Conflict between xunicode and fontspec?

Discussion:

Conflict between xunicode and fontspec?

Julien ÉLIE

2008-02-04 21:46:18 UTC

Hi,

I have a problem with xunicode. When it is loaded *before* fontspec,
it does not trigger the issue off.

You cas see in <http://people.via.ecp.fr/~iulius/test-xunicode.pdf> that
the letter "î" has a weird circumflex if I compile with XeLaTeX:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
\documentclass[a4paper,12pt]{article}
\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}

\usepackage{xltxtra}

\usepackage{hyperref}

\begin{document}

âêîôû \textbf{âêîôû}

\end{document}
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

but *no* problem with:

\usepackage{xunicode,fontspec}
\usepackage{xltxtra}

while there is one with:

\usepackage{fontspec,xunicode}
\usepackage{xltxtra}

Do you understand what is going on? Why do they interfere?

By the way, there is *no* problem if I do not specify:

\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}

But if I do not load these two packages, I have weird results with lots of
things (encoding problems for index, with the multind package, weird
accents, extraspaces with active French characters [I have " !" instead
of " !" for instance with Babel] and maybe other minor issues I have not seen).

Thanks beforehand for your help regarding this issue.

Regards,

--
Julien ÉLIE

« Il ne faut jamais parler séchement à un Numide. » (Astérix)

Arthur Reutenauer

2008-02-05 01:12:34 UTC

Hello,

You should not use inputenc and fontenc with XeTeX, they simply don't
support XeTeX at all. The output you get indicates incompatibility
between inputenc + fontenc and xunicode + fontspec, rather than a
conflict between the latter two (it might be that changing the order of
the packages improves things, but that's really not a robust solution).

If you really need to use the latin-1 encoding, you should use the
\XeTeXinputencoding command (\XeTeXinputencoding ISO-8859-1), but I
encourage you to use UTF-8, if possible (this is the default encoding
XeTeX expects).

In addition, you could try and give up Babel altogether and use
François Charette's polyglossia instead, which is a XeTeX-aware
replacement for Babel.

http://scripts.sil.org/svn-view/xetex/TRUNK/texmf/tex/xelatex/polyglossia/

There may be other issues left like indexes, etc., but in any case the
solution is certainly not to stick to inputenc and fontenc. You should
fix that first.

Arthur

Post by Julien ÃLIE
« Il ne faut jamais parler séchement à un Numide. » (Astérix)

A pity Goscinny's humor doesn't translate well in other languages ;-)

Julien ÉLIE

2008-02-05 18:25:16 UTC

Hi Arthur,

First of all, thanks for your answer.

Post by Arthur Reutenauer
You should not use inputenc and fontenc with XeTeX, they simply don't
support XeTeX at all.

Well, I have just tried polyglossia:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
% encoding: utf-8

\documentclass[a4paper,12pt]{article}

\usepackage{polyglossia}
\setdefaultlanguage{french}

\usepackage{xltxtra}
\usepackage{hyperref}

\begin{document}

âêîôû \textbf{âêîôû}

test ! test ! test! test~!

\end{document}
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Accents are good but I do not have the right spaces before "!"...
The first one is an unbreakable space and the second one is a normal
space. And the result is that I have *two* spaces for the first one
and different kinds of *one* space for the others.

It is not good at all...

However, if I add:

\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}

The result is fine!

Where do you think the problem is?

Post by Arthur Reutenauer
In addition, you could try and give up Babel altogether and use
François Charette's polyglossia instead, which is a XeTeX-aware
replacement for Babel.

Thanks for the pointer but, well, Babel has stuff like
\AddThinSpaceBeforeFootnotes, \FrenchFootnotes, \ier and other
commands I use so switching to polyglossia breaks a lot of things...

Post by Arthur Reutenauer
A pity Goscinny's humor doesn't translate well in other languages ;-)

It depends on the sentence. This one should be better for translations :)

--
Julien ÉLIE

« Quand tu auras lu ces lignes, le papyrus s'autodétruira. » (Astérix)

Jonathan Kew

2008-02-06 00:01:04 UTC

Post by Julien ÃLIE
Hi Arthur,
First of all, thanks for your answer.

Post by Arthur Reutenauer
You should not use inputenc and fontenc with XeTeX, they simply don't
support XeTeX at all.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
% encoding: utf-8
\documentclass[a4paper,12pt]{article}
\usepackage{polyglossia}
\setdefaultlanguage{french}
\usepackage{xltxtra}
\usepackage{hyperref}
\begin{document}
âêîôû \textbf{âêîôû}
test ! test ! test! test~!
\end{document}
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Accents are good but I do not have the right spaces before "!"...
The first one is an unbreakable space and the second one is a normal
space. And the result is that I have *two* spaces for the first one

This sounds like polyglossia doesn't recognize the non-breaking space
as a "space", and so adds space of its own; I expect François can
update this.

Post by Julien ÃLIE
and different kinds of *one* space for the others.

In the case of "test!", I think polyglossia is providing a \kern of a
certain width. Presumably "test !" and "test~!" simply give you the
standard space, which may not be the same.

Post by Julien ÃLIE
It is not good at all...
\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}
The result is fine!

I guess \usepackage[latin1]{inputenc} has the effect of converting
some of the accented characters, and probably the non-breaking space,
into LaTeX control sequences, and then some internal macros may deal
with them differently. However, this is not a good idea in xelatex;
if you think about it, you're actually misleading the software,
claiming that your text is Latin-1 when in fact it was UTF-8!

The only reason your accented characters survived at all is that
their Unicode values happen to coincide with their Latin-1
codepoints. So after xetex has decoded the UTF-8 bytes into Unicode
characters, the inputenc package then "decodes" those character
values into LaTeX macros. But this will not work in most other cases;
you were lucky that Latin-1 and Unicode happen to share codepoints
for the characters of interest.

I don't know exactly how fontenc gets involved here; it may mean that
you end up using different virtual fonts, or something. Did you try
this in combination with fontspec-selected fonts, not just the
default CM/LM?

If there are language-specific issues like space before footnotes
that polyglossia doesn't yet handle, I hope François will consider
adding support for these; I think this is a much better way forward
than trying to use combinations of old stuff (built for legacy byte
encodings and fonts) and the new Unicode mechanisms.

JK

François Charette

2008-02-06 08:09:38 UTC

Post by Jonathan Kew

Post by Julien ÃLIE
<JE>
<...>
Accents are good but I do not have the right spaces before "!"...
The first one is an unbreakable space and the second one is a normal
space. And the result is that I have *two* spaces for the first one

<JK>
This sounds like polyglossia doesn't recognize the non-breaking space
as a "space", and so adds space of its own; I expect François can
update this.

Sure, this should not be too difficult. I will look into this next
week-end.

Post by Jonathan Kew

Post by Julien ÃLIE
and different kinds of *one* space for the others.

In the case of "test!", I think polyglossia is providing a \kern of a
certain width. Presumably "test !" and "test~!" simply give you the
standard space, which may not be the same.

Indeed. I have not yet considered cases like "test !" and "test~!".
Currently polyglossia expects the input to be "test!" and inserts the
appropriate kerning, just as it was done in antomega by means of OTPs.

Post by Jonathan Kew

Post by Julien ÃLIE
It is not good at all...

Bonjour Julien. Please bear with me! Polyglossia is only a few weeks old
and has not even reached "alpha status"! I make the development version
available on the svn server (and as a snapshot on mine) so that people
like you can test it and point out bugs and omissions. Patches are of
course most than welcome ;-)
And thanks for your "bug report" of course.

Post by Jonathan Kew
If there are language-specific issues like space before footnotes
that polyglossia doesn't yet handle, I hope François will consider
adding support for these;

At this stage only "basic" language-specific features are implemented. I
have yet to look into more details at the "Pro" version of French
support in Babel. I am prepared to make polyglossia eventually as
feature-rich as Babel (or rather more so!), but in some cases this will
take some time.

Post by Jonathan Kew
I think this is a much better way forward
than trying to use combinations of old stuff (built for legacy byte
encodings and fonts) and the new Unicode mechanisms.

I can only agree with that!

FC

Julien ÉLIE

2008-02-06 13:00:19 UTC

Hi François,

Post by FranÃ§ois Charette

Post by Jonathan Kew
This sounds like polyglossia doesn't recognize the non-breaking space
as a "space", and so adds space of its own; I expect François can
update this.

Sure, this should not be too difficult. I will look into this next
week-end.

Thanks a lot.

Post by FranÃ§ois Charette
Indeed. I have not yet considered cases like "test !" and "test~!".
Currently polyglossia expects the input to be "test!" and inserts the
appropriate kerning, just as it was done in antomega by means of OTPs.

I believe it should give the same result, with the accurate spacing,
whatever space is given (even "test !" or 0x2003 [ESPACE CADRATIN]
or 0x200A [ESPACE ULTRAFINE] or 0x202F [ESPACE INSÉCABLE ÉTROITE] or...).
Otherwise, the user should use "\string!".

Post by FranÃ§ois Charette

Post by Jonathan Kew

Post by Julien ÃLIE
It is not good at all...

Bonjour Julien. Please bear with me! Polyglossia is only a few weeks old
and has not even reached "alpha status"!

Yep, I know! Sorry for my having been a bit rude. I was speaking about
the spacing when I wrote "It is not good at all", not about the whole
package which I understand is in development.

Post by FranÃ§ois Charette
Patches are of course most than welcome ;-)

Sure but I am currently unable to write complex macros like the ones
we can find in French Babel:

http://ftp.oleane.net/pub/CTAN/macros/latex/required/babel/frenchb.dtx

I know how to add thinks like:

\def\ier{\textsuperscript{\lowercase{er}}\xspace}

But for more complex macros like \AddThinSpaceBeforeFootnotes, I am
not TeX-fluent at all!

\ifLaTeXe
\newcommand{}{\***@FNtrue}
\AtBeginDocument{\let\@footnotemarkORI\@footnotemark
\def\@footnotemarkFR{\leavevmode\unskip\unkern
\,\@footnotemarkORI}%
\***@FN
\let\@footnotemark\@footnotemarkFR
\fi}
\def\ftnISsymbol{\@fnsymbol\***@footnote}
\long\def\@makefntextFR#1{\ifx\thefootnote\ftnISsymbol
[...]

is the beginning of supporting \AddThinSpaceBeforeFootnotes and I am
not good enough to convert it to Polyglossia, that is to say what
should be changed between LaTeX and XeLaTeX.
Otherwise, it would have been a pleasure to help.

Post by FranÃ§ois Charette
And thanks for your "bug report" of course.

Better say a "feature request" :)

Post by FranÃ§ois Charette
I am prepared to make polyglossia eventually as
feature-rich as Babel (or rather more so!), but in some cases this will
take some time.

That sounds very great!
Thanks!

--
Julien ÉLIE

« -- Et fais attention de ne pas te cogner aux arbres !
-- ...
-- Le tout, c'est de faire attention de ne pas se cogner aux Gaulois ! » (Astérix)

Arthur Reutenauer

2008-02-06 21:06:55 UTC

Post by FranÃ§ois Charette
Indeed. I have not yet considered cases like "test !" and "test~!".
Currently polyglossia expects the input to be "test!" and inserts the
appropriate kerning, just as it was done in antomega by means of OTPs.

Actually there is an interesting issue here, and it can be connected
with another recent thread, the one about soft hyphen: I think some of
these characters should be handled on a lower level than
language-specifics packages, probably in the format itself.

Such characters include the Unicode soft hyphen, as has already been
discussed, and various space characters, as you and Adam pointed out,
including the ISO 8859 no-break space (Unicode U+00A0 NO-BREAK SPACE).
I'm convinced that handling the latter as equivalent to the usual TeX
tie (~) is the right thing to do here, that is, making U+00A0 active and
defining it as \penalty10000\space or something like that (just like ~
in every TeX dialect). Of course, active characters have always had a
rather bad reputation in TeX, and I certainly hope you won't be tempted
to make some innocuous punctuation marks active in gloss-french, for
example, as it is done in Babel's french.ldf! But for some particular
Unicode characters, it seems advisable and the most straightforward
thing to do; I would say most characters with General Category C (U+00AD
SOFT HYPHEN is one of them, having gc=Cf), as well as some space
characters (space is anyway an active thing in TeX, if you think about it).

Anyway, what I wanted to stress here was that it might be advisable to
think in two steps: first, handle the characters according to their
semantics as defined by Unicode (this would be XeLaTeX' job) and then,
only, make some fine language-dependent typographical treatment
(Polyglossia's job), like (in French) modifying the length of the spaces
according to the character that follows them -- but this is only a
secondary issue: after all, if a user explicitely inputs a no-break
space before an exclamation mark, he shouldn't expect that space to be
treated on a different basis than other no-break spaces if Polyglossia
doesn't instruct him to do so; instead, it is quite legitimate for
Polyglossia-French to require users to type the exclamation marks just
after the last word with no space (that is, the user should not try and
be smarter than the system, so to say).

What do you think?

Post by FranÃ§ois Charette
At this stage only "basic" language-specific features are implemented. I
have yet to look into more details at the "Pro" version of French
support in Babel.

If you mean frenchpro.sty, I suppose you're aware that it is something
different from frenchb.ldf, the French support file for Babel; which
means it is not the default support file most users will be used to.

Arthur

Julien ÉLIE

2008-02-06 21:21:18 UTC

Hi Arthur,

Post by Arthur Reutenauer
instead, it is quite legitimate for
Polyglossia-French to require users to type the exclamation marks just
after the last word with no space

Do you mean that I would have to write "Ah bon?" *only* when I use XeLaTeX?
And "Ah bon ?" (with 0x00A0) for my mails, news articles, Word documents,
scripts, etc. It would be a nightmare to have to think about the system
on which I write, especially for people who have automatic reflexes to
correctly write the French language!

--
Julien ÉLIE

« -- Essayons d'interroger ce garde habilement sans éveiller ses soupçons...
-- Hep ! Où est enfermé Assurancetourix ? » (Astérix)

Arthur Reutenauer

2008-02-06 22:49:30 UTC

Post by Julien ÃLIE
Do you mean that I would have to write "Ah bon?" *only* when I use XeLaTeX?
And "Ah bon ?" (with 0x00A0) for my mails, news articles, Word documents,
scripts, etc.

U+00A0 NO-BREAK SPACE is not the kind of space required by French
typographic rules before question marks, exclamation marks and
semicolons. The appropriate character is closer to U+202F NARROW
NO-BREAK SPACE (“espace fine insécable” in French typography), and this
is the one you should type if you want to convey that precise meaning.
If you type U+00A0, which is a normal no-break space (“espace mot
insécable”), you should not expect it to behave magically as a narrower
no-break space – unless, of course, you rely on a more elaborate text
processor which you *know* will handle those characters according to
different rules (a “higher-level protocal”, as Unicode puts it). But
such an expectation would, generally speaking, not be Unicode-compliant.

Incidentally, U+00A0 NO-BREAK SPACE does also occurr in French
typography, since it is the amount of space required by most
typographers before the colon (not the other punctuation marks); some,
though, would put a narrow no-break space here, and it's of course up to
you to choose what space you want to put, so the difference really makes
sense, I think (and is endorsed by Babel-French, where the colon is
indeed preceded by a full no-break space, whereas the other special
punctuation marks have a narrow no-break space before them).

To sum up, wanting to be smart and inputting the appropriate Unicode
characters directly is of course legitimate, and should be handled
correctly by XeLaTeX (this is the “first step” I highlighted in my
previous post); but if you only go half the way, I don't consider it
should be a priority for Polyglossia to correct your input – although I
do think this would be a nice thing to do, but there would be a number
of problems involved, and it could take time to do so (especially
because one would not like to implement the same solution as Babel by
making punctuation marks active). On the other hand, if you don't want
to bother about the exact character to key in before the “double”
punctuation marks, Polyglossia-French requires you (for the moment) to
type no space at all, for technical reasons – and I expect this would be
more convenient to the vast majority of French-speaking users, who
wouldn't know how to input a no-break space in the first place.

Arthur

P-S for Adam: Maybe you understand now what I meant when I wrote that
“many TeX users do things which are not Unicode-compliant in a much
worse way” last week-end. I could not ask for a better example :-)

Julien ÉLIE

2008-02-07 19:31:24 UTC

Hi Arthur,

Post by Arthur Reutenauer
To sum up, wanting to be smart and inputting the appropriate Unicode
characters directly is of course legitimate, and should be handled
correctly by XeLaTeX (this is the “first step” I highlighted in my
previous post);

You are right.
However, what I was pointing at was the fact that XeLaTeX was not doing
that since Polyglossia added an extra \thinspace after a non-breaking
space 0x00A0 before a question mark.

I know it will be fixed by François, as he said in this mailing-list.

Post by Arthur Reutenauer
but if you only go half the way, I don't consider it
should be a priority for Polyglossia to correct your input – although I
do think this would be a nice thing to do

Sure.

--
Julien ÉLIE

« Les légionnaires ont adopté pour attaquer la redoutable tactique
dite de la tortue. Pour battre en retraite, les légionnaires
adoptent l'efficace tactique dite du lièvre. » (Astérix)

Will Robertson

2008-02-06 22:55:45 UTC

Post by Julien ÃLIE

Post by Arthur Reutenauer
instead, it is quite legitimate for
Polyglossia-French to require users to type the exclamation marks just
after the last word with no space

Do you mean that I would have to write "Ah bon?" *only* when I use XeLaTeX?

That would indeed be inconvenient. It's troubling (to me) how many
language-specific ways there might be of writing even relatively plain
texts.

Anyway, I'm sure this won't be a problem. Even if x00A0 is defined
active, it can still be removed so that polyglossia can do its job:

\documentclass{article}
\begin{document}
\catcode`\ =\active
\let ~
a \unskip b
\end{document}

(Who knows if those non-breaking spaces will make it through. Anyway,
you get the idea.)

Post by Julien ÃLIE
Of course, active characters have always had a
rather bad reputation in TeX

Only when they're not expected :) For example, no-one has a problem
with the active character ~.

In classical TeX, there's simply too few characters to be able to
combine syntax and semantics for more than a couple of ascii
characters -- and the macros were never designed to sanitise their
inputs (no doubt due to quite legitimate performance concerns).

For example, it would be nice to make ` active so you can write
`quoted text` and have the quotes render correctly and check their
nesting (e.g. see the csquotes package). But then you'd better be
really careful about writing code like \catcode`\x !

In short, characters can be made active to enforce their unicode
meaning. I even make (emdash) active so it inserts the right about
of (breaking and no-breaking) space around it; although interchartoks
will soon make this the wrong way of doing that sort of thing.

It would be really nice to have an editor which replaced the
essentially invisible unicode spaces, breaks, and joins with distinct
coloured symbols (and gave shortcuts to type them, maybe). Consider
this a request for anyone's upcoming TeX editor :)

Will

Max Rabkin

2008-02-07 08:23:44 UTC

Post by Will Robertson
It would be really nice to have an editor which replaced the
essentially invisible unicode spaces, breaks, and joins with distinct
coloured symbols (and gave shortcuts to type them, maybe). Consider
this a request for anyone's upcoming TeX editor :)

Yudit might do what you want.

Post by Will Robertson
Will

--Max

Julien ÉLIE

2008-02-07 19:31:28 UTC

Hi Will,

Post by Will Robertson
Anyway, I'm sure this won't be a problem. Even if x00A0 is defined
active, it can still be removed so that polyglossia can do its job

Awesome!

Thanks a lot, Will: it solves all the problems I have with Babel
since I do not need inputenc+fontenc any longer.

Post by Will Robertson
(Who knows if those non-breaking spaces will make it through. Anyway,
you get the idea.)

Yes, thanks again!

--
Julien ÉLIE

« Un croyant, c'est un antiseptique. » (Raymond Devos)

Joel C. Salomon

2009-09-30 21:38:03 UTC

I just came across this thread from about 15 months back, entitled

Post by Arthur Reutenauer
Of course, active characters have always had a
rather bad reputation in TeX

Only when they're not expected :) For example, no-one has a problem with
the active character ~.

<snip>

For example, it would be nice to make ` active so you can write `quoted
text` and have the quotes render correctly and check their nesting (e.g.
see the csquotes package). But then you'd better be really careful about
writing code like \catcode`\x !
In short, characters can be made active to enforce their unicode
meaning. I even make — (emdash) active so it inserts the right about of
(breaking and no-breaking) space around it; although interchartoks will
soon make this the wrong way of doing that sort of thing.

I’m intrigued. I’ve had this code in my preamble for a while (don’t
recall who gave it to me):
\DeclareRobustCommand\dash{%
\ifvmode\leavevmode\else\unskip\nobreak\thinspace\fi
\textemdash\thinspace\ignorespaces
}
\catcode`\—=\active
\let—=\dash
What are interchartoks, and how would I do this with them?

—Joel Salomon

Will Robertson

2009-09-30 22:54:25 UTC

On 2009-10-01 07:08:03 +0930, "Joel C. Salomon"

Post by Joel C. Salomon
\catcode`\—=\active
\let—=\dash

That looks vaguely like something I've written in the past :)

Post by Joel C. Salomon
What are interchartoks, and how would I do this with them?

There's an example in xetexref that might help to get you started.
(Although the feature wasn't really designed for this sort of
application, in a controlled environment you can get very good results
from it.)

Will

Julien ÉLIE

2008-02-06 19:49:33 UTC

Hi Jonathan,

Thanks for having answered. (And also thanks for one of the
interview you gave where you say that you for your part pronounce
XeLaTeX [ˈziːlɑtɛx].)

Post by Jonathan Kew
I guess \usepackage[latin1]{inputenc} has the effect of converting
some of the accented characters, and probably the non-breaking space,
into LaTeX control sequences

I believe it is what it does. And I must admit it is very bad since
if I copy/paste the PDF generated, I obtain stuff like "´et´e" or "goˆut"
instead of "été" or "goût". Native UTF-8 by fontspec is far better :)
And it highlights what you, Arthur and François say: inputenc should not
be used with XeLaTeX.

Post by Jonathan Kew
I don't know exactly how fontenc gets involved here; it may mean that
you end up using different virtual fonts, or something. Did you try
this in combination with fontspec-selected fonts, not just the
default CM/LM?

Only the default fonts.

Post by Jonathan Kew
If there are language-specific issues like space before footnotes
that polyglossia doesn't yet handle, I hope François will consider
adding support for these; I think this is a much better way forward
than trying to use combinations of old stuff (built for legacy byte
encodings and fonts) and the new Unicode mechanisms.

I have a question: why should polyglossia be written from scratch?
Couldn't Babel be "updated" to be used with XeLaTeX?

I think it is a waste of time and effort to write polyglossia instead
of improving what Babel does. Is is really incompatible with no
way to make it work with XeLaTeX?

--
Julien ÉLIE

« L'éternité, c'est long, surtout vers la fin. » (Woody Allen)

Peter Dyballa

2008-02-06 20:39:22 UTC

Post by Julien ÃLIE
I believe it is what it does. And I must admit it is very bad since
if I copy/paste the PDF generated, I obtain stuff like "´et´e" or "goˆut"
instead of "été" or "goût".

Use the cmap package to correct this behaviour in pdfLaTeX!

--
Greetings

Pete

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe trying
to produce bigger and better idiots. So far, the Universe is winning.
– Rich Cook

Julien ÉLIE

2008-02-06 20:45:02 UTC

Hi Peter,

Post by Peter Dyballa

Post by Julien ÃLIE
I believe it is what it does. And I must admit it is very bad since
if I copy/paste the PDF generated, I obtain stuff like "´et´e" or
"goˆut" instead of "été" or "goût".

Use the cmap package to correct this behaviour in pdfLaTeX!

I use XeLaTeX and this package does not change anything for it.

Thanks all the same.

--
Julien ÉLIE

« -- Rendre compte ? Mais nous n'y sommes pas allés
et nous n'avons rien vu ! Et puis Jules César a dit...
-- Je ne sais pas ce que Jules César a dit,
mais ne pas y aller et ne pas voir,
c'est le meilleur moyen de ne pas être vaincus ! » (Olibrius)

Will Robertson

2008-02-06 20:54:29 UTC

Post by Julien ÃLIE
I have a question: why should polyglossia be written from scratch?
Couldn't Babel be "updated" to be used with XeLaTeX?

Longer story short: no :) Almost everything that polyglossia and babel
have in common are implemented in pretty different ways. And there're
lots of new features that we'd like to add to polyglossia (e.g., like
the mem package) but the base packages of LaTeX don't really have
features added to them any more...

Will

François Charette

2008-02-06 21:38:27 UTC

Post by Julien ÃLIE
I have a question: why should polyglossia be written from scratch?
Couldn't Babel be "updated" to be used with XeLaTeX?
I think it is a waste of time and effort to write polyglossia instead
of improving what Babel does. Is is really incompatible with no
way to make it work with XeLaTeX?

In addition to what Will already wrote, read the previous threads on
polyglossia. You will realize that: 1) it is not written from scratch
and 2) it is not based on Babel either! ;)

FC

Bruno Voisin

2008-02-07 07:43:36 UTC

Post by Julien ÃLIE
I have a question: why should polyglossia be written from scratch?
Couldn't Babel be "updated" to be used with XeLaTeX?
I think it is a waste of time and effort to write polyglossia instead
of improving what Babel does. Is is really incompatible with no
way to make it work with XeLaTeX?

As François Charette already answered, there have been a number of
threads on this.

Babel does several disconnected things regarding language support:

- Select hyphenation files.

- Translate strings (like "section", "chapter", "carbon copy", etc.).

- Implement language-specific typographic conventions.

- Select specific fonts (for Russian, for Greek, ...) and specific
encodings.

Some of these things are incompatible with XeTeX (font and encoding
selection), and others are not agreed upon by all users.

Taking French as an example, the French babel definition file not only
sets spacing properly around punctuation (normal space after ".",
space before ":" and ";", etc.) and adds support for guillemets, but
it also enforces assumed norms (like only dashes before items in
itemized lists and no space between items, surnames in small caps,
etc.) which to my knowledge are no norms at all and which I personally
consider bad taste.

Similarly, just yesterday or the day before there was a question here
or on the OS X TeX list on how to prevent the Spanish definition file
from replacing "." in digits in maths by ",".

There are switches, language by language, to prevent Babel from taking
these initiatives, but these switches are often undocumented and their
syntax vary greatly from one language to the other.

Because of all this, the general opinion here (my interpretation,
obviously) was that it would be better if these distinct functions of
Babel were independent from each other, and could be activated at will
by the user. For example, I'm personally interested in hyphenation
selection and translation of strings, and on a small subset of
language-specific typographic conventions, but in none of the other
initiatives Babel is taking.

Given it's rather unlikely that Babel is modified to make its
functions independent (that would probably require a reimplementation
from scratch, given Babel's foundation on 8-bit fonts and encodings),
it was suggested that a new package is created, based on Unicode and
oriented towards XeTeX support (and possibly LuaTeX at some point).

Then François Charette took upon himself to create this package, and
the rest is history... ;-)

Bruno Voisin

Julien ÉLIE

2008-02-07 19:35:56 UTC

Hi Bruno,

Post by Bruno Voisin
As François Charette already answered, there have been a number of
threads on this.

Thanks a lot for the explanation you gave me!

Post by Bruno Voisin
Taking French as an example, the French babel definition file not only

[...]

Post by Bruno Voisin
it also enforces assumed norms

[...]

Post by Bruno Voisin
which to my knowledge are no norms at all and which I personally
consider bad taste.

De gustibus non disputandumst.

Post by Bruno Voisin
Then François Charette took upon himself to create this package, and
the rest is history... ;-)

I could not agree more! Thanks, François.

--
Julien ÉLIE

« Ta vie ne tient qu'à un fil, Téléféric ! » (Astérix)

Ulrike Fischer

2008-02-07 13:14:48 UTC

Post by Julien ÃLIE

Post by Jonathan Kew
I guess \usepackage[latin1]{inputenc} has the effect of converting
some of the accented characters, and probably the non-breaking space,
into LaTeX control sequences

I believe it is what it does. And I must admit it is very bad since
if I copy/paste the PDF generated, I obtain stuff like "´et´e" or "goˆut"
instead of "été" or "goût".

This has nothing to do with inputenc. It doesn't matter if a glyph is
printed "directly" or through a more or less longer chains of commands.
What matters is if this chain of commands results in something the
reader recognizes and knows how to copy. In the case of the accents it
is in most cases a question of the output encoding:

\documentclass{article}
\usepackage[ansinew]{inputenc}
\usepackage[T1]{fontenc}
\begin{document}
T1: été, goût %copies fine

\fontencoding{OT1}\selectfont
OT1: été, goût %copies badly

\end{document}

--
Ulrike Fischer

Peter Dyballa

2008-02-07 14:43:09 UTC

Post by Will Robertson
\documentclass{article}
\usepackage[ansinew]{inputenc}
\usepackage[T1]{fontenc}
\begin{document}
T1: été, goût %copies fine
\fontencoding{OT1}\selectfont
OT1: été, goût %copies badly
\end{document}

It's not that, and it's also not using the cmap package – in Mac OS
X the characters, when copied, are inserted only in decomposed
fashion. Even when I open old files!

The change has come ... from this in the source Ḃạȳḵ
ÝŶŸƳÝẎỴỲỶȲɎỸ ǵğǧģĝġʛɠḡǥɢɡ to that from
PDF: Ḃạȳḵ ÝŶŸ�ÝẎỴỲỶȲ�Ỹ
ǵğǧģĝġ��ḡ��! Fontspec only used with Monaco.

--
Greetings

Pete

If we don't succeed, we run the risk of failure."
– George W. Bush

Ulrike Fischer

2008-02-07 15:47:21 UTC

Post by Peter Dyballa

Post by Will Robertson
\documentclass{article}
\usepackage[ansinew]{inputenc}
\usepackage[T1]{fontenc}
\begin{document}
T1: été, goût %copies fine
\fontencoding{OT1}\selectfont
OT1: été, goût %copies badly
\end{document}

It's not that, and it's also not using the cmap package – in Mac OS
X the characters, when copied, are inserted only in decomposed
fashion. Even when I open old files!

I would say this is a misunderstanding: The above example was meant for
pdfLaTeX not for XeLaTeX.

There inputenc maps the û to a command (\^{u}), and t1enc.def then maps
it to char position 251, which in turn is mapped by the cmsuper fonts to
the glyph "ugrave" which can be copied without problems.

An example for XeLaTeX would be:

\documentclass{article}
\usepackage{fontspec}

\begin{document}
\^{u} % copies badly

\fontencoding{T1}\selectfont
\^{u} % copies fine
\end{document}

So the problem is not the input but the output encoding: the default
encoding eu1 from fontspec doesn't contain the necessary commands to map
the commands like \^{u} to the correct single glyph "ugrave".

Post by Peter Dyballa
The change has come ... from this in the source Ḃạȳḵ
ÝŶŸƳÝẎỴỲỶȲɎỸ ǵğǧģĝġʛɠḡǥɢɡ to that from
PDF: Ḃạȳḵ ÝŶŸ�ÝẎỴỲỶȲ�Ỹ
ǵğǧģĝġ��ḡ��! Fontspec only used with Monaco.

?? Not quite sure what this should mean.

--
Ulrike Fischer

Peter Dyballa

2008-02-07 16:42:15 UTC

Post by Ulrike Fischer
So the problem is not the input but the output encoding: the default
encoding eu1 from fontspec doesn't contain the necessary commands to map
the commands like \^{u} to the correct single glyph "ugrave".

This would explain why at sudden it fails to copy from the PDF file:
fontspec was changed in an unwanted and unexpected way – besides
probably not so many will write with LaTeX macros but use the proper
characters.

Below is just the input, contents of the LaTeX source file, and the
output I could copy from the PDF file.

Post by Ulrike Fischer

Post by Peter Dyballa
The change has come ... from this in the source Ḃạȳḵ
ÝŶŸƳÝẎỴỲỶȲɎỸ ǵğǧģĝġʛɠḡǥɢɡ to that from
PDF: Ḃạȳḵ ÝŶŸ�ÝẎỴỲỶȲ�Ỹ
ǵğǧģĝġ��ḡ��! Fontspec only used with Monaco.

?? Not quite sure what this should mean.

I was thinking (and I think I also experienced this) xdv2pdf and
xdvipdfmx were inserting a mapping from accented characters to their
Unicode positions, just as Vladimir Volovich's cmap packages tries to
handle this with pdfTeX, so that copying from the PDF file resulted
in fully composed Unicode strings. At least today I cannot copy
composed Unicode characters from any PDF file.

--
Greetings

Pete

It's not the valleys in life I dread so much as the dips.
– Garfield

Ulrike Fischer

2008-02-07 17:39:38 UTC

Post by Peter Dyballa

Post by Ulrike Fischer
So the problem is not the input but the output encoding: the default
encoding eu1 from fontspec doesn't contain the necessary commands to map
the commands like \^{u} to the correct single glyph "ugrave".

fontspec was changed in an unwanted and unexpected way

I don't know if this is really your problem. Perhaps you are using some
other package which maps your input to commands lile \^{u}. As Bruno
mentioned there is xunicode which handles the \^{u} case so you should
try it. If it doesn't help: a complete minimal example and the resulting
pdf would probably help to find what's going wrong.

Post by Peter Dyballa
– besides probably not so many will write with LaTeX macros but use the proper
characters.

How do you differ the both? A û in a normal (pdf)LaTeX file with e.g.
ansinew encoding is a "proper" character from the point of the view of
->\^u.

Post by Peter Dyballa
I was thinking (and I think I also experienced this) xdv2pdf and
xdvipdfmx were inserting a mapping from accented characters to their
Unicode positions, just as Vladimir Volovich's cmap packages tries to
handle this with pdfTeX, so that copying from the PDF file resulted
in fully composed Unicode strings. At least today I cannot copy
composed Unicode characters from any PDF file.

Does "any PDF" include arbitrary PDF's from the net?

--
Ulrike Fischer

Peter Dyballa

2008-02-07 23:10:09 UTC

Post by Ulrike Fischer

Post by Peter Dyballa

Post by Ulrike Fischer
So the problem is not the input but the output encoding: the default
encoding eu1 from fontspec doesn't contain the necessary commands to map
the commands like \^{u} to the correct single glyph "ugrave".

fontspec was changed in an unwanted and unexpected way

I don't know if this is really your problem. Perhaps you are using some
other package which maps your input to commands lile \^{u}. As Bruno
mentioned there is xunicode which handles the \^{u} case so you should
try it. If it doesn't help: a complete minimal example and the
resulting
pdf would probably help to find what's going wrong.

In GNU Emacs I can input the characters as themselves, so xunicode is
not needed. I started LaTeX at a Sun keyboard with compose key –
there was no real need to learn using these macros (OK, TeX was also
patched to accept direct input). Since then I prefer to use as little
packages as possible. (Which sometimes greatly fails: latest hyperref
package works when you also install the whole oberdiek plethora of
packages. But it has support for pdfTeX 1.50!)

Post by Ulrike Fischer

Post by Peter Dyballa
I was thinking (and I think I also experienced this) xdv2pdf and
xdvipdfmx were inserting a mapping from accented characters to their
Unicode positions, just as Vladimir Volovich's cmap packages tries to
handle this with pdfTeX, so that copying from the PDF file resulted
in fully composed Unicode strings. At least today I cannot copy
composed Unicode characters from any PDF file.

Does "any PDF" include arbitrary PDF's from the net?

Yes! I have some German PDF documents, often created without pdfTeX
or XeTeX. (The worst of them is from pdfFactory 2.42 (Windows XP
Professional German) – it seems to use some encryption or scrambling
that what you see & copy is not what you paste & see. The names of
the ugly ragged fonts are hidden, except that they are TT.) But now
I've found a few causes/culprits ...

First problem is GNU Emacs – it does not like to compose.
Second problem is Apple's PDFKit – it decomposes.
Third problem are TextEdit and Adobe Reader – they behave correctly
(IMO).

When I copy from TeXShop (my preferred PDF viewer) to TextEdit the
characters are OK, composed as in GNU Emacs, i.e. the XeLaTeX source
file, or as displayed in the PDF viewer. Pasting the same into GNU
Emacs shows it decomposed.

When I copy from Adobe Reader 8 to TextEdit the characters are OK,
composed as in GNU Emacs, i.e. the XeLaTeX source file, or as
displayed in the PDF viewer. Pasting the same into GNU Emacs shows it
*alright*.

And when I copy in TextEdit what I had copied before from TeXShop and
pasted into TextEdit, and paste this secondary copy into GNU Emacs
*it's composed*.

I think this is worth another Apple Bug Report. Can MS Windows users
feel better?

--
Mit friedvollen Grüßen

Pete

Be careful of reading health books, you might die of a misprint.
– Mark Twain

Ross Moore

2008-02-08 00:31:06 UTC

i Peter,

Post by Peter Dyballa
When I copy from TeXShop (my preferred PDF viewer) to TextEdit the
characters are OK, composed as in GNU Emacs, i.e. the XeLaTeX source
file, or as displayed in the PDF viewer. Pasting the same into GNU
Emacs shows it decomposed.
When I copy from Adobe Reader 8 to TextEdit the characters are OK,
composed as in GNU Emacs, i.e. the XeLaTeX source file, or as
displayed in the PDF viewer. Pasting the same into GNU Emacs shows it
*alright*.
And when I copy in TextEdit what I had copied before from TeXShop and
pasted into TextEdit, and paste this secondary copy into GNU Emacs
*it's composed*.

Yes, in my experience the results from Copy/Paste vary according
to the PDF reader being used, and the editor application being
the target into which you are pasting.
This is true when using the same PDF as the source.

A relevant issue is whether there is a /ToUnicode table for
the font within the PDF. But that isn't the only issue.

Another aspect is the extent to which word-boundaries
are respected after pasting. This makes it very hard to know
what to put into new /ToUnicode resources for existing TeX fonts.

Post by Peter Dyballa
I think this is worth another Apple Bug Report. Can MS Windows users
feel better?

Certainly this is something that only the big boys can fix.

Post by Peter Dyballa
--
Mit friedvollen Grüßen
Pete

Cheers,

Ross

------------------------------------------------------------------------
Ross Moore ross-To+F1JekST7X/***@public.gmane.org
Mathematics Department office: E7A-419
Macquarie University tel: +61 +2 9850 8955
Sydney, Australia 2109 fax: +61 +2 9850 8114
------------------------------------------------------------------------

Ulrike Fischer

2008-02-08 09:40:20 UTC

Post by Peter Dyballa

Post by Ulrike Fischer
I don't know if this is really your problem. Perhaps you are using
some other package which maps your input to commands lile \^{u}. As
Bruno mentioned there is xunicode which handles the \^{u} case so
you should try it. If it doesn't help: a complete minimal example
and the resulting pdf would probably help to find what's going
wrong.

In GNU Emacs I can input the characters as themselves, so xunicode is
not needed.

It has nothing to do with the editor or the input! Even the best editor
does not take a char in your tex-file and puts it unchanged in a pdf.
tex-files are processed by latex/xelatex, then by drivers like dvips or
xdvipdmfx, then by readers like a pdf-reader, then by routines of your
OS when you try to copy and paste... and during all this processing the
chars of your inputs are moved around a lot and a lot of weird things
can happen.

An "A" in your input can lead to a neat "A" in a pdf that copies fine.
But it can also lead to the Gettysburg address:

\documentclass{article}
\begin{document}
\begingroup
\catcode`\A\active
\def A{Four score and seven years ago \ldots}
A
\endgroup
\end{document}

Post by Peter Dyballa
I started LaTeX at a Sun keyboard with compose key –
there was no real need to learn using these macros (OK, TeX was also
patched to accept direct input).

You don't need to learn "these macros" (I guess you mean \^{u}). But you
should be aware that the fact that you don't use them in your input
doesn't mean that the macros aren't used at all. An input like û can
lead to such macros!

Post by Peter Dyballa

Post by Ulrike Fischer

Post by Peter Dyballa
At least today I cannot copy
composed Unicode characters from any PDF file.

Does "any PDF" include arbitrary PDF's from the net?

Yes! I have some German PDF documents, often created without pdfTeX
or XeTeX. (The worst of them is from pdfFactory 2.42 (Windows XP
Professional German) – it seems to use some encryption or scrambling
that what you see & copy is not what you paste & see. The names of
the ugly ragged fonts are hidden, except that they are TT.) But now
I've found a few causes/culprits ...
First problem is GNU Emacs – it does not like to compose.
Second problem is Apple's PDFKit – it decomposes.
Third problem are TextEdit and Adobe Reader – they behave correctly
(IMO).
When I copy from TeXShop (my preferred PDF viewer) to TextEdit the
characters are OK, composed as in GNU Emacs, i.e. the XeLaTeX source
file, or as displayed in the PDF viewer. Pasting the same into GNU
Emacs shows it decomposed.

On the whole your description sounds as if your pdfs doesn't use the
correct glyphs. In a correct pdf "û" is *not* a composed char but the
simple char "ugrave". I don't see any reason why and how an application
should decompose such a glyph. I think you should really make a complete
minimal example.

--
Ulrike Fischer

Peter Dyballa

2008-02-08 10:34:42 UTC

Post by Ulrike Fischer
I think you should really make a complete
minimal example.

OK, here it is:

\documentclass{article}
\usepackage[no-math]{fontspec}
%\usepackage{xunicode}
%\usepackage{xltxtra}
\defaultfontfeatures{Mapping=tex-text}
\setmainfont{Times}
\thispagestyle{empty}
\begin{document}
TeXt: ÄÅÃÅ«ÈÃªÄ¬Ã±ÃÃºÅ Ã§ÄÃâážáº¡È³ážµÄ± :tXeT
\end{document}

And here is the PDF output attached.

Bruno Voisin

2008-02-08 10:50:41 UTC

Post by Peter Dyballa
And here is the PDF output attached.

Opening this PDF on Mac OS X 10.5.1 in either Preview or Acrobat,
copying its content and pasting it in a text document with TextEdit,
there's no problem: no decomposed glyph. Is this what's expected?

Bruno Voisin

Arthur Reutenauer

2008-02-08 11:26:01 UTC

Post by Bruno Voisin
Opening this PDF on Mac OS X 10.5.1 in either Preview or Acrobat,
copying its content and pasting it in a text document with TextEdit,
there's no problem: no decomposed glyph.

There are no “decomposed glyphs” ... but the characters are :-)
According to where you paste them, characters copied from Preview may be
put in Unicode Normalization Form D (decomposed) – apparently,
pasting them to Terminal does that; TextEdit keeps the exact stream.

This means that when you paste a <U+00D6 LATIN CAPITAL LETTER O WITH
DIAERESIS> from Preview to some applications, it will be decomposed into
<U+004F LATIN CAPITAL LETTER O, U+0308 COMBINING DIAERESIS>. I can
imagine that it happens for Emacs, for example.

Arthur

Peter Dyballa

2008-02-08 12:23:31 UTC

Post by Bruno Voisin

Post by Peter Dyballa
And here is the PDF output attached.

Opening this PDF on Mac OS X 10.5.1 in either Preview or Acrobat,
copying its content and pasting it in a text document with TextEdit,
there's no problem: no decomposed glyph. Is this what's expected?

As I wrote: yes. Pasting it into GNU Emacs I see only de-composed
characters. And pasting a copy of what I've pasted before into
TextEdit from TextEdit now into GNU Emacs looks perfectly composed!
As if copied from Adobe Reader ...

--
Greetings

Pete

I wouldn't recommend sex, drugs or insanity for everyone, but they've
always worked for me.
– Hunter S. Thompson

Bruno Voisin

2008-02-07 17:03:06 UTC

Post by Will Robertson
\documentclass{article}
\usepackage{fontspec}
\begin{document}
\^{u} % copies badly
\fontencoding{T1}\selectfont
\^{u} % copies fine
\end{document}
So the problem is not the input but the output encoding: the default
encoding eu1 from fontspec doesn't contain the necessary commands to map
the commands like \^{u} to the correct single glyph "ugrave".

That's what the xunicode package does if I'm not mistaken.

And in addition if you want to deal similarly with legacy TeX
ligatures such as --, ---, !`, ?`, '', ``, then you need to switch the
tex-text teckit mapping.

Which would transform your preamble to:

\usepackage{fontspec,xunicode}
\defaultfontfeatures{Mapping=tex-text}

Bruno Voisin

Peter Dyballa

2008-02-07 18:25:45 UTC

Post by Julien ÃLIE
\usepackage{fontspec,xunicode}
\defaultfontfeatures{Mapping=tex-text}

Makes no difference – but one thing lets me think: what when the font
used is missing so many accented glyphs that the characters get
composed?

I'll look into that later – but the pdfTeX problem can't have the
same cause!

--
Greetings

Pete

To most people solutions mean finding the answers. But to chemists
solutions
are things that are still all mixed up.

Bruno Voisin

2008-02-07 22:21:03 UTC

Post by Peter Dyballa

Post by Julien ÃLIE
\usepackage{fontspec,xunicode}
\defaultfontfeatures{Mapping=tex-text}

Makes no difference

Here it does. I checked before posting, using the provided example:

\documentclass{article}
\usepackage{fontspec}

\begin{document}
\^{u} % copies badly

\fontencoding{T1}\selectfont
\^{u} % copies fine
\end{document}

As is, if I compile with TeXShop and the XeLaTeX-xdvipdfmx engine,
then copies the first û in the PDF output and pastes it in a text file
using TextEdit, I get:

ˆu

If now I add the call to xunicode, compile, copies and pastes as
before, I get:

û

My setup: MacTeX-TexLive2007 full, latest TeXShop, Mac OS X 10.5.1.

Bruno

Julien ÉLIE

2008-02-07 19:31:16 UTC

Hi Jonathan,

Post by Jonathan Kew
This sounds like polyglossia doesn't recognize the non-breaking space
as a "space", and so adds space of its own; I expect François can
update this.

I have just made the non-breaking space active and replaced by "~".
It solves the problem.
And there is no need to use inputenc and fontenc any longer!

Post by Jonathan Kew
I don't know exactly how fontenc gets involved here; it may mean that
you end up using different virtual fonts, or something.

I think that inputenc is guilty there: it converts "î" to "\^{\i}"
or something like that. Afterwards, the circumflex is badly centered.

--
Julien ÉLIE

« -- Essayons d'interroger ce garde habilement sans éveiller ses soupçons...
-- Hep ! Où est enfermé Assurancetourix ? » (Astérix)

Ross Moore

2008-02-07 22:11:51 UTC

Hi Julien,

Post by Julien ÃLIE

Post by Jonathan Kew
I don't know exactly how fontenc gets involved here; it may mean that
you end up using different virtual fonts, or something.

I think that inputenc is guilty there: it converts "î" to "\^{\i}"
or something like that. Afterwards, the circumflex is badly centered.

Aaah; there is a fault in xunicode.sty .
It should catch this conversion and map it back to the correct code-
point.

Currently xunicode.sty has:

\DeclareUTFcomposite[\UTFencname]{x00EC}{\`}{i}
\DeclareUTFcomposite[\UTFencname]{x00ED}{\'}{i}
\DeclareUTFcomposite[\UTFencname]{x00EE}{\^}{i}
\DeclareUTFcomposite[\UTFencname]{x00EF}{\"}{i}

It should have 4 extra declarations, to become:

\DeclareUTFcomposite[\UTFencname]{x00EC}{\`}{i}
\DeclareUTFcomposite[\UTFencname]{x00EC}{\`}{\i}
\DeclareUTFcomposite[\UTFencname]{x00ED}{\'}{i}
\DeclareUTFcomposite[\UTFencname]{x00ED}{\'}{\i}
\DeclareUTFcomposite[\UTFencname]{x00EE}{\^}{i}
\DeclareUTFcomposite[\UTFencname]{x00EE}{\^}{\i}
\DeclareUTFcomposite[\UTFencname]{x00EF}{\"}{i}
\DeclareUTFcomposite[\UTFencname]{x00EF}{\"}{\i}

The other (TIPA) accents have the appropriate extra rules.

Post by Julien ÃLIE
--
Julien ÉLIE
« -- Essayons d'interroger ce garde habilement sans éveiller ses soupçons...
-- Hep ! Où est enfermé Assurancetourix ? » (Astérix)

Hope this helps,

Ross

------------------------------------------------------------------------
Ross Moore ross-To+F1JekST7X/***@public.gmane.org
Mathematics Department office: E7A-419
Macquarie University tel: +61 +2 9850 8955
Sydney, Australia 2109 fax: +61 +2 9850 8114
------------------------------------------------------------------------

Julien ÉLIE

2008-02-08 13:50:14 UTC

Hi Ross,

Post by Ross Moore

Post by Julien ÃLIE
I think that inputenc is guilty there: it converts "î" to "\^{\i}"
or something like that. Afterwards, the circumflex is badly centered.

Aaah; there is a fault in xunicode.sty .
It should catch this conversion and map it back to the correct code-
point.

Just to let you know that the modification you did in xunicode.sty with
\DeclareUTFcomposite[\UTFencname]{x00EE}{\^}{\i}
properly fixed the problem.

Thanks.

--
Julien ÉLIE

« -- Tu dois avoir un messager zélé autant qu'ailé
pour faire rapidement le trajet.
-- Oui ! et c'est une fine mouche ! » (Astérix)

Arthur Reutenauer

2008-02-06 12:10:24 UTC

Post by Julien ÃLIE
The first one is an unbreakable space and the second one is a normal
space.

I take it you really typed a no-break space in your input file; it's
not present in your e-mail, though. I guess it has been transformed
into a regular space character by your e-mail agent (and the latin-1
characters have been converted to quoted-printable, too); next time you
want to show a sample, use an attached file.

As for your questions, Jonathan explained why you get seemingly better
results by adding inputenc and fontenc: by complete chance, and mostly
thanks to the fonts you use, as well as some bargaining on the
encodings. It will not, and I stress, *not* give better results in
general, and you shouldn't stick to it if you expect help on finer
typographic issues.

Post by Julien ÃLIE
Thanks for the pointer but, well, Babel has stuff like
\AddThinSpaceBeforeFootnotes, \FrenchFootnotes, \ier and other
commands I use so switching to polyglossia breaks a lot of things...

Actually, you shouldn't see it that way: the biggest leap is from
(pdf)TeX to XeTeX, not from Babel to Polyglossia. Of course, you could
continue using XeTeX mostly the same way you used TeX, but you would be
missing most of XeTeX's advantages if you did so, which is why you want
to use fontspec and other related packages. But you should realize that
a part of fontspec's job is to do the same actions for XeLaTeX as do
inputenc and fontenc for traditional LaTeX, and you shouldn't think of
it as a complement to the latter packages, but really as a replacement
(after all, XeLaTeX without any package isn't really useful at all; just
compare the two attached files: they demonstrate how fontspec takes over
the task of inputenc + fontenc, and this is how you should understand
it). Polyglossia is simply the next step, and even if for the moment it
lacks some commands for fine typography, those options can be added in
due time. Just be patient!

Arthur

Post by Julien ÃLIE
It depends on the sentence. This one should be better for translations :)

But it's not by Goscinny :-D http://fr.wikipedia.org/wiki/L'Odyssée_d'Astérix

Julien ÉLIE

2008-02-06 20:03:40 UTC

Hi Arthur,

Post by Arthur Reutenauer
I take it you really typed a no-break space in your input file; it's
not present in your e-mail, though. I guess it has been transformed
into a regular space character by your e-mail agent

Yes they were. I do not know why but both Windows Mail and Thunderbird
transform 0x00A0 to 0x0020 as soon as they are typed or pasted.

Post by Arthur Reutenauer
(and the latin-1 characters have been converted to quoted-printable, too)

Yes, but it is not a problem of the e-mail agent. It is the mailman run
by tug.org which does that!
It converts these characters to QP, even in your message by the way:
<20080205011234.GE28560-CModOCBWDQaW0+X1QFNjvGD2FQJk+8+***@public.gmane.org>.
And my last message <fod313$lpn$***@ger.gmane.org> encoded in UTF-8 was
even converted to base64...

Post by Arthur Reutenauer
next time you want to show a sample, use an attached file.

All right.

Post by Arthur Reutenauer
But you should realize that
a part of fontspec's job is to do the same actions for XeLaTeX as do
inputenc and fontenc for traditional LaTeX, and you shouldn't think of
it as a complement to the latter packages, but really as a replacement

I agree.

Post by Arthur Reutenauer
(after all, XeLaTeX without any package isn't really useful at all; just
compare the two attached files: they demonstrate how fontspec takes over
the task of inputenc + fontenc, and this is how you should understand
it).

I understand. Thanks for the samples.

Post by Arthur Reutenauer

Post by Julien ÃLIE
It depends on the sentence. This one should be better for translations :)

But it's not by Goscinny :-D http://fr.wikipedia.org/wiki/L'Odyssée_d'Astérix

Au temps pour moi. Vous avez tout à fait raison !

--
Julien ÉLIE

« L'éternité, c'est long, surtout vers la fin. » (Woody Allen)

41 Replies
39 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Julien ÉLIE 2008-02-04 21:46:18 UTC

Arthur Reutenauer 2008-02-05 01:12:34 UTC

Julien ÉLIE 2008-02-05 18:25:16 UTC

Jonathan Kew 2008-02-06 00:01:04 UTC

François Charette 2008-02-06 08:09:38 UTC

Julien ÉLIE 2008-02-06 13:00:19 UTC

Arthur Reutenauer 2008-02-06 21:06:55 UTC

Julien ÉLIE 2008-02-06 21:21:18 UTC

Arthur Reutenauer 2008-02-06 22:49:30 UTC

Julien ÉLIE 2008-02-07 19:31:24 UTC

Will Robertson 2008-02-06 22:55:45 UTC

Max Rabkin 2008-02-07 08:23:44 UTC

Julien ÉLIE 2008-02-07 19:31:28 UTC

Joel C. Salomon 2009-09-30 21:38:03 UTC

Will Robertson 2009-09-30 22:54:25 UTC

Julien ÉLIE 2008-02-06 19:49:33 UTC

Peter Dyballa 2008-02-06 20:39:22 UTC

Julien ÉLIE 2008-02-06 20:45:02 UTC

Will Robertson 2008-02-06 20:54:29 UTC

François Charette 2008-02-06 21:38:27 UTC

Bruno Voisin 2008-02-07 07:43:36 UTC

Julien ÉLIE 2008-02-07 19:35:56 UTC

Ulrike Fischer 2008-02-07 13:14:48 UTC

Peter Dyballa 2008-02-07 14:43:09 UTC

Ulrike Fischer 2008-02-07 15:47:21 UTC

Peter Dyballa 2008-02-07 16:42:15 UTC

Ulrike Fischer 2008-02-07 17:39:38 UTC

Peter Dyballa 2008-02-07 23:10:09 UTC

Ross Moore 2008-02-08 00:31:06 UTC

Ulrike Fischer 2008-02-08 09:40:20 UTC

Peter Dyballa 2008-02-08 10:34:42 UTC

Bruno Voisin 2008-02-08 10:50:41 UTC

Arthur Reutenauer 2008-02-08 11:26:01 UTC

Peter Dyballa 2008-02-08 12:23:31 UTC

Bruno Voisin 2008-02-07 17:03:06 UTC

Peter Dyballa 2008-02-07 18:25:45 UTC

Bruno Voisin 2008-02-07 22:21:03 UTC

Julien ÉLIE 2008-02-07 19:31:16 UTC

Ross Moore 2008-02-07 22:11:51 UTC

Julien ÉLIE 2008-02-08 13:50:14 UTC

Arthur Reutenauer 2008-02-06 12:10:24 UTC

Julien ÉLIE 2008-02-06 20:03:40 UTC

about - legalese

Loading...