Discussion:
[XeTeX] problem with discretionary
jfbu
2017-03-12 14:34:31 UTC
Permalink
Hi,

the problem is already fixed at XeTeX 0.99996 but it shows
with 0.99992 (TL2015), I tried to browse the commit history
at sourceforge to check which commit fixed that,
but I am not familiar enough with the repository, and besides
it could have been fixed as a collateral and perhaps a test
file needs to be added. So here we go:

% succeeds with:
% This is XeTeX, Version 3.14159265-2.6-0.99996 (TeX Live 2016) (preloaded
% format=xetex 2017.2.16)

% fails with:
% This is XeTeX, Version 3.14159265-2.6-0.99992 (TeX Live 2015) (preloaded
% format=xetex)
% ! Improper discretionary list.
% <recently read> }

% ;->\discretionary {\char `\;}
% {}{\char `\;}
% l.12 a;
% b

\catcode`@ 11
\XeTeXinterchartokenstate=1
\newXeTeXintercharclass\***@punctthin
\XeTeXcharclass `\; \***@punctthin
\XeTeXinterchartoks 255 \***@punctthin = {\nobreak\thinspace}%
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
% Local variables:
% TeX-engine: xetex
% End:

as you see from the notation this originated in use of polyglossia+french

the problem depends on the contents of the {\nobreak\thinspace} data,
with other data it does not show.

don't be alarmed by my strange \discretionary, it is here for
demonstration, original differs a bit.

So my query is which version of XeTeX fixed that issue ? (was it fixed
at initial TL2016 ? I only have a mature TL2016)

Best,

Jean-François





--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
jfbu
2017-12-03 09:19:49 UTC
Permalink
Hi,

I need some help to identify which XeTeX release fixed
that problem, the mwe is

\catcode`@ 11
\XeTeXinterchartokenstate=1
\newXeTeXintercharclass\***@punctthin
\XeTeXcharclass `\; \***@punctthin
\XeTeXinterchartoks 255 \***@punctthin = {\nobreak\thinspace}%
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye

In real life it appeared in a Polyglossia+French context
with the semi-colon make active to insert a \discretionary
similar to the above. There is no issue in lualatex.

It is currently seen at Python upstream (CPython) when
they try to build French docs (via Sphinx)

https://bugs.python.org/issue31589

and it would be nice to pinpoint which XeTeX release
precisely is ok. I know 0.99992 is bad and 0.99996 is good,
but can't easily bisect.

Best,

Jean-François




--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Zdenek Wagner
2017-12-03 10:01:35 UTC
Permalink
Hi,

please, notice that the number of character classes was increased from 256
to 4096, so 255 no longer works as a boundary but 4095 must be used. I use
the following code that I took from some other package:

\edef\CSat{\the\catcode`\@} % in order to work in plain XeTeX
\catcode`\@=11
\ifdefined\***@alloc@***@top
\chardef\CSboundary=\***@alloc@***@top
\else
\ifdefined\XeTeXinterwordspaceshaping
\chardef\CSboundary=4095 %
\def\newXeTeXintercharclass{%
\***@alloc\XeTeXcharclass\chardef
\***@alloc@intercharclass\***@ne\@***@boundary}
\else
\chardef\CSboundary=255
\fi
\fi
\catcode`\@=\CSat

Afterwards I use \CSboundary instead of a fixed number. It thus works both
with the old and new XeTeX.


Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Post by jfbu
Hi,
I need some help to identify which XeTeX release fixed
that problem, the mwe is
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
In real life it appeared in a Polyglossia+French context
with the semi-colon make active to insert a \discretionary
similar to the above. There is no issue in lualatex.
It is currently seen at Python upstream (CPython) when
they try to build French docs (via Sphinx)
https://bugs.python.org/issue31589
and it would be nice to pinpoint which XeTeX release
precisely is ok. I know 0.99992 is bad and 0.99996 is good,
but can't easily bisect.
Best,
Jean-François
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
jfbu
2017-12-03 10:58:34 UTC
Permalink
Thanks Zdeněk!

Should I thus conclude from this that polyglossia + French is currently broken ?
indeed the file gloss-french.ldf uses hardcoded 255 at various locations.

I am a bit lost though because my test mwe

\catcode`@ 11
\XeTeXinterchartokenstate=1
\newXeTeXintercharclass\***@punctthin
\XeTeXcharclass `\; \***@punctthin
\XeTeXinterchartoks 255 \***@punctthin = {\nobreak\thinspace}%
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye

compiles fine with current XeTeX, but not with TL2015 XeTeX.

(the @ thing is only to stay close to control sequence names from gloss-french.ldf)

To clarify, the \def;{\discretionary{\char`\;}{}{\char`\;}} is analogous to
the kind of things Sphinx does in verbatim listings to allow linebreaks,
but isn't the exact thing.

Anyway, it does not originate from polyglossia nor
gloss-french.ldf but is a Sphinx add-on inside code listings.

If the problem can be solved by a patch at macro level, that would
be best, because it would allow the CPython internationalization
team to build their PDF docs without worrying about which XeTeX
they use, I notice some of their team uses Debian 2013.

Best

Jean-François
Post by Zdenek Wagner
Hi,
\else
\ifdefined\XeTeXinterwordspaceshaping
\chardef\CSboundary=4095 %
\def\newXeTeXintercharclass{%
\else
\chardef\CSboundary=255
\fi
\fi
Afterwards I use \CSboundary instead of a fixed number. It thus works both with the old and new XeTeX.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Hi,
I need some help to identify which XeTeX release fixed
that problem, the mwe is
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
In real life it appeared in a Polyglossia+French context
with the semi-colon make active to insert a \discretionary
similar to the above. There is no issue in lualatex.
It is currently seen at Python upstream (CPython) when
they try to build French docs (via Sphinx)
https://bugs.python.org/issue31589
and it would be nice to pinpoint which XeTeX release
precisely is ok. I know 0.99992 is bad and 0.99996 is good,
but can't easily bisect.
Best,
Jean-François
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
Zdenek Wagner
2017-12-03 11:09:34 UTC
Permalink
Post by jfbu
Thanks Zdeněk!
Should I thus conclude from this that polyglossia + French is currently broken ?
indeed the file gloss-french.ldf uses hardcoded 255 at various locations.
Yes, everything with hardcoded 255 is broken since TL 2016. It was long
enough available in the pre-release and mentioned in the lists for
developers. Please, report it to the maintainer of gloss-french.ldf

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Post by jfbu
I am a bit lost though because my test mwe
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
compiles fine with current XeTeX, but not with TL2015 XeTeX.
To clarify, the \def;{\discretionary{\char`\;}{}{\char`\;}} is analogous to
the kind of things Sphinx does in verbatim listings to allow linebreaks,
but isn't the exact thing.
Anyway, it does not originate from polyglossia nor
gloss-french.ldf but is a Sphinx add-on inside code listings.
If the problem can be solved by a patch at macro level, that would
be best, because it would allow the CPython internationalization
team to build their PDF docs without worrying about which XeTeX
they use, I notice some of their team uses Debian 2013.
Best
Jean-François
Hi,
please, notice that the number of character classes was increased from 256
to 4096, so 255 no longer works as a boundary but 4095 must be used. I use
\else
\ifdefined\XeTeXinterwordspaceshaping
\chardef\CSboundary=4095 %
\def\newXeTeXintercharclass{%
\else
\chardef\CSboundary=255
\fi
\fi
Afterwards I use \CSboundary instead of a fixed number. It thus works both
with the old and new XeTeX.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Post by jfbu
Hi,
I need some help to identify which XeTeX release fixed
that problem, the mwe is
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
In real life it appeared in a Polyglossia+French context
with the semi-colon make active to insert a \discretionary
similar to the above. There is no issue in lualatex.
It is currently seen at Python upstream (CPython) when
they try to build French docs (via Sphinx)
https://bugs.python.org/issue31589
and it would be nice to pinpoint which XeTeX release
precisely is ok. I know 0.99992 is bad and 0.99996 is good,
but can't easily bisect.
Best,
Jean-François
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
jfbu
2017-12-03 11:18:54 UTC
Permalink
Only to point out frenchb.ldf (babel-french) does indeed

\ifdim\the\XeTeXversion\XeTeXrevision pt<0.99994pt
\***@nonchar=255 \relax
\else
\***@nonchar=4095 \relax
\fi

whereas I see no similar thing in gloss-french.ldf

There seems to be two problems now, whereas
I only had one initially

- my mwe does not compile with xetex 0.99992

- possibly, polyglossia-french has an issue with
xetex 0.99994 and later

Jean-François
Post by jfbu
Thanks Zdeněk!
Should I thus conclude from this that polyglossia + French is currently broken ?
indeed the file gloss-french.ldf uses hardcoded 255 at various locations.
I am a bit lost though because my test mwe
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
compiles fine with current XeTeX, but not with TL2015 XeTeX.
To clarify, the \def;{\discretionary{\char`\;}{}{\char`\;}} is analogous to
the kind of things Sphinx does in verbatim listings to allow linebreaks,
but isn't the exact thing.
Anyway, it does not originate from polyglossia nor
gloss-french.ldf but is a Sphinx add-on inside code listings.
If the problem can be solved by a patch at macro level, that would
be best, because it would allow the CPython internationalization
team to build their PDF docs without worrying about which XeTeX
they use, I notice some of their team uses Debian 2013.
Best
Jean-François
Post by Zdenek Wagner
Hi,
\else
\ifdefined\XeTeXinterwordspaceshaping
\chardef\CSboundary=4095 %
\def\newXeTeXintercharclass{%
\else
\chardef\CSboundary=255
\fi
\fi
Afterwards I use \CSboundary instead of a fixed number. It thus works both with the old and new XeTeX.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Hi,
I need some help to identify which XeTeX release fixed
that problem, the mwe is
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
In real life it appeared in a Polyglossia+French context
with the semi-colon make active to insert a \discretionary
similar to the above. There is no issue in lualatex.
It is currently seen at Python upstream (CPython) when
they try to build French docs (via Sphinx)
https://bugs.python.org/issue31589
and it would be nice to pinpoint which XeTeX release
precisely is ok. I know 0.99992 is bad and 0.99996 is good,
but can't easily bisect.
Best,
Jean-François
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
Zdenek Wagner
2017-12-03 11:21:39 UTC
Permalink
I do not know the exact revision of the change. The code which I sent tests
the features, not the revision number.

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Post by jfbu
Only to point out frenchb.ldf (babel-french) does indeed
\ifdim\the\XeTeXversion\XeTeXrevision pt<0.99994pt
\else
\fi
whereas I see no similar thing in gloss-french.ldf
There seems to be two problems now, whereas
I only had one initially
- my mwe does not compile with xetex 0.99992
- possibly, polyglossia-french has an issue with
xetex 0.99994 and later
Jean-François
Thanks Zdeněk!
Should I thus conclude from this that polyglossia + French is currently broken ?
indeed the file gloss-french.ldf uses hardcoded 255 at various locations.
I am a bit lost though because my test mwe
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
compiles fine with current XeTeX, but not with TL2015 XeTeX.
To clarify, the \def;{\discretionary{\char`\;}{}{\char`\;}} is analogous to
the kind of things Sphinx does in verbatim listings to allow linebreaks,
but isn't the exact thing.
Anyway, it does not originate from polyglossia nor
gloss-french.ldf but is a Sphinx add-on inside code listings.
If the problem can be solved by a patch at macro level, that would
be best, because it would allow the CPython internationalization
team to build their PDF docs without worrying about which XeTeX
they use, I notice some of their team uses Debian 2013.
Best
Jean-François
Hi,
please, notice that the number of character classes was increased from 256
to 4096, so 255 no longer works as a boundary but 4095 must be used. I use
\else
\ifdefined\XeTeXinterwordspaceshaping
\chardef\CSboundary=4095 %
\def\newXeTeXintercharclass{%
\else
\chardef\CSboundary=255
\fi
\fi
Afterwards I use \CSboundary instead of a fixed number. It thus works both
with the old and new XeTeX.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Post by jfbu
Hi,
I need some help to identify which XeTeX release fixed
that problem, the mwe is
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
In real life it appeared in a Polyglossia+French context
with the semi-colon make active to insert a \discretionary
similar to the above. There is no issue in lualatex.
It is currently seen at Python upstream (CPython) when
they try to build French docs (via Sphinx)
https://bugs.python.org/issue31589
and it would be nice to pinpoint which XeTeX release
precisely is ok. I know 0.99992 is bad and 0.99996 is good,
but can't easily bisect.
Best,
Jean-François
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
jfbu
2017-12-03 14:34:46 UTC
Permalink
Hi,

regarding the character class issue

(which isn't directly the one
I had reported at http://tug.org/pipermail/xetex/2017-March/027056.html
and again at top of this re-newed thread)

github user eg9 already did a report upstream

https://github.com/reutenauer/polyglossia/issues/145

as far as I can tell, the issue remains unresolved.

Best,

Jean-François
I do not know the exact revision of the change. The code which I sent tests the features, not the revision number.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Only to point out frenchb.ldf (babel-french) does indeed
\ifdim\the\XeTeXversion\XeTeXrevision pt<0.99994pt
\else
\fi
whereas I see no similar thing in gloss-french.ldf
There seems to be two problems now, whereas
I only had one initially
- my mwe does not compile with xetex 0.99992
- possibly, polyglossia-french has an issue with
xetex 0.99994 and later
Jean-François
Post by jfbu
Thanks Zdeněk!
Should I thus conclude from this that polyglossia + French is currently broken ?
indeed the file gloss-french.ldf uses hardcoded 255 at various locations.
I am a bit lost though because my test mwe
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
compiles fine with current XeTeX, but not with TL2015 XeTeX.
To clarify, the \def;{\discretionary{\char`\;}{}{\char`\;}} is analogous to
the kind of things Sphinx does in verbatim listings to allow linebreaks,
but isn't the exact thing.
Anyway, it does not originate from polyglossia nor
gloss-french.ldf but is a Sphinx add-on inside code listings.
If the problem can be solved by a patch at macro level, that would
be best, because it would allow the CPython internationalization
team to build their PDF docs without worrying about which XeTeX
they use, I notice some of their team uses Debian 2013.
Best
Jean-François
Post by Zdenek Wagner
Hi,
\else
\ifdefined\XeTeXinterwordspaceshaping
\chardef\CSboundary=4095 %
\def\newXeTeXintercharclass{%
\else
\chardef\CSboundary=255
\fi
\fi
Afterwards I use \CSboundary instead of a fixed number. It thus works both with the old and new XeTeX.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Hi,
I need some help to identify which XeTeX release fixed
that problem, the mwe is
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
In real life it appeared in a Polyglossia+French context
with the semi-colon make active to insert a \discretionary
similar to the above. There is no issue in lualatex.
It is currently seen at Python upstream (CPython) when
they try to build French docs (via Sphinx)
https://bugs.python.org/issue31589
and it would be nice to pinpoint which XeTeX release
precisely is ok. I know 0.99992 is bad and 0.99996 is good,
but can't easily bisect.
Best,
Jean-François
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
jfbu
2017-12-03 14:46:22 UTC
Permalink
Sorry for noise as

https://github.com/reutenauer/polyglossia/pull/178

has been merged upstream.

The question now is when will this make its way to CTAN or TeXLive.

As far as I can tell CTAN has 1.42.4 and the above was merged into 1.42.5.

(this is secondary topic to this thread, though)

Best,

Jean-François
Post by jfbu
Hi,
regarding the character class issue
(which isn't directly the one
I had reported at http://tug.org/pipermail/xetex/2017-March/027056.html
and again at top of this re-newed thread)
github user eg9 already did a report upstream
https://github.com/reutenauer/polyglossia/issues/145
as far as I can tell, the issue remains unresolved.
Best,
Jean-François
I do not know the exact revision of the change. The code which I sent tests the features, not the revision number.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Only to point out frenchb.ldf (babel-french) does indeed
\ifdim\the\XeTeXversion\XeTeXrevision pt<0.99994pt
\else
\fi
whereas I see no similar thing in gloss-french.ldf
There seems to be two problems now, whereas
I only had one initially
- my mwe does not compile with xetex 0.99992
- possibly, polyglossia-french has an issue with
xetex 0.99994 and later
Jean-François
Post by jfbu
Thanks Zdeněk!
Should I thus conclude from this that polyglossia + French is currently broken ?
indeed the file gloss-french.ldf uses hardcoded 255 at various locations.
I am a bit lost though because my test mwe
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
compiles fine with current XeTeX, but not with TL2015 XeTeX.
To clarify, the \def;{\discretionary{\char`\;}{}{\char`\;}} is analogous to
the kind of things Sphinx does in verbatim listings to allow linebreaks,
but isn't the exact thing.
Anyway, it does not originate from polyglossia nor
gloss-french.ldf but is a Sphinx add-on inside code listings.
If the problem can be solved by a patch at macro level, that would
be best, because it would allow the CPython internationalization
team to build their PDF docs without worrying about which XeTeX
they use, I notice some of their team uses Debian 2013.
Best
Jean-François
Post by Zdenek Wagner
Hi,
\else
\ifdefined\XeTeXinterwordspaceshaping
\chardef\CSboundary=4095 %
\def\newXeTeXintercharclass{%
\else
\chardef\CSboundary=255
\fi
\fi
Afterwards I use \CSboundary instead of a fixed number. It thus works both with the old and new XeTeX.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Hi,
I need some help to identify which XeTeX release fixed
that problem, the mwe is
\XeTeXinterchartokenstate=1
\catcode`;\active
\def;{\discretionary{\char`\;}{}{\char`\;}}
a;b
\bye
In real life it appeared in a Polyglossia+French context
with the semi-colon make active to insert a \discretionary
similar to the above. There is no issue in lualatex.
It is currently seen at Python upstream (CPython) when
they try to build French docs (via Sphinx)
https://bugs.python.org/issue31589
and it would be nice to pinpoint which XeTeX release
precisely is ok. I know 0.99992 is bad and 0.99996 is good,
but can't easily bisect.
Best,
Jean-François
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
Continue reading on narkive:
Loading...