Discussion:
[XeTeX] ifcat changed?
Bruno Le Floch
2017-04-15 11:26:38 UTC
Permalink
Dear all,

The primitive conditional "\ifcat\relax\cr true\else false\fi" gives
"true" in pdfTeX, LuaTeX, (e)(u)pTeX, and XeTeX from some time ago
(could be years), but "false" in XeTeX 0.99996

It would be useful for me to know which of \ifcat, \relax, and \cr
changed, to determine whether I should just special-case \cr in my
package, or use some other tool than \ifcat.

Best regards,

Bruno


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Jonathan Kew
2017-04-15 12:18:13 UTC
Permalink
This sounds like a bug. Offhand, I don't know what changed to cause
this, but it probably shouldn't have!

Filing an issue at https://sourceforge.net/projects/xetex/ would be
useful, to help us keep track.

JK
Post by Bruno Le Floch
Dear all,
The primitive conditional "\ifcat\relax\cr true\else false\fi" gives
"true" in pdfTeX, LuaTeX, (e)(u)pTeX, and XeTeX from some time ago
(could be years), but "false" in XeTeX 0.99996
It would be useful for me to know which of \ifcat, \relax, and \cr
changed, to determine whether I should just special-case \cr in my
package, or use some other tool than \ifcat.
Best regards,
Bruno
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Ulrike Fischer
2017-04-15 17:36:14 UTC
Permalink
Post by Bruno Le Floch
The primitive conditional "\ifcat\relax\cr true\else false\fi" gives
"true" in pdfTeX, LuaTeX, (e)(u)pTeX, and XeTeX from some time ago
(could be years),
Looks like *many* years. I get the wrong output with texlive 2012.
Post by Bruno Le Floch
It would be useful for me to know which of \ifcat, \relax, and \cr
changed,
It looks like a problem with \cr for me, but it is difficult to be
sure.
--
Ulrike Fischer
http://www.troubleshooting-tex.de/



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Apostolos Syropoulos
2017-04-16 09:28:10 UTC
Permalink
Definitely a bug. The TeXbook defines the behaviour of \if and \ifcat,
and all control sequences are considered to have character code 256
and category code 16, unless \let equal to a non-active character, in
which case they have the value of that character.
After comparing the relevant code in
texlive/source/texk/web2c/luatexdir/tex/conditional.w (function void conditional(void))
and
texlive/source/texk/web2c/xetexdir/xetex.web (@<Test if two characters match@>;)
I think they are identical. Note these things process \if and \ifcat commands.

A.S.
----------------------
Apostolos Syropoulos
Xanthi, Greece
Zdenek Wagner
2017-04-16 10:00:51 UTC
Permalink
Post by Bruno Le Floch
The primitive conditional "\ifcat\relax\cr true\else false\fi" gives
"true" in pdfTeX, LuaTeX, (e)(u)pTeX, and XeTeX from some time ago
(could be years), but "false" in XeTeX 0.99996
Definitely a bug. The TeXbook defines the behaviour of \if and \ifcat,
and all control sequences are considered to have character code 256
and category code 16, unless \let equal to a non-active character, in
which case they have the value of that character.
Not all control sequences but primitives. Unlike \ifx, \if and \ifcat
perform full expansion.
Try the following code:

\def\a{$A$}
\def\b{hello}
\def\c{world}
\ifcat\a\b\else\c\fi

The output will be world because $ and A have different category codes.

Similarly, \ifcat\relax\a will compare \relax with $.


Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
Philip Taylor
2017-04-16 10:19:01 UTC
Permalink
Not all control sequences but primitives.
*\**if *<token1> <token2>
TeX will expand macros following *\if* until two unexpandable tokens are found. If either token is a control sequence, TeX considers it to have character code 256 and category code 16, unless the current equivalent of that control sequence has been *\let* equal to a non-active character token ...
*\**ifcat *<token1> <token2>
This is just like *\if*, but it tests the category code, not the character code ...
Philip Taylor



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Bruno Le Floch
2017-04-16 13:19:05 UTC
Permalink
Filed https://sourceforge.net/p/xetex/bugs/138/ with a text essentially
identical to my message below explaining the bug's origin and how to fix it.
....
Post by Zdenek Wagner
Definitely a bug. The TeXbook defines the behaviour of \if and \ifcat,
and all control sequences are considered to have character code 256
and category code 16, unless \let equal to a non-active character, in
which case they have the value of that character.
Not all control sequences but primitives. Unlike \ifx, \if and \ifcat
perform full expansion.
(a) Yes, they do perform expansion. That's irrelevant to the point at
hand, since expansion happens before the comparison.
\ifcat\noexpand\foo\noexpand\baz true\else false \fi
\ifcat\noexpand\foo\halign true\else false \fi
As Philip pointed out, I was reporting Knuth's words, which are by
definition authoritative.
As far as I can tell from the sources, the bug likely was there from the
start, and only affects \span, \cr and \crcr. Basically, their
character code is too small. This can be fixed by changing
"special_char" from 65537 to 1114112 or so, to make the values of
"span_code", "cr_code", "cr_cr_code" be above "biggest_usv".

The test \ifcat and \if use to distinguish control sequences from
normal/active characters is

(cur_cmd>active_char)or(cur_chr>biggest_usv)

Most tokens that are not character tokens have "cur_cmd" greater than
"active_char". All exceptions are primitives, among which \relax,
\span, \cr, \crcr. For these primitives, Knuth made sure that "cur_chr"
was bigger than 255, but some cases were not increased enough when
switching to Unicode in XeTeX. I think I went through all cases and
only "span_code", "cr_code", "cr_cr_code" need to be changed, although I
think it makes sense to also increase "special_char" (used as a
\noexpand marker).

On a related note, I think "define(p,relax,256)" should be
"define(p,relax,too_big_usv)" but I'm not quite following the code there
so don't trust me. Namely, I don't see how the XeTeX code ends up
correctly giving TRUE in \chardef\foo=123\ifx\relax\foo TRUE\fi.

Best,

Bruno


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Apostolos Syropoulos
2017-04-16 15:12:34 UTC
Permalink
Post by Bruno Le Floch
As far as I can tell from the sources, the bug likely was there from the
start, and only affects \span, \cr and \crcr.  Basically, their
character code is too small.  This can be fixed by changing
"special_char" from 65537 to 1114112 or so, to make the values of
"span_code", "cr_code", "cr_cr_code" be above "biggest_usv".
Exactly! This is the difference between XeTeX and luaTeX. The code that follows isfrom xetex.web

@d special_char=65537 {|biggest_char+2|}@d span_code=special_char {distinct from any character}
@d cr_code=span_code+1 {distinct from |span_code| and from any character}
@d cr_cr_code=cr_code+1 {this distinguishes \.{\\crcr} from \.{\\cr}}
and this code is from luaTeX's align.h:
#  define span_code 1114114     /*  {|biggest_char+3|} */
#  define cr_code (span_code+1) /* distinct from |span_code| and from any character */
#  define cr_cr_code (cr_code+1)        /* this distinguishes \.{\\crcr} from \.{\\cr} */


A.S.
----------------------
Apostolos Syropoulos
Xanthi, Greece
 
Apostolos Syropoulos
2017-04-28 14:50:30 UTC
Permalink
Hello,
I have studied a bit the source code of XeTeX and luaTeX. XeTex definesa procedure that is used to insert primitive commands into a hash table.The procedure;s declaration is as follows:
procedure primitive(@!s:str_number;@!c:quarterword;@!o:halfword);
This is not exactly Pascal but I think one understands what is going on.Now luaTeX inserts the TeX primitives with the following command
#  define primitive_tex(a,b,c,d)    primitive((a),(b),(c),(d),tex_command)
where function primitive is declared as follows:
extern void primitive(const char *ss, quarterword c, halfword o, halfword off,
                      int cmd_origin);
XeTeX uses catcodes to compare commands, while luaTeX has special command codeswhich are different from catcodes. I think this approach is better. However, itis my understanding that the problem that started this thread cannot be solved trivially.
A.S.

----------------------
Apostolos Syropoulos
Xanthi, Greece







 

Loading...