Discussion:
[XeTeX] Xetex equiv to luatex's \directlua{}
maxwell
2018-03-23 20:33:00 UTC
Permalink
I'm just finishing up a project that involved typesetting text in
several languages, while outputting an XML file that defined in X/Y
coordinates the position and size of the bounding box surrounding each
line of text in the PDF. I used Luatex, because that made it possible
to call Lua from Luatex using the \directlua{} command) to pass
information to Lua, and to return information from Lua to Luatex using
tex.print(). I also used Lua to write the XML file.

Too late, I discovered that LuaTeX botches the rendering of one of the
languages, Tamil. Tamil has a complex script, with some typical Indic
script features; so presumably LuaTeX would also mess up on other
languages with complex scripts. XeTeX of course does just fine at
rendering text in complex scripts.

As I say, it's too late to change now, but is there any way I could have
done something similar using xetex? That is, called another programming
language to output box positions and sizes. I suppose it's possible to
write to an XML file in xetex natively, but I'm not sure how I could get
the positions and sizes of boxes. My style sheet defines a command,
\outputpara{}, that requires the user to specify the X-position of the
paragraph, and hence of lines in the paragraph, where line breaks are of
course decided on the fly. The command optionally specifies the
Y-position of the paragraph, but the Y-position of each line in the
paragraph--except the first--is determined by the usual TeX algorithms.
Getting TeX to tell me those Y-positions, as well as the vertical size
of the box, was the difficult part. But maybe I was missing something
obvious?

Mike Maxwell
University of Maryland


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
David Carlisle
2018-03-23 20:44:36 UTC
Permalink
there are several ways to get the box output in classic tex (or xetex)
although perhaps the easiest (and safest in terms of not accidentally
affecting the typeset positions) is to use \showoutput so all boxes
are (somewhat verbosely) logged in the log file, and then parse that
with perl or python or whatever to get whatever lengths you need,

David
I'm just finishing up a project that involved typesetting text in several
languages, while outputting an XML file that defined in X/Y coordinates the
position and size of the bounding box surrounding each line of text in the
PDF. I used Luatex, because that made it possible to call Lua from Luatex
using the \directlua{} command) to pass information to Lua, and to return
information from Lua to Luatex using tex.print(). I also used Lua to write
the XML file.
Too late, I discovered that LuaTeX botches the rendering of one of the
languages, Tamil. Tamil has a complex script, with some typical Indic
script features; so presumably LuaTeX would also mess up on other languages
with complex scripts. XeTeX of course does just fine at rendering text in
complex scripts.
As I say, it's too late to change now, but is there any way I could have
done something similar using xetex? That is, called another programming
language to output box positions and sizes. I suppose it's possible to
write to an XML file in xetex natively, but I'm not sure how I could get the
positions and sizes of boxes. My style sheet defines a command,
\outputpara{}, that requires the user to specify the X-position of the
paragraph, and hence of lines in the paragraph, where line breaks are of
course decided on the fly. The command optionally specifies the Y-position
of the paragraph, but the Y-position of each line in the paragraph--except
the first--is determined by the usual TeX algorithms. Getting TeX to tell
me those Y-positions, as well as the vertical size of the box, was the
difficult part. But maybe I was missing something obvious?
Mike Maxwell
University of Maryland
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
maxwell
2018-03-23 21:49:43 UTC
Permalink
Post by David Carlisle
there are several ways to get the box output in classic tex (or xetex)
although perhaps the easiest (and safest in terms of not accidentally
affecting the typeset positions) is to use \showoutput so all boxes
are (somewhat verbosely) logged in the log file, and then parse that
with perl or python or whatever to get whatever lengths you need,
David
David, do you have a life? Everywhere I see some conversation about
TeX, your name is there...

Anyway, I just now tried your idea (thanks!), but I'm not clear how to
parse the results. When I run a small example with xelatex, I see lines
like
-------------
.\vbox(608.40024+0.0)x360.0, shifted 54.0
..\vbox(12.0+0.0)x360.0, glue set 12.0fil
(some lines omitted)
..\vbox(541.40024+0.0)x360.0, glue set 503.14648fil
(some lines omitted)
...\hbox(7.71974+2.25569)x360.0, glue set - 0.17555
....\hbox(0.0+0.0)x17.0
....\TU/lmr/m/n/10.95 Now
--------------
(where the word "Now" is the first word in one of the lines of PDF
output). I think the first hbox is a line of text in the output, but I
don't know what to make of those numbers. They seem to be approximately
the same for every line's hbox in the trivial example I wrote (except
for the number after "glue set", which is different for each one), so
maybe they translate into the horizontal position and/or size of the
line's hbox. And maybe the vbox line is the entire paragraph's box,
where the numbers translate into the vertical position and/or size of
the vbox. But I'm not sure how to go from those numbers to the position
(in points) of each hbox in the output PDF.

Is there a guide somewhere to interpreting this trace? I didn't see
anything on-line.

Mike


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
David Carlisle
2018-03-23 22:02:12 UTC
Permalink
The format is described in the TeXBook but in short

.\hbox(7.71974+2.25569)x360.0,

is a box of height 7.71974pt, depth 2.25569 and width 360.0pt nested
one level inside some other box

....\hbox(0.0+0.0)x17.0

is a box of height and depth 0 and width 17pt (it will be the
indentation box) that is nested 4 levels deep from the outer page box.


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Philip Taylor (RHUoL)
2018-03-24 10:13:38 UTC
Permalink
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Mike Maxwell
2018-03-24 15:37:04 UTC
Permalink
Post by maxwell
I'm just finishing up a project that involved typesetting text in
several languages, while outputting an XML file that defined in X/Y
coordinates the position and size of the bounding box surrounding each
line of text in the PDF.  [...] [I]s there any way I could have done
something similar using xetex?  That is, called another programming
language to output box positions and sizes.  I suppose it's possible
to write to an XML file in xetex natively, but I'm not sure how I
could get the positions and sizes of boxes.
*
Post by maxwell
\pdfsavepos
Saves the current location of the page in the typesetting stream.
*
Post by maxwell
\pdflastxpos
Retrieves the horizontal position saved by \pdfsavepos.
*
Post by maxwell
\pdflastypos
Retrieves the vertical position saved by \pdfsavepos.
Thanks--playing around last night, I found a way (I think) to do what I
wanted to do using the \zsavepos macro in the zref package. In case
anyone is interested, there's an example here:
https://tex.stackexchange.com/questions/10343/
What I still can't do is determine in xetex what the text contents of a
box is. I can do that in luatex; the Lua function nodeText() here--
https://tex.stackexchange.com/questions/228312/
does that. I don't think there's an equivalent in xetex.

Combining luatex and xetex, I think I have an (untested) solution to my
problem: first running my document through luatex to get the line boxes
and their contents, and then running it through xetex to get the proper
shaping, telling it where to put each line box and what to put in the
box. Very much a kludge, and I suppose the output won't be perfectly
justified, but for our purposes that won't matter.
--
Mike Maxwell
"My definition of an interesting universe is
one that has the capacity to study itself."
--Stephen Eastmond


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/x
Loading...