Discussion:
[XeTeX] Khmer: ligatures break if XeTeXlinebreaklocale is turned on
Jo Hund
2018-04-10 15:55:14 UTC
Permalink
Hi there,

I was told by David Carlisle on https://tex.stackexchange.com that the
xetex mailing list may be a good place to ask my question:

When generating documents in Khmer, we noticed that some ligatures do
not work. We found that turning off XeTeXlinebreaklocale fixes all
ligatures, however this causes problems with linebreaks, and lines
extend past the right margin if there aren't any zero width spaces
between words. (In Khmer words within the same sentence or phrase are
generally run together with no spaces between them.)

Our objective is to have all ligatures work, and to use
XeTeXlinebreaklocale at the same time.

Please note that we are not looking for a manual workaround for this
particular example. We are looking for a fix to the root cause that we
can then use in our automated system where we cannot fix individual
instances of this problem.

My question at
https://tex.stackexchange.com/questions/425652/khmer-ligatures-break-if-xetexlinebreaklocale-is-turned-on
includes screenshots of the generated output for both MWEs:


MWE 1: XeTeXlinebreaklocale turned off, ligatures work (text in red),
however text runs into right margin.

\documentclass[12pt]{article}

%\XeTeXlinebreaklocale "km"
\usepackage{color}
\usepackage{fontspec}
\setmainfont[Ligatures=Required]{Khmer OS Content}

\begin{document}
មានមនុស្សម្នាក់ដែលបានធ្វើការមួយដែលអាចដឹងបាន
នោះគឺជាកា{\color{red}ស្រ្តូ}នៅទីនេះ។ បាទ អ្នកទាំងអស់គ្នា។
គាត់បានចុះទៅទីនោះ។ ហើយពួកនាយទុនមានអស់ទាំងព្រនង់ ហើយនឹងអ្វីៗដែល…
វាគឺឋិតនៅក្នុងស្តង់ដាមាស ដូចជាសហរដ្ឋអាមេរិកដែរ។ តើគាត់បានធ្វើអ្វី?
គាត់បានទិញអស់ទាំងបណ្ណ័ភាគហ៊ុន។
គឺរកលុយតាមគ្រប់ទាំងវិធីដែលគាត់អាចធ្វើទៅបាន។ បន្ទាប់មកតើគាត់ធ្វើអ្វីទៀត?
គាត់ក៏បានក្លែងបន្លំក្រដាស់ប្រាក់ ហើយបានប្ដូរវា។ ហើយក៏ដាក់ទៅវិញ។
នេះហើយគឺជាអ្វីតែមួយដែលជាតិសាសន៍នេះអាចធ្វើបាន។
\end{document}


MWE 2: XeTeXlinebreaklocale set to "km", line breaks work, however some
ligatures are broken (text in red).

\documentclass[12pt]{article}

\XeTeXlinebreaklocale "km"
\usepackage{color}
\usepackage{fontspec}
\setmainfont[Ligatures=Required]{Khmer OS Content}

\begin{document}
មានមនុស្សម្នាក់ដែលបានធ្វើការមួយដែលអាចដឹងបាន
នោះគឺជាកា{\color{red}ស្រ្តូ}នៅទីនេះ។ បាទ អ្នកទាំងអស់គ្នា។
គាត់បានចុះទៅទីនោះ។ ហើយពួកនាយទុនមានអស់ទាំងព្រនង់ ហើយនឹងអ្វីៗដែល…
វាគឺឋិតនៅក្នុងស្តង់ដាមាស ដូចជាសហរដ្ឋអាមេរិកដែរ។ តើគាត់បានធ្វើអ្វី?
គាត់បានទិញអស់ទាំងបណ្ណ័ភាគហ៊ុន។
គឺរកលុយតាមគ្រប់ទាំងវិធីដែលគាត់អាចធ្វើទៅបាន។ បន្ទាប់មកតើគាត់ធ្វើអ្វីទៀត?
គាត់ក៏បានក្លែងបន្លំក្រដាស់ប្រាក់ ហើយបានប្ដូរវា។ ហើយក៏ដាក់ទៅវិញ។
នេះហើយគឺជាអ្វីតែមួយដែលជាតិសាសន៍នេះអាចធ្វើបាន។
\end{document}


Setup:

* TexShop 3.62 on Mac
* Typesetting it as XeLatex with XeTeX, Version 3.14159265-2.6-0.99996
(TeX Live 2016)
* Font Khmer OS Content:
https://drive.google.com/file/d/1YuoheWcKu9jS0cyZ-LU9g2ohicCMd5yX/view?usp=sharing


My question is whether there is a way to change our setup so that both
XeTeXlinebreaklocale and ligatures work?

Thank you for your help

Jo Hund



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo
Jonathan Kew
2018-04-10 19:02:15 UTC
Permalink
Post by Jo Hund
Hi there,
I was told by David Carlisle on https://tex.stackexchange.com that the
When generating documents in Khmer, we noticed that some ligatures do
not work. We found that turning off XeTeXlinebreaklocale fixes all
ligatures, however this causes problems with linebreaks, and lines
extend past the right margin if there aren't any zero width spaces
between words. (In Khmer words within the same sentence or phrase are
generally run together with no spaces between them.)
Our objective is to have all ligatures work, and to use
XeTeXlinebreaklocale at the same time.
Please note that we are not looking for a manual workaround for this
particular example. We are looking for a fix to the root cause that we
can then use in our automated system where we cannot fix individual
instances of this problem.
The problem arises because activating XeTeXlinebreaklocale effectively
makes xetex insert something like \penalty0 or \hskip0pt or similar
(depending on the settings of \XeTeXlinebreakpenalty and ...skip) at
each potential line-break position, so that the normal TeX line-breaking
algorithm will be able to find and use these breaks.

But the inserted penalty and/or skip interrupts the sequence of
characters that is passed to the OpenType shaping engine, and so
features like ligatures will not work across the boundary.

A possible workaround would be to set \XeTeXinterwordspaceshaping=2 in
your document. This will cause xetex to re-shape runs of text after
line-breaking, and at this point your ligatures should work.

There are some caveats: in particular you'll notice if you try this that
your red coloring of the example text fragment gets lost. This is
because the \special{}s that \color inserts will be moved out of the run
of text that is now being shaped as a unit. But depending on the needs
of your documents, this may be an acceptable trade-off.

Oh, by the way: you can change the \XeTeXinterwordspaceshaping setting
within the document if you like, but its effect does not respect the
usual TeX scoping rules; if I remember correctly,
\XeTeXinterwordspaceshaping=2 basically operates on a whole-page basis,
so what matters is the value at the time the page is completed.

HTH,

JK


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Jo Hund
2018-04-10 22:08:23 UTC
Permalink
Hi Jonathan,

Thank you very much for the helpful info. This fixed our problem with
ligatures. All our formatting seems to be working, too.

However there are some problems if we insert "\\hspace{0pt}" anywhere in
the document. In the worst case, the entire PDF conversion crashes, or
in some situations, the ligatures are broken, and character spacing is
off. We insert "\\hspace{0pt}" in combination with "\\nolinebreak[4]" in
some places to coax Latex to break lines where we want them to break
around some emdashes and other places. This is not a show stopper. We
can work around it. We were just wondering if you may have an obvious
solution to this.

Thanks again for your help, much appreciated

Cheers

Jo




--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
Jonathan Kew
2018-04-11 10:25:55 UTC
Permalink
Post by Jo Hund
Hi Jonathan,
Thank you very much for the helpful info. This fixed our problem with
ligatures. All our formatting seems to be working, too.
However there are some problems if we insert "\\hspace{0pt}" anywhere in
the document. In the worst case, the entire PDF conversion crashes, or
in some situations, the ligatures are broken, and character spacing is
off. We insert "\\hspace{0pt}" in combination with "\\nolinebreak[4]" in
some places to coax Latex to break lines where we want them to break
around some emdashes and other places. This is not a show stopper. We
can work around it. We were just wondering if you may have an obvious
solution to this.
I'm afraid I don't have any ideas about this. I'd normally expect
inserting \hspace{0pt} to interrupt ligatures, but it shouldn't cause
the job to fail altogether! That sounds like a bug.

If you can provide a test document where it causes a crash, we may be
able to investigate further.

Thanks,

JK


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex

Loading...