Discussion:
[XeTeX] how to customize sorting order of characters in index
Kamal Abdali
2018-05-13 23:04:10 UTC
Permalink
I'm using polyglossia and imakeindex to produce an Urdu document. The
sorting order of letters in the index is wrong. The order is according to
Arabic, but Urdu has about 9 more letters which are being pushed after the
Arabic letters in the index. I couldn't find an option to fix this in the
imakeindex style file. It probably should be specified in polyglossia but I
don't see a way there either. Any suggestions please?

Kamal Abdali
Zdenek Wagner
2018-05-13 23:30:10 UTC
Permalink
I do not know how to do it in imakeindex but I could do it in xindy. I have
an Urdu-Hindi dictionary so I can find the sort oder of most characters but
I do not know where to put bari yeh (ے), dochashmee he (ÚŸ) and maybe a few
more characters that never appear at the beginning of the word. This is not
too much work, if you give me the correct order of all characters
(including alif maqsura which appears in words like موسی), I can make it
(but it may take a few days, I am currently quite busy).


Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Post by Kamal Abdali
I'm using polyglossia and imakeindex to produce an Urdu document. The
sorting order of letters in the index is wrong. The order is according to
Arabic, but Urdu has about 9 more letters which are being pushed after the
Arabic letters in the index. I couldn't find an option to fix this in the
imakeindex style file. It probably should be specified in polyglossia but I
don't see a way there either. Any suggestions please?
Kamal Abdali
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
Kamal Abdali
2018-05-14 00:17:38 UTC
Permalink
Zdeněk,

Thanks for responding promptly and offering to help.

The Urdu letters, in their sorting order, are as follows:

Ø¢ ا Øš ÙŸ ت Ù¹ Ø« ج چ Ø­ Ø® د ڈ Ø° ر ڑ ز ژ س ØŽ

ص ض Ø· Øž ع غ ف ق Ú© Ú¯ ل م ن Úº و ہ ÚŸ Ø¡ ی ے

As you mentioned the following letters never start a word:

ÚŸ Ø¡ ے

And Úº occurs only at the end of a word. (Urdu is working hard to create
more challenges for its learners -:) ). But all the letters need to be
included in the sorting order.

T
​hanks in advance,

Kamal Abdali
​
Post by Zdenek Wagner
I do not know how to do it in imakeindex but I could do it in xindy. I
have an Urdu-Hindi dictionary so I can find the sort oder of most
characters but I do not know where to put bari yeh (ے), dochashmee he (ÚŸ)
and maybe a few more characters that never appear at the beginning of the
word. This is not too much work, if you give me the correct order of all
characters (including alif maqsura which appears in words like موسی), I can
make it (but it may take a few days, I am currently quite busy).
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
Post by Kamal Abdali
I'm using polyglossia and imakeindex to produce an Urdu document. The
sorting order of letters in the index is wrong. The order is according to
Arabic, but Urdu has about 9 more letters which are being pushed after the
Arabic letters in the index. I couldn't find an option to fix this in the
imakeindex style file. It probably should be specified in polyglossia but I
don't see a way there either. Any suggestions please?
Kamal Abdali
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
--------------------------------------------------
http://tug.org/mailman/listinfo/xetex
Zdenek Wagner
2018-05-20 12:54:24 UTC
Permalink
Hi,

I knew that I already had something in my computer. Many years ago
someone sent me some files for Arabic/Urdu/Farsi and I made some tests
and modifications. Looking at the old Urdu files (saved as old-*) I
see that letter Ø« was missing (I do not know the names of the letters,
I know only some of them). Hamza waz not present as a letter, it was
present as maza above waw, yeh, and baree yeh. This means that for
instance ماء would not be sorted correctly. He goal and dochashmee he
were defined as a group. This means that dochashmee he would never be
used as a heading but ؚڟارت would precede ؚہانا and ÙŸÚŸÙ„ would precede
ٟہلو which is probably wrong. If it is correct, I will revert it.
Similarly, yeh and baree yeh were defined as a group which is probably
harmless, the index should contain چڟہٹا but not چڟہٹے and چڟہٹی
(different grammatical forms). I have split them as well as ن and ں
which is most probably harmless because the dotless form is present
only at the end of a word. But without splitting میں would precede
مینار and I do not know whether it is correct. Please, test the
attached xindy module. Even with XeTeX it should be invoked as

texindy -I omega -L urdu

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
...
Being engaged in thermodynamics it is nice to know how entropy is said
in Urdu. My former boss has a large collection of "vapour-liquid
equilibrium" translated into many languages. How is it in Urdu?
...
(Taking a brief thermodynamical break on this group :-) ), the equivalent
Urdu term would be
ؚخارما؊ع تعادل
(pronounced "bukẖār-maÌ„ÊŸiÊ¿ taÊ¿ādul").
...There are many other problems. The Persion module is a good
startbecause Persian also makes use of a few non-Arabic letters
...
...
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
I have several comments about what your suggestions. Let me write them
carefully and send those to you a bit later.
Thanks,
Kamal
Loading...