<Michael Everson wrote>

> Well, there are good reasons for that, namely the stability of the
> standard and the suitability of the code set for encoding Khmer, but
> you are right we are obliged to discuss N2406. I will give it some
> attention tonight.
> One of the things N2406 does is it criticises some errors in N2380,
> on page 10. Please understand that those are my typing errors.
> Maurice and I wrote that paper in only two days, staying up late at
> night, under extreme pressure because I had to fly to New York the
> morning we finished, and also because the Singapore conference was
> happening on the following Monday. We made the best contribution we
> could in the time we had, because we wanted to help with the
> conversations in Singapore even though we couldn't be there.
> Best regards,
> --
> Michael Everson *** Everson Typography *** http://www.evertype.com
OK, never mind. I understand the condition when you wrote this document. It's same also for me, when I have no time I will be nervous too, and make mistake. Nobody is perfect. And thank you again for the contributiuon in Singapore. From the discussions after Singapore's meeting, we knew each other more than before and we hope that we can find out a better for all on our side and also on your side.

<Peter wrote>
>99.99% of people in Canada, Germany, Russia, Greece, China, Japan, India,
>Thailand, etc. will never know how their text is encoded. Is it *really*
>believed that Cambodians will be different, or is it just a matter that a
>small group of people who happen to be reviewing the technical details of
>the encoding are concerned about cultural perceptions that should really
>only be a concern with regard to end users?

I don't know about the number, but anyway I agreed that there are only a few number of people who know the standard. And in Cambodia I'm sure that the number is smaller, if the number is greater than this, may be there are many Cambodians working with IBM or Microsoft or HP.... like Chinese or Indian people.They will take care about Khmer standard and today problem will not occur.Even if it's just a matter for a small group,but let me say that Cambodian government need also a standard for interchange and can not have a standard different from ISO standard which is the same as UNICODE. For the globalization some countries in the world say that we have to use the same standards, and in the other hand if we say we don't need to take care small group, it's not fair(in this case it's not the voice of a small group, but of Cambodian National Body)(in my mind I think that American people or European people are very decromatic and fair.There are many minorities living there get some supports from the government for the conservation of there cultures)(here I don't say that you want to destroy our culture). But anyway I strongly believe that somebody will hit a good idea to solve this sensitive problem.

Svay Leng


First of all let me express my thanks for the (increasing!) mutual respect shown in the communications on this list. The issues being raised here are complex...and it is a lot easier to discuss them when the differing viewpoints are thoroughly considered.

Earlier I requested that we next move our discussion to Khmer sorting/collation. Michael brought me back to topic by pointing out that "we're supposed to be dealing with the Cambodian objections to the way Khmer has been encoded". Hence, may we try to focus on characters which affect sorting (some of which the Cambodian delegation presumably would prefer to deprecate [which is not deleting...it is just annotating with a recommendation to ignore] and some of which the Cambodian delegation has proposed as additions).

There are many ways that a human language can be used and the visual display of characters is a relatively small part of that. What I most wish for Khmer are standardised sorting and searching...enterprise functions often associated with relational databases.

For sorting and searching to proceed efficiently and accurately it is important to:
(1) Avoid ambiguity (don't use two different codes for what is essentially the same thing)
(2) Emphasize uniformity (do things the same way: ensure the characters of a word are always in the same order, use global rules instead of observing multiple exceptional conditions)
(3) Distinguish what needs to be distinguished

T[w]o classes of vowels have occasioned some controversy. These relate in part to sorting: INHERENT VOWELS and PRODUCTIVE VOWELS.

(1) Two inherent vowels have been encoded in Khmer Unicode (U+17B4 and U+17B5). As the names imply, these are invisible characters (in the Khmer script). U+17B4 is the abrupt/short inherent vowel inherited from Indic languages (and is exceptionally used in Khmer, typically when using Sanskrit [Pali?] loan words with their native pronunciations; often associated with the speech of Buddhist monks, royalty, and high ranking officials). U+17B5 is the long default Khmer inherent vowel. The pronunciation of these (indeed of virtually all vowels) is further refined according to the register of the consonants with which they are associated...but that has been ignored in the encoding.

(2) The note associated with these in Unicode cautions that "These are for phonetic transcription to distinguish Indic language inherent vowels from Khmer inherent vowels." They should not be used in conventional text (as that would doubtless violate the principle of uniformity [the same words of text would sometimes have them and sometimes not]). They are for specialists use.

(3) With transliteration, for example, their function becomes very important: They suddenly have to become visible and it would be very difficult to have the same Khmer encoded backing store without a link to them. I am presently working on a Graphite font which allows switching between ALA transliteration characters and proper Khmer script characters.

(4) Consonants are commonly thought to have these inherent vowels attached to them (but they just as easily 'lose' these vowels when followed by a subscript). There is at most one vowel per consonant cluster: it is either an inherent vowel, a dependent vowel, or no vowel at all [as is typically the case in the final cluster of a Khmer word...when there is more than one consonant cluster in the word]. Indic words used in Khmer speech more often have a distinctly pronounced final inherent vowel [if they lack a dependent vowel].

(5) Furthermore they do distinguish separate words which have separate entries when sorting in Chuon Nath's dictionary (note the entries on Arabic number page 1583). The adjective is derived from Sanskrit and Pali and is
short inherent (with a meaning of 'negation'). The noun is Khmer with a long inherent vowel and is the name of a kind of rice.

(1) Khmer is reported to have more vowels than any other language. There are so many we do not know the exact number! I have prepared (and recently edited) a document which illustrates this:

(2) In the document just referenced (pages 4 and 5) one can note page numbers from Chuon Nath's dictionary that illustrate that the union of KHMER SIGN YUUKALEAPINTU, KHMER SIGN REAHMUK, and KHMER SIGN NIKAHIT with dependent (or inherent) vowels to create new vowels. The ranking that is achieved is not simply that of the addition of a sign (for signs have a very minor [secondary] affect on sorting). They change the ranking of the vowel they follow...causing it to be sorted in a different location from the stand alone vowel. Typically the 'productive vowel' is sorted near its associated dependent vowel...but in the case of NIKAHIT, the shift is substantial (all the way to the end of the vowel listing).

(3) When one looks up the descriptions of KHMER SIGN YUUKALEAPINTU, KHMER SIGN REAHMUK, and KHMER SIGN NIKAHIT in the Chuon Nath dictionary they are in fact described as signs.

(4) KHMER SIGN YUUKALEAPINTU, KHMER SIGN REAHMUK, and KHMER SIGN NIKAHIT furthermore do not have what are essentially vowel sounds: they are typically glottal stop, 'h' and 'm' respectively.

(5) According to the classification of characters allowed in a cluster, there is only one 'vowel' per consonant cluster. To call any one of these three a vowel would violate that principle (introducing two vowels per cluster).

(6) If, for example, a NIKAHIT were considered for convenience sake to be a vowel; how would we encode the SIGN nikahit? Would it not be confusing...which one should be typed in any given situation? I appreciate that there are similar problems with U+17A2 and U+17A3 (but U+17A3 is only for specialist use, not for general use).

(7) The encouraging thing is that the complexity of dependent (or inherent) vowels combining with a sign to 'produce' new vowels [for collation purposes] can be handled algorithmically in the pre-processing of sorting. We do not have to encumber the typist (or the standardisation labourer) with every combination (particularly because [as was the case with independent vowel subscripts] we cannot anticipate every one that we will encounter).

Hence, I would recommend that we do not deprecate the inherent vowels (to the point of asking that they not be used at all; although it might be useful to more strictly acknowledge in the associated annotation that they are only for specialist use). Further I recommend that we do not create new vowels out of NIKAHIT, for the handling of the sorting order of productive vowels with NIKAHIT is a separate issue from the encoding (and is something that is similarly handled in the cases of KHMER SIGN YUUKALEAPINTU and KHMER SIGN REAHMUK).

Sorry...this was a long message. Hope no one fell asleep in a dangerous position while plodding through it;-)