On 11/17/2001 12:07:35 PM "S Leng" wrote:

>And I also wrote "everybody working in this field can see this
>standard table" , it means for me that Cambodian who are programmers,
>computers engineers, and so on , when they study computer science they can
>see this standard.

Without trying to argue for or against either opinion, I would like to understand why it is a problem for Cambodian computer engineers if it produces the necessary results.

>Could you please read once again the attached document especially page 9.

There are some points in this paper that seem valid to me. For example, I would not have expected U+17D8 KHMER SIGN BEYYAL to be encoded as a single character. Also, the presence of both U+17A2 and U+17A3 is likely to cause confusion for users, and I am led to wonder if it isn't a mistake to infer the need for a distinct QAQ in order to transliterate when the need has not been felt for hundreds of years. Looking at comparable issues in Thai script, it seems clear that such things were not needed to allow for successful software implementations for that script.

Some of those arguments seem valid to me. For better or worse, though, the encoding is part of the Standard, and I don't see that there could be any way to effectively change it.

Some of the arguments against the virama are not convincing, however.

First, since keyboard input has been mentioned in this thread, I will note the comment that "the encoding issue is independent from the key typing issue", which I agree with.

Secondly, I note that the developers involved with KPP did not see any technical limitation in the virama model: "if you decide to accept [the newly defined Khmer block in UCS/Unicode], we can recode our implementation accordingly."

Thirdly, with respect to some specific arguments against the virama model for Khmer:

"(a) The virama model is groundless for Khmer script;"

This is merely an assertion; it is not a valid argument against the virama model for Khmer.

"(b) The virama model is inefficient for Khmer because it will increase normal text file size by around 20%. This is a clear demerit for end-users;"

I cannot see that this is such a serious issue. Changing from 8-bit encodings to Unicode/ISO 10646 will have changed file sizes for text in Khmer and many other scripts by 100%, but that has been seen to be a non-issue by all.

"(c) It will impose frequent application of an unnecessary step in the rendering phase (substituting COENG+consonant with a glyph of a subscript consonant). This is a demerit for implementers;"

Again, I do not see this as a serious issue: general-purpose rendering technologies can deal with these issues. Moreover, these technologies are required regardless of whether the virama model is used in order to deal with issues of subscript / diacritic positioning and ligation in Khmer.

"(d) The rendering rules used for Devanagari cannot be used for Khmer even if its encoding were to be based on the virama model. The merit for implementers is not highly significant;"

That rendering rules for Devanagari cannot be applied to Khmer is, strictly speaking, true: for example, Devanagari has interactions with ZWJ and ZWNJ that do not apply to Khmer. That is a valid counter-argument against an argument that Devanagari rendering rules can be used for rendering Khmer. I does *not*, however, constitute an argument against the virama model for Khmer; at best, it nullifies an argument in favour of the virama model for Khmer. In addition, while Devanagari rendering rules might not exactly match what is needed for Khmer, it is the case that applying a virama model to Khmer would make rendering implementations needed for it similar *in some respects* to implementations for other scripts, thus making it possible for implementers to work with familiar concepts. Therefore, it seems to me that the argument still stands that there is merit with the virama model in this regard.

"(e) The only practical reason for the virama model seems to be to economize on the number of code points, as one of the authors of N2385 clearly said in the WG2 London meeting (WG2 N1903), but this is not a merit for end-users or for implementers."

Economy of code points is one reason, though of itself it is not really a strong argument. But there are others. e.g. a rendering model that would be familiar to implementers is a practical benefit. Maurice has suggested others benefits that related to other processes. If you concede that the encoding details can be hidden from end users, then I do not think we can say that there is any detriment to users of the virama model. I have not seen any explanation of how the virama model would be detrimental for implementers.

There is another consideration which to me seems to indicate a significant benefit to users and implementers of the virama model for Khmer: it is part of the Standard now. Efforts to change the Standard are resulting in delays in implementations of Unicode / ISO 10646 encoding for Khmer, and they are delaying support for Khmer in some important commercial software products using *any* encoding. Moreover, if there were to be changes that resulted in distinct but equivalent representations, that would certainly be detrimental to both users and implementers. I realise that no one is advocating ambiguity, but it might be that, if the requested changes were put into effect, that There may yet be other strong arguments for making that change, and I am certainly open to hearing them. But I think we need to keep in mind the implications of such change.

In view of all this, it seems to me that this argument against a virama model is not really valid.

"(f) In sum, the virama model has nothing that can justify deviation from the natural and efficient approach for Khmer: explicit encoding of subscripts."

This is like point (a) in being merely an assertion and not a valid argument against the virama model for Khmer.

"1. The authors of N2385 present the difficulty of determining the entire set of possible subscript characters in advance as a reason for adoption of the virama model. However, a character set has to be explicitly determined in a standard. Without it, each implementer might define it as they like."

On the one hand, the possible subscripts needs to be known in advance by those developing the encoding Standard, or by those developing fonts. On the other hand, it the days of writing prior to computer implementations, nothing existed to prevent an implementer (an author / scribe) from innovating new subscripts other than pressures due to the general behaviours of the script and the typical ways in which it is used. It seems to me that the same is true here as well.

I have not covered all of the points in the document. After reading it, I'm left thinking that the main argument against the existing encoding is mainly that COENG as a separate character does not match what is known within Khmer culture about the script. I am not in a position to speak for or against that point (except to point out that it appears not to be relevant with regard to implementations). I do question the validity in some of the arguments discussed above, however.


- Peter

>Khmer language is our own culture and we don't need the implementation of
>different model (if it is the inside rendering it's OK, but here it's a
>standard, in other word our face). Developed countries can dominate in
>economy, in technology, but please don't change our existing culture. It's
>our identity, we don't need the domination on our culture.
>Svay Leng
> - n2406.pdf