Sunday, September 03, 2006

The Chinese language

There is no such thing as "the Chinese language". This should be no surprise to people. When you learn more about Chinese languages, it does not take long that people mean Mandarin and the simplified script when they talk about the Chinese language. For WiktionaryZ, this is not a position that we can leave like that; WiktionaryZ is to be a lexicological, terminological and ontological resource in every language. So we will change Chinese and have it called Mandarin (simplified) we will add Mandarin (traditional) and Min Nam to the languages that we will support in WiktionaryZ.

The basis for this is the way we have embraced ISO-639-3 and the experience we have gained with Serbian and English. For Serbian we have two scripts and this works well, for English the words that are universal are English, the specific US-American words are now English (American).
The only thing left for English is to include English (British).

Thanks,
GerardM

2 comments:

Minh Nguyễn said...

The Vietnamese Wiktionary has been using templates for the structure of its entries for some time now. Most of our templates provide headings for parts of speech and language. Recently, we added another class of templates to provide headings according to writing system, so that {{-Latn-}} will mark a entry as a Latin character, {{-Hans-}} as a Simplified Chinese character, etc. (We use a similar system in the Translations section of each entry.)

One thing I’m trying to figure out, though, is what to do with chữ Nôm characters. Many of these characters still have meaning when transliterated into the modern, Latin-based Vietnamese alphabet, but they no longer have any meaning in Chinese, since chữ Nôm was based off classical Chinese characters.

I thought of using the {{-vie-}} tag to mark these characters as being Vietnamese, since they no longer have meaning in the Chinese languages, but this might make the Vietnamese Wiktionary look silly to its readers, because Vietnamese is now so far removed from the Chinese-based script that Vietnamese speakers would never classify chữ Nôm characters as being part of the Vietnamese language, even if they are specific to Vietnamese.

Since you’ve been doing a lot of work with multilingualism at WiktionaryZ, how do you think the project would handle such a situation? In general, how do you handle words in the ancient form of a language?

 – Minh Nguyễn / Nguyễn Xuân Minh

GerardM said...

At this moment I am very much a "standards" junkie. When there is a standard solution I'll have it, when there is no such thing I procrastinate.

ISO 15924 is the standard for scripts and, I do not recognise that the chữ Nôm characters are in there on there own. This means that either they are not part of UNICODE or they are part of an other script..

So essentially at this moment I am at a loss because I do not know. I do know that people who are in the standards committees acknowledge that it is an ommission that there is no support for orthographies. I would not be suprised that chữ Nôm could be characterised as an orthography and not as a script.

Thanks,
Gerard