Copyright © The Document Company - Xerox 1997. All Rights Reserved.
All normal adult humans speak or sign at least one language, and language is so much a part of our humanity that the lack of language is a clear indication of a severe handicap or, in rare cases, a feral or abused childhood in which the person had little or no exposure to a language.
In contrast, there is nothing natural or essentially human about literacy, the ability to read and write. One can be a completely normal human being and be illiterate; and indeed illiteracy is still the norm in many places in the world.
An orthography is a learnable human technology consisting of 1) a set of characters and 2) conventions for using them to make language "visible". Prototypically these characters generally take the form of marks on paper, parchment, bark or some similar medium, but the notion is easily extended to notches cut in a stick, carvings in stone or metal, raised Braille dot patterns that are felt rather than seen, magnetized bits in a computer file, etc.
Many language communities currently have no culturally-accepted or "standard" orthography, but only because too few people have ever wanted to write them badly enough. A single language may have multiple orthographies in reasonably common use (at least three separate orthographies, all romanizations, have been proposed and used for Aymará; and Serbian and Croation are essentially the same language, with two different orthographies). Orthographies can change, through evolution or cultural revolution: up until the 1920s Turkish was written using Arabic letters and conventions; since then it has been written in a Roman orthography. But the Turkish language is still the Turkish language, no matter what orthography is used. English speakers have used many orthographies, including a fairly standard Roman version (with regional variations), Pitman shorthand, Gregg shorthand, Shavian and dozens of other proposed orthographical reforms. A competent linguist can invent a new, viable orthography for any human language.
Many language communities adopt their standard orthography more or less by historical accident. English and most of the languages of Western Europe have a Roman orthography culturally associated with them because these areas were conquered by the Romans and later proselytized by the Roman Catholic Church. Polish (a Slavic language) uses a Roman orthography because it too was proselytized from the Roman Catholic side. But Slavic Russian and Bulgarian speakers traditionally use a Cyrillic orthography because historically they were proselytized from the Greek Orthodox side. Serbo-Croatian is for all practical purposes a single language, but the Serbs use a Cyrillic orthography while the Croats use a Roman one; the difference is again the result of which missionaries got there first.
In a similar fashion, the conquest of Islam carried both religion and, to a lesser extent, the Arabic language; but where the local languages survived, as in Persia, Turkey, Indonesia and India, the speakers often adopted an orthography for their local language based on traditional Arabic characters.
The typical transcription of Arabic has as its purpose to convey the pronunciation of Arabic words, usually to foreigners who are not comfortable with traditional Arabic orthography. Given their previous schooling in the orthographies used for their native languages, Western Europeans are more comfortable with a Roman-based transcription; Russians and Bulgarians would obviously prefer a Cyrillic transcription, etc. In any case, traditional Arabic orthography includes silent letters, superficially ambiguous letters like waaw and yaa', and usually an absence of vowel signs and other diacritics necessary to convey the pronunciation reliably. For all these reasons, it is useful and proper for linguists, teachers and dictionary editors to devise and use whatever kinds of transcription are suited for their ends. These transcriptions are possible orthographies for Arabic, possible ways of making Arabic visible, but because they use different character inventories and different conventions, they are different from the standard Arabic orthography.
For an orthography to qualify as a transliteration, it must use the same orthographical conventions and a character set which has a one-to-one, fully reversible mapping with the character set of the original orthography.
The standard Arabic orthography is a clear case where writing, displaying and storing the original character shapes is often inconvenient for many people working with European-language text editors, email systems and networks; in these cases, there is often a practical need for a Roman transliteration that allows standard Arabic orthography to be represented faithfully using the available ASCII letters. Russians might devise a cyrillic transliteration for the same reason.
Many linguists, and including Arabists, have not learned to distinguish transcriptions from transliterations, and this leads to considerable confusion. The vast majority of Arabic romanizations are transcriptions, designed to convey the surface pronunciation or the deeper morphophonology of words; and many Arabists see no purpose in a romanization that doesn't serve this purpose. But in commercial Arabic natural-language processing systems, such as the Xerox Morphological Analyzer, where the input and output consist of written text in the traditional orthography, there is often a need for a true Roman transliteration. The Buckwalter Transliteration is used by the developers of the Xerox Arabic Morphology system when they need to communicate Arabic text, consistent with Arabic orthographical conventions but with substituted letter shapes, via common email and other media where it is inconvenient or impossible to display real Arabic script.
Inside the Xerox Arabic applets that display real Arabic script, strings are stored as UNICODE characters.
When a printer or terminal is directed to display a file of ASCII codes as English text, the codes are converted to letter shapes from a font and are then painted appropriately on paper or terminal screen by an English Rendering program.
cat [display on screen or paper]
^
|
English Rendering Program
^
|
99, 97, 116 [integers in a file]
The character distinctions of English orthography are reflected faithfully
(and reversibly) in the ASCII encodings themselves, but some of the facts
of English orthography are relegated to the Rendering Program, in
particular the fact that English is rendered from left to right. Computer
files have a beginning and an end, but they don't have any inherent
left-to-right or right-to-left orientation; they're just sequences of byte
values.
The banality of ASCII-transliterated English resides in the fact that it's unambiguously mappable to and from standard English orthography. For all practical purposes, ASCII-transliterated English texts are "the same thing" as traditionally typewritten or printed texts. This banality extends to all true transliterations: a transliteration of an Arabic orthographical text into ISO8859-6 or UNICODE characters is effectively the same as the original except that numbers are carefully substituted for the original characters. For exactly the same reason, a true transliteration of traditional Arabic orthography using Roman letters is again the same thing as traditional Arabic orthography except that the shape of each letter is different.
Of course, the rendering (on paper or computer screen) of Arabic script from a UNICODE or ISO8859-6 file is somewhat more difficult that the parallel rendering program for encoded English. The Arabic encoding systems properly employ only a single character encoding for shiin, one or miim, one for daal, etc. and yet the fonts will contain multiple "glyphs" for each character, representing the isolated, initial, medial and final shapes for rendering each character appropriately in context. An Arabic-script rendering program, such as the Java applets that display Arabic script in the Xerox Arabic Morphology System, must accept a string of input codes (UNICODE in this case), compute which glyph is appropriate in each context, and then display the appropriate glyphs right to left.
Thus almost everyone uses the obvious equivalents like s for siin, d for daal, z for zaay, t for taa', w for waaw, y for yaa', etc. In the Buckwalter Transliteration we use uppercasing to distinguish pharyngealized (aka "emphatic") consonants: S for Saad, D for Daad, T for Taa', Z for Zaa' (DHaa'). (The same convention was adopted quite independently by Terry Regier in his atex transliteration for specifying Arabic strings in LaTeX.) We use a for fatHa, i for kasra, u for Damma, o for sukuun, etc. Eventually we (and anyone else devising a 7-bit ASCII transliteration) must start grasping for motivated character substitutes, and we freely recognize that many equivalent and equally good Roman transliterations could be devised.
When the obvious letter substitutions run out, the usual course is to adopt Roman digraphs (or more generally ngraphs) to represent particular Arabic characters, but sloppy use of ngraphs disqualifies many orthographies from being faithful transliterations. Classic blunders include using 'sh' for shiin, while at the same time using 's' for siin and h for haa'; and using th for unvoiced thaa' and dh for voiced dhaal. The use of such ngraphs creates ambiguities in the transliterated text that were not present in the standard orthography.
The use of ngraphs does not necessarily disqualify an orthography from being a faithful transliteration. If, for example, the exclamation mark has no independent role, then it could be used legitimately and unambiguously to distinguish siin, written just as "s", from Saad, written "s!", and daal, written "d", from Daad, written "d!", etc. Or digraphs can be bracketed, for example always writting "[sh]" for shiin, "[th]" for thaa' and "[dh]" for dhaal to preclude any possibility of confusing the digraphs with two separate characters. Nevertheless, bracketing is a nuisance, users can't always be expected to remember the conventions, and we found it safest to eschew ngraphs in the Buckwalter Transliteration.
Other ways in which most transcriptions of Arabic fail to qualify as transliterations:
Transcriptions will naturally be used in pedagogical situations, where there is a genuine need, distinct from orthography, to convey phonetics, phonology and morphophonology; whereas in Arabic natural-language processing, there will more commonly be a need for strict orthographical transliterations like the Buckwalter Transliteration.
[Arabic Home Page] [Arabic Keyboard Input Page]
Ken.Beesley@xrce.xerox.com