Russian (original and up to date) version here

It is not a real english version but computer generated bastard with some minor manual corrections. Sorry. :-)

Languages, fonts and encodings. Russian and around

Edition from 05-may-1999,
small corrections 09-feb-2000, 09-jan-2001
Updates of the last month.
Lingua latina non penis canina est.
(Old scholar saying, to which there is comments)
  Konstantin Kazarnovsky Forum/Guestbook


Any comments, flame etc. by e-mail: shlimazl@mtu-net.ru (Konstantin Kazarnovsky) would be gratefully appreciated


Code Pages with cyrillic

From large variety of the codings of Russian, the majority from which in "of Russian part" is contained also by(with) the not Russian cyrillic letters, - more widespread and documentary variants are shown in the tables as pictures and documents WORD 6/'95:
Definitions of (cyrillic) code pages
The programs for code conversion and viewers

True Type: documented and not documented

Specifications and sources
Character Sets, Fonts and FontSubstitutes
TrueType font tables and their role in Windows

The main tables of the TrueType font, have the relation to languages and encodings, are the table of names name, table of the codings cmap and table of chaacteristics (for Windows and OS/2), called OS/2.

Unicode and Postscript-fonts

If the work with TruÅType by fonts (first of all in Windows) is based to taking characters on appropriate numerical value in Unicode, the work with Postscript fonts is based on names of characters in the font.
If it is necessary to take characters via Unicode (in Windows NT, in Word'97) so names of characters should be to deliver in the correspondence with Unicode values, that the manager of Postscript-fonts (Adobe Type Manager) should do. It is known that ATM/NT 4.0 can do such comparison, whereas ATM/95 4.0 - can't. The list of glyph names (Adobe Glyph List) and their Unicodes is contained in the documents from Adobe, in the main document Unicode and Glyph Names the algorithm of such comparison is defined and the references to the glyph lists are contained.



Fonts and Unicode applications
Office-97

The main feature of files and philosophy of Office-97, first of all Winword-97, - almost full transition to the coding Unicode, i.e. and in the documents Word-97 a text part represented in Unicode, and during fulfilment of the program - input and text processing - is used (here, certainly, the clause), on all probability is necessary, it is exclusively Unicode. However, as well as in the previous versions Word, for normal work with language it is necessary, that in Word'Å there was it a support, irrespective of Unicodeness of the font.

Internet Explorer 4

  Internet Explorer 4, including majority it of components: browser MSIE, mail&news Outlook Express, in some smaller degree HTML-editor Frontpage Express, uses Unicode as the main encoding for internal purposes and in input windows. As described in a number of articles in Microsoft Knowledge Base, in MSIE is realized the concept of "multilingual object" is realized: class and functions for working with it are defined. All this allows to support practically anyone encoding in the Internet, including Unicode-based (UTF-7 and UTF-8), and for an rendering of characters of language (character set), which is not supported by default font, is selected the font, where this language present. Such font substitution works for those Unicode's, which can be associated to the certain character set (problem similar to a problem of recognition accessible character sets of the font), and this substitution will not made, if the character set in the font present, but some characters from this set are absent - then they are represented by small squares.

Internet Explorer 5 beta 2

The multilingual support in MSIE5 has even more extended, and now it includes as standard option to installation, apart paneuropean language pack, also Hebrew, Arabic, Thai, Vietnamese, Japanese, Korean and Chinese. The label for the Ukrainian coding became standard KOI8-U. The keyboard support for Hebrew, Arabic, Thai is not put, but, at least, for Hebrew if to register language in the system and to set an own keyboard layout, in Outlook Express wilk be supported correct bidirectional input. Hebrew fragments in Unicode-pages (UTF-8 etc.) are now represented "logically", i.e. with change of a direction, as in products from Accent, instead of "visually", as in the previous version.
As well as was promised, the management of encodings through the Registry (section MIME) was in this version very hardly reduced and is transferred in the tables mlang.dll (and at all not into resources, where are mainly names of encodings in different languages, such as "Cyrillic (Windows)").


Netscape Communicator 4

  In the fourth version Netscape became Unicode-aware and multilingual, with support UTF-7 and UTF-8, however this support is much more brocken than in MSIE4. The fonts are required (or can be used) Unicode, but for single-byte encodings Netscape don't perform the interpretation of encoding but coverts it in the main Windows encoding for this Language. In particular, all Russian encodings will be recoded in CP1251, with loss of the pseudo-graphic from koi8-r. "Ampersand" entities are understood only beforehand known, exept of Unicode-encodings UTF-7 and UTF-8 - in them the characters for any "ampersand"-unicode are taked from the specific font via their Unicode-values.


Adobe Photoshop 5 and Illustrator 7

   Last versions of Adobe products become Unicode-aware, seems with support of chinesish/japanese/korean ideographes, however from european codepages is supported only western CP1252, for want of switching of the keyboard, for example, on Russian - the layout changes, and language isn't change, and on a screen instead of Russian the latin diacritical charactes are displayed.
For one - for example, Russian - language it is possible to workaround the problem, having replaced the NLS-file for page 1252 (CP_1252.NLS in Windows'9x, ó_1252.NLS in NT) with the NLS-file from page 1251. One can make this either by direct copying (CP_1251.NLS into CP_1252.NLS, in DOS-mode), or by editing in the Registry with reboot of the computer:
REGEDIT4

[HKEY_LOCAL_MACHINE\System\CurrentControlSet\control\Nls\Codepage]
"1252" = "cp_1251.nls"

(For NT - by similar way:
"1252" = "c_1251.nls").

As well as it was to expect, after that will be blocked all latin diacritic (French, Spanish, German and other languages) also in other Unicode-aware programs working through NLS: in the western encoding in MSIE4, for want of input in Word'97 and WordPad'98. At the same time will be shown correctly datas immediately in Unicode (UTF-7/UTF-8 in MSIE4, earlier created file in Word'97).
Large fonts Windows'9x/NT4

In Windows'95 for the first time have appeared large on a size True Type fonts (100-200 and more kb), including characters of various languages instead of separate font files in Windows 3.1*. It did not contradict also early True Type Specification, though the number of essential modifications was introduced in it, in particular, the new field ulCodepageRange was added into new version of the table OS/2 intended for support of such fonts. The novelty of these fonts was first of all in algorithms and ideology Windows'95. Multilanguage support was much more advanced, than in Windows 3.1, by a natural way to include multilingual fonts.




Converters of the font encoding and other tools

Multilanguage: programs and examples


arrowhome   return to the contents