Chapter 11. Internationalization

Internationalization is a method of application development that allows the application to be run in many different languages without having to be rewritten or recompiled. This chapter describes how to design applications to use Motif's internationalization capability. It is not a general discussion of internationalization.

Issues in Internationalized Applications

There are several important issues to keep in mind when designing an application so that it takes advantage of Motif's internationalization capabilities.

Internationalization and Localization

An internationalized application contains no code that is dependent on the user's language, the characters needed to represent that language, or any formats (such as date and currency) that the user expects to see and interact with. Motif accomplishes this by storing language and custom dependent information outside the application.

The following figure shows the kinds of information that should be external to an application to simplify internationalization.

Figure 11-1. Information External to the Application

Figure 11-1 
Information External to the Application

Because the language and culture dependent information is separate from the application source code, the application does not need to be rewritten or recompiled to be marketed in a different countries. Instead, the only requirement is for the external information to be localized to accommodate local language and custom.

Localizing the application includes the process of translating certain parts of the external information into the appropriate language and storing the translated information in files that are then accessed by the application. In addition, the application may be told the format to use to display time, date, and the other language or culture dependent formats shown in the previous figure.

Every language consists of a set of characters that, either individually or in combination, represents meaningful words or concepts in the language. The set of characters is called a character set. The set of binary values needed to represent all the characters in a language is called a coded character set or, more simply, a code set.

Several attempts were started long ago to standardize character sets and continue to this day. The most commonly used code set for English is the American National Standard Code for Information Interchange (ASCII). It originally used a 7-bit encoding scheme plus an eighth bit for error control. Using 7 bits for character representation allows 128 unique binary values. Later versions use the eighth bit as a code bit allowing 255 characters. Both are fine for English and some other alphabetic languages, but neither is suitable for ideographic languages such as Chinese, Japanese, and Korean. Ideographic languages represent a concept or an idea as a single character; consequently, there are thousands of characters in these languages, and two or more bytes are needed to represent the characters.

Other standard code sets have been developed to accommodate other languages. The ISO8859 standard is perhaps the most commonly used of these. Different versions of the ISO8859 standard exist for various areas of the world. The following table shows a typical language and character set relationship for various areas.

Table 11-1. Areas and Typical Character Sets

Area or Language

Character Set

English

ASCII, ISO8859-1

Western Europe

ISO8859-1

Eastern Europe

ISO8859-2

Northern Europe

ISO8859-3

Russia (Cyrillic)

ISO8859-5

Hebrew

ISO8859-6

Greek

ISO8859-7, 8, 9

Japan

Shift JIS

Japan

UJIS

See the specifications for the American National Standards Institute (ANSI) C programming language and the X/Open Portability Guide, Issue 3 (XPG3) for more information on standards involved in internationalization.

Obtaining Input

Special considerations must be made for the user of an application to input characters in the local written language. Virtually all applications require some action on the part of the user, often asking for input in one form or another. For example, an application can ask the user to input information in text form, such as name, home address, and so on. The user must then enter this information by typing it on the keyboard in the normal manner. This is done with relative ease in an English-based application but can become more complex when other language text is desired.

Motif uses Xlib functions to provide the basic support for obtaining input in a Text widget.

The Problems

Many languages are expressed by means of an alphabet made up of characters or letters. The letters are arranged in groups to form meaningful words. A keyboard suitable for the language normally contains all the letters of the alphabet, plus the standard numerals and punctuation marks. The problem arises when the keyboard does not have all the alphabet characters. This can happen when a German user is using an English-based keyboard and needs a German character such as "ß" .

A far more involved example is the case of defining a keyboard to use for the ideographic languages. Because thousands of characters are needed to represent an ideographic language, no reasonable keyboard can be constructed with a single key for each character.

The Solution

Motif solves these input problems by using an input method, which is a layer of mapping between the keyboard keys (or combinations of keys) that the user types and the text data that is passed to the application. For example, the Swedish user with an English keyboard who needs the letter "Ĝ" must enter a combination of keystrokes (this varies among vendors but could be <Extend char> <O> </> as an example) rather than just one keystroke. This is very similar to the act of using the <Shift> key to access uppercase letters.

An ideographic language's input method is often based on the language's phonetics, but there are also input methods based on a common graphics property of certain characters. The graphics method involves defining a key to map to a common graphic symbol that is the basis for multiple characters. The phonetic method is more commonly used. It requires a phonetic (alphabet-based) writing system. The number of phonetic signs or characters is few enough that a unique key is assigned to each phoneme. Characters are entered by pressing the appropriate phonetic keys. In several popular input methods, the user types a phonetic representation of a spoken word and the input method determines which characters are pronounced that way. If only one character meets this criterion, it is displayed. If more than one character meets the criterion, a list of all characters found is displayed and the user chooses the desired one. It is then passed to the application. See Section 11.4.1, "Internationalization and Text Input," for more information on input methods.

Displaying Output

Displaying the output produced by an application intended for international use also requires some consideration. To display text, it must have the appropriate content, encoding and fonts. For example, many languages, especially ideographic ones, require more than one font. Bitmaps and pixmaps must be localized as well. An icon that is an appropriate or meaningful symbol in one country may be totally inappropriate or meaningless in another.

Locales and Localization

A locale is the language environment determined by the application at run time. XPG3 defines locale as a means of specifying three characteristics of a language environment that may be needed for localization: language, territory, and code set. Motif supports only one locale per application; that is, an application can set the locale only once, at start-up time.

Motif uses the locale to help find:

  • Resource files

  • UID files

  • Bitmap files

  • Fonts used to display text and labels

  • Text input method

The ANSI C method of setting the locale in an application is to use the function setlocale. How setlocale obtains a language when the language is not explicitly referenced in the call to setlocale is system dependent. For example, on POSIX systems, the environment variable LANG is used. The locale name is also used to establish a path to the localized files of information. How this is actually accomplished is explained in Section 11.3, "Localizing Applications."

Compound Strings, Fonts, and Text Display

A compound string is a means of encoding text so that it can be displayed in many different languages or fonts without changing anything in the program. Motif uses compound strings to display all text except that in the Text and TextField widgets. This section describes the structure of a compound string and the interaction between a compound string and a font list that determines how the compound string is displayed.

Compound String Components

A compound string is a byte stream in ASN.1 encoding, consisting of tag-length-value segments. Semantically, a compound string has components that contain the text to be displayed, a tag (called a font list element tag) that will be matched with an element of a font list, and an indicator denoting the direction in which it is to be displayed.

A compound string component can be one of four types:

  • A font list element tag.

    • The font list element tag XmFONTLIST_DEFAULT_TAG indicates that the text is encoded in the codeset of the current locale.

    • Other font list element tags are used later to match text with particular entries in a font list.

  • A direction identifier.

  • The text of the string. For internationalized applications, the text falls into two broad categories: either the text requires localized treatment or it does not.

  • A separator.

The following section describes each of the compound string components:

Font list element tag  


The font list element tag is a string value that correlates the text component of a compound string to a font or a font set in a font list.

Direction 

The relationship between the order in which characters are entered on the keyboard and the order in which the characters are displayed on the screen. For example, the display order is left to right in English, French, German, and Italian and right to left in Hebrew and Arabic.

Text 

The text to be displayed.

Separator 

A separator is a special form of a compound string component that has no value. It is used to separate other segments.

Motif uses the specified font list element tag identified in the text component to display the compound string. A specified font list element tag is used until a new font list element tag is encountered. Motif provides a special font list element tag, XmFONTLIST_DEFAULT_TAG, that matches a font that is correct for the current codeset. It identifies the default entry in a font list. See Section 11.2.3, "Compound Strings and Font Lists," for more information.

The direction segment of a compound string specifies the direction in which the text is displayed. Direction can be left-to-right or right-to-left.

Compound Strings and Resources

Compound strings are used to display all text except that in the Text and TextField widgets. The compound string is set into the appropriate widget resource so that it can be displayed. For example, the label for the PushButton widget is inherited from the Label widget, and the resource is XmNlabelString, which is type XmString. This means that the resource expects a value that is a compound string. A compound string can be created programmatically or defined in a resource file.

Setting a Compound String Programmatically

An application can set this resource programmatically by creating the compound string using one of the compound string convenience functions. There are several such functions:

XmStringCreate  


This function creates a compound string with text and a font list element tag, both of which are arguments in the function call.

XmStringCreateLocalized  


This function creates a compound string in the encoding of the current locale and automatically sets the font list entry tag to XmFONTLIST_DEFAULT_TAG.

The following code segment shows one way to set XmNlabelString for a PushButton programmatically:

Widget    button;
Args      args[10];
int       n;  
XmString  button_label;
     .
     .
button_label = XmStringCreateLocalized (locvar,
                 XmFONTLIST_DEFAULT_TAG);
/* locvar is a variable assumed to contain
 * locale-encoded text.
 * Create an argument list for the button */
n = 0;
XtSetArg (args[n], XmNlabelString, button_label); n++;
/* Create and manage the button */
button = XmCreatePushButton (toplevel, "button", args, n);
XtManageChild (button);
XmStringFree (button_label);

Setting a Compound String in a Defaults File

In an internationalized program, the label string for the button label should be obtained from an external source. For example, the button label can come from a resource file instead of the program. For this example, assume that the PushButton is a child of a Form widget called "form1".

*form1.button.labelString:  Push Here

Here, Motif's string-to-compound-string converter produces a compound string from the resource file text. This converter always uses XmFONTLIST_DEFAULT_TAG.

Compound Strings in UIL

Three basic mechanisms exist for specifying strings in UIL files:

  • String literals, which may be stored in UID files as either NULL-terminated strings or compound strings

  • Compound strings

  • Wide-character strings

Both string literals and compound strings consist of text, a character set, and a writing direction. For string literals and for compound strings with no explicit direction, UIL infers the writing direction from the character set. The UIL concatenation operator (&) concatenates both string literals and compound strings.

Whether UIL stores string literals in UID files as NULL-terminated strings or as compound strings, it stores information about each string's character set and writing direction along with the text. In general, UIL stores string literals or string expressions as compound strings in UID files under the following conditions:

  • When a string expression consists of two or more literals with different character sets or writing directions

  • When the literal or expression is used as a value that has a compound string data type (such as the value of a resource whose data type is compound string)

UIL recognizes a number of keywords specifying character sets. UIL associates parsing rules, including parsing direction and whether characters have 8 or 16 bits, for each character set it recognizes. It is also possible to define a character set using the UIL character_set function.

The syntax of a string literal is one of the following:

    '[character_string]'
    [#char_set]"[character_string]"

For each syntax, the character set of the string is determined as follows:

  • For a string declared as 'string', the character set is the codeset component of the LANG environment variable if it is set in the UIL compilation environment, or the value of XmFALLBACK_CHARSET if LANG is not set or has no codeset. By default, the value of XmFALLBACK_CHARSET is ISO8859-1, but vendors may supply different values.

  • For a string declared as #char_set"string", the character set is char_set.

  • For a string declared as "string", the character set depends on whether or not the module has a character_set clause and on whether or not the UIL compiler's use_setlocale_flag is set:

    • If the module has a character_set clause, the character set is the one specified in that clause.

    • If the module has no character_set clause but the uil command was invoked with the -s option or the Uil function was invoked with the use_setlocale_flag set, UIL calls setlocale and parses the string in the current locale. The character set of the resulting string is XmFONTLIST_DEFAULT_TAG.

    • If the module has no character_set clause and the uil command was invoked without the -s option or the Uil function was invoked without the use_setlocale_flag, the character set is the codeset component of the LANG environment variable if it is set in the UIL compilation environment; if LANG is not set or has no codeset, the character set is the value of XmFALLBACK_CHARSET.

UIL always stores a string specified using the compound_string function as a compound string. This function takes as arguments a string expression and optional specifications of a character set, direction, and whether or not to append a separator to the string. If no character set or direction is specified, UIL derives it from the string expression, as described above.

Note that certain predefined escape sequences, beginning with a backslash, may appear in string literals, with these exceptions:

  • A string in single quotes can span multiple lines, with each newline escaped by a backslash. A string in double quotes cannot span multiple lines.

  • Escape sequences are processed literally inside a string that is parsed in the current locale (a localized string).

For more information on UIL string and compound string syntax, see the UIL(5X) reference page in the OSF/Motif Programmer's Reference.

Fonts, Font Lists, and Font Sets

Motif uses font sets and font lists to display text. A font defines set of glyphs that represent the characters in a given language. A font set is a group of fonts that are needed to display text for a given locale. A font list is a list of fonts, font sets, or a combination of the two, that may be used. Motif has convenience functions to create a font list.

Font List Structure

Motif requires a font list for text display. A font list is a list of font structures, font sets, or both, each of which has a tag to identify it. A font set ensures that all characters in the current language can be displayed. With font structures, the responsibility for ensuring that all characters can be displayed rests with the programmer.

Each entry in a font list is in the form of a {tag, element} pair, where element can be either a single font or a font set. The application can create a font list entry from either a single font or a font set. For example, the following code segment creates a font list entry for a font set:

char font1[] = 
   "-adobe-courier-medium-r-normal--10-100-75-75-M-60";
XmFontListEntry font_list_entry;
font_list_entry = XmFontListEntryLoad (display,
                     font1, XmFONT_IS_FONT, "font_tag");

XmFontListEntryLoad loads a font or creates and loads a font set. There are four arguments to the function:

display 

The display on which the font list is to be used

font_name 

A string that represents either a font name or a base font name list, depending on the type argument

type 

A value that specifies whether font_name refers to a font name or a base font name list

tag 

A string that represents the tag for this font list entry

If type is XmFONT_IS_FONTSET, XmFontListEntryLoad creates a font set in the current locale from the value in font_name. The character set(s) of the fonts specified in the font set are dependent on the locale. If type is XmFONT_IS_FONT, XmFontListEntryLoad opens the font found in font_name. In either case, the font or font set is placed into a font list entry.

Now, the following code creates a font list, using the font list entry just created:

XmFontList font_list;
XmFontListEntry font_list_entry;
   .
   .
font_list = XmFontListAppendEntry (NULL, font_list_entry);
XmFontListEntryFree (font_list_entry);

The code example above creates a new font list and appends the entry font_list_entry to it.

Once a font list has been created, XmFontListEntryAppend adds a new entry to it. The following example uses XmFontListEntryCreate to create a new font list entry for an existing font list:

XFontSet font2;
char *font_tag;
XmFontListEntry font_list_entry2;
   .
   .
font_list_entry2 = XmFontListEntryCreate (font_tag,
                      XmFONT_IS_FONT_SET,
                      (XtPointer) font2);

font2 specifies an XFontSet returned by XCreateFontSet. The arguments to XmFontListEntryCreate are font_tag, XmFONT_IS_FONTSET, and font2, which are the tag, type, and font, respectively. The tag and the font set are the {tag, element} pair of the font list entry.

Now, to add this entry to the font list, use XmFontListAppendEntry again, only this time its first parameter specifies the existing font list:

font_list = XmFontListAppendEntry(font_list, font_list_entry2);
XmFontListEntryFree(font_list_entry2);

Font Lists and Resources

The syntax for specifying a font list in a resource file depends on whether the list contains fonts, font sets, or both.

  • To obtain a font, specify a font and an optional font list element tag. If the tag is present, it should be preceded by an equal sign (=). If the tag is not present, do not use the equal sign. Entries specifying more than one font are separated by commas.

  • To obtain a font set, specify a base font list and an optional font list element tag. The tag should be preceded by a colon (:) instead of an equal sign. If the tag is not present, the colon must still be present, because this is what distinguishes a font from a font set in the resource declaration. Fonts specified in the base font list are separated by semicolons (;). Entries specifying more than one font set are separated by commas.

If the font list element tag is not present in either case, Motif uses the default XmFONTLIST_DEFAULT_TAG. Here are some examples:

  • Specifying a font:

    • Using the default font list element tag:

      *fontList:  fixed
      *fontList:\
         -adobe-courier-medium-r-normal--10-100-75-75-M-60-iso8859-1
      

    • Specifying a font list element tag:

      *fontList:  fixed=ROMAN, 8x13bold=BOLD
      

    • Specifying two fonts, one with the default font list element tag and one with an explicit tag:

      *fontList:  fixed, 8x13bold=BOLD
      

  • Specifying a font set:

    • List the fonts explicitly without specifying a font list element tag:

      *fontList:\
        -JIS-Fixed-Medium-R-Normal--26-180-100-100-C-240;\
        -JIS-Fixed-Medium-R-Normal--26-180-100-100-C-120;\
        -GB-Fixed-Medium-R-Normal--26-180-100-100-C-240;\
        -Adobe-Courier-Bold-R-Normal--25-180-100-100-M-150:
      

    • Let Xlib select the fonts without specifying a font list element tag:

      *fontList:  -*-*-*-R-Normal--*-180-100-100-*-*:
      

    • List the fonts explicitly and specify a font list element tag as MY_TAG:

      *fontList:\
        -JIS-Fixed-Medium-R-Normal--26-180-100-100-C-240;\
        -JIS-Fixed-Medium-R-Normal--26-180-100-100-C-120;\
        -GB-Fixed-Medium-R-Normal--26-180-100-100-C-240;\
        -Adobe-Courier-Bold-R-Normal--25-180-100-100-M-150:MY_TAG
      

    • Let Xlib select the fonts and specify a font list element tag as MY_TAG:

      *fontList:  -*-*-*-R-Normal--*-180-100-100-*-*:MY_TAG
      

    • List the fonts explicitly and specify a font list element tag for bold fonts, but use the default font list element tag for medium fonts:

    *fontList:\
      -JIS-Fixed-Medium-R-Normal--26-180-100-100-C-240;\
      -JIS-Fixed-Medium-R-Normal--26-180-100-100-C-120;\
      -GB-Fixed-Medium-R-Normal--26-180-100-100-C-240;\
      -Adobe-Courier-Bold-R-Normal--25-180-100-100-M-150:,\
      -JIS-Fixed-Medium-R-Normal--26-180-100-100-C-240;\
      -JIS-Fixed-Medium-R-Normal--26-180-100-100-C-120;\
      -GB-Fixed-Medium-R-Normal--26-180-100-100-C-240;\
      -Adobe-Courier-Bold-R-Normal--25-180-100-100-M-150:BOLD
    

    • Let Xlib select the fonts and specify a font list element tag for bold fonts and use the default font list element tag for the others:

      *fontList:  -*-*-*-R-Normal--*-180-100-100-*-*:,\
                  -*-*-Bold-R-Normal--*-180-100-100-*-*:BOLD
      

Font List Resource Defaults

A font list resource exists for a number of different widgets. Motif uses a hierarchy system to determine the font list it should use. There are several font list resources for VendorShell, XmBulletinBoard, and XmMenuShell. These resources can be set, either programmatically or in resource files. VendorShell and XmMenuShell have some common font list resources but one of them, XmNdefaultFontList, exists only for compatibility with earlier Motif releases. The widgets that have a font list resource (or resources) are listed in the following table. Note that in some cases the resource is not named XmNfontList.

Table 11-2. Widgets With Font List Resources

Widget

Resource Name

VendorShell

XmNbuttonFontList

VendorShell

XmNdefaultFontList

VendorShell

XmNlabelFontList

VendorShell

XmNtextFontList

XmBulletinBoard

XmNbuttonFontList

XmBulletinBoard

XmNlabelFontList

XmBulletinBoard

XmNtextFontList

XmLabel

XmNfontList

XmLabelGadget

XmNfontList

XmList

XmNfontList

XmMenuShell

XmNbuttonFontList

XmMenuShell

XmNdefaultFontList

XmMenuShell

XmNlabelFontList

XmScale

XmNfontList

XmText

XmNfontList

XmTextField

XmNfontList

The three resources XmNbuttonFontList, XmNlabelFontList, and XmNtextFontList are used to specify a font list for descendants of a type associated with the resource. For example, XmNbuttonFontList specifies the font list used for button descendants of VendorShell, XmBulletinBoard, and XmMenuShell. If a button's XmNfontList is NULL at initialization, the font list for the button is set by searching the parent hierarchy of the button widget or gadget for an ancestor that is a subclass of VendorShell, XmBulletinBoard, or XmMenuShell. If such an ancestor is found, the button's font list is set to the value of XmNbuttonFontList in the ancestor widget. If no such ancestor is found, the result is implementation dependent.

Font Lists in UIL

UIL has three functions for use in creating font lists: font, fontset, and font_table. The font and fontset functions create font list entries. The font_table function creates a font list from these font list entries.

The font function creates a font list entry containing a font specification. The argument is a string representing an XLFD font name. The fontset function creates a font list entry containing a font set specification. The argument is a comma-separated list of XLFD font names representing a base name font list.

Both font and fontset have optional character_set parameters that specify the font list element tag for the font list entry. In both cases, if no character_set parameter is specified, UIL determines the font list element tag as follows:

  • If the module contains no character_set declaration and if the uil command was invoked with the -s option or the Uil function was invoked with the use_setlocale_flag set, the font list element tag is XmFONTLIST_DEFAULT_TAG.

  • Otherwise, the font list element tag is the codeset component of the LANG environment variable if it is set in the UIL compilation environment, or the value of XmFALLBACK_CHARSET if LANG is not set or has no codeset.

The font_table function creates a font list from a comma-separated list of font list entries, created by FONT or FONTSET. The resulting font list can be used as the value of a font list resource. If a single font list entry is supplied as the value for such a resource, UIL converts the entry to a font list.

Compound Strings and Font Lists

When Motif displays a compound string, it associates each segment with a font or font set by means of the font list element tag for that segment. The application must have loaded the desired font or font set, created a font list that contains that font or font set and its associated font list element tag, and created the compound string segment with the same tag.

Motif follows a set search procedure when when it binds a compound string to a font list entry:

  1. Motif searches the font list for an exact match with the font list element tag specified in the compound string. If it finds a match, the compound string is bound to that font list entry.

  2. If the above does not provide a binding between the compound string and the font list, Motif binds the compound string to the first element in the font list, regardless of its font list element tag.

For backward compatibility, if an exact match is not found, XmFONTLIST_DEFAULT_TAG in either a compound string or a font list matches the tag that would result from creating a compound string or font list entry with a tag of XmSTRING_DEFAULT_CHARSET.

The following figure shows the relationships between a compound string, a font set, and a font list when the font list element tag is set to something other than XmFONTLIST_DEFAULT_TAG.

Figure 11-2. Compound String and Explicit Tag

Figure 11-2 
Compound String and Explicit Tag

The following example shows how to use a tag called tagb.

XFontStruct     *font1;
XmFontListEntry     font_list_entry;
XmFontList font_list;
XmString  label_text;
char *tagb;  /* Font list element tag */
char *fontx; /* Initialize to XLFD or font alias */
char *button_label;   /* Contains button label text */
     .
     .
font1 = XLoadQueryFont (XtDisplay(toplevel), fontx);
font_list_entry = XmFontListEntryCreate (tagb, XmFONT_IS_FONT,
     (XtPointer)font1);
font_list = XmFontListAppendEntry (NULL, font_list_entry);
XmFontListEntryFree (font_list_entry);
label_text = XmStringCreate (button_label, tagb);

XLoadQueryFont loads the font and then XmFontListEntryCreate creates a font list entry. The application must create an entry and then append it to an existing font list or create a new font list, in either case using XmFontListAppendEntry. Because there is no font list in place, the previous code example has NULL for the font list argument. XmFontListAppendEntry creates a new font list called font_list with a single entry, font_list_entry. To add another entry to font_list, the application can follow the same procedure but supply a non-NULL font list argument.

The following figure shows the relationships between a compound string, a font set, and a font list when the font list element tag is set to XmFONTLIST_DEFAULT_TAG. In this case, the value field is locale text.

Figure 11-3. Compound String and XmFONTLIST_DEFAULT_TAG

Figure 11-3 
Compound String and XmFONTLIST_DEFAULT_TAG

Text and TextField Widgets and Font Lists

The Text and TextField widgets display text information. To do so, they must be able to select the correct font in which to display the information. The Text and TextField widgets follow a set search pattern to find the correct font:

  1. Search the font list for an entry that is a font set and has a font list element tag of XmFONTLIST_DEFAULT_TAG. If a match is found, use that font list entry. No further searching occurs.

  2. Search the font list for an entry that specifies a font set. Use the first one found.

  3. If no font set is found, use the first font in the font list.

A font set is desired because that insures that there are glyphs for every character in the locale.

Localizing Applications

An internationalized application can be tailored to operate in many areas of the world, each with its own requirements for the language and customs to be used. This section explains some methods for localizing an application.

The following section describes how the user, the application developer, and the implementation combine to establish the language environment of the application. It then discusses two general approaches to localizing applications. Succeeding sections focus on four aspects of localizing information in Motif programs:

  • Resource files

  • UID files

  • Message catalogs

  • X bitmap files

Many aspects of localization depend on the particular operating system, Motif implementation, and user environment in which the application runs. The following must all cooperate for correct localization to occur:

  • The operating system's locale mechanism, if any

  • The Motif implementation

  • The application itself

  • The user's system administrator

  • The user's language environment

Techniques for Localization

Although there are different methods for localizing an application, there are some common considerations:

  • The application should not explicitly code any language-dependent information in the application. This includes strings, fonts, and language-dependent pixmaps.

  • The application should isolate text, fonts, and pixmaps, and translate them into the languages needed. Usually this information is stored in separate directories by language.

Establishing the Language Environment

The term language environment refers to the set of localized data that the application needs in order to run correctly in the user specified locale. A language environment supplies the rules associated with a specific language. In addition, the language environment consists of any externally stored data, such as localized strings or text used by the application. For example, the menu items displayed by an application might be stored in separate files for each language supported by the application. This type of data can be stored in resource files, UID files, or, on XPG3-compliant systems, message catalogs.

A single language environment is established when an application executes. The actual language environment in which an application operates is specified by the application user, often either by setting an environment variable (LANG on POSIX-based systems) or by setting the xnlLanguage resource. The application then sets the language environment based on the user's specification. The application can do this either by using setlocale in a language procedure established by XtSetLanguageProc, or by using a method that does not call setlocale. In either case, Xt caches a per-display language string that is used by XtResolvePathname to find resource, bitmap, and UIL files.

An application that supplies a language procedure may either provide its own or use an Xt default procedure. In either case, the application establishes the language procedure by calling XtSetLanguageProc before calling XtAppInitialize. When a language procedure is installed, Xt calls it in the process of constructing the initial resource database. Xt uses the value returned by the language procedure as its per-display language string.

The default language procedure performs the following tasks:

  • Sets the locale. On ANSI C-based systems, this is done by using the following code:

    setlocale(LC_ALL, language);
    

    where language is the value of xnlLanguage or the empty string ("") if xnlLanguage is not set. When xnlLanguage is not set, the locale is generally derived from an environment variable (LANG on POSIX-based systems).

  • Calls XSupportsLocale to verify that the locale just set is supported. If not, a warning message is issued and the locale is set to "C."

  • Calls XSetLocaleModifiers specifying the empty string.

  • Returns the value of the current locale. On ANSI C-based systems, this is the result of calling:

    setlocale(LC_ALL, NULL);
    

The application can use the default language procedure by making the call to XtSetLanguageProc in this manner:

XtSetLanguageProc(NULL, NULL, NULL);
   .
   .
toplevel = XtAppinitialize(...);

By default, Xt does not install any language procedure. If the application does not call XtSetLanguageProc, Xt uses as its per-display language string the value of the xnlLanguage resource if it is set. If xnlLanguage is not set, Xt derives the language string from the environment. On POSIX-based systems, this is the value of the LANG environment variable.

It is important to note that the per-display language string that results from this process is implementation dependent and that Xt provides no public means of examining the language string once it is established. The following vary by operating system and by Motif implementation:

  • The mechanism, if any, used to set the locale

  • On ANSI C-based systems, the value returned by setlocale

  • The possible values of any environment variables used to establish the language environment

  • Whether or not xnlLanguage is used and, if so, its possible values

Furthermore, by supplying its own language procedure, an application may use any procedure it wants for setting the language string.

Using Locales

The locale provides local information to an application based on the user's language, territory, and codeset. Both language and territory are needed because some languages are spoken in more than one country and more than one language may be spoken in some countries (Belgium, Canada, and Switzerland are examples).

Information in resource, UID, and image files can be localized and stored in separate directories by language. The Xt function XtResolvePathname uses the run-time locale to determine the proper directory to use.

On XPG3-compliant systems, an application can use message catalogs to localize text and messages. A message catalog file exists for each language, and each is usually stored in a separate directory by language.

The locale method of localizing compound strings and font lists consists of the following steps:

  1. Establish a language procedure before calling XtAppInitialize. The language procedure calls setlocale.

  2. Localize the compound strings and font lists using resource files, message catalogs, or UID files. Normally, do not specify any font list element tags other than XmFONTLIST_DEFAULT_TAG.

  3. Use font sets in resource or UID file font lists.

  4. Use XmStringCreateLocalized to create compound strings in the program. This function only has one argument, a text string, and automatically sets the font list element tag to XmFONTLIST_DEFAULT_TAG.

The run-time locale determines which fonts are used to display text. This is accomplished in the following manner:

  • Motif calls XtResolvePathname to load resource or UID files that specify the names of fonts for font sets. XtResolvePathname uses a file search path that may vary depending on the display's language string.

  • XCreateFontSet uses the locale to determine the fonts to be used from the base font name and the locale charset.

In this method, the application usually does not specify font list element tags other than XmFONTLIST_DEFAULT_TAG. It is possible to supply explicit font list element tags with locale-dependent text. For example, text might be displayed using large and small fonts or bold and italic fonts. The application can do this with special tags in both the compound string and the font list associated with it. In the font list, match the tag with a font set specification that supplies the desired attribute (point size, for example). When the application creates the font set, the charset comes from the locale. For example, a resource file might specify a font list in the following manner to obtain fonts with a different point size:

*fontList:  -*-*-*-R-Normal--*-120-100-100-*-*:,\
            -*-*-*-R-Normal--*-180-100-100-*-*:BIG,\
            -*-*-*-R-Normal--*-80-100-100-*-*:SMALL

In this case, the application should also map the tags to XmFONTLIST_DEFAULT_TAG in the Motif registry of font list element tags. See Section 11.4.2, "Compound Strings and Compound Text," for more information.

Localization Without Locales

In this method, the locale is not set in the program, and a language procedure is not needed. Instead, the user specifies the language environment using either xnlLanguage or an environment variable such as LANG. Resource, UID, and image files are localized and stored in separate directories by language, as they are when the application uses locales. XtResolvePathname uses the display's language string in the same way to determine the proper locations of these files. Message catalogs are not used in this method. Also, in this case Text and TextField cannot accommodate 16-bit data. The nonlocale method of localizing compound strings and font lists consists of these steps:

  1. Localize compound strings using UIL files. Note that resource files cannot be used for compound strings because the string-to-compound-string converter always uses the font list element tag XmFONTLIST_DEFAULT_TAG. Localized font lists can appear in resource files.

  2. Specify explicit font list element tags other than XmFONTLIST_DEFAULT_TAG in both compound strings and font lists.

  3. Use font names with explicit charset components in resource or UIL files. Do not use font sets.

  4. To create compound strings in the program, use XmStringCreate with the font list element tag set to something other than XmFONTLIST_DEFAULT_TAG.

Resources and Localization

The resources used in an application that are subject to internationalization are stored in files external to the application. These resources include

  • All labels, particularly those that identify controls. Such labels are defined as type XmString, meaning they are compound strings.

  • Text strings; that is, strings of text that are not compound strings.

  • Font lists.

Initial Resource Database

The information in the external resource files is used when Xt builds the initial resource database. The XtDisplayInitalize function loads the resource database by merging in resources from the following sources, in order of precedence (that is, each component takes precedence over the following components):

  • The application command line

  • A per-host user environment resource file on the local host

  • Screen-specific resources for the default screen of the display

  • A resource property on the server or user preference resource file on the local host

  • An application-specific user resource file on the local host

  • An application-specific class resource file on the local host

Localization applies to two components of the initial resource database—the application-specific user and class resources. Localized resources that are controlled by the programmer are in the application class resource file, and localized resources that are controlled by the user are in the user resource file. Note that the user resources take precedence over the application class resources.

Resource File Locations

XtDisplayInitialize calls XtResolvePathname to load both the user and the class resources.

To load the user's application resource file, XtDisplayInitialize uses the value of the XUSERFILESEARCHPATH environment variable as the search path. If that variable is not set or if the search path fails to find the file, and if the environment variable XAPPLRESDIR is defined, XtDisplayInitialize next tries an implementation-dependent search path with a number of entries that include XAPPLRESDIR and the user's home directory. If XAPPLRESDIR is not set or if that search path fails, XtDisplayInitialize tries another implementation-dependent search path with a number of entries that include the user's home directory.

To load the application-specific class resource file, XtDisplayInitialize uses the value of the XFILESEARCHPATH environment variable as the search path. If that variable is not set or if the search path fails to find the file, XtDisplayInitialize tries an implementation-dependent search path.

The search paths for both resource files may contain any substitutions recognized by XtResolvePathname. That routine substitutes the display's language string for %L. In an implementation-dependent manner, it substitutes the language, territory, and codeset components of the language string for %l, %t, and %c, respectively. This mechanism allows Xt to load different resource files for different languages, as specified by the display's language string.

The display's language string is determined by the application's language procedure, if present, or else by the value of the xnlLanguage resource or by the environment. The language string associated with any particular language and the search paths used to find the resource files depend on the system vendor, the Motif vendor, the application, and the user's system administrator. Determining the actual directories in which localized resource files reside requires coordination among all these sources.

In general, an application developer prepares a set of localized application class resource files, one for each language the application supports. The developer may also need to supply a language procedure appropriate for one or more of the systems on which the application will run. The application vendor must arrange for the resource files to be installed in the correct directories, depending on the operating system and the Motif implementation on which the application will run.

An Example

Following is an example of an application class defaults file for a simple program that creates a MainWindow with a Text widget. The font list specification includes a single font set with a default tag. This resource file would be appropriate for an application that uses locales.

*fontList:                 -*-*-*-R-Normal--*-180-100-100-*-*:
*Text1.value:\
Hier ist etwas Text fur das Text Widget.\n\
Gemischter 8-und 16-bit Text.
*version_box.messageString:     Dies ist i18n Demo Version
*version_box.okLabelString:     Schliessen
*version_box.dialogTitle:       I18n Demo Version
*pgm_ver_btn.labelString:       I18n Demo Version 
*events_btn.labelString:        Aktionen
*help_btn_menu.labelString:     Hilfe
*help_btn_cascade.labelString:  Hilfe
*help_box.messageString:        Leider ist keine Hilfe hier.
*help_box.okLabelString:        Schliessen
*help_box.dialogTitle:          i18n Demo Hilfe
*stop_btn.labelString:          Enden

UIL and Localization

The general models for localizing applications using UIL are the same as those for applications that do not use UIL. An application developer creates separate UIL files, each containing string and resource values for a particular language. UIL files can also be used in conjuction with localized resource and pixmap files. As with localization of resource files, there are two basic approaches to localizing UIL files: one that uses locales and one that does not.

Preparing Localized UID Files

When using locales with UIL, an application developer should take the following steps:

  • Do not use a character_set declaration for the module.

  • When creating compound strings in a UIL file, use double quotes and no character set specification for the text.

  • When creating font lists in a UIL file, use font sets, not fonts. Do not specify character sets for the font sets.

  • Before compiling a UIL file using the uil command, set up any environment variables (such as LANG) or other mechanisms the system vendor recommends to establish the locale that is appropriate for the UIL file to be compiled. Invoke the uil command with the -s option. This enables the UIL compiler to set the locale and parse double quoted strings without explicit character sets in the locale's encoding. It also ensures that localized compound strings and font list entries are created with font list element tags of XmFONTLIST_DEFAULT_TAG.

  • Before using the Uil function to compile a UIL file, set the locale that is appropriate for the UIL file to be compiled. In the Uil_command_type structure that is the first argument to the Uil function, set the use_setlocale_flag member to 1. This has the same effect as invoking the uil command with the -s option.

When localizing UIL files without using locales, an application developer should take the following steps:

  • When using single quotes for the text of compound strings, supply a character_set declaration for the module.

  • When using double quotes for the text of compound strings, supply an explicit character set for each segment.

  • When creating font lists in a UIL file, use fonts, not font sets. Specify an explicit character set for each font.

  • When compiling a UIL file using the uil command, do not invoke the command with the -s option. The UIL compiler does not set the locale, and it parses each string using rules derived from the explicitly specified character set for that string.

  • When compiling a UIL file using the Uil function, set the use_setlocale_flag member of the Uil_command_type structure to 0. This has the same effect as invoking the uil command without the -s option.

The UIL compiler processes a single source file for each invocation of the uil command or the Uil function. However, UIL has an include file directive that is similar to the C preprocessor's #include directive. If the file argument for this directive is not an absolute pathname, the compiler searches for the file in a series of directories. These include the directory of the main UIL source file and any directories specified via the -I option to the uil command or the include_dir member of the Uil_command_type structure for the Uil function.

One strategy for maintaining localized UIL source files is to place only language-independent information in the main UIL source file and to put all language-dependent information in included files that are in separate directories for each language. Then a developer can compile the UIL files for different languages without editing any UIL files. When using locales, a developer first sets up the environment for the intended locale. Whether using locales or not, the developer then invokes the UIL compiler with the proper include directory for the intended language.

In general, a developer can mix localized UIL files with localized resource files. For example, the developer might specify compound strings in UIL files and font lists in resource files. Note one exception: it is not practical to use resource files to localize compound strings without using locales. This is because no resource file syntax exists for supplying an explicit font list element tag for a compound string.

For resource values that the user may override, the developer must use resource files or fallback resources, or must in some way ensure that the user's resource settings can override the developer's settings from the UIL file.

MRM and Localized UID Files

Once the developer has generated localized UID files, the vendor and the user's system administrator must arrange for these files to be installed in the appropriate directories for the system where the program is to run. As with resource files, these directories depend on configurations established by the operating system vendor, the Motif vendor, and the system administrator.

MrmOpenHierarchyPerDisplay takes as an argument a list of names of UID files. It calls XtResolvePathname to find each file the list. If a filename is an absolute pathname, that pathname is the search path for XtResolvePathname. Otherwise, MrmOpenHierarchyPerDisplay constructs a search path in the following way:

  • If the environment variable UIDPATH is set, the value of that variable is the search path.

  • If UIDPATH is not set, but XAPPLRESDIR is set, MrmOpenHierarchyPerDisplay uses a default search path with entries that include $XAPPLRESDIR, the user's home directory, and vendor-dependent system directories.

  • If neither UIDPATH nor XAPPLRESDIR is set, MrmOpenHierarchyPerDisplay uses a default search path with entries that include the user's home directory and vendor-dependent system directories.

These paths may include the substitution field %U. In each call to XtResolvePathname, MrmOpenHierarchyPerDisplay substitutes the current filename from the list of UID files for %U. The paths may also include other substitution fields accepted by XtResolvePathname. In particular, XtResolvePathname substitutes the display's language string for %L, and it substitutes the components of the display's language string (in a vendor-dependent way) for %l, %t, and %c. If necessary MrmOpenHierarchyPerDisplay searches the path twice, first with %S mapped to .uid and then with %S mapped to NULL. The substitution field %T is always mapped to uid.

The usual mechanism for employing localized UID files is to use a search path that contains one of the substitutions derived from the display's language string. As with resource files, the vendor and system administrator must ensure that the directories where the localized UID files reside match the display's language string (or the appropriate component of the language string).

Message Catalogs and Localization

On an XPG3-compliant system, an application can use message catalogs to localize text. The format of message catalogs is implementation dependent, and the application must take steps to coordinate the locations of the message catalogs with the locations of resource, UID, and image files. Use of message catalogs requires the following steps:

  • Using an implementation-dependent method, prepare a separate message catalog containing text to be localized for each language.

  • Arrange to have the message catalogs installed in the appropriate directories on the systems on which the application will run.

  • Arrange for the user's environment to be set up correctly so that the application can read the message catalog appropriate to the language.

  • In the program, use the catopen function to open a message catalog and the catclose function to close it.

  • Use the catgets function to read text from an open message catalog.

  • If necessary, convert the text to the target format (such as a compound string) and, for resources, supply the text in the appropriate widget creation argument list or call to XtSetValues.

The catopen function takes as an argument the name of the message catalog file. If this is an absolute pathname, catopen opens that file. Otherwise, catopen uses the value of the NLSPATH environment variable as a search path. This path can contain a number of substitution fields. The filename passed to catopen is substituted for %N. The value of the LANG environment variable is substituted for %L, and its language, territory, and codeset components are substituted for %l, %t, and %c, respectively.

Note that these values may not be the same as the display's language string or its components. An application and software vendor that use message catalogs must coordinate the locations of message catalogs with those of localized resource, UID, and image files, which usually depend on the display's language string. One possible strategy is to call catopen with an absolute pathname constructed by calling XtResolvePathname with the value of NLSPATH as the search path argument. XtResolvePathname substitutes the display's language string and its components for %L, %l, %t, and %c in $NLSPATH. In this way, the application can use a single mechanism, the display's language string, to distinguish file locations by language. The software vendor must still arrange for the user's system administrator to install the message catalogs in the correct locations and to ensure that NLSPATH is appropriately set in the user's environment.

Images, Pixmaps, and Localization

A pixmap is a screen image that is stored in memory so that it can be recalled and displayed when needed. Motif has a number of pixmap resources that allow the application to supply pixmaps for backgrounds, borders, shadows, label and button faces, drag icons, and other uses. As with text, some pixmaps may be specific to particular language environments; these pixmaps need to be localized.

Motif maintains caches of pixmaps and images. The function XmGetPixmapByDepth searches these caches for a requested pixmap. If the requested pixmap is not in the pixmap cache and a corresponding image is not in the image cache, XmGetPixmapByDepth searches for an X bitmap file whose name matches the requested image name. XmGetPixmapByDepth calls XtResolvePathname to search for the file. If the requested image name is an absolute pathname, that pathname is the search path for XtResolvePathname. Otherwise, XmGetPixmapByDepth constructs a search path in the following way:

  • If the environment variable XBMLANGPATH is set, the value of that variable is the search path.

  • If XBMLANGPATH is not set but XAPPLRESDIR is set, XmGetPixmapByDepth uses a default search path with entries that include $XAPPLRESDIR, the user's home directory, and vendor-dependent system directories.

  • If neither XBMLANGPATH nor XAPPLRESDIR is set, XmGetPixmapByDepth uses a default search path with entries that include the user's home directory and vendor-dependent system directories.

These paths may include the substitution field %B. In each call to XtResolvePathname, XmGetPixmapByDepth substitutes the requested image name for %B. The paths may also include other substitution fields accepted by XtResolvePathname. In particular, XtResolvePathname substitutes the display's language string for %L, and it substitutes the components of the display's language string (in a vendor-dependent way) for %l, %t, and %c. The substitution field %T is always mapped to bitmaps, and %S is always mapped to NULL.

As with resource and UID files, the usual mechanism for employing localized X bitmap files is to use a search path that contains one of the substitutions derived from the display's language string. As with resource and UID files, the vendor and system administrator must ensure that the directories where the localized X bitmap files reside match the display's language string (or the appropriate component of the language string).

See Chapter 12, "Color and Pixmaps," for more information on images and pixmaps.

Comparing Approaches to Localization

The locale approach allows an application to use existing internationalization routines. On the other hand, the application is limited in portability to systems that support the same internationalization standards (XPG3, POSIX, or ANSI). This approach is also only applicable to applications using a single language.

The nonlocale approach only addresses the aspect of isolating information from the application and ensuring that it uses the proper localized version of this information. The disadvantage is that there is more work for the programmer and there may be nonstandard functionality. The advantages are that there is guaranteed portability across all platforms that support Motif, and that it allows handling of multiple character sets for specialized applications that require this functionality.

Advanced Topics in Internationalization

This section covers some advanced topics dealing with internationalization.

Internationalization and Text Input

An application subject to internationalization presents some unique problems when it deals with text input. The application must be able to correctly interpret and process text input in any language. This section explains how an application accomplishes this.

Input Method

Although there are many different keyboards in use, sometimes certain characters in an alphabetic language are not directly available on any keyboard. In this case, the user must type a combination of keys to input the desired character. The number of characters in an ideographic language far exceeds the capability of any keyboard and makes it impossible to have a keyboard with all of the language's symbols. In this case, input is usually accomplished based on the language's phonetics. These cases illustrate the concept of an input method. An input method is simply the mechanism that is used to map between the keys typed by a user and the resulting characters that are input to the application. A common feature of many input methods is that the application user may type combinations of keys to create a single character. Creating characters from keystrokes is called pre-editing.

Input methods may require several areas to display the actual keystrokes.

  • The status area is an output-only window that identifies the style of input (phonetic, numeric, stroke and radial, and so on) and the current status of an input method interaction.

  • The pre-edit area displays the intermediate text for languages that are composed before the application acts on the data. There are several possible locations for the pre-edit area:

    Over-the-spot 


    Displays the data in an input method window that is placed over the point of insertion.

    Off-the-spot 


    Displays the pre-edit window inside the application window (usually at the bottom) but not at the point of insertion.

    Root-window 


    Uses a pre-edit window that is a child of the root window.

    A VendorShell resource, XmNpreeditType determines which style is used for a Text or TextField input method. The syntax, possible values, and default value of this resource are implementation dependent.

  • The auxiliary area is used for popup menus and customizing dialogs that some input methods use.

Input methods are supplied by vendors and are implementation dependent. The VendorShell resource XmNinputMethod is an implementation-dependent string that specifies the input method portion of the locale modifiers. If a value is supplied for this resource, Motif uses it to set the locale modifiers before opening an input method for Text or TextField.

The following figure shows one possible program window with a Text widget using over-the-spot interaction for Japanese text input. The status area indicates that phonetic input is in use and insert mode is enabled. The pre-edit area shows that the letter "H" has been entered. Since there is no Hiragana phonetic equivalent, the "H" appears in the pre-edit window.

Figure 11-4. Text Widget Pre-Edit and Status Areas Using Over-the-Spot

Figure 11-4 
Text Widget Pre-Edit and Status Areas Using Over-the-Spot

The following figure shows the same window after a "u" has been entered following the "H" shown in the previous figure.

Figure 11-5. Text Widget Pre-Edit Area After Next Character Entry

Figure 11-5 
Text Widget Pre-Edit Area After Next Character Entry

Here the pre-edit area is displaying the phonetic equivalent of the English letters "hu" in Hiragana.

Input Context

An input context is the mechanism used to provide the state information needed to manage the information flow between the application and the input method. It is a combination of an input method, a locale specifying the encoding of character strings to be returned, an application window, and internal state information. The following figure shows the relationships involved. The input method is determined by the locale specified by the application user.

Figure 11-6. Input Method and Input Contexts

Figure 11-6 
Input Method and Input Contexts

Input Manager

The input method and input context described in the previous sections are transparent to the Motif programmer. Motif has an Input Manager that handles all necessary interface between an application and the input context and input method. The input manager functions are performed by VendorShell. A Motif application only needs to register a widget using XmImRegister. As mentioned earlier, you can select the input method user interaction style by setting the VendorShell resource XmNpreeditType. This is shown in the example program.

A widget using the input manager must use the function XmImMbLookupString to retrieve character input from the keyboard.

Input and the Motif Text Widget

The Motif Text and TextField widgets, when editable, provide a transparent connection to the locale-specific input method for text input. The application programmer specifies an appropriate font set in the Text or TextField XmNfontList resource and creates either widget as a descendant of VendorShell. VendorShell provides geometry management of the status and pre-edit areas. It also supplies a visual separator between the status area window and the application's top level window.

Setting the VendorShell resource XmNpreeditType dictates the location of the input method window. With an off-the-spot input method, the pre-edit and status area windows appear at the bottom of the application window.

Text Input Using a DrawingArea

An application that needs special text processing may use a DrawingArea for text input and output. For internationalized text input with any widget other than Text or TextField, the application must use the Xlib input method facilities. These allow the application to open an input method and input context and to obtain input from the input method. When using these facilities, an application may also need to handle input method geometry management, focus management, event filtering, and other issues. For more information, see Xlib—C Language X Interface.

Geometry Management of Pre-Edit and Status Areas

When an off-the-spot input method is used with the Text or TextField widget, the pre-edit and status areas are below the client's main window but inside the VendorShell. VendorShell accomplishes the necessary geometry management. If the application uses either XtGetValues or XtSetValues to get or set the height (XmNheight) of VendorShell, the height includes the height of the input method area.

The following figure shows a Text widget using an off-the-spot input method. The distance "h" is the additional height that the input manager needs to display the status and pre-edit areas. Note that in off-the-spot, the pre-edit area is at the bottom of the interaction.

Figure 11-7. Text Widget Pre-Edit and Status Areas Using Off-the-Spot

Figure 11-7 
Text Widget Pre-Edit and Status Areas Using Off-the-Spot

Compound Strings and Compound Text

Compound text is the standard format for exchanging textual data between X window system applications. This is necessary when the user moves text displayed in one codeset to another window with text in a different codeset. For example, the following figure shows two windows, one titled "UJIS" and the other titled "Shift JIS."

Figure 11-8. Reason for Compound Text

Figure 11-8 
Reason for Compound Text

Both windows represent a Motif Text widget, one with some Japanese UJIS characters displayed, and the other with some Shift JIS characters. If the user wants to cut text from one window and paste it in the other window, compound text is used to pass data between the two. The Motif Text widget does this automatically.

If one of the widgets in the previous figure is a Label widget instead of a Text widget, a different situation exists. This is because the Label widget has its text data in compound string format, while the text widget data is a simple character string. In order to pass text data between a Text or TextField widget and any other widget, the application needs to convert the compound string to compound text.

Motif has two functions, XmCvtXmStringToCT and XmCvtCTToXmString, for converting between compound strings and compound text.

XmCvtXmStringToCT converts a compound string to compound text. The converter uses the font list tag associated with a given compound string segment to select a compound text format for that segment. A registry defines a mapping between font list tags and compound text encoding formats. The converter uses the following algorithm for each compound string segment:

  1. If the compound string segment tag is mapped to XmFONTLIST_DEFAULT_TAG in the registry, the converter passes the text of the compound string segment to XmbTextListToTextProperty with an encoding style of XCompoundTextStyle and uses the resulting compound text for that segment.

  2. If the compound string segment tag is mapped to an MIT registered charset in the registry, the converter creates the compound text for that segment using the charset (from the registry) and the text of the compound string segment as defined in the X Consortium Standard Compound Text Encoding.

  3. If the compound string segment tag is mapped to a charset in the registry that is neither XmFONTLIST_DEFAULT_TAG nor an MIT registered charset, the converter creates the compound text for that segment using the charset (from the registry) and the text of the compound string segment as an "extended segment" with a variable number of octets per character.

  4. If the compound string segment tag is not mapped in the registry, the result is implementation dependent.

An application can use XmRegisterSegmentEncoding to map a font list element tag to a compound text encoding format. For example, the application may be using a font list element tag of "BOLD" to identify a compound text segment consisting of localized text to be displayed in a bold font. To ensure that the segment is treated as localized text when converted to compound text, the tag "BOLD" should be mapped to XmFONTLIST_DEFAULT_TAG as follows:

char *old_encoding = XmRegisterSegmentEncoding("BOLD",
                         XmFONTLIST_DEFAULT_TAG);
XtFree(old_encoding);

XmCvtCTToXmString converts compound text to a compound string. This function is implementation dependent.

See Chapter 16, "Interclient Communication," for more information on transferring data between applications. The compound text format is described in the X Consortium Standard Compound Text Encoding.