Chapter 8. Designing for International Markets

This chapter provides basic guidelines for producing applications for international markets. Internationalization is the process of generalizing programs or systems so that they can handle a variety of languages, character sets, and national customs. Localization is the process of providing language-specific or country-specific information or support for programs.

In general, internationalization issues are handled by tools available to programmers on their system. For example, the ANSI C standard (ANS X3.159-1989) and POSIX 1003.1 have defined internationalization in terms of locale. The locale can then be set as part of the user's environment, allowing the program to access locale-specific information, such as data formats, collating sequences, and system messages, from system-specific or application-specific databases. You should use any internationalization tools available on your system to support internationalization in your application.

Following are some of the issues that need to be addressed in an internationalized application. In most cases, these issues are addressed by the internationalization tools available on your system. They are provided here primarily to increase your awareness of issues that can affect your programming. In a few cases, you may need to adjust your program to allow for size and layout changes of data in different locales.

Internationalized Text Input

Ideally, text is input from a keyboard that can directly produce all the characters needed for that language. It is sometimes the case, however, that text input requires a pre-edit step, whereby text is typed into a pre-edit area using a set of characters, later converted to another set of characters, and passed to the application. An input method is used to convert keyboard input to an encoding suitable for a Text control.

Fortunately, application developers do not need to worry about the input method as long as the Text controls in the application support the display and input of text in the writing system supported by the underlying system. Furthermore, defining how keyboard actions convert into characters suitable for text input and the display is the responsibility of the underlying system.

Designers of Text controls should create Text controls that support display and input of text in any writing system supported by the underlying system. Text controls can also support input and display of multiple writing systems.

System designers need to create input methods that address the following issues of internationalized text input:

The following subsections describe these issues and provide some guidelines for addressing them, but there is currently too much variation in systems, and the field is too new for this guide to make many firm recommendations about these issues.

Locating the Pre-Edit Area

The trend over time is for pre-edit areas to move closer to the location of the final text. Ideally, pre-edit should occur in place in the Text control being edited (known as an on-the-spot input method). This is technically difficult to do without fully integrating the Text control with the input method. In the absence of an on-the-spot input method, systems should use an over-the-spot input method. An over-the-spot input method places the pre-edit area above but separate from the Text control. In the absence of on-the-spot or over-the-spot input methods, systems can create input methods where the pre-edit area is separate on the display from the Text control that it sends input to. These models are know as off-the-spot input methods. In these input methods, a single pre-edit area can apply to a single Text control, a group of Text controls, an entire application window, or the whole screen. An input method where a single pre-edit area is responsible for all the Text controls on a screen is known as a root window input method.

Systems should support in-place or per-window input methods. They can support per-control, per-group, or per-screen input methods.

When using an off-the-spot input method, converted text obviously goes to the Text control with the input focus. The pre-edit area itself does not get the input focus. It only acts as an intermediary for the Text control. The contents of the pre-edit area prior to conversion can be maintained separately for each Text control; that is, when moving the focus from one Text control to another, the unconverted text from the pre-edit area can be maintained in the first Text control and any existing unconverted pre-edit text for the second Text control can be restored to the pre-edit area.

Displaying Status

After a block of text is input to the pre-edit area, the user performs some action, usually a key sequence, that causes the system to convert the pre-edit text into the final characters and pass it to Text control. Using an in-place input method, it is important to show the user which text is pre-edit text and which text has already been converted; that is, the system should give the user some idea of the status of input. In an in-place input method, this status is usually provided by some visual effect such as a font or color difference.

Using any input method including an in-place input method, the method can include an additional off-the-spot status area that displays input method status information such as input and output text formats.

Converting Pre-Edit Characters to Final Characters

When the user requests that the pre-edit text be converted to the final text format, the system may still not have enough information to unambiguously convert the text. In this case, the next step of the conversion depends on the system, the pre-edit format, and the final text format. The conversion can either fail, prompt for more pre-edit text, or present a list of possible choices and let the user pick one. An input method can present conversion choices to the user in a number of ways including the following:

  • Listing the choices in a DialogBox

  • Presenting the choices in an Option Menu

  • Presenting the choices in a Popup Menu

  • Allowing the user to cycle through choices using key sequences

Collating Sequences

To produce an alphanumeric list, printable characters are sorted according to a collating sequence. Printable characters include letters possibly with accents, numbers, punctuation characters, and other symbols such as an * (asterisk) or & (ampersand). The collating sequence defines the value and position of a character relative to the other characters.

Many applications make frequent use of collating sequences to produce alphanumeric lists. Examples of alphanumeric lists include the following:

  • A directory listing of filenames

  • The output from a sorting utility

  • An index produced by a text-processing application

  • The lists produced by a database application, such as lists of names or addresses

Country-Specific Data Formats

Country-specific data formats include the following:

Thousands Separators

The comma, period, space, and apostrophe are examples of valid separators for units of thousands as shown in the following examples:

1 234 567
1.234.567
1'234'567
1,234,567

Decimal Separators

The period, comma, and the center dot are examples of valid separators for decimal fractions as shown in the following examples:

5,324
5.324
5 324
5.324

Grouping Separators

Grouping may not be restricted to thousands separators as shown in the following examples:

400,001.00
40,0001,00

Positive and Negative Values

Various countries indicate positive and negative values differently. The symbols + (plus) and - (minus) can appear either before or after the number. Negative numbers can be enclosed in parentheses in applications such as a spreadsheet.

Currency

Currency formats differ among various countries. The comma, period, and colon are examples of valid separators for currency. There can be one or no space between the currency symbol and the amount. The currency symbol can be up to four characters. The following example shows valid currency values:

Sch3.50
SFr. 5.-
3.50FIM
25 c
3F50
760 Ptas
Esc. 3.50
kr. 3,50

Date Formats

Most countries use the Gregorian calendar, but some do not. Dates can be formatted differently based on the locale. Separators can be different in different locales or left out altogether. The hyphen, comma, period, space, and slash are all examples of valid separators for the day, month, and year. In numeric date formats, the month and day fields can be reversed, and, in some cases, the year field can come first. For example, the 4th of August 1992 can be written as either 4/8/92 or 8/4/92 depending on locale. In addition, users in other countries sometimes place the year first, so June 11, 1992 could be 920611 or 921106.

Time Formats

Time formats can change based on locale. The colon, period, and space are examples of valid separators for hours, minutes, and seconds. The letter h can separate hours and minutes. There is both 12-hour or 24-hour notation. For 12-hour notation, a.m. or p.m. can appear after the time, separated by a space. The following example shows a number of valid time formats:

1830
18:30
04 56
08h15
11.45 a.m.
11.45 p.m.
13:07:31.30
13:07:31

Telephone Numbers

Telephone numbers can contain blanks, commas, hyphens, periods, and brackets as valid separators, for example. Telephone numbers can be displayed in local, national, and international formats. Local formats vary widely. National formats can have an area code in parentheses, while the international format can drop the parentheses but add a + (plus sign) at the beginning of the number to indicate the country code. The following examples show valid telephone number formats:

(038) 473589
+44 (038) 473549
617.555.2199
(617) 555-2199
1 (617) 555-5525
(1) 617 555 5525
911
1-800-ORDERME

Proper Names and Addresses

Addresses can vary from two to six lines long and can include any character used in the locale's character set. The post code (zip code) can be in various positions in the address and can include alphabetic characters and separators as well as numbers.

Icons, Symbols, and Pointer Shapes

It may not always be possible to design an icon, pointer shape, or other graphical symbol that adequately represents the same object or function in different countries. Culture is inherent even in seemingly universal symbols. For example, sending and receiving mail is a commonly understood function, but representing that function with an icon of a mail box can be inappropriate because the appearance of mail boxes varies widely among countries. Therefore, an envelope may be a more appropriate icon. You should make sure that graphical symbols are localizable.

When used correctly, graphical symbols offer the following advantages:

  • They are language independent and do not need to be translated. In some cases, you may not be able to avoid changing an icon or symbol for a culture that is vastly different. However, design icons and symbols with the entire user population in mind so that you can try to avoid redesigning.

  • They can be used instead of computer terms that have no national-language equivalent.

  • They may have more impact when used with text as warnings than the text alone.

Here are a few guidelines to follow when creating icons, symbols, or pointer shapes:

  • Use an already existing international icon, if possible.

  • Make your icons, symbols, or pointer shapes represent basic, concrete concepts. The more abstract the icon, the more explanatory documentation is needed.

  • Check your icons and symbols for conflicts with existing icons or symbols for that function.

  • Do not incorporate text in icons because the text will need to be translated. Translated text often expands and might no longer fit the icon.

  • Test and retest your symbols and icons in context with real users.

Scanning Direction

Readers of Western languages scan from left to right across the page (or display screen) and from top to bottom. In other languages, particularly Eastern ones such as Hebrew and Arabic, this is not the case; readers scan from right to left. The scanning direction of the country can have an impact on the location of components in DialogBoxes, the order of selections in Menus, and other areas.

If your application will be used in environments other than those that scan from left to right, remember that the the scanning direction should match the input direction.

Designing Modularized Software

Modularizing software allows for easier localization; that is, a properly modularized application requires that fewer files be modified to localize the application. Guidelines for designing modularized software are as follows:

  • Create separate modules for text, code, and input/output components that need to be changed to accommodate different markets.

  • Separate all user interface text from the code that presents it.

  • Use standard (registered) data formats, such as ISO and IEEE.

  • Use standard processing algorithms for all processing, storage, and interchange.

In general, you should modularize your application so that elements that need to be translated to different languages are in separate files, and that those files are the only files that will need changes for localization. Furthermore, you should have a different set of language-dependent text files for each locale that are read in at run time using the internationalization tools available on your system.

Translating Screen Text

Well-written screen text makes an application easier for users to understand. It also makes translation easier.

Use the following guidelines to write screen text for translation:

  • Write brief and simple sentences; they are easy to understand and translate.

  • Write affirmative statements; they are easier to understand than negative statements. For example, use "Would you like to continue?" rather than "Wouldn't you like to continue?"

  • Use active voice; it is easier for both application users and application translators to understand. For example, use "Press the Help button." rather than "The Help button should be pressed."

  • Use prepositions to clarify the relationship of nouns; avoid stringing three or more nouns together.

  • Use simple vocabulary; avoid using jargon unless it is a part of your audience's working vocabulary.

  • Allow space for text expansion. Text translated from English is likely to expand 30% to 50%, or even more.