Recently Joel Spolsky wrote an article on Unicode and character encoding that significantly shortens the length that I need for this article, so before we go any farther please go read "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)".
Back? Great. That's just the beginning of world-class global software, but it is required for what I hope are now very clear reasons.
The next major resource I want to recommend for this subject is the book Developing International Software, Second Edition. The focus of the book is Windows-centric (as is this website), but there is plenty of information that you must know if you want to write international software no matter what platform it is for. Most (if not all) of the subjects in this article are covered in depth in that book.
Planning for Localization
When you create software for an international market you normally write and design it for your native language or the language of the first market (usually English), and then get it localized into other languages and for other regions. What needs to happen to localize software?
- Translation of all the strings to the new language (this is the most obvious one).
- Changing UI layout to properly display the new strings (German translations tend to be larger than English, Chinese tend to be smaller).
- Handling RTL (or 'mirrored') locales (where the text lays out from right-to-left instead of the English left-to-right style).
- Changing the way dates, times, currencies, etc. are handled to the appropriate style for the locale.
- Make sure the fonts are appropriate for the locale and language.
For too many software products, localization is something that blindsides the product team either late in the cycle (near release date) or after the English release. You have a successful product, you're selling like hotcakes, and now your sales team wants to expand into Europe, Asia, South America, Africa... anywhere they can make money! Then begins a trek up a steep learning curve and often the need to rewrite large portions of code to take into account all the differences between all the different regions and languages in the world.
Pseudo-Loc
The first technique for mitigating the problems is to pretend like you're localizing the product without going through the (significant) expense of doing so, in order to flush out problems early. This involves using characters in the language you're trying to test, making your strings larger or smaller than they are for English, setting the thread locale, etc.
Your test team is then able to find and file bugs against code related issues that can be fixed well ahead of the actual localization of the product. It ensures a few very good practices:
- The differences between the executable code for the different languages is either minimal or (ideally) none. This makes the testing matrix much more manageable.
- Localization and internationalization are on the team's mind throughout the whole product cycle.
- The processes for producing localized versions can be worked out early.
Separation of Localizable Resources
The first major step is to move all resources (strings, dialogs, etc.) that can change from locale to locale into a separate resource-only DLL. Some method for the developers to communicate the intent of the string/dialog/etc. to the localizer is time saving. The correlary is that only localizable resources should be in that DLL, any strings, bitmaps, or other resources that should not or will not change should not be in that DLL.
Once you have that separation you have a well understood item that needs to change during the process, and nothing else should have to be modified. Let's pretend we have an application called myapp.exe that currently contains all the code and resources for the project. If I give this to a localization team, they will have to both determine what needs to be changed and change it, an the result will be a different myapp.exe version for the new locale.
If I've got myapp.exe and myappres.dll, however, I separate the code from the resources and the only difference between my application in English vs. Traditional Chinese vs. Hebrew is which resource DLL it loads. This is very important for supportability issues. If my resources and code are all in one executable and I fix a nasty crashing bug and push it out to customers, I have to release a different version of that executable for each locale I've localized to. If they're separate, I can release one version to fix the executable only for all localized versions. Much easier to test the changes and release these fixes fast, too.
Common Controls
Writing globalized UI code is difficult and time consuming, which is why I heavily recommend leveraging the common controls provider by Windows whenever possible. Even seemingly simple pieces of UI are very expensive once you add up all the different things that need to be done right to be used in a world class application.
MUI
Multilingual User Interface is a feature that was added to Windows 2000 Professional and is a very important for multi-national companies who want to simplify deployment, support, and multi-user computer use.
Each user account on the MUI Windows system can set their own locale. This means that two users of a shared machine can each have the UI in their own language, it also means that support personnel can work on user's computers in their preferred language without disruption of the user's settings. For deployment, it enables a single image/installation type worldwide.
If these kinds of deployments are important for your application, supporting MUI can be a big win. The idea is that you have a MUI version of your application which installs a different resource DLL for each of the locales you support. When your application is run it calls the appropriate APIs to determine which locale the user is running and loads the resource DLL with the most correct resources.
Notice how thoroughly doing the right thing end to end makes these kinds of features easy to do.
IMEs
IME stands for Input Method Editor, and is how you can type (for instance) Japanese on a US keyboard. If you use common controls (as recommended) they are IME aware, if you do it yourself you need to make sure you understand the requirements and the level of support you want to provide.
Globalization Gotchas
Some other items to think about that may or may not have been called out above:
- Formats (and more!) can vary based on locale:
- Time and date
- Currencies
- Numbers
- Addresses, telephone numbers
- Units of measurement, standard sizes
- String sorting and comparing is very difficult
- Capitalization (uppercase, lowercase, when it is appropriate) is a big gotcha.
- Line and word breaking is very language specific.
The default answer should be to use the built-in Windows APIs for all these functions and let them do all the hard work for you. Windows has already paid the price to do this right, there's no reason for you to repeat it.
Comments