Glibc locale files

Locale files define the culural conventions of a language and region. The Glibc locale files are used on all GNU/Linux systems so are needed for software that will run on Linux.

Data

When creating loclale data for South African locales the data files are stored in version control. While these files are in development the reference version is the one contained there. When the files are suitably mature they are submitted to glibc and those become the reference versions while those located in version control will then contain any development work if needed.

In the mature stage the locale data can be abandoned and development made against versions in glibc.

It is suggested that you follow a similar model.

Tools

The following are tools located in the the Translate.org.za Subversion repository. They are useful for building and testing glibc locales:

missing determines if a locale file has a certain locale field or not.
error displays any compilation errors detected
install performs a test install (use -r for a real install – must be root)
definition prints the value of a locale field (installed locales only)
locale-escape converts your locale into <UNNNN> format
check-dates prints a list of the LC_TIME defined date formats for the locale

Editing

If you edit your locale file using vim then you can make use of the fdcc file highlighter. Newer versions of vim should already have this file installed and will detect the filetype automatically. If your file is not automatically highlighted then you will need to download the file and follow these instructions

It is preferable to edit your locale in UTF-8 and then use locale-escape to encode your work in the <UNNNN> format used in glibc locale files. Use locale-escape as follows:

cat unescaped-locale | ./locale-escape > escaped-locale

You can also use the iconv tool to achieve the same escaping (this will only work if your version of iconv supports the –unicode-subst option):

$ echo ÐÑÑÑÐÐÐ |
  iconv -f UTF-8 -t ASCII --unicode-subst='<U%04X>'

<U0420><U0443><U0441><U0441><U043A><U0438><U0439>

% Charset

All locales should contain a comment like

% Charset: UTF-8

at the beginning.

The recommendation is not to use ISO-8859-1 as it’s outdated, unless there is a substantial install base. Ideally you should only use UTF-8. But if necessary, ISO-8859-15 is preferred above ISO-8859-1.

Checking

For a quick check first install the locale as root run:

install -r xx_XX

Then run checks, either

definition xx_XX

Or

definition -c LC_TIME xx_XX

And go through each one checking that the entries are correct

Defining LC_TIME

Use ‘man date’ to see what variables are valid in a locale file date and time formatting. If you want to remove space padding then use minus in the variable eg: %-e will print the day of the week without a space padding before the number. E.g. ‘[space]1’ becomes simply ‘1’

Gentoo users

On gentoo you should NOT use the install script, but rather execute

localedef -i <your_escaped_file> -f UTF-8 <your_locale>.UTF-8 -c -v

The install script uses locale-gen and can give you quite a lot of trouble with accented chars.

Resources

Website and such:

Producing you locale file:

Resource specific to the ISO standard for locales files:

Notes

All changes to glibc locales must also be reflected into the IBM ICU locales. So you need to post ‘bug’ reports against ICU and possibly against the OO locales as well.

Submitting your new/update locale to glibc

Note: double check everything before sending. Its easy to overlook silly things like comments that still apply to a previous language. Check them all again.

Officially you should send your locale files to:

I have in the past sent email to Ulrich Drepper, the glibc maintainer. This is not guaranteed to work but if all else fails try this route.

Attach the file and preferably a diff between your update and the one in glibc CVS

diff -u xx_XX.glibc_version xx_XX.updated > xx_XX.diff

Make the subject very clear: “Update xx_XX glibc locale file”. Attach the files and send.

You also need to patch against localedata/SUPPORTED so that you can define what charsets you can use with your locales.