OpenOffice.org

OpenOffice.org is the leading cross-platform Office suite. Its a large project and a large localisation undertaking, but it is an important component of a localised desktop.

What is your language’s LCID

Microsoft defines LCIDs for various locales. You need to know this so that OpenOffice.org can work well on Windows and also so that documents you create can move seamlessly between MS Word and Office Writer as the language identifier is correct.

There are a number of places that you can use to identify the LCID. For most languages they will all agree but in some cases (See 1072/Sutu/Sesotho) it helps to look at all list to help clarify what exactly Microsoft meant.

What to do first

This is a very large application. If you can do a smaller section of the total and still have a useful product then that will help. We created this rough targeting guide using OpenOffice.org 1.1.3 and podebug

Localisation

Read the localisation documentation on the OpenOffice.org website: http://wiki.services.openoffice.org/wiki/Category:Localization

Things are now very easy since they are using Pootle. You can translate online on Pootle, or download the files to work offline with something like Virtaal.

gsicheck

The OpenOffice.org guys have a tool for checking the SDF file called gsicheck. But of course you don’t want to build the whole of OpenOffice.org simply to get one tool. pofilter will pick up most errors that gsicheck does but its nice to know that your SDF is good before submitting it. Read more and download from the OpenOffice.org website:

http://wiki.services.openoffice.org/wiki/Gsicheck

Then install it and use it

tar xvzf gsicheck-1.7.8_2.0m122.tar.gz
cd gsicheck-1.7.8_2.0m122
./gsicheck -c <GSI/SDF file>

Now go and fix the errors that it detected. You should correct these in your PO files.

AutoCorrect

The OpenOffice.org AutoCorrect file is a zip file called for example, acor_en-US.dat. Søren Thing Pedersen has created csv2acor.py which generates an AutoCorrect file from CSV sources.

The autocorrect file contains 3 XML files:

  • DocumentList.xml – pairs of mistyped words and their correct spelling
  • SentenceExceptList.xml – abbreviations that end with a fullstop that should be ignored when determining the end of a sentence
  • WordExceptList.xml – Words that may contain more than 2 leading capital eg. CDs

When using csv2acor.py your need to have 3 files with the same name as above but with a .csv file extension. WordExceptList.csv and SentenceExceptList.csv contain just a list of entries one per line surrounded by double quotes (“). DocumentList.csv is a comma separated list with the mistyped word in the first column and the correct word in the second column, all also surrounded by double quotes.

The translation program Virtaal also makes use of these files, so consider contributing it to this project as well.

WordExceptList.xml

If you have an existing spell checking wordlist then use the following to extract potential words:

egrep "^[A-Z][A-Z][a-z]" spell-wordlist > WordExceptList.new

This extracts all words that start with two capitals followed by a lower case letter. Add all the characters valid in your language.

SentenceExceptList.xml

If you have an existing spell checking wordlist then use the following to extract potential words:

egrep "\.$" spell-wordlist > SentenceExceptList.new

This extracts all entries that end in a fullstop.

DocumentList.xml

If you have an existing DocumentList.xml you can convert it to CSV using the following:

sed "s/<block-list:block block-list:abbreviated-name=\"/\"\\n\"/g;s/\" block-list:name=\"/\",\"/g;s/\"\/>//g" < DocumentList.xml > DocumentList.csv

Your’ll need to edit DocumentList.csv to remove some of the remaining XML data.

A cleaner method is to use the following XSLT – this way you don’t have to clean any XML data (so this is suitable for batch mode):

<?xml version="1.0" ?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 version="1.0"
 xmlns:block-list="http://openoffice.org/2001/block-list">

 <xsl:output method="text" encoding="utf-8"/>

<xsl:template match="//block-list:block">
  <xsl:text>"</xsl:text>
  <xsl:value-of select="@block-list:abbreviated-name"/>
  <xsl:text>"</xsl:text>
   <xsl:text>,</xsl:text>
   <xsl:text>"</xsl:text>
   <xsl:value-of select="@block-list:name"/>
   <xsl:text>"</xsl:text>
   <xsl:text>&#x0a;</xsl:text>
  </xsl:template>

</xsl:stylesheet>
</xml>

Run this script through any XSLT processor, e.g., for Saxon, type:

java -jar saxon8.jar DocumentList.xml <name-of-xslt> >DocumentList.new

Generating your new AutoCorrect file

Then run csv2acor.py acor_xx-YY.dat where xx-YY is your language and country code.

Spell Checker and Hyphenation in the official build

In order to add your spell checker and hyphenation file to OpenOffice.org CVS you need to do the following:

  • Ensure your license is compatible
  • Fill in the form at http://external.openoffice.org/
  • Fill out an Issue assigned to mh who needs to process the approval for inclusion

Holidays

  • wizards/source/schedule/LocalHolidays.xba

Looks like a StarBasic program that allows you to specify holidays, etc. FIXME need to check this more carefully

Child Workspace

OpenOffice developers use what they call child workspaces to make fixes and commit changes. These are usually linked to related bugs in IssueZilla.

Here some instructions to help you track your changes and see if they have been integrated/fixed:

Now you see which l10n CWS have been integrated and which not. By clicking on the CWS name you see the list of the bugs registered to that CWS. Once approved by QA you’ll exactly know in which milestone the CWS has been integrated.