lang¶
Classes that represent languages and provides language-specific information.
All classes inherit from the parent class called common
.
The type of data includes:
Language codes
Language name
Plurals
Punctuation transformation
etc.
af¶
This module represents the Afrikaans language.
See also
- class translate.lang.af.af(code)¶
This class represents Afrikaans.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Modify this for the indefinite article (‘n).
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = 'ëïêôûáéíóúý'¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = ['']¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
- translate.lang.af.cyr2lat = {'Ё': 'Jo', 'А': 'A', 'Б': 'B', 'В': 'W', 'Г': 'G', 'Д': 'D', 'ДЖ': 'Dj', 'Е': 'Je', 'ЕЙ': 'Ei', 'Ж': 'Zj', 'З': 'Z', 'И': 'I', 'Й': 'J', 'К': 'K', 'Л': 'L', 'М': 'M', 'Н': 'N', 'О': 'O', 'П': 'P', 'Р': 'R', 'С': 'S', 'Т': 'T', 'У': 'Oe', 'Ф': 'F', 'Х': 'Ch', 'Ц': 'Ts', 'Ч': 'Tj', 'Ш': 'Sj', 'Щ': 'Sjtsj', 'Ъ': '', 'Ы': 'I', 'Ь': '', 'Э': 'E', 'Ю': 'Joe', 'Я': 'Ja', 'а': 'a', 'б': 'b', 'в': 'w', 'г': 'g', 'д': 'd', 'дж': 'dj', 'е': 'je', 'ей': 'ei', 'ж': 'zj', 'з': 'z', 'и': 'i', 'й': 'j', 'к': 'k', 'л': 'l', 'м': 'm', 'н': 'n', 'о': 'o', 'п': 'p', 'р': 'r', 'с': 's', 'т': 't', 'у': 'oe', 'ф': 'f', 'х': 'ch', 'ц': 'ts', 'ч': 'tj', 'ш': 'sj', 'щ': 'sjtsj', 'ъ': '', 'ы': 'i', 'ь': '', 'э': 'e', 'ю': 'joe', 'я': 'ja', 'ё': 'jo'}¶
Mapping of Cyrillic to Latin letters for transliteration in Afrikaans
- translate.lang.af.tranliterate_cyrillic(text)¶
Convert Cyrillic text to Latin according to the AWS transliteration rules.
am¶
This module represents the Amharic language.
See also
- class translate.lang.am.am(code)¶
This class represents Amharic.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = '፣ '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {',': '፣', '.': '።', ';': '፤'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '።!?…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
ar¶
This module represents the Arabic language.
See also
- class translate.lang.ar.ar(code)¶
This class represents Arabic.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['acronyms', 'simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = '، '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = (('0', '٠'), ('1', '١'), ('2', '٢'), ('3', '٣'), ('4', '٤'), ('5', '٥'), ('6', '٦'), ('7', '٧'), ('8', '٨'), ('9', '٩'))¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {',': '،', ';': '؛', '?': '؟'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
bn¶
This module represents the Bengali language.
See also
- class translate.lang.bn.bn(code)¶
This class represents Bengali.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = (('0', '০'), ('1', '১'), ('2', '২'), ('3', '৩'), ('4', '৪'), ('5', '৫'), ('6', '৬'), ('7', '৭'), ('8', '৮'), ('9', '৯'))¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'.\n': '।\n', '. ': '। '}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '।!?…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
code_or¶
This module represents the Odia language.
See also
- class translate.lang.code_or.code_or(code)¶
This class represents Odia.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'.\n': '।\n', '. ': '। '}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '।!?…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
common¶
This module contains all the common features for languages.
Supported features:
language code (km, af)
language name (Khmer, Afrikaans)
Plurals
Number of plurals (nplurals)
Plural equation
pofilter tests to ignore
Segmentation:
characters
words
sentences
Punctuation:
End of sentence
Start of sentence
Middle of sentence
Quotes
single
double
Valid characters
Accelerator characters
Special characters
Direction (rtl or ltr)
TODOs and Ideas for possible features:
Language-Team information
Segmentation
phrases
- class translate.lang.common.Common(code)¶
This class is the common parent class for all language classes.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
data¶
This module stores information and functionality that relates to plurals.
- translate.lang.data.cldr_plural_categories = ['zero', 'one', 'two', 'few', 'many', 'other']¶
List of plural tags generated from CLDR 44.0.1 using https://github.com/WeblateOrg/language-data
- translate.lang.data.expansion_factors = {'af': 0.1, 'ar': -0.09, 'es': 0.21, 'fr': 0.28, 'it': 0.2}¶
Source to target string length expansion factors.
- translate.lang.data.languages = {'ach': ('Acholi', 2, 'n > 1'), 'af': ('Afrikaans', 2, '(n != 1)'), 'ak': ('Akan', 2, 'n > 1'), 'am': ('Amharic', 2, 'n > 1'), 'an': ('Aragonese', 2, '(n != 1)'), 'anp': ('Angika', 2, '(n != 1)'), 'ar': ('Arabic', 6, 'n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5'), 'arn': ('Mapudungun; Mapuche', 2, 'n > 1'), 'as': ('Assamese', 2, '(n != 1)'), 'ast': ('Asturian; Bable; Leonese; Asturleonese', 2, '(n != 1)'), 'ay': ('Aymará', 1, '0'), 'az': ('Azerbaijani', 2, '(n != 1)'), 'be': ('Belarusian', 3, 'n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2'), 'bg': ('Bulgarian', 2, '(n != 1)'), 'bn': ('Bengali', 2, '(n != 1)'), 'bn_BD': ('Bengali (Bangladesh)', 2, '(n != 1)'), 'bn_IN': ('Bengali (India)', 2, '(n != 1)'), 'bo': ('Tibetan', 1, '0'), 'br': ('Breton', 2, 'n > 1'), 'brx': ('Bodo', 2, '(n != 1)'), 'bs': ('Bosnian', 3, 'n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2'), 'ca': ('Catalan; Valencian', 2, '(n != 1)'), 'ca@valencia': ('Catalan; Valencian (Valencia)', 2, '(n != 1)'), 'cgg': ('Chiga', 1, '0'), 'cs': ('Czech', 3, '(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2'), 'csb': ('Kashubian', 3, 'n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2'), 'cy': ('Welsh', 2, '(n==2) ? 1 : 0'), 'da': ('Danish', 2, '(n != 1)'), 'de': ('German', 2, '(n != 1)'), 'doi': ('Dogri', 2, '(n != 1)'), 'dz': ('Dzongkha', 1, '0'), 'el': ('Greek, Modern (1453-)', 2, '(n != 1)'), 'en': ('English', 2, '(n != 1)'), 'en_GB': ('English (United Kingdom)', 2, '(n != 1)'), 'en_ZA': ('English (South Africa)', 2, '(n != 1)'), 'eo': ('Esperanto', 2, '(n != 1)'), 'es': ('Spanish; Castilian', 2, '(n != 1)'), 'es_AR': ('Argentinean Spanish', 2, '(n != 1)'), 'et': ('Estonian', 2, '(n != 1)'), 'eu': ('Basque', 2, '(n != 1)'), 'fa': ('Persian', 2, 'n > 1'), 'ff': ('Fulah', 2, '(n != 1)'), 'fi': ('Finnish', 2, '(n != 1)'), 'fil': ('Filipino; Pilipino', 2, '(n > 1)'), 'fo': ('Faroese', 2, '(n != 1)'), 'fr': ('French', 2, '(n > 1)'), 'fur': ('Friulian', 2, '(n != 1)'), 'fy': ('Frisian', 2, '(n != 1)'), 'ga': ('Irish', 5, 'n==1 ? 0 : n==2 ? 1 : (n>2 && n<7) ? 2 :(n>6 && n<11) ? 3 : 4'), 'gd': ('Gaelic; Scottish Gaelic', 4, '(n==1 || n==11) ? 0 : (n==2 || n==12) ? 1 : (n > 2 && n < 20) ? 2 : 3'), 'gl': ('Galician', 2, '(n != 1)'), 'gu': ('Gujarati', 2, '(n != 1)'), 'gun': ('Gun', 2, '(n > 1)'), 'ha': ('Hausa', 2, '(n != 1)'), 'he': ('Hebrew', 2, '(n != 1)'), 'hi': ('Hindi', 2, '(n != 1)'), 'hne': ('Chhattisgarhi', 2, '(n != 1)'), 'hr': ('Croatian', 3, '(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2)'), 'ht': ('Haitian; Haitian Creole', 2, '(n != 1)'), 'hu': ('Hungarian', 2, '(n != 1)'), 'hy': ('Armenian', 1, '0'), 'ia': ('Interlingua (International Auxiliary Language Association)', 2, '(n != 1)'), 'id': ('Indonesian', 1, '0'), 'is': ('Icelandic', 2, '(n != 1)'), 'it': ('Italian', 2, '(n != 1)'), 'ja': ('Japanese', 1, '0'), 'jbo': ('Lojban', 1, '0'), 'jv': ('Javanese', 2, '(n != 1)'), 'ka': ('Georgian', 1, '0'), 'kab': ('Kabyle', 2, '(n != 1)'), 'kk': ('Kazakh', 2, 'n != 1'), 'kl': ('Greenlandic', 2, '(n != 1)'), 'km': ('Central Khmer', 1, '0'), 'kn': ('Kannada', 2, '(n != 1)'), 'ko': ('Korean', 1, '0'), 'kok': ('Konkani', 2, '(n != 1)'), 'ks': ('Kashmiri', 2, '(n != 1)'), 'ku': ('Kurdish', 2, '(n != 1)'), 'kw': ('Cornish', 4, '(n==1) ? 0 : (n==2) ? 1 : (n == 3) ? 2 : 3'), 'ky': ('Kirghiz; Kyrgyz', 2, 'n != 1'), 'lb': ('Luxembourgish; Letzeburgesch', 2, '(n != 1)'), 'ln': ('Lingala', 2, '(n > 1)'), 'lo': ('Lao', 1, '0'), 'lt': ('Lithuanian', 3, '(n%10==1 && n%100!=11 ? 0 : n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2)'), 'lv': ('Latvian', 3, '(n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2)'), 'mai': ('Maithili', 2, '(n != 1)'), 'me': ('Montenegrin', 3, 'n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2'), 'mfe': ('Morisyen', 2, '(n > 1)'), 'mg': ('Malagasy', 2, '(n > 1)'), 'mi': ('Maori', 2, '(n > 1)'), 'mk': ('Macedonian', 2, '(n==1 || n%10==1 ? 0 : 1)'), 'ml': ('Malayalam', 2, '(n != 1)'), 'mn': ('Mongolian', 2, '(n != 1)'), 'mni': ('Meithei (Manipuri)', 2, '(n != 1)'), 'mnk': ('Mandinka', 3, '(n==0 ? 0 : n==1 ? 1 : 2)'), 'mr': ('Marathi', 2, '(n != 1)'), 'ms': ('Malay', 1, '0'), 'mt': ('Maltese', 4, '(n==1 ? 0 : n==0 || ( n%100>1 && n%100<11) ? 1 : (n%100>10 && n%100<20 ) ? 2 : 3)'), 'my': ('Burmese', 1, '0'), 'nah': ('Nahuatl languages', 2, '(n != 1)'), 'nap': ('Neapolitan', 2, '(n != 1)'), 'nb': ('Bokmål, Norwegian; Norwegian Bokmål', 2, '(n != 1)'), 'ne': ('Nepali', 2, '(n != 1)'), 'nl': ('Dutch; Flemish', 2, '(n != 1)'), 'nn': ('Norwegian Nynorsk; Nynorsk, Norwegian', 2, '(n != 1)'), 'nqo': ("N'Ko", 2, '(n > 1)'), 'nso': ('Pedi; Sepedi; Northern Sotho', 2, '(n != 1)'), 'oc': ('Occitan (post 1500)', 2, '(n > 1)'), 'or': ('Odia', 2, '(n != 1)'), 'pa': ('Panjabi; Punjabi', 2, '(n != 1)'), 'pap': ('Papiamento', 2, '(n != 1)'), 'pl': ('Polish', 3, '(n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2)'), 'pms': ('Piemontese', 2, '(n != 1)'), 'ps': ('Pushto; Pashto', 2, '(n != 1)'), 'pt': ('Portuguese', 2, '(n != 1)'), 'pt_BR': ('Portuguese (Brazil)', 2, '(n > 1)'), 'rm': ('Romansh', 2, '(n != 1)'), 'ro': ('Romanian', 3, '(n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2)'), 'ru': ('Russian', 3, '(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2)'), 'rw': ('Kinyarwanda', 2, '(n != 1)'), 'sa': ('Sanskrit', 3, '(n==1 ? 0 : n==2 ? 1 : 2)'), 'sah': ('Yakut', 1, '0'), 'sat': ('Santali', 2, '(n != 1)'), 'scn': ('Sicilian', 2, '(n != 1)'), 'sco': ('Scots', 2, '(n != 1)'), 'sd': ('Sindhi', 2, '(n != 1)'), 'se': ('Northern Sami', 2, '(n != 1)'), 'si': ('Sinhala; Sinhalese', 2, '(n != 1)'), 'sk': ('Slovak', 3, '(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2'), 'sl': ('Slovenian', 4, '(n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3)'), 'so': ('Somali', 2, '(n != 1)'), 'son': ('Songhai languages', 1, '0'), 'sq': ('Albanian', 2, '(n != 1)'), 'sr': ('Serbian', 3, '(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2)'), 'st': ('Sotho, Southern', 2, '(n != 1)'), 'su': ('Sundanese', 1, '0'), 'sv': ('Swedish', 2, '(n != 1)'), 'sw': ('Swahili', 2, '(n != 1)'), 'szl': ('Silesian', 3, '(n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2)'), 'ta': ('Tamil', 2, '(n != 1)'), 'te': ('Telugu', 2, '(n != 1)'), 'tg': ('Tajik', 1, '0'), 'th': ('Thai', 1, '0'), 'ti': ('Tigrinya', 2, '(n > 1)'), 'tk': ('Turkmen', 2, '(n != 1)'), 'tr': ('Turkish', 2, '(n != 1)'), 'tt': ('Tatar', 1, '0'), 'ug': ('Uighur; Uyghur', 1, '0'), 'uk': ('Ukrainian', 3, '(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2)'), 'ur': ('Urdu', 2, '(n != 1)'), 'uz': ('Uzbek', 2, '(n > 1)'), 've': ('Venda', 2, '(n != 1)'), 'vi': ('Vietnamese', 1, '0'), 'wa': ('Walloon', 2, '(n > 1)'), 'wo': ('Wolof', 2, '(n != 1)'), 'yo': ('Yoruba', 2, '(n != 1)'), 'yue': ('Yue', 1, '0'), 'zh_CN': ('Chinese (China)', 1, '0'), 'zh_HK': ('Chinese (Hong Kong)', 1, '0'), 'zh_TW': ('Chinese (Taiwan)', 1, '0'), 'zu': ('Zulu', 2, '(n != 1)')}¶
Dictionary of language data. The language code is the dictionary key (which may contain country codes and modifiers). The value is a tuple: (Full name in English from iso-codes, nplurals, plural equation).
Note that the English names should not be used in user facing places - it should always be passed through the function returned from tr_lang(), or at least passed through _fix_language_name().
- translate.lang.data.normalize(string, normal_form='NFC')¶
Return a unicode string in its normalized form.
- Parameters:
string – The string to be normalized
normal_form – NFC (default), NFD, NFKC, NFKD
- Returns:
Normalized string
- translate.lang.data.scripts = {'Beng': ['bn', 'mni'], 'Deva': ['anp', 'bho', 'brx', 'doi', 'hi', 'kfy', 'kok', 'mai', 'mr', 'sa', 'sat'], 'Gujr': ['gu'], 'Khmr': ['km'], 'Knda': ['kn'], 'Laoo': ['lo'], 'Mlym': ['ml'], 'Mymr': ['my', 'shn'], 'Orya': ['or'], 'Sind': ['sd'], 'Taml': ['ta'], 'Tibt': ['bo'], 'assamese': ['as'], 'chinese': ['yue'], 'perso-arabic': ['ks']}¶
Dictionary of scripts data. The dictionary keys are ISO 15924 script codes, and script names where scripts are missing from standard. The value is a list of codes for languages using that script.
This is mainly used to alter the behavior of some checks (the accelerators one for example).
- translate.lang.data.simplercode(code)¶
This attempts to simplify the given language code by ignoring country codes, for example.
- translate.lang.data.simplify_to_common(language_code)¶
Simplify language code to the most commonly used form for the language, stripping country information for languages that tend not to be localized differently for different countries.
de¶
This module represents the German language.
See also
- class translate.lang.de.de(code)¶
This class represents German.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
el¶
This module represents the Greek language.
See also
- class translate.lang.el.el(code)¶
This class represents Greek.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {';': '·', '?': ';'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!;…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890αβγδεζηθικλμνξοπρστυφχψωΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ'¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
es¶
This module represents the Spanish language.
Note
As it only has special case code for initial inverted punctuation, it could also be used for Asturian, Galician, or Catalan.
- class translate.lang.es.es(code)¶
This class represents Spanish.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Implement some extra features for inverted punctuation.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
factory¶
This module provides a factory to instantiate language classes.
- translate.lang.factory.get_all_languages()¶
Return all language classes.
- translate.lang.factory.getlanguage(code)¶
This returns a language class.
- Parameters:
code – The ISO 639 language code
fa¶
This module represents the Persian language.
See also
- class translate.lang.fa.fa(code)¶
This class represents Persian.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = '، '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = (('0', '٠'), ('1', '١'), ('2', '٢'), ('3', '٣'), ('4', '٤'), ('5', '٥'), ('6', '٦'), ('7', '٧'), ('8', '٨'), ('9', '٩'), ('0', '۰'), ('1', '۱'), ('2', '۲'), ('3', '۳'), ('4', '۴'), ('5', '۵'), ('6', '۶'), ('7', '۷'), ('8', '۸'), ('9', '۹'))¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {',': '،', ';': '؛', '?': '؟'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Implement “French” quotation marks.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
fi¶
This module represents the Finnish language.
- class translate.lang.fi.fi(code)¶
This class represents Finnish.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890äöÄÖ'¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
fr¶
This module represents the French language.
See also
- class translate.lang.fr.fr(code)¶
This class represents French.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'!': '\xa0!', '#': '\xa0#', ':': '\xa0:', ';': '\xa0;', '?': '\xa0?'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Implement some extra features for quotation marks.
- Known shortcomings:
% and $ are not touched yet for fear of variables
Double spaces might be introduced
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890éÉ'¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
gu¶
This module represents the Gujarati language.
See also
- class translate.lang.gu.gu(code)¶
This class represents Gujarati.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
he¶
This module represents the Hebrew language.
See also
- class translate.lang.he.he(code)¶
This class represents Hebrew.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['acronyms', 'simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
hi¶
This module represents the Hindi language.
See also
- class translate.lang.hi.hi(code)¶
This class represents Hindi.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
hy¶
This module represents the Armenian language.
See also
- class translate.lang.hy.hy(code)¶
This class represents Armenian.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = 'n!=1 ? 1 : 0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'!': '՜', '.': '։', ':': '՝', '?': '՞'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»…±°¹²³·©®×£¥€։՝՜՞'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '։՝՜…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
identify¶
This module contains functions for identifying languages based on language models.
ja¶
This module represents the Japanese language.
See also
- class translate.lang.ja.ja(code)¶
This class represents Japanese.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = '、、,,'¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {',\n': '、\n', ', ': '、', '.\n': '。\n', '. ': '。'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '。。!?!?…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
km¶
This module represents the Khmer language.
See also
- class translate.lang.km.km(code)¶
This class represents Khmer.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- khmerpunc = '។៕៖៘'¶
These marks are only used for Khmer.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = 'n!=1 ? 1 : 0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'!': '\xa0!', '.': '\xa0។', ':': '\xa0៖', '?': '\xa0?'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»…±°¹²³·©®×£¥€។៕៖៘'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '!?…។៕៘'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
kn¶
This module represents the Kannada language.
See also
- class translate.lang.kn.kn(code)¶
This class represents Kannada.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
ko¶
This module represents the Korean language.
See also
- class translate.lang.ko.ko(code)¶
This class represents Korean.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
ml¶
This module represents the Malayalam language.
See also
- class translate.lang.ml.ml(code)¶
This class represents Malayalam.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
mr¶
This module represents the Marathi language.
See also
- class translate.lang.mr.mr(code)¶
This class represents Marathi.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
ne¶
This module represents the Nepali language.
See also
- class translate.lang.ne.ne(code)¶
This class represents Nepali.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['accelerators', 'simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'.': ' ।', '?': ' ?'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '।!?…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
ngram¶
Ngram models for language guessing.
Note
Orignal code from http://thomas.mangin.me.uk/data/source/ngram.py
pa¶
This module represents the Punjabi language.
See also
- class translate.lang.pa.pa(code)¶
This class represents Punjabi.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'.\n': '।\n', '. ': '। '}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '।!?…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
poedit¶
Functions to manage Poedit’s language features.
Note
The ISO 639 maps are from Poedit’s isocode.cpp (v1.4.2) to ensure that we match currently released versions of Poedit.
- translate.lang.poedit.dialects = {'Chinese': {'CHINA': 'zh_CN', 'None': 'zh_CN', 'TAIWAN': 'zh_TW'}, 'English': {'None': 'en', 'SOUTH AFRICA': 'en_ZA', 'UNITED KINGDOM': 'en_GB'}, 'Portuguese': {'BRAZIL': 'pt_BR', 'None': 'pt', 'PORTUGAL': 'pt'}}¶
Language dialects based on ISO 3166 country names, ‘None’ is the default fallback
- translate.lang.poedit.isocode(language, country=None)¶
Returns a language code for the given Poedit language name.
Poedit uses language and country names in the PO header entries:
X-Poedit-Language
X-Poedit-Country
This function converts the supplied language name into the required ISO 639 code. If needed, in the case of
dialects
, the country name is used to create an xx_YY style dialect code.- Parameters:
language (String) – Language name
country (String) – Country name
- Returns:
ISO 639 language code
- Return type:
String
- translate.lang.poedit.lang_codes = {'aa': 'Afar', 'ab': 'Abkhazian', 'ae': 'Avestan', 'af': 'Afrikaans', 'am': 'Amharic', 'ar': 'Arabic', 'as': 'Assamese', 'ay': 'Aymara', 'az': 'Azerbaijani', 'ba': 'Bashkir', 'be': 'Belarusian', 'bg': 'Bulgarian', 'bh': 'Bihari', 'bi': 'Bislama', 'bn': 'Bengali', 'bo': 'Tibetan', 'br': 'Breton', 'bs': 'Bosnian', 'ca': 'Catalan', 'ce': 'Chechen', 'ch': 'Chamorro', 'co': 'Corsican', 'cs': 'Czech', 'cu': 'Church Slavic', 'cv': 'Chuvash', 'cy': 'Welsh', 'da': 'Danish', 'de': 'German', 'dz': 'Dzongkha', 'el': 'Greek', 'en': 'English', 'eo': 'Esperanto', 'es': 'Spanish', 'et': 'Estonian', 'eu': 'Basque', 'fa': 'Persian', 'fi': 'Finnish', 'fj': 'Fijian', 'fo': 'Faroese', 'fr': 'French', 'fur': 'Friulian', 'fy': 'Frisian', 'ga': 'Irish', 'gd': 'Gaelic', 'gl': 'Galician', 'gn': 'Guarani', 'gu': 'Gujarati', 'ha': 'Hausa', 'he': 'Hebrew', 'hi': 'Hindi', 'ho': 'Hiri Motu', 'hr': 'Croatian', 'hu': 'Hungarian', 'hy': 'Armenian', 'hz': 'Herero', 'ia': 'Interlingua', 'id': 'Indonesian', 'ie': 'Interlingue', 'ik': 'Inupiaq', 'is': 'Icelandic', 'it': 'Italian', 'iu': 'Inuktitut', 'ja': 'Japanese', 'jw': 'Javanese', 'ka': 'Georgian', 'ki': 'Kikuyu', 'kj': 'Kuanyama', 'kk': 'Kazakh', 'kl': 'Kalaallisut', 'km': 'Khmer', 'kn': 'Kannada', 'ko': 'Korean', 'ks': 'Kashmiri', 'ku': 'Kurdish', 'kv': 'Komi', 'kw': 'Cornish', 'ky': 'Kyrgyz', 'la': 'Latin', 'lb': 'Letzeburgesch', 'ln': 'Lingala', 'lo': 'Lao', 'lt': 'Lithuanian', 'lv': 'Latvian', 'mg': 'Malagasy', 'mh': 'Marshall', 'mi': 'Maori', 'mk': 'Macedonian', 'ml': 'Malayalam', 'mn': 'Mongolian', 'mo': 'Moldavian', 'mr': 'Marathi', 'ms': 'Malay', 'mt': 'Maltese', 'my': 'Burmese', 'na': 'Nauru', 'nb': 'Norwegian Bokmal', 'ne': 'Nepali', 'ng': 'Ndonga', 'nl': 'Dutch', 'nn': 'Norwegian Nynorsk', 'nr': 'Ndebele, South', 'nv': 'Navajo', 'ny': 'Chichewa; Nyanja', 'oc': 'Occitan', 'om': '(Afan) Oromo', 'or': 'Oriya', 'os': 'Ossetian; Ossetic', 'pa': 'Panjabi', 'pi': 'Pali', 'pl': 'Polish', 'ps': 'Pashto, Pushto', 'pt': 'Portuguese', 'qu': 'Quechua', 'rm': 'Rhaeto-Romance', 'rn': 'Rundi', 'ro': 'Romanian', 'ru': 'Russian', 'rw': 'Kinyarwanda', 'sa': 'Sanskrit', 'sc': 'Sardinian', 'sd': 'Sindhi', 'se': 'Northern Sami', 'sg': 'Sangro', 'sh': 'Serbo-Croatian', 'si': 'Sinhalese', 'sk': 'Slovak', 'sl': 'Slovenian', 'sm': 'Samoan', 'sn': 'Shona', 'so': 'Somali', 'sq': 'Albanian', 'sr': 'Serbian', 'ss': 'Siswati', 'st': 'Sesotho', 'su': 'Sundanese', 'sv': 'Swedish', 'sw': 'Swahili', 'ta': 'Tamil', 'te': 'Telugu', 'tg': 'Tajik', 'th': 'Thai', 'ti': 'Tigrinya', 'tk': 'Turkmen', 'tl': 'Tagalog', 'tn': 'Setswana', 'to': 'Tonga', 'tr': 'Turkish', 'ts': 'Tsonga', 'tt': 'Tatar', 'tw': 'Twi', 'ty': 'Tahitian', 'ug': 'Uighur', 'uk': 'Ukrainian', 'ur': 'Urdu', 'uz': 'Uzbek', 'vi': 'Vietnamese', 'vo': 'Volapuk', 'wa': 'Walloon', 'wo': 'Wolof', 'xh': 'Xhosa', 'yi': 'Yiddish', 'yo': 'Yoruba', 'za': 'Zhuang', 'zh': 'Chinese', 'zu': 'Zulu'}¶
ISO369 codes and names as used by Poedit. Mostly these are identical to ISO 639, but there are some differences.
- translate.lang.poedit.lang_names = {'(Afan) Oromo': 'om', 'Abkhazian': 'ab', 'Afar': 'aa', 'Afrikaans': 'af', 'Albanian': 'sq', 'Amharic': 'am', 'Arabic': 'ar', 'Armenian': 'hy', 'Assamese': 'as', 'Avestan': 'ae', 'Aymara': 'ay', 'Azerbaijani': 'az', 'Bashkir': 'ba', 'Basque': 'eu', 'Belarusian': 'be', 'Bengali': 'bn', 'Bihari': 'bh', 'Bislama': 'bi', 'Bosnian': 'bs', 'Breton': 'br', 'Bulgarian': 'bg', 'Burmese': 'my', 'Catalan': 'ca', 'Chamorro': 'ch', 'Chechen': 'ce', 'Chichewa; Nyanja': 'ny', 'Chinese': 'zh', 'Church Slavic': 'cu', 'Chuvash': 'cv', 'Cornish': 'kw', 'Corsican': 'co', 'Croatian': 'hr', 'Czech': 'cs', 'Danish': 'da', 'Dutch': 'nl', 'Dzongkha': 'dz', 'English': 'en', 'Esperanto': 'eo', 'Estonian': 'et', 'Faroese': 'fo', 'Fijian': 'fj', 'Finnish': 'fi', 'French': 'fr', 'Frisian': 'fy', 'Friulian': 'fur', 'Gaelic': 'gd', 'Galician': 'gl', 'Georgian': 'ka', 'German': 'de', 'Greek': 'el', 'Guarani': 'gn', 'Gujarati': 'gu', 'Hausa': 'ha', 'Hebrew': 'he', 'Herero': 'hz', 'Hindi': 'hi', 'Hiri Motu': 'ho', 'Hungarian': 'hu', 'Icelandic': 'is', 'Indonesian': 'id', 'Interlingua': 'ia', 'Interlingue': 'ie', 'Inuktitut': 'iu', 'Inupiaq': 'ik', 'Irish': 'ga', 'Italian': 'it', 'Japanese': 'ja', 'Javanese': 'jw', 'Kalaallisut': 'kl', 'Kannada': 'kn', 'Kashmiri': 'ks', 'Kazakh': 'kk', 'Khmer': 'km', 'Kikuyu': 'ki', 'Kinyarwanda': 'rw', 'Komi': 'kv', 'Korean': 'ko', 'Kuanyama': 'kj', 'Kurdish': 'ku', 'Kyrgyz': 'ky', 'Lao': 'lo', 'Latin': 'la', 'Latvian': 'lv', 'Letzeburgesch': 'lb', 'Lingala': 'ln', 'Lithuanian': 'lt', 'Macedonian': 'mk', 'Malagasy': 'mg', 'Malay': 'ms', 'Malayalam': 'ml', 'Maltese': 'mt', 'Maori': 'mi', 'Marathi': 'mr', 'Marshall': 'mh', 'Moldavian': 'mo', 'Mongolian': 'mn', 'Nauru': 'na', 'Navajo': 'nv', 'Ndebele, South': 'nr', 'Ndonga': 'ng', 'Nepali': 'ne', 'Northern Sami': 'se', 'Norwegian Bokmal': 'nb', 'Norwegian Nynorsk': 'nn', 'Occitan': 'oc', 'Oriya': 'or', 'Ossetian; Ossetic': 'os', 'Pali': 'pi', 'Panjabi': 'pa', 'Pashto, Pushto': 'ps', 'Persian': 'fa', 'Polish': 'pl', 'Portuguese': 'pt', 'Quechua': 'qu', 'Rhaeto-Romance': 'rm', 'Romanian': 'ro', 'Rundi': 'rn', 'Russian': 'ru', 'Samoan': 'sm', 'Sangro': 'sg', 'Sanskrit': 'sa', 'Sardinian': 'sc', 'Serbian': 'sr', 'Serbo-Croatian': 'sh', 'Sesotho': 'st', 'Setswana': 'tn', 'Shona': 'sn', 'Sindhi': 'sd', 'Sinhalese': 'si', 'Siswati': 'ss', 'Slovak': 'sk', 'Slovenian': 'sl', 'Somali': 'so', 'Spanish': 'es', 'Sundanese': 'su', 'Swahili': 'sw', 'Swedish': 'sv', 'Tagalog': 'tl', 'Tahitian': 'ty', 'Tajik': 'tg', 'Tamil': 'ta', 'Tatar': 'tt', 'Telugu': 'te', 'Thai': 'th', 'Tibetan': 'bo', 'Tigrinya': 'ti', 'Tonga': 'to', 'Tsonga': 'ts', 'Turkish': 'tr', 'Turkmen': 'tk', 'Twi': 'tw', 'Uighur': 'ug', 'Ukrainian': 'uk', 'Urdu': 'ur', 'Uzbek': 'uz', 'Vietnamese': 'vi', 'Volapuk': 'vo', 'Walloon': 'wa', 'Welsh': 'cy', 'Wolof': 'wo', 'Xhosa': 'xh', 'Yiddish': 'yi', 'Yoruba': 'yo', 'Zhuang': 'za', 'Zulu': 'zu'}¶
Reversed
lang_codes
si¶
This module represents the Sinhala language.
See also
- class translate.lang.si.si(code)¶
This class represents Sinhala.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
st¶
This module represents the Southern Sotho language.
- class translate.lang.st.st(code)¶
This class represents Southern Sotho.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = ['o', 'le', 'ba']¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
sv¶
This module represents the the Swedish language.
See also
- class translate.lang.sv.sv(code)¶
This class represents Swedish.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890åäöÅÄÖ'¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
ta¶
This module represents the Tamil language.
See also
- class translate.lang.ta.ta(code)¶
This class represents Tamil.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
team¶
Module to guess the language ISO code based on the ‘Language-Team’ entry in the header of a Gettext PO file.
- translate.lang.team.LANG_TEAM_CONTACT_SNIPPETS = {'af': ('i18n@af.org.za', 'Petri Jooste'), 'am': ('@geez.org',), 'ar': ('arabeyes.org', 'Arabeyes'), 'as': ('assam@mm.assam-glug.org',), 'ast': ('@softastur.org', 'launchpad.net/~ubuntu-l10n-ast', 'softast-xeneral@lists.sourceforge.net', 'Softastur'), 'az': ('linuxaz@azerimal.net', 'gnome@azitt.com', 'gnome@azətt.com'), 'az_IR': ('az-ir@lists.sharif.edu',), 'be': ('i18n@mova.org', 'i18n@tut.by', 'mozilla_byx@poczta.fm'), 'be@latin': ('translation-team-be-latin@lists', 'be-latin.open-tran.eu'), 'bg': ('dict@fsa-bg.org', 'dict@linux.zonebg.com'), 'bn': ('gnome-translation@bengalinux.org', 'core@bengalinux.org', 'ankur-bd-l10n@googlegroups.com', 'redhat-translation@bengalinux.org'), 'bn_IN': ('anubad@lists.ankur.org.in',), 'br': ('drouizig@drouizig.org', 'brenux@free.fr', 'tradgnome@softcatala.net', 'fedora@softcatala.org'), 'bs': ('lokal@linux.org.ba', 'lokal@lugbih.org'), 'ca': ('@softcatala.org',), 'crh': ('tilde-birlik-tercime@lists.sourceforge.net',), 'cs': ('fedora-cs-list@redhat.com', 'cs-users@lists.fedoraproject.org', 'debian-l10n-czech@lists.debian.org', 'kde-czech-apps@lists.sourceforge.net', 'kde-czech-apps@lists.sf.net', 'translations.cs@gnupg.cz'), 'cy': ('gnome-cy@lists.linux.org.uk', 'gnome-cy@pengwyn.linux.org.uk', 'gnome-cy@www.linux.org', 'gnome-cy@www.linux.org.uk', 'cy@pengwyn.linux.org.uk'), 'da': ('dansk@dansk-gruppen.dk', 'dansk@klid.dk', 'sslug-locale@sslug.dk'), 'de': ('gnome-de@gnome.org', 'debian-l10n-german@lists.debian.org'), 'dz': ('pgeyleg@dit.gov.bt', 'pgyeleg@dit.gov.bt'), 'el': ('debian-l10n-greek@lists.debian.org', 'i18ngr@lists.hellug.gr', 'i18n@hellug.gr', 'nls@tux.hellug.gr', 'team@gnome.gr', 'team@lists.gnome.gr', 'users@el.openoffice.org'), 'en@shaw': ('ubuntu-l10n-en-shaw@launchpad.net', 'ubuntu-l10n-en-shaw@lists.launchpad.net'), 'en_AU': ('trans@six-by-nine.com.au',), 'en_CA': ('adamw@gnome.org', 'adamw@freebsd.org'), 'en_GB': ('kde-en-gb@kde.me.uk',), 'eo': ('eo-tradukado@lists.tuxfamily.org', 'debian-l10n-esperanto@lists.debian.org', 'ubuntu-l10n-eo@lists.launchpad.net', 'eo-tradukado.tuxfamily.org'), 'es': ('pgsql-es-ayuda@postgresql.org', 'debian-l10n-spanish@lists.debian.org', 'gnome-es@gnome.org', 'traductores@es.gnome.org'), 'et': ('gnome-et@linux.ee', 'kde-et@linux.ee', 'linux-ee@lists.eenet.ee', 'linux-et@lists.eenet.ee', 'et-gnome@linux.ee', 'linux-ee@eenet.ee'), 'eu': ('debian-l10n-basque@lists.debian.org', 'debian-l10n-eu@lists.debian.org', 'itzulpena@euskalgnu.org', 'gnome@euskalgnu.org', 'librezale@librezale.org', 'linux-eu@chanae.alphanet.ch'), 'fa': ('farsi@lists.sharif.edu', 'Farsiweb.info'), 'fi': ('debian-l10n-finnish@lists.debian.org', 'gnome-fi-laatu@lists.sourceforge.net', 'laatu@lokalisointi.org', 'lokalisointi-laatu@linux-aktivaattori.org', 'laatu@gnome.fi', 'yast-trans-fi@kotoistaminen.novell.fi'), 'fr': ('debian-l10n-french@lists.debian.org', 'gnomefr@traduc.org', 'kde-francophone@kde.org', 'traduc@traduc.org', 'pgsql-fr-generale@postgresql.org', 'rpm-fr@livna.org'), 'ga': ('gaeilge-gnulinux@lists.sourceforge.net', 'gaeilge-a@listserv.heanet.ie'), 'gl': ('trasno@ceu.fi.udc.es', 'gnome@g11n.net', 'gpul-traduccion@ceu.fi.udc.es', 'proxecto@trasno.net', 'trasno@gpul.org'), 'gu': ('indianoss-gujarati@lists.sourceforge.net',), 'he': ('debian-hebrew-common@lists.alioth.debian.org', 'kde-il@yahoogroups.com', 'fedora-he-list@redhat.com', 'mdk-hebrew@iglu.org.il'), 'hi': ('indlinux-hindi-gnome@lists.sourceforge.net', 'indlinux-hindi@lists.sourceforge.net'), 'hr': ('translator-shop.org', 'lokalizacija@linux.hr'), 'hu': ('debian-l10n-hungarian@lists.debian.org', 'gnome@fsf.hu', 'gnome@gnome.hu', 'magyar@lists.linux.hu'), 'id': ('@id.gnome.org', '@gnome.linux.or.id', 'mdk-id@yahoogroups.com', 'linux.or.id', 'gnome@i15n.org'), 'io': ('gnome-ido@lists.mterry.name',), 'is': ('gnome@techattack.nu', 'kde-isl@mmedia.is', 'kde-isl@molar.is'), 'it': ('debian-l10n-italian@lists.debian.org', 'traduzioni@itpug.org', 'fedora-trans-it@redhat.com', 'tp@lists.linux.it'), 'ja': ('debian-doc@debian.or.jp', 'debian-japanese@lists.debian.org', 'gnome-translation@gnome.gr.jp', 'translation@gnome.gr.jp', 'jpug-doc@ml.postgresql.jp'), 'ka': ('geognome@googlegroups.com', 'Ubuntu-Georgian-Translators@googlegroups.com'), 'kk': ('kk_KZ@googlegroups.com',), 'km': ('@khmeros.info',), 'kn': ('debian-l10n-kannada@lists.debian.org',), 'ko': ('gnome-kr-hackers@list.kldp.net', 'gnome-kr-hackers@lists.kldp.net', 'gnome-kr-translation@lists.kldp.net', 'pgsql-kr@postgresql.or.kr', 'hangul-hackers@lists.kldp.net', 'debian-l10n-korean@lists.debian.org', 'gnome-kr-translation@lists.sourceforge.net'), 'ks': ('ks-gnome-trans-commits@lists.code.indlinux.net',), 'ku': ('gnu-ku-wergerandin@lists.sourceforge.net',), 'ky': ('i18n-team-ky-kyrgyz@lists.sourceforge.net', 'ky-li@mail.ru'), 'la': ('gnome-latin-list@gnome.org',), 'li': ('li@gnome.org',), 'lt': ('gimp-lt@lists.akl.lt', 'gnome-lt@lists.akl.lt', 'gnome-lt@lists.gnome.org', 'komp_lt@konferencijos.lt'), 'lv': ('lata-l10n@googlegroups.com', 'lata-i18n@groups.google.com', 'locale@laka.lv', 'll10nt@os.lv'), 'mai': ('maithili.sf.net',), 'mg': ('i18n-malagasy-gnome@gnome.org',), 'mi': ('maori@nzlinux.org.nz',), 'mk': ('gnomk-main@lists.sourceforge.net', 'lug@lists.linux.net.mk', 'mkde-l10n@lists.sourceforge.net', 'ossm-members@hedona.on.net.mk'), 'ml': ('smc-discuss@googlegroups.com',), 'mn': ('openmn-', 'openmn.org'), 'ms': ('gabai-penyumbang@lists.sourceforge.net', 'gabai-penyumbang@lists.sf.net', 'kedidiemas@yahoogroups.com'), 'nb': ('i18n-nb@lister.ping.uio.no',), 'nds': ('nds-lowgerman@lists.sourceforge.net',), 'ne': ('info@mpp.org.np',), 'nl': ('debian-l10n-dutch@lists.debian.org', 'vertaling@nl.gnome.org', 'vertaling@vrijschrift.org', 'nl@vrijschrift.org', 'vertaling@nl.linux.org', 'vertaling@nl.li.org'), 'nn': ('i18n-nn@lister.ping.uio.no',), 'nso': ('sepedi@translate.org.za',), 'or': ('oriya-group@lists.sarovar.org', 'oriya-it@googlegroups.com'), 'pa': ('punjabi-l10n@users.sf.net', 'fedora-pa-list@redhat.com', 'punjabi-users@lists.sf.net', 'punjabi-l10n@lists.sourceforge.net', 'punlinux-i18n@lists.sourceforge.net'), 'pl': ('gnomepl@aviary.pl', 'debian-l10n-polish@lists.debian.org', 'gnome-l10n@lists.aviary.pl', 'translators@gnomepl.org'), 'ps': ('pathanisation@googelgroups.com',), 'pt': ('fedora-trans-pt@redhat.org', 'gnome_pt@yahoogroups.com', 'traduz@debianpt.org', 'traduz@debian.pt'), 'pt_BR': ('gnome-l10n-br@listas.cipsga.org.br', 'gnome-pt_br-list@gnome.org', 'fedora-docs-br@redhat.com', 'fedora-trans-pt-br@redhat.com', 'ldp-br@bazar.conectiva.com.br', 'pgbr-dev@postgresql.org.br', 'pgbr-dev@listas.postgresql.org.br', 'debian-l10n-portuguese@lists.debian.org'), 'ro': ('fedora-ro@googlegroups.com', 'gnomero-list@lists.sourceforge.net', 'debian-l10n-romanian@lists.debian.org'), 'ru': ('pgsql-rus@yahoogroups.com', 'debian-l10n-russian@lists.debian.org', 'gnupg-ru@gnupg.org'), 'scn': ('l10n@cademiasiciliana.org',), 'sk': ('sk-i18n@lists.linux.sk', 'kde-sk@linux.sk'), 'sl': ('gnome-si@googlegroups.com',), 'sq': ('gnome-albanian-perkthyesit@lists.sourceforge.net', 'debian-l10n-albanian@lists.debian.org'), 'sr': ('@prevod.org', 'serbiangnome-lista@nongnu.org'), 'sv': ('debian-l10n-swedish@lists.debian.org', 'tp-sv@listor.tp-sv.se'), 'ta': ('gnome-tamil-translation@googlegroups.com', 'tamilinix@yahoogroups.com', 'Ubuntu-l10n-tam@lists.ubuntu.com', 'tamil-DI@yahoogroups.com'), 'te': ('localisation@swecha.org', 'indlinux-telugu@lists.sourceforge.net'), 'th': ('l10n@opentle.org', 'thai-l10n@googlegroup.com', 'thailang@buraphalinux.org', 'thai-l10n@googlegroups.com', 'l10n.opentle.org'), 'tk': ('kakilikgroup@yahoo.com',), 'tl': ('debian-tl@banwa.upm.edu.ph',), 'tr': ('debian-l10n-turkish@lists.debian.org', 'gnome-turk@gnome.org', 'gnu-tr-u12a@lists.sourceforge.net', 'turkce@pardus.org.tr'), 'tt': ('tatarish.l10n@gmail.com',), 'ug': ('gnome-uighur@yahoogroups.com',), 'uk': ('linux@linux.org.ua',), 'ur': ('l10n@urduweb.org', 'urdu.scs.gift@gmail.com'), 've': ('venda@translate.org.za',), 'vi': ('gnomevi-list@lists.sourceforge.net', 'vi-VN@googlegroups.com'), 'wa': ('linux-wa@',), 'xh': ('xh-translate@ubuntu.com', 'xhosa@translate.org.za', 'xhosa@ubuntu.com'), 'zh_CN': ('i18n-translation@lists.linux.net.cn', 'i18n-zh@googlegroups.com', 'translation-team-zh-cn@lists.sourceforge.net', 'i18n-zh@googlegroup.com'), 'zh_TW': ('zh-l10n@lists.linux.org.tw', 'chinese-l10n@googlegroups.com', 'community@linuxhall.org', 'zh-l10n@linux.org.tw'), 'zu': ('zulu@translate.org.za',)}¶
Language codes with snippets of contact information that can be used to uniquely identify the language
- translate.lang.team.guess_language(team_string)¶
Gueses the language of a PO file based on the Language-Team entry.
te¶
This module represents the Telugu language.
See also
- class translate.lang.te.te(code)¶
This class represents Telugu.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
th¶
This module represents the Thai language.
See also
- class translate.lang.th.th(code)¶
This class represents Thai.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['sentencecount', 'simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'. ': ' '}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
ug¶
This module represents the Uyghur language.
See also
- class translate.lang.ug.ug(code)¶
This class represents Uyghur.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['acronyms', 'simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = '، '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {',': '،', ';': '؛', '?': '؟'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
ur¶
This module represents the Urdu language.
See also
- class translate.lang.ur.ur(code)¶
This class represents Urdu.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = '، '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {',': '،', '.': '۔', ';': '؛', '?': '؟'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
vi¶
This module represents the Vietnamese language.
See also
- class translate.lang.vi.vi(code)¶
This class represents Vietnamese.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = ', '¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = 'n!=1 ? 1 : 0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'!': ' !', '#': ' #', ':': ' :', ';': ' ;'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Implement some extra features for quotation marks.
- Known shortcomings:
% and $ are not touched yet for fear of variables
Double spaces might be introduced
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '.!?…։؟।。!?።۔'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.
zh¶
This module represents the Chinese language (Both tradisional and simplified).
See also
- class translate.lang.zh.zh(code)¶
This class represents Chinese.
- CJKpunc = '。、,;!?「」『』【】'¶
These punctuation marks are used in certain circumstances with CJK languages.
- classmethod alter_length(text)¶
Converts the given string by adding or removing characters as an estimation of translation length (with English assumed as source language).
- classmethod capsstart(text)¶
Determines whether the text starts with a capital letter.
- classmethod character_iter(text)¶
Returns an iterator over the characters in text.
- classmethod characters(text)¶
Returns a list of characters in text.
- checker = None¶
A language specific checker instance (see filters.checks).
This doesn’t need to be supplied, but will be used if it exists.
- code = ''¶
The ISO 639 language code, possibly with a country specifier or other modifier.
Examples:
km pt_BR sr_YU@Latn
- commonpunc = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>'¶
These punctuation marks are common in English and most languages that use latin script.
- ethiopicpunc = '።፤፣'¶
These punctuation marks are used by several Ethiopic languages.
- fullname = ''¶
The full (English) name of this language.
Dialect codes should have the form of:
Khmer
Portugese (Brazil)
TODO: sr_YU@Latn?
- ignoretests = {'all': ['simplecaps', 'startcaps']}¶
Dictionary of tests to ignore in some or all checkers.
Keys are checker names and values are list of names for the ignored tests in the checker. A special ‘all’ checker name can be used to tell that the tests must be ignored in all the checkers.
Listed checkers to ignore tests on must be lowercase strings for the checker name, for example “mozilla” for MozillaChecker or “libreoffice” for LibreOfficeChecker.
- indicpunc = '।॥॰'¶
These punctuation marks are used by several Indic languages.
- invertedpunc = '¿¡'¶
Inverted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
- classmethod length_difference(length)¶
Returns an estimate to a likely change in length relative to an English string of length length.
- listseperator = '、'¶
This string is used to separate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.
- miscpunc = '…±°¹²³·©®×£¥€'¶
The middle dot (·) is used by Greek and Georgian.
- mozilla_pluralequation = '0'¶
This of languages that has different plural formula in Mozilla than the standard one in Gettext.
- nplurals = 0¶
The number of plural forms of this language.
0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6)
See also
- classmethod numbertranslate(text)¶
Converts the numbers in a string according to the rules of the language.
- numbertuple = ()¶
A tuple of number transformation rules that can be used by numbertranslate().
- classmethod numstart(text)¶
Determines whether the text starts with a numeric value.
- pluralequation = '0'¶
The plural equation for selection of plural forms.
This is used for PO files to fill into the header.
See also
- puncdict = {'!\n': '!\n', '! ': '!', '% ': '%', '.\n': '。\n', '. ': '。', ':\n': ':\n', ': ': ':', ';\n': ';\n', '; ': ';', '?\n': '?', '? ': '?'}¶
A dictionary of punctuation transformation rules that can be used by punctranslate().
- classmethod punctranslate(text)¶
Converts the punctuation in a string according to the rules of the language.
- punctuation = '.,;:!?-@#$%^*_()[]{}/\\\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣…±°¹²³·©®×£¥€'¶
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won’t need to override this.
- quotes = '‘’‛“”„‟′″‴‵‶‷‹›«»'¶
These are different quotation marks used by various languages.
- rtlpunc = '،؟؛÷'¶
These punctuation marks are used by Arabic and Persian, for example.
- classmethod sentence_iter(text, strip=True)¶
Returns an iterator over the sentences in text.
- sentenceend = '。!?!?…'¶
These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won’t need to override this.
- classmethod sentences(text, strip=True)¶
Returns a list of sentences in text.
- specialchars = ''¶
Characters used by the language that might not be easy to input with common keyboard layouts
- validaccel = None¶
Characters that can be used as accelerators (access keys) i.e. Alt+X where X is the accelerator. These can include combining diacritics as long as they are accessible from the users keyboard in a single keystroke, but normally they would be at least precomposed characters. All characters, lower and upper, are included in the list.
- validdoublewords = []¶
Some languages allow double words in certain cases. This is a dictionary of such words.
- classmethod word_iter(text)¶
Returns an iterator over the words in text.
- classmethod words(text)¶
Returns a list of words in text.