search¶
Services for searching and matching of text.
lshtein¶
A class to calculate a similarity based on the Levenshtein distance.
See also
If available, the python-Levenshtein will be used which will provide better performance as it is implemented natively.
- translate.search.lshtein.distance(a, b, stopvalue=0)¶
Same as python_distance in functionality. This uses the fast C version if we detected it earlier.
Note that this does not support arbitrary sequence types, but only string types.
- translate.search.lshtein.native_distance(a, b, stopvalue=0)¶
Same as python_distance in functionality. This uses the fast C version if we detected it earlier.
Note that this does not support arbitrary sequence types, but only string types.
- translate.search.lshtein.python_distance(a, b, stopvalue=-1)¶
Calculates the distance for use in similarity calculation. Python version.
match¶
Class to perform translation memory matching from a store of translation units.
- class translate.search.match.matcher(store, max_candidates=10, min_similarity=75, max_length=70, comparer=None, usefuzzy=False)¶
A class that will do matching and store configuration for the matching process.
- buildunits(candidates)¶
Builds a list of units conforming to base API, with the score in the comment.
- extendtm(units, store=None, sort=True)¶
Extends the memory with extra unit(s).
- Parameters:
units – The units to add to the TM.
store – Optional store from where some metadata can be retrieved and associated with each unit.
sort – Optional parameter that can be set to False to supress sorting of the candidates list. This should probably only be used in
matcher.inittm()
.
- static getstartlength(min_similarity, text)¶
Calculates the minimum length we are interested in. The extra fat is because we don’t use plain character distance only.
- getstoplength(min_similarity, text)¶
Calculates a length beyond which we are not interested. The extra fat is because we don’t use plain character distance only.
- inittm(stores, reverse=False)¶
Initialises the memory for later use. We use simple base units for speedup.
- matches(text)¶
Returns a list of possible matches for given source text.
- Parameters:
text (String) – The text that will be search for in the translation memory
- Return type:
- Returns:
a list of units with the source and target strings from the translation memory. If
self.addpercentage
is True (default) the match quality is given as a percentage in the notes.
- setparameters(max_candidates=10, min_similarity=75, max_length=70)¶
Sets the parameters without reinitialising the tm. If a parameter is not specified, it is set to the default, not ignored.
- usable(unit)¶
Returns whether this translation unit is usable for TM.
- translate.search.match.sourcelen(unit)¶
Returns the length of the source string.
- class translate.search.match.terminologymatcher(store, max_candidates=10, min_similarity=75, max_length=500, comparer=None)¶
A matcher with settings specifically for terminology matching.
- buildunits(candidates)¶
Builds a list of units conforming to base API, with the score in the comment.
- extendtm(units, store=None, sort=True)¶
Extends the memory with extra unit(s).
- Parameters:
units – The units to add to the TM.
store – Optional store from where some metadata can be retrieved and associated with each unit.
sort – Optional parameter that can be set to False to supress sorting of the candidates list. This should probably only be used in
matcher.inittm()
.
- getstartlength(min_similarity, text)¶
Calculates the minimum length we are interested in. The extra fat is because we don’t use plain character distance only.
- getstoplength(min_similarity, text)¶
Calculates a length beyond which we are not interested. The extra fat is because we don’t use plain character distance only.
- inittm(store)¶
Normal initialisation, but convert all source strings to lower case.
- matches(text)¶
Normal matching after converting text to lower case. Then replace with the original unit to retain comments, etc.
- setparameters(max_candidates=10, min_similarity=75, max_length=70)¶
Sets the parameters without reinitialising the tm. If a parameter is not specified, it is set to the default, not ignored.
- usable(unit)¶
Returns whether this translation unit is usable for terminology.
- translate.search.match.unit2dict(unit)¶
Converts a pounit to a simple dict structure for use over the web.
terminology¶
A class that does terminology matching.