This mostly affects programmers wanting to have search functionality in their application.
A composed character in Unicode can often have a number of different ways of representing the character. E.g
You’ll notice that if you where comparing the above two characters that indeed Ḽ != Ḽ
In order to do correct comparison the characters need to be normalized, thus they need to be reduced to the same character composition.
The following show how to normalize your data in various programming languages.
In most cases you don’t want to alter the actual stored data but just want to normalize when comparing data, then throw it away.
import unicodedata string = unicodedata("NFC", string)
You can replace NFC with NFD, NFCK or NFDK. Read the Python docs for the unicodedata module for more detail on what those options mean. For most cases it is enough to use the above to bring the data into the same form and perform your comparison.