Skip to content

Utils

remove_duplicate_spaces

Remove duplicate spaces. Replace multiple spaces with one space.

Multiple newline characters and empty lines will be replaced
with one newline character.

:param str text: input text
:return: text without duplicated spaces and newlines
:rtype: str
from pykhmernlp import remove_duplicate_spaces

remove_duplicate_spaces('ក    គ    ថ')

# Result: 
# 'ក គ ថ'

remove_zero_width_characters

Remove zero-width characters.

These non-visible characters may cause unexpected result from the
user's point of view. Removing them can make string matching more robust.

Characters to be removed:

    * Zero-width space (ZWSP)
    * Zero-width non-joiner (ZWJP)

:param str text: input text
:return: text without zero-width characters
:rtype: str
from pykhmernlp import remove_zero_width_characters

text1 = 'ក\u200b\u200cថ'
text2 = 'ក\u200b\u200b\u200c\u200cថ'
text3 = 'កគថ'

print(remove_zero_width_characters(text1))
print(remove_zero_width_characters(text2))
print(remove_zero_width_characters(text3))

# Result
# 'កគថ'
# 'កគថ'
# 'កគថ'