Corpus
Khmer words
Load a list of Khmer words from ICU data.
Returns:
List[str]: A list of Khmer words.
from pykhmernlp.corpus import km_words
km_words = km_words()
print(f"Length of Khmer words {len(km_words)}")
print(km_words[:100])
# Length of Khmer words 81028
# ['ក', 'កក', 'កកកុញ', 'កកកុះ', ...]
English Word
Load a list of Khmer words nltk library.
Returns:
List[str]: A list of Khmer words.
from pykhmernlp.corpus import en_words
en_words = en_words()
print(f"Length of English words {len(en_words)}")
print(en_words[:100])
# Length of English words 235892
# ['elcaja', 'problockade', 'chalkiness',...]
Khmer to Khmer Dictionary
Search for a Khmer word in the Khmer dictionary.
Args:
word (str): The Khmer word to search for.
Returns:
List[Dict[str, str]]: A list of dictionaries representing entries in the Khmer dictionary
corresponding to the provided word.
Each dictionary contains keys for
'word' (main word),
'pronounce' (pronunciation),
'pos' (part of speech),
'definition',
'example'.
from pykhmernlp.corpus import km2km_dict
entries = km2km_dict('កក់ក្ដៅ')
print(entries)
# Result :
# [{
# 'word': 'កក់ក្ដៅ',
# 'pronounce': '[កក់-ក្ដៅ]',
# 'pos': 'កិ.',
# 'definition': 'ក្ដៅល្មម ក្ដៅស្រួល :',
# 'example': 'ភួយនេះកក់ក្ដៅណាស់។'
# }]
English to English Dictionary
Search for an English word in the English dictionary.
Args:
word (str): The English word to search for.
Returns:
List[Dict[str, str]]: A list of dictionaries representing entries in the English dictionary
corresponding to the provided word.
Each dictionary contains keys for
'word',
'pos' (part of speech),
'definition'
from pykhmernlp.corpus import en2en_dict
entries = en2en_dict('BooK')
print(entries)
# Result:
# [{
# 'word': 'book',
# 'pos': 'n.',
# 'definition': 'A collection of sheets of paper, or similar material....
# }]