PyLadiesCon 2025

Building Digital Dictionaries for Local Languages in Indonesia with Python NLP
06/12/2025 , Main Stream
Langue: English

Learn how to build open-source bilingual dictionaries for Indonesia’s local languages using Python NLP libraries such as spaCy, NLTK, and Transformers. This talk explores practical methods to process low-resource languages, including hybrid rule-based and transformer models. We will discuss how to build dictionary data pipelines and add features like search, synonyms, antonyms, and lemmatization. The session also highlights the role of open-source collaboration in preserving linguistic diversity through technology.


Discover practical techniques for building bilingual digital dictionaries (Indonesian ↔ local languages) using Python NLP libraries such as spaCy, NLTK, and Transformers. This talk will cover how to design and implement hybrid models combining rule-based and machine learning approaches, build efficient data pipelines for dictionary management, and integrate advanced linguistic features like synonym and antonym lookup, lemmatization, and full-text search. Additionally, it emphasizes fostering sustainable, community-driven projects that empower local language preservation and digital accessibility.

Muh Naufal Muzhaffar is an AI and Natural Language Processing (NLP) enthusiast with a strong interest in computational linguistics and local language technology. He is currently pursuing his postgraduate degree in Informatics at the Faculty of Science and Technology, Sunan Kalijaga State Islamic University, Yogyakarta, Indonesia. With a background in programming and a focus on NLP-based language processing, he actively works on developing digital dictionaries and language resources for Indonesia’s local languages. His work combines AI, open-source collaboration, and linguistic preservation through technology.