PythonAsia 2026

PythonAsia 2026

[Keynote] Architectures of Ambiguity: Mapping the Technical Hurdles of Cultural Sensitivity in Localized LLMs
2026-03-21 , Teresa Yuchengco Auditorium (Main Hall)

As Large Language Models (LLMs) expand into global markets, they often hit a "Nuance Gap" or a failure to distinguish literal meaning from cultural context. This presentation examines the technical hurdles of building culturally competent AI through two lenses, namely, linguistic ambiguity (sarcasm) and localized safety (toxicity). Using the Philippines as a case study, we identify four critical hurdles: (1) the Linguistic Inversion Problem, where sarcasm flips intended sentiment; (2) the Context Vacuum, where text lacks the "cultural scaffolding" necessary for interpretation; (3) the Data Desert of low-resource languages; and (4) the Western-Centricity of standard safety filters. We propose a roadmap for researchers to move beyond literal translation toward AI that respects the "unspoken" and "unseen" nuances of regional identities.


"This talk examines the technical and socio-linguistic hurdles of developing culturally sensitive Large Language Models (LLMs) for the Philippines, a high-stakes digital environment where roughly 51% of citizens struggle to identify online misinformation. Using two major research pillars—FilSarcasm (sarcasm detection) and LLigtas (toxicity and safety)—we demonstrate why ""grammatical correctness"" is insufficient for safe AI deployment in Southeast Asia.

The talk is structured around the ""Architectures of Ambiguity"" that emerge when global, Western-centric models encounter localized, high-context languages.

Topic of the Talk
1. The Linguistic Inversion Problem (Sarcasm)
Sarcasm acts as a ""sentiment inverter,"" often flipping a literally positive statement into a sharp negative critique. Rule-based and traditional machine learning systems often fail because they lack the ""paralinguistic cues"" (tone of voice, facial expressions) humans use to detect irony. We introduce FilSarcasm, a novel benchmark of 10,000 political tweets. A key technical innovation is the use of ""Cultural Scaffolding""—leveraging high-resource models like Gemini 2.5 Pro to generate political and social context that ""teaches"" smaller models how to resolve linguistic ambiguity.

  1. Localizing Toxicity and Safety (LLigtas)
    While global models include safety filters, these are often WIRED (Western, Industrialized, Rich, Educated, and Democratic), causing them to miss localized ""dog whistles"" or culturally specific harms.There is a critical gap in assessing cultural safety, specifically in the domains of toxicity, subjective topics, and emotional harm in the Filipino context. The LLigtas project develops a manually curated benchmark of culturally relevant prompts. We evaluate models across eight specific harm categories, using human evaluators to establish a ""ground truth"" that respects local norms and beliefs.

Key Technical Challenges Explored:
1. The ""Data Desert"". Addressing the extreme scarcity of high-quality, annotated text corpora for low-resource languages like Filipino.
2. Cross-Lingual Transfer Learning. Investigating sequential training pipelines—such as pre-fine-tuning on the English SARC dataset using Low-Rank Adaptation (LoRA) before adapting to Filipino.
3. The difficulty of establishing ""inter-rater reliability"" for subjective concepts like sarcasm, where human annotators may disagree based on their own cultural perspectives.

Conclusion. We conclude by framing the Philippines as a proxy for other Southeast Asian nations. The talk provides a roadmap for moving beyond literal translation toward culturally aligned AI that can mitigate online disinformation and protect users in diverse linguistic landscapes."


Category: AI/ML Audience Level: Beginner

Chari Cheng is the Associate Dean and a professor at the College of Computer Studies, De La
Salle University. A local pioneer in her field, she co-founded Senti AI and is a recognized expert
in Natural Language Processing (NLP). Since 2003, her work has been driven by a passion for
the digitization and better representation of Philippine languages.

Her contributions to the field have been widely recognized. She contributed to a 2020 Asian
Development Bank project that developed NLP tools for monitoring the Sustainable
Development Goals (SDGs). For her work on multilingual machine translation, she received the
2024 Imminent Research Grant from Translated, Inc., as the only Asian recipient that year. This
year, she received 2 grants from Google to support her team’s work on the development of the
models for Philippine languages.

Beyond academia, Chari is an active consultant on AI system design, evaluation, and impact
assessment. She is also a strong advocate for community collaboration, regularly sharing her
team’s resources through talks and open repositories to support the broader NLP community.