Ninai (நினை) and Udiron (উদীরণ): text generation with Wikidata items and lexemes
10-30, 14:10–14:30 (UTC), Room 2

In the lead-up to Abstract Wikipedia's launch, a sufficient body of linguistic information, requiring more thorough consideration of certain linguistic aspects sooner rather than later, must be in place so that different sets of functions can work together to produce naturally-sounding text.

This session introduces Ninai and Udiron, two related tools with which functions can be built to generate text based on the linguistic information for a given language. In doing so it will discuss the compositionality and manipulability of lexical units, the breadth and interconnectedness of meaning units, and the treatment of variation among a language’s lects broadly construed, and how they can be dealt with in those tools.

Special reference to the handling of these aspects for Bengali and a number of other languages will be presented.


Link to notes

https://etherpad.wikimedia.org/p/WikidataCon2021-Sisterprojects-Languages

What will the participants take away from this session?
  • Participants will get a brief look at some of the decisions made in setting up lexicographical data for a particular language.
  • Participants will gain a better understanding of where best to focus their attention on-wiki and off-wiki in order to ease the development of text generation systems for their languages.
  • Participants will learn the basics of how Ninai and Udiron work for the languages it currently supports.
Language

English

Recording

Yes

This speaker also appears in: