2023年10月29日 –, Main program
(EN)
Demonstration of Wikidata's lexicographical data being used by Ninai/Udiron in various ways to generate Abstract Wikipedia article text
(ZH)
展示 Ninai/Udiron 如何以各種方式使用維基數據的詞位資料來產生抽象維基百科條目內容。
(EN)
As Wikifunctions expands after its launch, there will be lots of opportunities to explore how it can be used to realize the Abstract Wikipedia project—that is, what functions might need to be written to do so many simple and complex things for it. Of the numerous other systems for text generation that might be considered for addition to Wikifunctions, only one 1) takes specific and direct advantage of Wikidata's information and 2) is specifically designed to be as easily as possible to add to Wikifunctions. It is this system that will be demonstrated here.
This session will first exhibit how the lexicographical data on Wikidata—its lexemes ("items for words") and their senses ("word meanings") and forms ("word inflections")—can be best set up for use with any natural language generation system. It will detail how one can model, in a number of different languages in as many different ways, information about the structure of different types of phrases and sentences, the usage styles and circumstances of word meanings, and the different forms and patterns words can take on. The importance of Wikidata item links in many important circumstances will also be highlighted here, not just for linking word meanings across languages but also for ensuring that, if a phrase in a particular language has no direct equivalent or multiple such equivalents in another language, that appropriate choices can be made based on those item links.
Throughout this exhibition, potential abstract content will be processed with the Ninai/Udiron living text generation system to show how many of the details in the Wikidata items and lexemes shown affect the output of that living system. It will clarify what sorts of functionality, whether in the processing of individual abstract content elements or in the manipulation of sentence structure, are expected to be contributed by the community, while acknowledging areas in which--due to limits in the knowledge of the living system's sole author--improvements to functionality for certain languages are indeed necessary.
(This session is largely a follow-up to the WikidataCon 2021 session "Ninai (நினை) and Udiron (উদীরণ): text generation with Wikidata items and lexemes", given how much the described software has evolved since then.)
(ZH)
在維基函數發布之後我們將會遭遇許多的機會來探索其在實現抽象維基百科專案的潛力,即探索我們需要什麼函數來解決那些複雜或簡單的問題。在眾多可能被收錄至維基函數中的敘述產生系統,只有
- 可以符合維基數據資料獨特的優勢和特性
- 可以相對簡易的被收錄至維基函數之中,才會被在此展示。
本場次中,首先我們將會展示維基數據的詞位資料 (描述字詞的項目) 和其詞意 (字詞的內涵) 以及其詞形 (字詞的變體) 需如何建立以最佳的供任何自然語言產生系統所使用。我們將會詳細說明如何在多種不同的語言中,以多種不同的方式,對有關不同類型的短句和句子的結構、單詞含義的使用風格和情況,以及單詞可以採取的不同形式和模式的信息進行建模。我們也會在此強調,維基數據項目應當連結至許多重要的使用情況,不僅僅是連結各語言之間相同意涵的單詞,更要確保在特定語言中與個別語言間缺乏對等翻譯,或是擁有數個相同翻譯時,仍然可以透過這些重要連結做出最適當的選擇。
在本次的展示,我們將會使用 Ninai/Udiron 敘述產生系統來處理潛在的抽象條目內容,以展示維基數據的項目和詞位資料將如何影響系統的輸出結果。這將可以協助釐清,不論是在處理單一抽象條目內容或是調整語句結構時需要社群貢獻的功能與函數;並且,受限於系統的單一作者性,這些功能在面對特定語言時必定有其侷限性且需要社群的協助進行改進。
(有鑑於所展示之軟體經過這些年的發展,本場次之內容大致上可以視作 2021 WikidataCon “Ninai (நினை) and Udiron (উদীরণ): text generation with Wikidata items and lexemes” 的後續。)
英文(en)
內容難易度 –中階
目標聽眾族群 –anyone interested in Abstract Wikipedia or Wikidata lexemes
選擇發表地點 –國際純線上(歐美時區友善)
發表形式 –線上預錄