JuliaCon 2022 (Times are UTC)

Text Segmentation with Julia
07-28, 13:30–13:40 (UTC), Red

Introducing TextSegmentation.jl, a package for Text Segmentation with Julia. Text Segmentation is a method of dividing an unstructured document including various contents into several parts according to its topics. So it is an important technique that supports various natural language processing tasks such as summarization, extraction, and question answering. If the audience listen to this presentation, they will learn Text Segmentation and how to use packages and be able to perform it easily.


TextSegmentation.jl(https://github.com/kawasaki-kento/TextSegmentation.jl) provides a julia implementation of unsupervised text segmentation methods. Text Segmentation is a method for dividing an unstructured document including various contents into several parts according to their topics. A specific example of its use is pre-processing in natural language processing. Natural language processing includes various tasks such as summarization, extraction, and question answering, but to achieve higher accuracy, text preprocessing is necessary. Text segmentation helps improve the accuracy of those tasks by allowing documents to be segmented according to topics. As specific text segmentation methods, this package provides the following three:

  • TextTiling
    • TextTiling is a method for finding segment boundaries based on lexical cohesion and similarity between adjacent blocks.
  • C99
    • C99 is a method for determining segment boundaries by divisive clustering.
  • TopicTiling
    • TopicTiling is an extension of TextTiling that uses the topic IDs of words in a sentence to calculate the similarity between blocks.

The planned presentations are as follows

  1. introduction

    • I will introduce the purpose of TextSegmentation.jl and what it is useful for.
  2. Text Segmentation

    • Specific methods of text segmentation will be explained.
  3. overview of the package

    • Introduce how to use the package and how to perform the tasks.
  4. example

    • Using simple text data, this section explains how to actually perform text segmentation with the package.
  5. future work

    • Share future prospects for TextSegmentation.jl.

Researcher in natural language processing