Python Conference APAC 2024

How to learn Japanese with Python
2024-10-27 , CLASS #3
Language: English

Japanese is reportedly one of the most difficult languages for English speakers to learn.
(FSI language difficulty: https://www.fsi-language-courses.org/blog/fsi-language-difficulty/)
There are many reasons for this, including the fact that there are three types of characters: hiragana, katakana, and kanji, and that words are not separated by spaces.
In this talk, I will first introduce what makes Japanese different from many European languages.
Then I will show how Python and natural language processing libraries can be used to support Japanese language learning.


  • Motivation and background (2 min)
    • Goal of this talk
  • Self introduction (1 min)
  • Explan difficult points in Japanese(5 min)
    • 3 types of characters: Hiragana, Katakana, and Kanji
    • Kanji has various readings
      • Reading changes depending on the combination of Kanji characters
    • Words are not separated by spaces
    • The same sequence of kanji can be used in different contexts
  • Kanji have various readings(10 min)
    • When the word "日" is used, it means Day
      • The most common readings are "Hi" and "Nichi"
      • But the reading changes depending on the combination of kanji
      • 一月一日(Tsuitachi)は元日(Ganjitsu)で昨日(Kinou)は大晦日(Omisoka)、明日(Ashita)は二日(Futsuka)
    • How to get the reading with Python
    • Install SudachiPy(Japanese NLTK library)
    • Basic morphological analysis process
      • Command line execution
      • Running in Python
    • Furigana: phonetic guides to read Kanji
    • Furigana in Katakana
      • Can anyone read katakana?
    • Furigana in Hiragana
    • Furigana in Romaji
      • Convert Katakana to Romaji with jaconv
    • Read sentences with text-to-speech
      • gTTS or Amazon Polly with boto3
  • Words are not separated by spaces(5 min)
  • Can you split "すもももももももものうち"?
    • すもも / も / もも / も / もも / の / うち
  • Word segmentation by SudachiPy
  • Get the part-of-speech of a word
  • Translate word by word
  • Summary(1 min)

Takanori is the Chair of PyCon JP Association(www.pycon.jp) and Co-Chair of PyCon JP 2024.
He is also a director of BeProud Inc.(www.beproud.jp), and his title is "Python Climber".
Currently he teaches Python to beginners as a lecturer at Python Boot Camp(pycamp.pycon.jp) all over Japan.
In addition, he published several Python books.
He plays trumpet, climbs boulder, loves ferrets, beer and Lego.