2024-10-27 –, CLASS #3 - 4C
Language: English
Japanese is reportedly one of the most difficult languages for English speakers to learn.
(FSI language difficulty: https://www.fsi-language-courses.org/blog/fsi-language-difficulty/)
There are many reasons for this, including the fact that there are three types of characters: hiragana, katakana, and kanji, and that words are not separated by spaces.
In this talk, I will first introduce what makes Japanese different from many European languages.
Then I will show how Python and natural language processing libraries can be used to support Japanese language learning.
- Motivation and background (2 min)
- Goal of this talk
- Self introduction (1 min)
- Explan difficult points in Japanese(5 min)
- 3 types of characters: Hiragana, Katakana, and Kanji
- Kanji has various readings
- Reading changes depending on the combination of Kanji characters
- Words are not separated by spaces
- The same sequence of kanji can be used in different contexts
- Kanji have various readings(10 min)
- When the word "日" is used, it means Day
- The most common readings are "Hi" and "Nichi"
- But the reading changes depending on the combination of kanji
- 一月一日(Tsuitachi)は元日(Ganjitsu)で昨日(Kinou)は大晦日(Omisoka)、明日(Ashita)は二日(Futsuka)
- How to get the reading with Python
- Install SudachiPy(Japanese NLTK library)
- Introduction to SudachiPy
- https://github.com/WorksApplications/SudachiPy
- Basic morphological analysis process
- Command line execution
- Running in Python
- Furigana: phonetic guides to read Kanji
- Furigana in Katakana
- Can anyone read katakana?
- Furigana in Hiragana
- Convert Katakana to Hiragana with jaconv
- https://github.com/ikegami-yukino/jaconv
- Furigana in Romaji
- Convert Katakana to Romaji with jaconv
- Read sentences with text-to-speech
- gTTS or Amazon Polly with boto3
- When the word "日" is used, it means Day
- Words are not separated by spaces(5 min)
- Can you split "すもももももももものうち"?
- すもも / も / もも / も / もも / の / うち
- Word segmentation by SudachiPy
- Get the part-of-speech of a word
- Translate word by word
- Summary(1 min)
Takanori is the Chair of PyCon JP Association(www.pycon.jp) and Co-Chair of PyCon JP 2024.
He is also a director of BeProud Inc.(www.beproud.jp), and his title is "Python Climber".
Currently he teaches Python to beginners as a lecturer at Python Boot Camp(pycamp.pycon.jp) all over Japan.
In addition, he published several Python books.
He plays trumpet, climbs boulder, loves ferrets, beer and Lego.