JuliaCon 2020 (times are in UTC)

Natural Language Processing in Julia
07-31, 18:30–19:00 (UTC), Purple Track

The JuliaText ecosystem provides various packages for working with human languages. In this talk, we show the usage of these JuliaText packages with Flux.jl for Natural Language Processing (NLP) with a focus on deep learning-based approaches.


Natural Language Processing (NLP) enables the computers to analyse, understand and read human languages. In the past decade, tremendous growth has been witnessed in NLP owing to milestones like word embeddings, neural networks for NLP, attention and pre-trained language modelling. JuliaText packages, together with Flux, makes Deep Learning for NLP easy in Julia.

Packages

We will start with an overview of natural language processing.
Then we pick up the task of Sentiment Analysis and discuss following packages:
- WordTokenizers.jl provides various high-speed tokenizers and APIs for writing custom tokenizers for natural languages.
- CorpusLoaders.jl contains a variety of (lazy) loaders for NLP corpora.
- Embeddings.jl for working with Word Embeddings.
- Flux.jl for neural networks.

Next we will move on to some other NLP pipelines and discuss some APIs from TextAnalysis.jl

Talk

The attendees will gain working knowledge about how to apply the package for NLP in Julia.
The talk will encompass the following:
- Tokenizers (Sentence splitters and word tokenizers) in WordTokenizers.jl
- Word Embeddings (mapping words to vectors of numbers) using Embeddings.jl
- Recurrent Neural Networks and Language models.

All notebooks with model weights at https://github.com/Ayushk4/JuliaCon20_Talk

Undergraduate student majoring in Computer Science and Engineering at Indian Institute of Technology Kharagpur.