2021-07-25, 14:00–17:00 (UTC), Green
Parsers are programs to break apart strings that match a grammar
in order to transform into structured representations.
I demonstrate composing regular expressions, EBNF grammars and CombinedParsers.jl constructors to build a slick message broker system inspired by Apache Kafka:
Log lines and other messages are parsed to julia types.
Parsed instances are brokered by julia's multiple dispatch into different data sinks (git managed CSV, SearchLight.jl).
Step by step I show available options for defining CombinedParsers to process different message formats (e.g. log lines) and to transform into julia result_types.
The examples demonstrate that julia's dispatch leverages parsed result_types straightforwardly to a slick and powerful platform for complex string-processing workflows like message brokering similar to Apache Kafka:
Julia's multiple dispatch is easier to write and executes faster than conditional programming patterns of the form "if this kind of thing then do x" in java-based Kafka.
The demonstration exemplifies dispatch into different data sinks like git managed CSV and text files, SearchLight.jl, and even Telegram.jl Bot alerts.
The workshop details the use of grammar languages supported by CombinedParsers.jl:
You can conveniently compose existing EBNF Grammars with PCRE regular expressions and CombinedParser's julia constructors to create fast pure julia compiled (also recursive) parsers.
Regular expressions and EBNF CombinedParsers result in nested (named) tuples by default.
Users can inject transformation functions for any (sub-)parser after definition as EBNF/PCRE.
Alternatively a CombinedParsers julia syntax equivalent to a PCRE/EBNF grammar can be printed and amended with transformations.
For improved performance, lazy transformations allow access to parts of a parsed string without transforming the full parsing result (similar to LazyJSON.jl).
Benchmarks and standards compliance is reported based on extensive unit tests.
Julia CombinedParsers performance competes with the PCRE C library, which is among the fastest regex libraries on the market.
This is achieved by leveraging the excellent julia compiler with generated functions, multiple dispatch and parametric types.
CombinedParsers supports to lazily iterate all valid parsings if not unique, and the TextParse interface to include CombinedParsers e.g. in CSV.jl.
Other parsing packages (Automa.jl, ParserCombinator.jl, Lerche.jl) and current limitations and considerations for further optimization will be discussed.
Gregor Kappler carries out psychometric research and data science consulting, and is founder of FilingForest, a julia-focused startup developing solutions for fast unbiased measurement in graph data.
Gregor was initially trained as a mathematician and psychologist, has implemented solutions for semantic text analytics for his PhD in 2007, and developed psychometric models for measuring with texts.
He has worked as a lecturer and researcher at the University of Vienna and the University of Jena and worked on a series of predictive analytic projects for software vendors and customers.
Gregor has switched to Julia from R in 2018, and is creator of the CombinedParsers package which provides parser combinators for fast, recursive and type-save parsing in pure Julia.