2020-07-31 –, Purple Track
I will introduce the CombinedParsers package for writing complex recursive parsers efficiently in a composable functional style.
The package API will be demonstrated by example of an CombinedParser for regular expressions which generates compiled regular expression parsers in pure julia.
Far more expressive than regular expressions, parser combinators allow for arbitrary transformations and higher-order parsers depending on the parsing state (exemplified with a very short html parser).
Parsing data from strings recurrently is at the beginning of scientific computing and thus regular expressions are a familiar part of standard tooling.
CombinedParsers constructors will be presented side-by-side with the equivalent regex syntax.
The regex parser provided with the package can be used as a pure julia plug-in replacement for the current julia Regex type.
Benchmarks and compliance with PCRE syntax will be reported based on the extensive unit tests of the PCRE library.
Leveraging julia compiler optimizations for multiple dispatch and parametric types,
CombinedParsers performance can for many patterns compete with the PCRE C library that currently is used by julia base Regex.
Arbitrary transformations can be defined as part of the grammar definition, with convenient syntax for extracting data as named tuples.
For optimized performance, parsing and transformation are decoupled, and parsing memoization can be used optionally.
Parser combinators straightforwardly generalize from strings to parsing any iterator type.
Logging and human-readable error messages help debugging complex parsers.
CombinedParsers supports the iterate interface to lazily generate all valid parsings, and the TextParse interface to include CombinedParsers e.g. in CSV.jl.
Preliminary packages for parsing wikitext and orgmode markup with ParserIterators are available.
Other parsing packages (Automa.jl, ParserCombinator.jl) will be acknowledged. Current limitations and considerations for further optimization will be discussed.
Gregor Kappler carries out psychometric research and data science consulting, and is founder of FilingForest, a julia-focused startup developing solutions for fast unbiased measurement in graph data.
Gregor was initially trained as a mathematician and psychologist, has implemented solutions for semantic text analytics for his PhD in 2007, and developed psychometric models for measuring with texts.
He has worked as a lecturer and researcher at the University of Vienna and the University of Jena and worked on a series of predictive analytic projects for software vendors and customers.
Gregor has switched to Julia from R in 2018, and is creator of the IterableParsers package which provides parser combinators for fast, recursive and type-save parsing in pure Julia.