Introducing Chemellia: Machine Learning, with Atoms!
2021-07-30 , Green

In this talk, I introduce Chemellia: a machine learning ecosystem (built on Flux.jl) designed for chemistry and materials science problems involving molecules, crystals, surfaces, etc. I will focus on two packages I have developed: first, ChemistryFeaturization, which allows customizable and invertible featurization of atomic systems. The second, AtomicGraphNets, implements graph neural network models tailored to atomic graphs, and substantially outperforms comparable Python packages.


Machine learning is a promising approach in science and engineering for “filling the gaps” in modeling, particularly in cases where substantial volumes of training data are available. These techniques are becoming increasingly popular in the chemistry and materials science communities, as evidenced by the popularity of Python packages such as DeepChem and matminer. Clearly, there are many potential benefits to building, training, and running such models in Julia, including improved performance, better code readability, and perhaps most importantly, a multitude of prospects for composability with packages from the broader SciML ecosystem, allowing integration with packages for differential equation solving, sensitivity analysis, and more.

In this talk, I introduce Chemellia: an ecosystem for machine learning on atomic systems based on Flux.jl. In particular, I will focus two packages I have been developing that will be core to Chemellia. ChemistryFeaturization represents a novel paradigm in data representation of molecules, crystals, and more. It defines flexible types for features associated with individual atoms, pairs of atoms, etc. as well as for representing featurized structures in the form of, for example, a crystal graph (the AtomGraph type, which of course dispatches the set of functions so that all of the LightGraphs analysis capabilities “just work”). It also implements an easily extensible set of modular featurization schemes to create inputs for a variety of models, graph-based and otherwise. A core design principle of the package is that all featurized data types carry the requisite metadata to “decode” their features back to human-readable values.

AtomicGraphNets provides a Julia implementation of the increasingly popular crystal graph convolutional neural net model architecture that trains and runs nearly an order of magnitude faster than the Python implementation, and requires fewer trainable parameters to achieve the same accuracy on benchmark tasks due to a more efficient and expressive convolutional operation. The layers provided by this package can be easily combined into other architectures using Flux’s utility functions such as Chain and Parallel.

We have some great summer student developers working on these packages now and would welcome further community feedback and contributions!

Usually writing code (or docs!) or triathlon-ing.

More at my website.

This speaker also appears in: