JuliaCon 2023

Third Millennium Symbolic Learning with Sole.jl
2023-07-26 , 32-123

Sole.jl is a framework for symbolic learning, i.e., machine learning with symbolic logic.
It comprehends packages for:
- Computational logic (SoleLogics.jl);
- Operating with multimodal (un)structured data (SoleData.jl, SoleFeatures.jl);
- Learning, inspecting and analyzing symbolic models (SoleModels.jl, SolePostHoc.jl).


Symbolic learning is machine learning based on formal logic. Its peculiarity lies in the fact that the learned models enclose an explicit knowledge representation, which offers many opportunities:

  • Verifying that the model's thought process is adequate for a given task;
  • Learning of new insights by simple inspection of the model;
  • Manual refinement of the model at a later time.

These levels of transparency (or interpretability) are generally not available with standard machine learning methods, thus, as AI permeates more and more aspects of our lives, symbolic learning is becoming increasingly popular. In spite of this, implementations of symbolic algorithms (e.g, extraction of decision trees or rules) are mostly scattered across different languages and machine learning frameworks.

Enough with this! The lesser and lesser minoritarian community of symbolic learning deserves a programming framework of its own. So, here comes Sole.jl, a collection of Julia packages for symbolic modeling and learning; Sole.jl covers a relatively wide range of functionality that is of interest for the symbolic community, but it also fills some gaps with a few functionalities for standard machine learning pipelines (e.g., feature selection on multimodal (un)structured data). At the time of writing, the framework comprehends the following released packages:

  • SoleLogics.jl lays the logical foundations for symbolic learning. It provides a useful codebase for computational logic, which features easy manipulation of:
  • SoleData.jl provides a data layer built on top DataFrames.jl. Its codebase is machine learning oriented and allows to:
    • Instantiate and manipulate multimodal datasets for (un)supervised machine learning;
    • Deal with (un)structured data (e.g., graphs, images, time-series, etc.);
    • Describe datasets via basic statistical measures;
    • Save to/load from npy/npz format;
    • Perform basic data processing operations (e.g., windowing, moving average, etc.).
  • SoleModels.jl defines the building blocks of symbolic modeling/learning. It is the core of the framework, and it features:
    • Definitions for symbolic models (decision trees/forests, rules, etc.);
    • Optimized data structures, useful when learning models from datasets;
    • Support for mixed, neuro-symbolic computation.

Altogether, Sole.jl makes for a powerful tool built with an eye to formal correctness, and it can be of use for both machine learning practitioners and computational logicians.

Q: Ok, so what symbolic learning methods do you people provide?
A: At the moment, ModalDecisionTrees.jl is the only package compatible with Sole.jl, and it provides novel decision tree algorithms based on multimodal temporal and spatial logics for time-series and image classification. Checkout the related talk at JuliaCon22.

Q: Why the name?
A: Sole stands for SymbOlic LEarning; it also means "sun" in Italian, a hint to the enlightening power of transparent modeling.

PhD Student in Artificial Intelligence @ University of Ferrara, Italy
Studying Symbolic Learning with Multimodal Logics.