Symbolic Learning and Rule Extraction with Sole.jl JuliaCon Local Paris 2025

Symbolic Learning and Rule Extraction with Sole.jl
.ical
2025-10-02 11:50–12:20, Jean-Baptiste Say Amphitheater
Language: English

Symbolic learning is a branch of machine learning that studies algorithms for building symbolic models, classifiers that can be translated to logical rules. Hence, differently from neural networks and other statistical models, symbolic models are more easily readable and interpretable. Common examples include decision trees and their ensemble counterpart, random forests.

Sole.jl is a Julia package for symbolic learning and rule extraction, aimed at guiding the user throughout the whole process, from the initial data to learning a symbolic model, to the extraction and manipulation of logical rules from such model.

It is also the entry point to the SOLE framework (which stands for SymbOlic LEarning), an open-source project fully developed in Julia, serving as an interface to both its main packages, such as SoleData.jl, SoleModels.jl, SolePostHoc.jl, as well as useful packages developed by the learning community, such as DecisionTree.jl, ModalDecisionTrees.jl, ModalAssociationRules.jl, MLJ.jl, XGBoost.jl, and so on.

For instance, given a dataset, one can use Sole.jl to fit a decision tree model, provided by the DecisionTree.jl package, and then extract and eventually manipulate rules from such model through the SolePostHoc.jl package. Furthermore, if working with more non-tabular data like images or time-series, one can leverage the SoleData.jl package to interpret the dataset as a set of logical interpretations of a more-than propositional logic (e.g., interval or rectangle modal logic) and use a learning package compatible with such logic, such as ModalDecisionTrees.jl or ModalAssociationRules.jl.

In this presentation, we will have a look at how Sole.jl works through a hands-on tutorial, emphatizing on its comprehensiveness and user-friendliness. This will also allow us to introduce two newcomers to the SOLE ecosystem: ModalAssociationRules.jl, a package for mining association rules between instances, and SolePostHoc.jl, a package to extract, interpret and simplify sets of rules starting from a symbolic model.

Sole.jl stems from the idea of simplifying the use of machine learning models for symbolic data analysis, a set of techniques providing logical description of data, valued for its interpretability and explainability. It is designed for both newcomers who are unfamiliar with machine learning concepts, as well as experts in the field who will find in Sole.jl a secure, reliable, and fully customizable system.

Also designed for extended use, Sole.jl offers full customization and supports extended functionalities such as:
- checking the parameters submitted by the user;
- handling basic preprocessing tasks, such as dataset partitioning and cross-validation;
- performing training, also allowing for parameters tuning, and testing;
- printing analysis results also allowing to extract, simplify and manipulate rules and models leveraging the SolePostHoc.jl package.

The Sole.jl engine can be operated with just a single line of code. For instance, we can model the theory underlying a dataset with different strategies and compare them using:

modelset = symbolic_analysis(X; model=(type=:randomforest, :modaldecisiontree))

where X denotes the dataset under analysis and model specifies the machine learning model to be employed. For example, the former could consist of a set of body movement recordings performed by various athletes, the latter can be one of the many models provided by the SoLe (Symbolic Learning; also, sun in italian) framework, for instance, through the SoleModels.jl, ModalDecisionTrees.jl, or ModalAssociationRules.jl packages, as well as other widely used packages from the learning community such as MLJ.jl, DecisionTree.jl, XGBoost.jl, and so on.

As an example, we will focus on ModalAssociationRules.jl, one of the latest novelties in the Julia ecosystem.

ModalAssociationRules.jl is a package for mining and analysing association rules from logisets (i.e., sets of logical interpretations onto which formulas can be checked).

Association rules are co-occurrence relations between patterns, that is, features or events that frequently occur together in the data. For example, given a dataset encoding arm movements, an association rule shared by many movements might be:

right_hand_moves_right AND <A>right_hand_moves_left THEN <B>left_hand_moves_up

which can be read as "when the right hand moves to the right in a certain range of time and for a certain distance, and, after that movement, the same hand moves to the left, then it must be that the whole movement started with the left hand moving up".

The Miner structure serves as the core for the whole package, wrapping everything necessary to perform mining:

the alphabet within which to count co-occurrences;
the logiset representation of a dataset;
a mining algorithm;
a vector of measures to compute for each rule;
a set of policies to reduce the search space.

One of the strengths of the ModalAssociationRules.jl package is the support for mining temporal relations, providing the implementation for two well-known algorithms generalised to handle modal logics and, thus, potentially deal with temporal, spatial, spatio-temporal, and other kinds of unstructured data.

Finally, SolePostHoc.jl provides knowledge extraction algorithms through a uniform interface, allowing for the comparison of different post-hoc interpretation methods while maintaining a coherent and intuitive user experience:

struct ALGORITHMNAME <: RuleExtractor end
modalextractrules(:ALGORITHMNAME, model, args...)

SolePostHoc.jl integrates a wide range of algorithms for knowledge extraction, including:

Surrogate Trees, algorithms that approximate complex models such as neural networks or random forests with more interpretable decision trees;
Knowledge Distillation, techniques for transferring knowledge from complex models to simpler and more transparent ones;
Rule Extraction, methods for deriving clear and understandable logical rules from any machine learning model.

Going back to our example, consider the following theory, consisting of a list of logical rules, obtained through a Decision Tree provided by the DecisionTree.jl package:

IF right_hand_moves_right AND <A>right_hand_moves_left THEN <B>left_hand_moves_up;

OTHERWISE, IF right_hand_moves_right AND (not <A>right_hand_moves_left) THEN <B>left_hand_moves_up.

we can leverage SolePostHoc.jl to obtain the following new theory, which is more succinct and equally expressive:

IF right_hand_moves_right THEN <B>left_hand_moves_up.

Looking into the future, the development roadmap for Sole.jl includes:
- features pre-selection;
- support to load and store models;
- formatted outputs, delivering results in both PDF and LaTeX form;
- a GUI to further enhance the user experience.

Alberto Paparella

Hello everyone! My name is Alberto Paparella, and I am currently a PhD student in Mathematics at the University of Ferrara. My main interests are Mathematical Logic, specifically Many-Valued and Modal Logics, and Machine Learning. In the last few years, I have been working with the Applied Computational Logic and Artificial Intelligence Laboratory on the SOLE framework for Symbolic Learning in Julia, where my main contributions have been a sub-module for the SoleLogics.jl core package to work with Many-Valued Logics and a package for satisfiability and authomated theorem proving for Many-Valued Multi-Modal Logic based on analytic tableau technique, namely SoleReasoners.jl.

Mauro

Hi, I am Mauro Milella, and I am currently a master student at the University of Ferrara. In the last few years, I have been collaborating with the Applied Computational Logic and AI (ACLAI) Laboratory, where we have been developing the Sole framework for symbolic learning entirely in Julia.

Marco Perrotta

Hi, my name is Marco Perrotta. I'm a computer science student at the University of Ferrara, where I also work as a collaborator at the Applied Computational Logic and Artificial Intelligence Lab. My main interest is how technology can be used to understand and study language.

Riccardo Pasini

Hi, I am Riccardo Pasini, and I am currently a student at the University of Ferrara. In the last few years, I have been collaborating with the Applied Computational Logic and AI (ACLAI) Laboratory, where we have been developing the Sole framework for symbolic learning entirely in Julia.

Symbolic Learning and Rule Extraction with Sole.jl .ical 2025-10-02 11:50–12:20, Jean-Baptiste Say Amphitheater Language: English

Symbolic Learning and Rule Extraction with Sole.jl
.ical
2025-10-02 11:50–12:20, Jean-Baptiste Say Amphitheater
Language: English