1. Assessment of Biomedical Datasets Fairness Using FanFAIR
Abstract
Research has shown how data sets convey social bias in Artificial Intelligence (AI) systems, especially those based on machine learning. A biased data set is not representative of reality and might contribute to perpetuating societal biases through the model. To tackle this problem, it is important to understand how to avoid biases, errors, and unethical practices while creating the data sets. In order to provide guidance for the use of data sets in contexts of critical decision-making, such as health decisions, we identified a variety of dataset features that can (negatively) affect a model’s fairness. The core of this tutorial is the FanFAIR approach – a rule-based framework leveraging Fuzzy Logic to calculate multiple fairness metrics and condense them into a single, interpretable score. We will specifically show these concepts within the high-stakes domain of Bio Medicine and Drug Discovery, where biased representations can lead to inequitable outcomes.
Learning Objectives
- Bridging Law and CI: Deconstruct the EU AI Act and global bio-medical regulations into actionable Computational Intelligence (CI) constraints, identifying which “High-Risk” requirements necessitate intervention.
- Dataset Auditing with FanFAIR: Implement a complete data fairness audit using the FanFAIR Python library, moving from raw bio-medical datasets to a fairness report.
- Detecting Representation Bias: Identify and mitigate “hidden” biases in data representations that compromise the reliability.
Intended Audience and Multidisciplinary Scope
This tutorial is designed for researchers, Ph.D. students, and practitioners in Computational Intelligence who are transitioning their models toward clinical or industrial deployment. However, its scope extends beyond the purely technical to welcome participants from the legal and ethical sectors interested in the practical implementation of AI governance. The core mission is to bring together technical experts alongside ethical and legal scholars in a dedicated forum to discuss fairness n in Computational Intelligence. We invite researchers who view representation as a central mechanism for promoting trustworthy, fair, and reliable AI systems. Our curriculum is built to accommodate both technical discussions addressing the human and legal dimensions of AI, and non-technical explorations focusing on the ethical implications of data within the bio-medical domain.
Outline with topics and timing breakdown
| Module | Content & Key Takeaways | Duration |
| From Law to Logic: Foundations of Fair datasets | Theoretical Basis: Mapping the EU AI Act and global healthcare regulations relevant to Computational Intelligence; the theoretical foundations of fairness quantitative metrics. Compliance feature: Explaining how to properly set the qualitative metrics in FanFAIR | 30 minutes |
| Hands-on Lab: The FanFAIR Python Library | Practical Audit: Overview of the FanFAIR library. Live Lab: Executing a semi-automatic audit on a given dataset. The datasets will be provided by us, but the audience will be encouraged to test FanFAIR on their own data. Fuzzy Reporting: Evaluate the fairness of the dataset using the plots and the auditing report FanFAIR produces. | 45 minutes |
| Bias in Bio-Molecular Representation | Domain Risks: Exploring challenges in Drug Discovery. Identifying Harm: Analyzing how imbalances in datasets and narrow chemical space coverage lead to limited therapeutic results. | 15 minutes |
Organizers:
- Dr. Chiara Gallese, Bioinformatics and Bioengineering Technical Committee (BBTC), Ethical, Legal, Social, Environmental and Human Dimensions of AI/CI (SHIELD) Technical Committee, Tilburg University, The Netherlands
- Matteo Grazioso, IEEE Computational Intelligence Society (CIS) Member, IEEE Task Force on Advanced Representation in Biomedical Optimization Member, Ca’ Foscari University of Venice, Italy
- Silvia Multari, IEEE Computational Intelligence Society (CIS) Member, Ca’ Foscari University of Venice, Italy
2. Generating and Refining Synthetic Networks with the Graph Evolution Tool
Abstract
The Graph Evolution Tool (GET) can be used to generate or refine synthetic network data for a wide range of applications. Users may provide their own fitness function and application-specific parameters, and GET will generate networks that satisfy these criteria. GET includes two frameworks for network generation, distinguished by the representation used in the genetic algorithm (GA). In addition to selecting a single framework, users may also stack both frameworks to produce higher-fitness networks. The first framework uses an edge-editing representation that modifies the edge structure of a provided initial network within a specified edit distance, making it well suited for refining existing networks under a given fitness function. The second framework uses self-driving automata (SDA), which directly specify network edges and therefore allow exploration of the full solution space. These frameworks can be combined by using the SDA to generate an initial network, which is then refined using the edge-editing framework. The tutorial will present the fundamentals of both frameworks, along with a brief introduction to genetic algorithms. The GET code will then be provided to attendees, followed by an interactive demonstration using supplied data. This demonstration will be framed as a case study on generating personal contact networks for epidemic simulation.
Learning Objectives
- Understanding Network Generation with Genetic Algorithms: Deconstruct the core principles of genetic algorithms as applied in GET, identifying how evolutionary optimization drives both network generation and refinement.
- Selecting the Right GET Framework: Distinguish between the edge-editing and SDA frameworks based on their underlying representations, determining which approach best suits a given network generation task.
- Implementing Custom Fitness Functions: Translate application-specific network requirements into a working fitness function, moving from problem definition to a parameterized GET configuration.
- Generating Synthetic Networks with GET: Apply GET hands-on to produce or refine synthetic network data, using the personal contact network case study as a template for epidemic simulation and beyond.
Intended Audience and Multidisciplinary Scope
This tutorial is designed for researchers and graduate students who are interested in working with graph data, in particular in generating and analysing graphs that meet specific conditions. This is widely applicable in Bioinformatics and many other areas. We assume that attendees will have coding experience and access to a computer with Python.
Outline with topics and timing breakdown
| Module | Content & Key Takeaways | Duration |
| Introduction to Genetic Algorithms | Foundational mechanics of genetic algorithms (selection, crossover, and mutation) and how they drive iterative optimization | 10 minutes |
| Representations Used in GET | Introduction to the edge-editing and SDA frameworks, their differences, and the concept of stacking both frameworks | 35 minutes |
| Getting Started with GET | Installation and basic usage of the Python bindings using provided graphs and a simple fitness function | 30 minutes |
| GET Demonstration: Edge-Editing and SDA | Generating networks using each representation and configuring system parameters and hyperparameters | 25 minutes |
| Data Visualization | Visualizing GET outputs as network diagrams, charts, and summary statistics | 10 minutes |
| Epidemic Case Study | Overview of the SIR epidemic model adapted for networks, and introduction to epidemic length as a fitness function | 10 minutes |
| Implementing a Custom Fitness Function | Modifying the GET library to add custom fitness functionality, demonstrated using epidemic spread (provided); includes compiling and importing changes | 20 minutes |
| Applying GET in Your Research | Recap of key steps: choosing a representation, configuring GET, and writing a custom fitness function | 10 minutes |
Organizers:
- Dr. Michael Dube, Eesearch Associate, Otto-von-Guericke-Universitat, Magdeburg, Germany.
- James Sargant, Doctoral Student, Brock University, St. Catharines, Ontario, Canada.