1. Assessment of Biomedical Datasets Fairness Using FanFAIR

Abstract

Research has shown how data sets convey social bias in Artificial Intelligence (AI) systems, especially those based on machine learning. A biased data set is not representative of reality and might contribute to perpetuating societal biases through the model. To tackle this problem, it is important to understand how to avoid biases, errors, and unethical practices while creating the data sets. In order to provide guidance for the use of data sets in contexts of critical decision-making, such as health decisions, we identified a variety of dataset features that can (negatively) affect a model’s fairness. The core of this tutorial is the FanFAIR approach – a rule-based framework leveraging Fuzzy Logic to calculate multiple fairness metrics and condense them into a single, interpretable score. We will specifically show these concepts within the high-stakes domain of Bio Medicine and Drug Discovery, where biased representations can lead to inequitable outcomes.

Learning Objectives

  • Bridging Law and CI: Deconstruct the EU AI Act and global bio-medical regulations into actionable Computational Intelligence (CI) constraints, identifying which “High-Risk” requirements necessitate intervention.
  • Dataset Auditing with FanFAIR: Implement a complete data fairness audit using the FanFAIR Python library, moving from raw bio-medical datasets to a fairness report.
  • Detecting Representation Bias: Identify and mitigate “hidden” biases in data representations that compromise the reliability.

Intended Audience and Multidisciplinary Scope

This tutorial is designed for researchers, Ph.D. students, and practitioners in Computational Intelligence who are transitioning their models toward clinical or industrial deployment. However, its scope extends beyond the purely technical to welcome participants from the legal and ethical sectors interested in the practical implementation of AI governance. The core mission is to bring together technical experts alongside ethical and legal scholars in a dedicated forum to discuss fairness n in Computational Intelligence. We invite researchers who view representation as a central mechanism for promoting trustworthy, fair, and reliable AI systems. Our curriculum is built to accommodate both technical discussions addressing the human and legal dimensions of AI, and non-technical explorations focusing on the ethical implications of data within the bio-medical domain.

Outline with topics and timing breakdown

ModuleContent & Key TakeawaysDuration
From Law to Logic: Foundations of Fair datasetsTheoretical Basis: Mapping the EU AI Act and global healthcare regulations relevant to Computational Intelligence; the theoretical foundations of fairness quantitative metrics.

Compliance feature: Explaining how to properly set the qualitative metrics in FanFAIR
30 minutes
Hands-on Lab: The FanFAIR Python LibraryPractical Audit: Overview of the FanFAIR library.

Live Lab: Executing a semi-automatic audit on a given dataset. The datasets will be provided by us, but the audience will be encouraged to test FanFAIR on their own data.

Fuzzy Reporting: Evaluate the fairness of the dataset using the plots and the auditing report
FanFAIR produces.
45 minutes
Bias in Bio-Molecular RepresentationDomain Risks: Exploring challenges in Drug Discovery.

Identifying Harm: Analyzing how imbalances in datasets and narrow chemical space coverage lead to limited therapeutic results.
15 minutes

Organizers:

  • Dr. Chiara Gallese, Bioinformatics and Bioengineering Technical Committee (BBTC), Ethical, Legal, Social, Environmental and Human Dimensions of AI/CI (SHIELD) Technical Committee, Tilburg University, The Netherlands
  • Matteo Grazioso, IEEE Computational Intelligence Society (CIS) Member, IEEE Task Force on Advanced Representation in Biomedical Optimization Member, Ca’ Foscari University of Venice, Italy
  • Silvia Multari, IEEE Computational Intelligence Society (CIS) Member, Ca’ Foscari University of Venice, Italy

2. Generating and Refining Synthetic Networks with the Graph Evolution Tool

Abstract

The Graph Evolution Tool (GET) can be used to generate or refine synthetic network data for a wide range of applications. Users may provide their own fitness function and application-specific parameters, and GET will generate networks that satisfy these criteria. GET includes two frameworks for network generation, distinguished by the representation used in the genetic algorithm (GA). In addition to selecting a single framework, users may also stack both frameworks to produce higher-fitness networks. The first framework uses an edge-editing representation that modifies the edge structure of a provided initial network within a specified edit distance, making it well suited for refining existing networks under a given fitness function. The second framework uses self-driving automata (SDA), which directly specify network edges and therefore allow exploration of the full solution space. These frameworks can be combined by using the SDA to generate an initial network, which is then refined using the edge-editing framework. The tutorial will present the fundamentals of both frameworks, along with a brief introduction to genetic algorithms. The GET code will then be provided to attendees, followed by an interactive demonstration using supplied data. This demonstration will be framed as a case study on generating personal contact networks for epidemic simulation.

Learning Objectives

  • Understanding Network Generation with Genetic Algorithms: Deconstruct the core principles of genetic algorithms as applied in GET, identifying how evolutionary optimization drives both network generation and refinement.
  • Selecting the Right GET Framework: Distinguish between the edge-editing and SDA frameworks based on their underlying representations, determining which approach best suits a given network generation task.
  • Implementing Custom Fitness Functions: Translate application-specific network requirements into a working fitness function, moving from problem definition to a parameterized GET configuration.
  • Generating Synthetic Networks with GET: Apply GET hands-on to produce or refine synthetic network data, using the personal contact network case study as a template for epidemic simulation and beyond.

Intended Audience and Multidisciplinary Scope

This tutorial is designed for researchers and graduate students who are interested in working with graph data, in particular in generating and analysing graphs that meet specific conditions. This is widely applicable in Bioinformatics and many other areas. We assume that attendees will have coding experience and access to a computer with Python.

Outline with topics and timing breakdown

ModuleContent & Key TakeawaysDuration
Introduction to Genetic AlgorithmsFoundational mechanics of genetic algorithms (selection, crossover, and mutation) and how they drive iterative optimization10 minutes
Representations Used in GETIntroduction to the edge-editing and SDA frameworks, their differences, and the concept of stacking both frameworks35 minutes
Getting Started with GETInstallation and basic usage of the Python bindings using provided graphs and a simple fitness function30 minutes
GET Demonstration: Edge-Editing and SDAGenerating networks using each representation and configuring system parameters and hyperparameters25 minutes
Data VisualizationVisualizing GET outputs as network diagrams, charts, and summary statistics10 minutes
Epidemic Case StudyOverview of the SIR epidemic model adapted for networks, and introduction to epidemic length as a fitness function10 minutes
Implementing a Custom Fitness FunctionModifying the GET library to add custom fitness functionality, demonstrated using epidemic spread (provided); includes compiling and importing changes20 minutes
Applying GET in Your ResearchRecap of key steps: choosing a representation, configuring GET, and writing a custom fitness function10 minutes

Organizers:

  • Dr. Michael Dube, Eesearch Associate, Otto-von-Guericke-Universitat, Magdeburg, Germany.
  • James Sargant, Doctoral Student, Brock University, St. Catharines, Ontario, Canada.