Patterns
that Matter

News and updates

  • 10.07.2023 Our paper titled A Survey on Explainable Anomaly Detection, with Zhong Li and Yuxuan Zhu, got accepted for publication in Transactions on Knowledge Discovery from Data. Congratulations Zhong and Yuxuan!
  • 06.07.2023 Our paper titled WEARDA: recording wearable sensor data for human activity monitoring, with Richard van Dijk and Daniela Gawehns, got accepted for publication in the Journal of Open Research Software. Congratulations Richard and Daniela!
  • 21.06.2023 Our paper titled Explainable Contextual Anomaly Detection using Quantile Regression Forests, with Zhong Li, got accepted for publication in Data Mining and Knowledge Discovery. Congratulations Zhong!
  • 23.01.2023 Our paper titled Discovering Rule Lists with Preferred Variables, with Ioanna Papagianni, got accepted at IDA 2023. Congratulations Ioanna!
  • 23.01.2023 Our paper titled Discovering diverse top-k characteristic lists, with Antonio Lopez-Martinez-Carrasco, Hugo M. Proença, Jose M. Juarez, and Manuel Campos, got accepted at IDA 2023. Congratulations Antonio!
  • 14.12.2022 Our paper titled Feature Selection for Fault Detection and Prediction based on Event Log Analysis, with Zhong Li, got accepted for publication in the ACM SIGKDD Explorations Newsletter. Congratulations Zhong!
  • 09.12.2022 Our paper titled Unsupervised Discretization by Two-dimensional MDL-based Histogram, with Lincen Yang and Mitra Baratchi, got accepted for publication in Machine Learning. Congratulations Lincen!
  • 14.09.2022 Our paper titled Generating synthetic mixed discrete-continuous health records with mixed sum-product networks, with Shannon Kroes, Rolf Groenwold, and Mart Janssen got accepted for publication in the Journal of the American Medical Informatics Association. Congratulations Shannon!
  • 01.09.2022 I am now Director of Education of LIACS' Master's programmes, and in that role also a member of the institute's management team.
  • 01.09.2022 Francesco Bariatti has joined the EDA group as a postdoctoral researcher. Welcome Francesco!
  • 14.08.2022 Our paper titled Explainable hemoglobin deferral predictions using machine learning models: interpretation and consequences for the blood supply, with Marieke Vinkenoog et al. got accepted for publication in Vox Sanguinis. Congratulations Marieke!
  • 22.07.2022 Our paper titled Feature Selection for Fault Detection and Prediction based on Log Analysis, with Zhong Li, got accepted at the AI for manufacturing workshop at ECML PKDD 2022. Congratulations Zhong!
  • 21.07.2022 Our paper titled Histogram-based Probabilistic Rule Lists for Numeric Targets, with Lincen Yang and Tim Opdam, got accepted at the KDID 2022 workshop at ECML PKDD 2022. Congratulations Lincen and Tim!
  • 14.06.2022 Our paper titled Truly Unordered Probabilistic Rule Sets for Multi-class Classification, with Lincen Yang, got accepted at ECML PKDD 2022. Congratulations Lincen!
  • 14.03.2022 Our paper titled Robust Subgroup Discovery, with Hugo Proença, Peter Grünwald, and Thomas Bäck, got accepted for publication in Data Mining and Knowledge Discovery. Congratulations Hugo!
  • 03.02.2022 Our paper titled Associations between symptoms, donor characteristics and IgG antibody response in 2082 COVID-19 convalescent plasma donors, with Marieke Vinkenoog et al. got accepted for publication in Frontiers in Immunology. Congratulations Marieke!
  • 27.01.2022 I received the Senior Teaching Qualification (SKO) certificate! Read the news article.
  • 20.01.2022 Our paper titled Finding Efficient Trade-offs in Multi-Fidelity Response Surface Modeling, with Sander van Rijn, Sebastian Schmitt, and Thomas Bäck got accepted for publication in Engineering Optimization. Congratulations Sander!

I am associate professor and director of education at the Leiden Institute of Advanced Computer Science (LIACS), the computer science institute of Leiden University. I am group leader of the Explanatory Data Analysis group.

My primary research interest is exploratory data mining: how can we enable domain experts to explore and analyse their data, to discover structure and—ultimately—novel knowledge?

For this it is important that methods and results are explainable to domain experts, who may not be data scientists. My signature approach is to define and identify patterns that matter, i.e., succinct descriptions that characterise relevant structure present in the data. Which patterns matter strongly depends on the data and task at hand, hence defining the problem is one of the key challenges of exploratory data mining. Information theoretic concepts such as the Minimum Description Length (MDL) principle have proven very useful to this end. I am also interested in interactive data mining, i.e., involving humans in the loop. Finally, I am interested in fundamental data mining research for real-world applications, both in science (e.g., life sciences, social sciences) and industry (e.g., manufacturing and engineering, aviation), as this is the best way to show that the theory works in practice.

I am affiliated with SAILS, the university-wide research programme for artificial intelligence. Broadly speaking, my research can be situated in the fields of data mining, machine learning, data science, and artificial intelligence (AI).


see all

Selected recent publications

2024
Li, Z, Zhu, Y & van Leeuwen, M A Survey on Explainable Anomaly Detection. Transactions on Knowledge Discovery from Data vol.18(1), ACM, 2024.website
2023
Kroes, SKS, van Leeuwen, M, Groenwold, RHH & Janssen, MP Evaluating Cluster-Based Synthetic Data Generation for Blood-Transfusion Analysis. Journal of Cybersecurity and Privacy vol.3(4), pp 882-894, MDPI, 2023.
van Dijk, R, Gawehns, D & van Leeuwen, M WEARDA: recording wearable sensor data for human activity monitoring. Journal of Open Research Software vol.11(1), 2023.website
Vinkenoog, M, Toivonen, J, van Leeuwen, M, Janssen, M & Arvas, M The added value of ferritin levels and genetic markers for the prediction of haemoglobin deferral. Vox Sanguinis vol.118(10), pp 825-834, 2023.
Li, Z & an Leeuwen, M Explainable Contextual Anomaly Detection using Quantile Regression Forests. Data Mining and Knowledge Discovery, Springerwebsite
Lopez-Martinez-Carrasco, A, Proença, HM, Juarez, JM, van Leeuwen, M & Campos, M Novel approach for phenotyping based on diverse top-k subgroup lists. In: Proceedings of the Conference on Artificial Intelligence In Medicine (AIME 2023), Springer, 2023.
Lopez-Martinez-Carrasco, A, Proença, HM, Juarez, JM, van Leeuwen, M & Campos, M Discovering Diverse Top-k Characteristic Lists. In: Proceedings of the 21st International Symposium on Intelligent Data Analysis (IDA 2023), Springer, 2023.
Papagianni, I & van Leeuwen, M Discovering Rule Lists with Preferred Variables. In: Proceedings of the 21st International Symposium on Intelligent Data Analysis (IDA 2023), Springer, 2023.
van der Arend, B, Verhagen, I, van Leeuwen, M, van der Arend, M, van Casteren, D & Terwindt, G Defining migraine days, based on longitudinal E-diary data. Cephalalgia
Yang, L, Baratchi, M & van Leeuwen, M Unsupervised Discretization by Two-dimensional MDL-based Histogram. Machine Learning, Springerwebsite
Kroes, SKS, van Leeuwen, M, Groenwold, RHH & Janssen, MP Generating synthetic mixed discrete-continuous health records with mixed sum-product networks. Journal of the American Medical Informatics Association vol.30(1), Oxford University Press, 2023.
2022
Li, Z & van Leeuwen, M Feature Selection for Fault Detection and Prediction based on Event Log Analysis. ACM SIGKDD Explorations vol.24(2), ACM, 2022.
Spaink, HA, Verhagen, IE, van Leeuwen, M & Terwindt, GM Methodological considerations in predicting migraine attacks using machine learning. In: MTIS 2022 Cephalalgia Abstracts, Sage Publications, 2022.
Li, Z & van Leeuwen, M Feature Selection for Fault Detection and Prediction based on Log Analysis. In: Proceedings of the international workshop on AI for Manufacturing Workshop at ECMLPKDD 2022, 2022.
Yang, L, Opdam, T & van Leeuwen, M Histogram-based Probabilistic Rule Lists for Numeric Targets. In: Proceedings of the 20th anniversary Workshop on Knowledge Discovery in Inductive Databases (KDID 2022) at ECMLPKDD 2022, CEUR Workshop Proceedings, 2022.
Yang, L & van Leeuwen, M Truly Unordered Probabilistic Rule Sets for Multi-class Classification. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2022), Springer, 2022.implementationwebsite
Vinkenoog, M, van Leeuwen, M & Janssen, M Explainable hemoglobin deferral predictions using machine learning models: interpretation and consequences for the blood supply. Vox Sanguinis
Proença, HM, Grünwald, P, Bäck, T & van Leeuwen, M Robust subgroup discovery - Discovering subgroup lists using MDL. Data Mining and Knowledge Discoveryimplementationwebsite
van Rijn, S, Schmitt, S, van Leeuwen, M & Bäck, T Finding Efficient Trade-offs in Multi-Fidelity Response Surface Modeling. Engineering Optimizationwebsite
Yang, L & van Leeuwen, M Probabilistic Rule Sets Ready for Interactive Machine Learning. In: AAAI'22-Workshop on Interactive Machine Learning, 2022.
Vinkenoog, M, Steenhuis, M, ten Brinke, A, van Hasselt, C, Janssen, M, van Leeuwen, M, Swaneveld, F, Vrielink, H, van de Watering, L, Quee, F, van cen Hurk, K, Rispens, T, Hogema, B & van der Schoot, E Associations between symptoms, donor characteristics and IgG antibody response in 2082 COVID-19 convalescent plasma donors. Frontiers in Immunology, Frontiers