that Matter

News and updates

  • 01.08.2018 I am now general co-chair of the IDA Council, the steering committee of the IDA symposium series. First up: IDA 2018 in Den Bosch, October 24-26!
  • 01.05.2018 Our research proposal titled Data Science for State-of-the-Art Blood Banking (BloodStarT), with Mart Janssen, Aske Plaat, Marian van Kraaij, and Katja van den Hurk, was granted by Sanquin.
  • 24.03.2018 Our paper titled Multi-Fidelity Surrogate Model Approach to Optimization, with Sander van Rijn, Sebastian Schmitt, Markus Olhofer, and Thomas Bäck, got accepted at GECCO 2018. Congratulations Sander!
  • 01.02.2018 Our special issue on Interactive Data Exploration and Analytics (IDEA), co-edited with Polo Chau, Jilles Vreeken, Dafna Shahaf, and Christos Faloutsos, was published in TKDD.
  • 31.08.2017 New website for my research group: Explanatory Data Analysis.
  • 18.07.2017 Our research proposal titled Dementia back in the heart of the community, a consortium effort for which we will conduct the data scientific component, was granted by ZonMW.

I am assistant professor and group leader of the Explanatory Data Analysis group at the Leiden Institute of Advanced Computer Science (LIACS), the computer science institute of Leiden University. I am also affiliated with the Leiden Centre of Data Science (LCDS) and university-wide Data Science Research Programme (DSRP). My primary research interest is exploratory data mining: how can we enable domain experts to explore and analyse their data, to discover structure and—ultimately—novel knowledge?

For this it is very important that all methods and results are explainable to domain experts, who may not be data scientists. My signature approach is to define and identify patterns that matter, i.e., succinct descriptions that characterise relevant structure present in the data. Which patterns matter strongly depends on the data and task at hand, hence defining the problem is one of the key challenges of exploratory data mining. Information theoretic concepts such as the Minimum Description Length (MDL) principle have proven very useful to this end. I am also interested in interactive data mining, i.e., involving humans in the loop.

Finally, I am interested in fundamental data mining research for real-world applications, both in science (e.g., life sciences, social sciences) and industry (e.g., manufacturing and engineering, aviation), as this is the best way to show that the theory works in practice.

see all


Current and upcoming
  • Teacher of Information Theoretic Data Mining '18-'19 (MSc Computer Science & Data Science).
  • Participant at Honda Research Institute's EGN Symposium 2018, in Offenbach, Germany, 27-27 September.
  • Participant at Dagstuhl Seminar 18401, Automating Data Science, in Wadern, Germany, 30 September - 5 October.
  • Advisory Chair of IDA 2018 in Den Bosch, 24-26 October.
  • Committee member at the PhD defence of Sergey Paramonov, whom I previously worked with together with Luc De Raedt. KU Leuven, Leuven, 29 October.

see all

Selected recent publications

van Rijn, S, van Leeuwen, M, Schmitt, S, Olhofer, M & Bäck, T Multi-Fidelity Surrogate Model Approach to Optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO'18), ACM, 2018.
van Leeuwen, M, Chau, DH, Vreeken, J, Shahaf, D & Faloutsos, C Editorial: TKDD Special Issue on Interactive Data Exploration and Analytics. Transactions on Knowledge Discovery from Data vol.12(1), ACM, 2018.
Ukkonen, A, Dzyuba, V & van Leeuwen, M Explaining Deviating Subsets through Explanation Networks. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD'17), Springer, 2017.
Dzyuba, V & van Leeuwen, M Learning what matters – Sampling interesting patterns. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'17), pp 534-546, Springer, 2017.
Paramonov, S, van Leeuwen, M & De Raedt, L Relational Data Factorization. Machine Learning vol.106(12), pp 1867-1904, Springer, 2017.
Dzyuba, V, van Leeuwen, M & De Raedt, L Flexible constrained sampling with guarantees for pattern mining. Data Mining and Knowledge Discovery vol.31(5), pp 1266-1293, Springer, 2017. (ECMLPKDD'17 Special Issue)implementation
Le Van, T, Nijssen, S, van Leeuwen, M & De Raedt, L Semiring Rank Matrix Factorisation. Transactions on Knowledge and Data Engineering vol.29(8), pp 1737-1750, IEEE, 2017.
van Stein, B, van Leeuwen, M, Wang, H, Purr, S, Kreissl, S, Meinhardt, J & Bäck, T Towards Data Driven Process Control in Manufacturing Car Body Parts. In: Proceedings of IEEE International Conference on Computational Science and Computational Intelligence (IEEE CSCI-ISBD'16), IEEE, 2016.
van Rijn, S, Wang, H, van Leeuwen, M & Bäck, T Evolving the Structure of Evolution Strategies. In: Proceedings of IEEE Symposium Series on Computational Intelligence (IEEE SSCI'16), IEEE, 2016.
van Stein, B, van Leeuwen, M & Bäck, T Local Subspace-Based Outlier Detection using Global Neighbourhoods. In: Proceedings of IEEE International Conference on Big Data (IEEE BigData'16), IEEE, 2016.
van Leeuwen, M & Ukkonen, A Expect the Unexpected - On the Significance of Subgroups. In: Proceedings of Discovery Science (DS'16), pp 51-66, Springer, 2016.
Le Van, T, van Leeuwen, M, Fierro, AC, De Maeyer, D, Van den Eynden, J, Verbeke, L, De Raedt, L, Marchal, K & Nijssen, S Simultaneous discovery of cancer subtypes and subtype features by molecular data integration. Bioinformatics vol.32(17), pp 445-454, Oxford University Press, 2016.implementation
Copmans, D, Meinl, T, Dietz, C, van Leeuwen, M, Ortmann, J, Berthold, M & de Witte, PAM A KNIME-based Analysis of the Zebrafish Photomotor Response Clusters the Phenotypes of 14 Classes of Neuroactive Molecules. Journal of Biomolecular Screening vol.21(5), pp 427-436, SAGE Publishing, 2016.implementation
van Leeuwen, M, De Bie, T, Spyropoulou, E & Mesnage, C Subjective Interestingness of Subgraph Patterns. Machine Learning vol.105(1), pp 41-75, Springer, 2016.implementation