Software
On this page you can find software (i.e., code and/or binaries) for a number of projects in which I have been involved. Are you looking for an implementation that is not listed here? Contact me. Currently available are:
- DCM – Description-driven Community Mining
- DSSD – Diverse Subgroup Discovery
- Fast-Skyline – Efficient Discovery of the Cost-Influence Skyline
- Krimp – Itemsets that Compress
- SSG Miner – Subjective Interestingness of Subgraph Patterns
- Spectra – Fast Estimation of the Pattern Frequency Spectrum
- Translator – Association Discovery in Two-view Data
DCM – Description-driven Community Mining
Description-driven Community Mining (DCM) [2] is our solution to finding a diverse set of cohesive communities with concise descriptions in a social network. It has the nice feature of being able to build well-described cohesive communities starting from any given description or seed set of nodes, which makes it very flexible and easily applicable.
- DCM binaries and C# source code
Download (for Windows only).
DSSD – Diverse Subgroup Set Discovery
The latest release of my DSSD [3] implementation can be fully configured to:
- perform depth-first search or beam search;
- use a traditional top-k beam or one of the diverse beam selection strategies;
- do sequential or weighted covering using any of the depth-first or beam search strategies;
- perform post-selection using any of the subgroup selection strategies;
- use one of the Subgroup Discovery quality measures: Weighted Relative Accuracy (standard, multi-class, or numeric), Chi-squared, mean test, (Weighted) KL quality;
- use one of the Exceptional Model Mining quality measures: (Weighted) Kullback-Leibler quality, (Weighted) Krimp Gain quality.
Provided are both Windows binaries and the C++ source code:
- DSSD binaries
Download -- Should run on any Windows platform (includes binaries for both x86 and x64). - DSSD C++ source code
Download -- Includes solution and project files for Visual Studio 2010, but hardly depends on platform-specific features. In other words, should also compile with different platforms and compilers.
Fast-Skyline – Efficient Discovery of the Cost-Influence Skyline
Fast-Skyline is an algorithm for computing approximate “skylines” (/ Pareto fronts / non-dominated sets) of subsets of size-k subject to two functions, one linear, one submodular. That is, the algorithm computes the set of non-dominated subsets of size-k.
Van Leeuwen & Ukkonen 2015 describes this algorithm in the context of influence maximisation, where the subsets are sets of vertices, the seed sets. We consider the special case where the seed sets have different costs, defined as the sum of vertex-specific costs. We say that a seed set dominates another seed set if it has higher influence and lower cost.
- More information and Javascript source code
anttiukkonen.com
Krimp – Itemsets that Compress
Our implementation of Krimp [4] is freely available for research purposes; we provide both the C++ source code and binaries for Windows (x86 and x64) and Linux. In addition to the pattern set selection algorithm, it contains the Krimp classifier [5] and the StreamKrimp algorithm [6]. For your convenience, the package includes some example UCI datasets taken from the LUCS-KDD data library. Please refer to the documentation in the package for installation/compilation details and usage hints.
- Krimp binaries and C++ source code
Download (version 1st of February 2013)
SSG Miner – Subjective Interestingness of Subgraph Patterns
Our implementation of SSG Miner, for Subjective Subgraph Miner, as described in Subjective Interestingness of Subgraph Patterns.
Spectra – Fast Estimation of the Pattern Frequency Spectrum
FastEst and Spectra [1] are algorithms for estimating the number of frequent itemsets in a dataset. Exactly counting the number of frequent itemsets is a #P-complete problem. Our approach, based on the classical algorithm by Knuth to estimate the size of a search tree, is much faster but accurate nevertheless.
The C++ implementation was used for the experiments reported on in our ECML PKDD 2014 paper. In addition, we also provide a JavaScript-based implementation that runs in your browser; a description and some performance benchmarks are in this paper.
- Spectra binaries and C++ source code
Download (supported: Windows, Linux). - Browser-based demo and implementation in JavaScript
Demo (or read the motivation and performance benchmarks).
Translator – Association Discovery in Two-view Data
The Translator algorithms find small and non-redundant sets of associations that describe how the two views of two-view datasets are related, where two-view datasets are datasets whose attributes are naturally split into two sets. The models, dubbed translation tables, contain both unidirectional and bidirectional rules that span both views and provide lossless translation from either of the views to the opposite view. A score based on the Minimum Description Length (MDL) principle is used for model selection.
The implementation provided here was used for the experiments reported on in our TKDE paper.
- Translator binaries and C++ source code
Download (supported: Windows, Linux).
References
[1] | Fast Estimation of the Pattern Frequency Spectrum. In Proceedings of the ECML PKDD'14, pages ?, 2014. |
[2] | Description-driven Community Detection. Transactions on Intelligent Systems and Technology, 5(2):?, ACM, 2014. |
[3] | Diverse Subgroup Set Discovery. Data Min. Knowl. Discov., 25(2):208-242, Springer Netherlands, 2012. |
[4] | Item Sets that Compress. In Proc. SDM'06, pages 393-404, 2006. |
[5] | Compression Picks the Item Sets that Matter. In Proc. ECML PKDD'06, pages 585-592, 2006. |
[6] | StreamKrimp: Detecting Change in Data Streams. In Proceedings of the ECML PKDD'08, pages 672-687, 2008. |