Association Mining

The goal of this project is to develop new techniques for efficient mining of frequent item sets and for mining itemsets with constraints in general. MAFIA supports mining of maximal frequent itemsets from transactional databases and scales to very long itemsets. Dual-Miner employs novel dual-pruning algorithms for mining of itemsets with constraints.

MAFIA: MAximal Frequent Itemset Algorithm

MAFIA is a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms.

Our implementation of the search strategy combines a vertical bitmap representation of the database with an efficient relative bitmap compression schema. In a thorough experimental analysis of our algorithm on real data, we isolate the effect of the individual components of the algorithm. Our performance numbers show that our algorithm outperforms previous work by up to an order of magnitude.

Explanations of various aspects of the algorithm

Power-point walkthrough of the algorithm

Obtaining MAFIA

The full source code for MAFIA is now available for download from Sourceforge.

People

Doug Burdick
Manuel Calimlim
Jason Flannick
Johannes Gehrke

Publications

Doug Burdick, Manuel Calimlim, and J. E. Gehrke. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases.
In Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, April 2001.