This is an implementation of FP-Growth-Tiny which introduces a space
optimization to FP-Growth algorithm.

There are three executables for mining frequent patterns: fpgrowth,
fpgrowth-tiny-bin and fpgrowth-tiny-debug. fpgrowth-tiny uses plain ASCII
data format consisting of each transaction as a whitespace separated list of
items on a separate line. fpgrowth-tiny-bin uses the binary IBM
datagenerator format. fpgrowth-tiny-debug generates intermediate fp-trees
and outputs verbose logs about each step of the algorithm.

The algorithms output the discovered pattern to a file, in the format
used in FIMI workshop.

random-ts creates a random dataset with Gaussian distribution of transaction
length.

check-ts validates a transaction database, check-patterns validates the
correctness of discovered frequent patterns. pattterncount.py counts the
number of patterns in a generated pattern file.

ts-to-ascii and ts-to-gnuplot convert a binary transaction set to ascii format
and to a 2d plot respectively. thrash-cache was used to corrupt current os
file cache, but might not work as intended.

The test-* programs are CLIs for unit tests.

Under perf directory, there are some neat python scripts to measure time and
space performance of fpgrowth-tiny and fpgrowth. serial-time.py and space.py
conducts the experiments, plot-* commands create gnuplot data files.

In doc directory, only doc/performance is active, and it plots the
performance data in perf dir. The

The source code is distributed under GPL, please read COPYING file.

Note that the ASCII input and pattern counting codes were not optimized. In
particular, pattern counting could be made more efficient, the current code
uses std::map<T,T>

--
Eray Ozkural <erayo@cs.bilkent.edu.tr> http://www.cs.bilkent.edu.tr/~erayo 
Ankara, 2004
