Table of Contents
Introduction
WCImport is a set of tools for importing data of 2014 Soccer World Cup and prepare this data for experiments with StreamPref Data Stream Management System (DSMS) prototype. Please see the related publications for more information.
Tools
The first step is to run the tool wcimport.py to download and convert the data. Next, you can use the specific tool to create the environment for experiments.
In addition, WCImport has the following individual tools:
- bestseqgen.py: tool for evaluation of BESTSEQ operator (temporal preference operator);
- seqgen.py: tool for evaluation of SEQ operator (sequence extraction);
- conseggen.py: tool for evaluation of CONSEQ operator (subsequences with consecutive tuples);
- endseqgen.py: tool for evaluation of ENDSEQ operator (subsequences with the last position);
- maxseqgen.py: tool for evaluation of MAXSEQ operator (Filtering by maximum length);
- minseqgen.py: tool for evaluation of MINSEQ operator (Filtering by minimum length);
- utilgen.py: tool for utility experiments.
Algorithms
Except by the utilgen.py, all tools generate StremPref environments for evaluating their operators. Each operator can be evaluated by one or more algorithms and by a CQL equivalent query. The available algorithms for each operator are the following:
- SEQ
- Incremental algorithm
- CQL Equivalence
- CONSEQ / ENDSEQ
- Naive algorithm
- Incremental algorithm
- CQL Equivalence
- MINSEQ / MAXSEQ
- Direct algorithm
- CQL Equivalence
- BESTSEQ
- Naive algorithm with depth search comparison
- Incremental algorithm with sequences tree
- Incremental algorithm with sequences tree and pruning
- CQL Equivalence
The goal of the utilgen.py is to execute experiments to analyze the utility of the operators. This tool execute experiments using the following combinations of operators:
- SEQ / BESTSEQ;
- SEQ / CONSEQ / BESTSEQ;
- SEQ / CONSEQ / ENDSEQ / BESTSEQ;
- SEQ / CONSEQ / ENDSEQ / MINSEQ / MAXSEQ / BESTSEQ. During the experiments execution the tool takes informations about the sequences sent to BESTSEQ operator and about the comparisons performed by this operator.
Parameters
The experiments parameters must be updated directly in the source code. The available parameters are the following:
- ATT: Number of attributes;
- NSQ: Number of distinct sequences;
- RAN: Temporal range;
- SLI: Slide interval;
- PCT: Percentage of consecutive instants (used only by conseqgen.py);
- MAX: Maximum valid length (used only by maxseqgen.py);
- MIN: Maximum valid length (used only by minseqgen.py);
- RUL: Number of rules (used only by bestseqgen.py);
- LEV: Maximum preference level (used only by bestseqgen.py);
- IND: Number of indifferent attributes (used only by bestseqgen.py).
Every parameter is dictionary with the keys VAR (list of values) and DEF (default parameter).
Command Line
Despite StreamPrefGen is composed by many tools, all of them share the same command line options.
gen.py [-h] [-g] [-o] [-r] [-s]
-h, --help show the help message and exit
-g, --gen Generate files
-o, --output Generate query output
-r, --run Run experiments
-s, --summarize Summarize results