Table of Contents
Introduction
StreamPrefGen is a dataset generator for evaluation of the operators of the StreamPref query language. The operators are implemented in the StreamPref Data Stream Management System (DSMS) prototype. The StreamPrefGen is composed of tools to generate streams, relations, queries and auxiliary files for the execution of experiments with StreamPref. Please see the related publications for more information.
Generators
The StreamPrefGen is composed by individual dataset generators for evaluation of specific StreamPref operators. The generators are the following:
- bestseqgen.py: generator for evaluation of BESTSEQ operator (temporal preference operator);
- seqgen.py: generator for evaluation of SEQ operator (sequence extraction);
- conseggen.py: generator for evaluation of CONSEQ operator (subsequences with consecutive tuples);
- endseqgen.py: generator for evaluation of ENDSEQ operator (subsequences with the last position);
- maxseqgen.py: generator for evaluation of MAXSEQ operator (Filtering by maximum length);
- minseqgen.py: generator for evaluation of MINSEQ operator (Filtering by minimum length);
- utilgen.py: generator for utility experiments.
Algorithms
Except by the utilgen.py, all tools generate StremPref environments for evaluating their operators. Each operator can be evaluated by one or more algorithms and by a CQL equivalent query. The available algorithms for each operator are the following:
- SEQ
- Incremental algorithm
- CQL Equivalence
- CONSEQ / ENDSEQ
- Naive algorithm
- Incremental algorithm
- CQL Equivalence
- MINSEQ / MAXSEQ
- Direct algorithm
- CQL Equivalence
- BESTSEQ
- Naive algorithm with depth search comparison
- Incremental algorithm with sequences tree
- Incremental algorithm with sequences tree and pruning
- CQL Equivalence
The goal of the utilgen.py is to execute experiments to analyze the utility of the operators. This tool execute experiments using the following combinations of operators:
- SEQ / BESTSEQ;
- SEQ / CONSEQ / BESTSEQ;
- SEQ / CONSEQ / ENDSEQ / BESTSEQ;
- SEQ / CONSEQ / ENDSEQ / MINSEQ / MAXSEQ / BESTSEQ. During the experiments execution the tool takes informations about the sequences sent to BESTSEQ operator and about the comparisons performed by this operator.
Parameters
The experiments parameters must be updated directly in the source code. The available parameters are the following:
- ATT: Number of attributes;
- NSQ: Number of distinct sequences;
- RAN: Temporal range;
- SLI: Slide interval;
- PCT: Percentage of consecutive instants (used only by conseqgen.py);
- MAX: Maximum valid length (used only by maxseqgen.py);
- MIN: Maximum valid length (used only by minseqgen.py);
- RUL: Number of rules (used only by bestseqgen.py);
- LEV: Maximum preference level (used only by bestseqgen.py);
- IND: Number of indifferent attributes (used only by bestseqgen.py).
Every parameter is dictionary with the keys VAR (list of values) and DEF (default parameter).
Command Line
Despite StreamPrefGen is composed by many generators, all of them share the same command line options.
gen.py [-h] [-g] [-o] [-r] [-s]
-h, --help show the help message and exit
-g, --gen Generate files
-o, --output Generate query output
-r, --run Run experiments
-s, --summarize Summarize results