QFARM is a research tool for range-based rule mining, combining DFS rule-tree exploration with multi-objective genetic algorithms (Jenetics-based). It discovers high-quality rules of the form:
The system evaluates rules using multiple fitness metrics, evolves Pareto fronts, and outputs DOT graphs and JSON logs describing the search.
From the project root:
./gradlew shadowJarThe runnable JAR is generated at:
build/libs/qfarm-<version>.jarRebuild the JAR whenever you modify the source code.
QFARM now uses a multi-command CLI. The general syntax is:
java -jar qfarm.jar <command> [options]Available commands:
search— run the rule mining algorithmvalidate— run validation procedures on a dataset (initial implementation)
Run the main rule-mining algorithm.
java -jar qfarm.jar search \
--data <path/to.csv> \
--rhs <column_name> \
[--rhs-range <lo,hi>] \
[--rhs-range-percentile <pLo,pHi>] \
[optional hyperparameters...]--data
Path to CSV dataset.
--rhs
Column name of the right-hand-side attribute.
Exactly one of:
--rhs-range--rhs-range-percentile
java -jar qfarm.jar search \
--data data.csv \
--rhs y \
--rhs-range-percentile 90,100java -jar qfarm.jar search \
--data data.csv \
--rhs y \
--rhs-range 4.0,MAXBoth --rhs-range and --rhs-range-percentile accept:
lo,hi
lo..hi
MIN,MAX
MIN,6.0
4.0,MAX
--rhs-range-percentile "[90,100]"All hyperparameters can be overridden through CLI flags.
Any parameter not provided falls back to defaults defined in HyperParameters.
--min-support (default: 100)
Minimum number of records that must satisfy the rule.
--max-support (default: 5000)
Maximum number of records a rule can cover.
--max-depth (default: 2)
Maximum number of attributes in the antecedent (rule length).
--max-children (default: 1)
Maximum number of children per internal node in the rule tree.
--max-first-children (default: 1)
Maximum number of children for the root node.
--evo-cheap-pop
Population size used in the cheap (initial) evolution phase.
--evo-cheap-gen
Number of generations for the cheap evolution phase.
--evo-full-pop
Population size used in the full evolution phase.
--evo-full-gen
Number of generations for the full evolution phase.
--prob-mutation (default: 1.0)
Probability of applying mutation to a gene during evolution.
--std-mutation (default: 0.15)
Standard deviation controlling mutation magnitude.
--alpha-threshold
Statistical significance threshold (e.g., for p-value filtering).
--excl-cols (default: [])
Comma-separated list of column names to exclude from the dataset before rule mining.
Example:
--excl-cols ID,Timestamp
--name (default: auto-generated)
Optional run name / experiment label.
Used for logging, plots, output directories, and DOT URLs.
Example:
--name experiment_1
java -jar qfarm.jar search \
--data data.csv \
--rhs y \
--rhs-range 4.0,MAX \
--max-depth 3 \
--max-children 2 \
--max-first-children 10 \
--evo-cheap-pop 100 \
--evo-cheap-gen 100 \
--evo-full-pop 500 \
--evo-full-gen 500 \
--prob-mutation 0.75 \
--std-mutation 0.02 \
--min-support 5 \
--max-support 500 \
--alpha-threshold 0.05Run validation procedures on a dataset.
java -jar qfarm.jar validate \
--data <path/to.csv>
--rules <path/to.jsonl>--data (required)
Path to dataset used for validation.
--rules (required)
Path to the JSONL file containing rules generated from a previous run.
java -jar build/libs/qfarm-0.1.build.jar validate \
--data data.csv \
--rules /path/to/previous_run/log.jsonl- The CLI is built using Clikt, enabling structured subcommands.
- Commands are independent and can evolve separately.
- Future versions will expand
validateto support rule evaluation and metrics.
Both search and validate commands produce a full set of result files inside a run-specific directory:
results/<run_name>/
├── validation_summary.txt (only for validate)
├── final_rules_summary.txt
├── final_rules_table.csv
├── representative_rules.txt
├── log.jsonl
├── full_tree.dot
├── full_tree.svg
└── front_plots/
-
validation_summary.txt
Produced only by thevalidatecommand.
Main validation report. Includes:- ROC p-values
- KS p-values
- Failure reasons (
MISSING,KS_FAIL,ROC_FAIL,PARENT_FAIL) - Visual rule plots
- Comparison with previous run (for KS failures)
-
final_rules_summary.txt
Summary of discovered fronts (attribute combinations). -
final_rules_table.csv
Tabular export of fronts (attribute combinations) and their metrics. -
representative_rules.txt
Selected subset of representative rules from final fronts. -
log.jsonl
NDJSON log with detailed step-by-step execution (serves as input for validation procedure). -
full_tree.dot
GraphViz representation of the rule tree. -
full_tree.svg
Rendered tree visualization (generated automatically if GraphViz is available). -
front_plots/
HTML files with Pareto front visualizations for each rule.
To run without rebuilding the JAR each time, add this to build.gradle.kts:
application {
mainClass = "MainKt"
}
Run with:
./gradlew run --args="--data data.csv --rhs LDL --rhs-range 4.0..7.0"
-
Fork this repository
-
Create a feature branch:
git checkout -b feature/my-feature -
Commit your work
-
Push your branch:
git push origin feature/my-feature -
Open a pull request
...