Utility Scripts
In addition to the main script, run_experiment, SKLL comes with a number of helpful utility scripts that can be used to prepare feature files and perform other routine tasks. Each is described briefly below.
compute_eval_from_predictions
Compute evaluation metrics from prediction files after you have run an experiment.
Positional Arguments
- examples_file
SKLL input file with labeled examples
- predictions_file
file with predictions from SKLL
- metric_names
metrics to compute
Optional Arguments
- --version
Show program’s version number and exit.
filter_features
Filter feature file to remove (or keep) any instances with the specified IDs or labels. Can also be used to remove/keep feature columns.
Warning
Starting with v2.5 of SKLL, the arguments for filter_features
have changed and are no longer backwards compatible with older
versions of SKLL. Specifically:
The input and output files must now be specified with
-i
and-o
respectively.--inverse
must now be used to invert the filtering command since-i
is used to specify the input file.
Required Arguments
- -i, --input
Input feature file (ends in
.arff
,.csv
,.jsonlines
,.ndj
, or.tsv
)
- -o, --output
Output feature file (must have same extension as input file)
Optional Arguments
- -f <feature <feature ...>>, --feature <feature <feature ...>>
A feature in the feature file you would like to keep. If unspecified, no features are removed.
- -I <id <id ...>>, --id <id <id ...>>
An instance ID in the feature file you would like to keep. If unspecified, no instances are removed based on their IDs.
- --inverse
Instead of keeping features and/or examples in lists, remove them.
- --id_col <id_col>
Name of the column which contains the instance IDs in ARFF, CSV, or TSV files. (default:
id
)
- -L <label <label ...>>, --label <label <label ...>>
A label in the feature file you would like to keep. If unspecified, no instances are removed based on their labels.
- -l <label_col>, --label_col <label_col>
Name of the column which contains the class labels in ARFF, CSV, or TSV files. For ARFF files, this must be the final column to count as the label. (default:
y
)
- -db, --drop-blanks
Drop all lines/rows that have any blank values. (default:
False
)
- -rb <replacement>, --replace-blanks-with <replacement>
Specifies a new value with which to replace blank values in all columns in the file. To replace blanks differently in each column, use the SKLL Reader API directly. (default:
None
)
- -q, --quiet
Suppress printing of
"Loading..."
messages.
- --version
Show program’s version number and exit.
generate_predictions
Loads a trained model and outputs predictions based on input feature files. Useful if you want to reuse a trained model as part of a larger system without creating configuration files. Offers the following modes of operation:
For non-probabilistic classification and regression, generate the predictions.
For probabilistic classification, generate either the most likely labels or the probabilities for each class label.
For binary probablistic classification, generate the positive class label only if its probability exceeds the given threshold. The positive class label is either read from the model file or inferred the same way as a SKLL learner would.
Positional Arguments
- model_file
Model file to load and use for generating predictions.
- input_file(s)
One or more feature file(s) (ending in
.arff
,.csv
,.jsonlines
,.libsvm
,.ndj
, or.tsv
) (with or without the label column), with the appropriate suffix.
Optional Arguments
- -i <id_col>, --id_col <id_col>
Name of the column which contains the instance IDs in ARFF, CSV, or TSV files. (default:
id
)
- -l <label_col>, --label_col <label_col>
Name of the column which contains the labels in ARFF, CSV, or TSV files. For ARFF files, this must be the final column to count as the label. (default:
y
)
- -o <path>, --output_file <path>
Path to output TSV file. If not specified, predictions will be printed to stdout. For probabilistic binary classification, the probability of the positive class will always be in the last column.
- -p, --predict_labels
If the model does probabilistic classification, output the class label with the highest probability instead of the class probabilities.
- -q, --quiet
Suppress printing of
"Loading..."
messages.
- -t <threshold>, --threshold <threshold>
If the model does binary probabilistic classification, return the positive class label only if it meets/exceeds the given threshold and the other class label otherwise.
- --version
Show program’s version number and exit.
join_features
Combine multiple feature files into one larger file.
Positional Arguments
- infile ...
Input feature files (ends in
.arff
,.csv
,.jsonlines
,.ndj
, or.tsv
)
- outfile
Output feature file (must have same extension as input file)
Optional Arguments
- -l <label_col>, --label_col <label_col>
Name of the column which contains the labels in ARFF, CSV, or TSV files. For ARFF files, this must be the final column to count as the label. (default:
y
)
- -q, --quiet
Suppress printing of
"Loading..."
messages.
- --version
Show program’s version number and exit.
plot_learning_curves
Generate learning curve plots from a learning curve output TSV file.
Positional Arguments
- tsv_file
Input learning Curve TSV output file.
- output_dir
Output directory to store the learning curve plots.
print_model_weights
Prints out the weights of a given trained model. If the model
was trained using feature hashing,
feature names of the form hashed_feature_XX
will be used
since the original feature names no longer apply.
Positional Arguments
- model_file
Model file to load.
Optional Arguments
- --k <k>
Number of top features to print (0 for all) (default: 50)
- --sign {positive,negative,all}
Show only positive, only negative, or all weights (default:
all
)
- --sort_by_labels
Order the features by classes (default:
False
). Mutually exclusive with the--k
option.
- --version
Show program’s version number and exit.
skll_convert
Convert between .arff, .csv., .jsonlines, .libsvm, and .tsv formats.
Positional Arguments
- infile
Input feature file (ends in
.arff
,.csv
,.jsonlines
,.libsvm
,.ndj
, or.tsv
)
- outfile
Output feature file (ends in
.arff
,.csv
,.jsonlines
,.libsvm
,.ndj
, or.tsv
)
Optional Arguments
- -l <label_col>, --label_col <label_col>
Name of the column which contains the labels in ARFF, CSV, or TSV files. For ARFF files, this must be the final column to count as the label. (default:
y
)
- -q, --quiet
Suppress printing of
"Loading..."
messages.
- --arff_regression
Create ARFF files for regression, not classification.
- --arff_relation ARFF_RELATION
Relation name to use for ARFF file. (default:
skll_relation
)
- --no_labels
Used to indicate that the input data has no labels.
- --reuse_libsvm_map REUSE_LIBSVM_MAP
If you want to output multiple files that use the same mapping from labels and features to numbers when writing libsvm files, you can specify an existing .libsvm file to reuse the mapping from.
- --version
Show program’s version number and exit.
summarize_results
Creates an experiment summary TSV file from a list of JSON files generated by run_experiment.
Positional Arguments
- summary_file
TSV file to store summary of results.
- json_file
JSON results file generated by run_experiment.
Optional Arguments
- -a, --ablation
The results files are from an ablation run.
- --version
Show program’s version number and exit.