Running Experiments

The simplest way to use SKLL is to create configuration files that describe experiments you would like to run on pre-generated features. This document describes the supported feature file formats, how to create configuration files (and layout your directories), and how to use run_experiment to get things going.

Quick Example

If you don’t want to read the whole document, and just want an example of how things work, do the following from the command prompt:

$ cd examples
$ python          # download a simple dataset
$ cd iris
$ run_experiment --local evaluate.cfg        # run an experiment

Feature file formats

The following feature file formats are supported:


The same file format used by Weka with the following added restrictions:

  • Only simple numeric, string, and nomimal values are supported.
  • Nominal values are converted to strings.
  • If the data has instance IDs, there should be an attribute with the name specified by id_col in the Input section of the configuration file you create for your experiment. This defaults to id. If there is no such attribute, IDs will be generated automatically.
  • If the data is labelled, there must be an attribute with the name specified by label_col in the Input section of the configuartion file you create for your experiment. This defaults to y. This must also be the final attribute listed (like in Weka).


A simple comma or tab-delimited format with the following restrictions:

  • If the data is labelled, there must be a column with the name specified by label_col in the Input section of the configuartion file you create for your experiment. This defaults to y.
  • If the data has instance IDs, there should be a column with the name specified by id_col in the Input section of the configuration file you create for your experiment. This defaults to id. If there is no such column, IDs will be generated automatically.
  • All other columns contain feature values, and every feature value must be specified (making this a poor choice for sparse data).


While we can process the standard input file format supported by LibSVM, LibLinear, and SVMLight, we also support specifying extra metadata usually missing from the format in comments at the of each line. The comments are not mandatory, but without them, your labels and features will not have names. The comment is structured as follows:

ID | 1=ClassX | 1=FeatureA 2=FeatureB

The entire format would like this:

2 1:2.0 3:8.1 # Example1 | 2=ClassY | 1=FeatureA 3=FeatureC
1 5:7.0 6:19.1 # Example2 | 1=ClassX | 5=FeatureE 6=FeatureF


IDs, labels, and feature names cannot contain the following characters: | # =


An expanded form of the input format for the MegaM classification package with the -fvals switch.

The basic format is:

# Instance1
CLASS1    F0 2.5 F1 3 FEATURE_2 -152000
# Instance2
CLASS2    F1 7.524

where the optional comments before each instance specify the ID for the following line, class names are separated from feature-value pairs with a tab, and feature-value pairs are separated by spaces. Any omitted features for a given instance are assumed to be zero, so this format is handy when dealing with sparse data. We also include several utility scripts for converting to/from this MegaM format and for adding/removing features from the files.

Creating configuration files

The experiment configuration files that run_experiment accepts are standard Python configuration files that are similar in format to Windows INI files. [1] There are four expected sections in a configuration file: General, Input, Tuning, and Output. A detailed description of each possible settings for each section is provided below, but to summarize:

  • If you want to do cross-validation, specify a path to training feature files, and set task to cross_validate. Please note that the cross-validation currently uses StratifiedKFold. You also can optionally use predetermined folds with the folds_file setting.


    When using classifiers, SKLL will automatically reduce the number of cross-validation folds to be the same as the minimum number of examples for any of the classes in the training data.

  • If you want to train a model and evaluate it on some data, specify a training location, a test location, and a directory to store results, and set task to evaluate.
  • If you want to just train a model and generate predictions, specify a training location, a test location, and set task to predict.
  • If you want to just train a model, specify a training location, and set task to train.
  • If you want to generate a learning curve for your data, specify a training location and set task to learning_curve. The learning curve is generated using essentially the same underlying process as in scikit-learn except that the SKLL feature pre-processing pipline is used while training the various models and computing the scores.


    Ideally, one would first do cross-validation experiments with grid search and/or ablation and get a well-performing set of features and hyper-parameters for a set of learners. Then, one would explicitly specify those features (via featuresets) and hyper-parameters (via fixed_parameters) in the config file for the learning curve and explore the impact of the size of the training data.

Example configuration files are available here.


Both fields in the General section are required.


A string used to identify this particular experiment configuration. When generating result summary files, this name helps prevent overwriting previous summaries.


What types of experiment we’re trying to run. Valid options are: cross_validate, evaluate, predict, train, learning_curve.


The Input section has only one required field, learners, but also must contain either train_file or train_directory.


List of scikit-learn models to try using. A separate job will be run for each combination of classifier and feature-set. Acceptable values are described below. Custom learners can also be specified. See custom_learner_path.



For all regressors you can also prepend Rescaled to the beginning of the full name (e.g., RescaledSVR) to get a version of the regressor where predictions are rescaled and constrained to better match the training set.

train_file (Optional)

Path to a file containing the features to train on. Cannot be used in combination with featuresets, train_directory, or test_directory.


If train_file is not specified, train_directory must be.

train_directory (Optional)

Path to directory containing training data files. There must be a file for each featureset. Cannot be used in combination with train_file or test_file.


If train_directory is not specified, train_file must be.

test_file (Optional)

Path to a file containing the features to test on. Cannot be used in combination with featuresets, train_directory, or test_directory

test_directory (Optional)

Path to directory containing test data files. There must be a file for each featureset. Cannot be used in combination with train_file or test_file.

featuresets (Optional)

List of lists of prefixes for the files containing the features you would like to train/test on. Each list will end up being a job. IDs are required to be the same in all of the feature files, and a ValueError will be raised if this is not the case. Cannot be used in combination with train_file or test_file.


If specifying train_directory or test_directory, featuresets is required.

suffix (Optional)

The file format the training/test files are in. Valid option are .arff, .csv, .jsonlines, .libsvm, .megam, .ndj, and .tsv.

If you omit this field, it is assumed that the “prefixes” listed in featuresets are actually complete filenames. This can be useful if you have feature files that are all in different formats that you would like to combine.

id_col (Optional)

If you’re using ARFF, CSV, or TSV files, the IDs for each instance are assumed to be in a column with this name. If no column with this name is found, the IDs are generated automatically. Defaults to id.

label_col (Optional)

If you’re using ARFF, CSV, or TSV files, the class labels for each instance are assumed to be in a column with this name. If no column with this name is found, the data is assumed to be unlabelled. Defaults to y. For ARFF files only, this must also be the final column to count as the label (for compatibility with Weka).

ids_to_floats (Optional)

If you have a dataset with lots of examples, and your input files have IDs that look like numbers (can be converted by float()), then setting this to True will save you some memory by storing IDs as floats. Note that this will cause IDs to be printed as floats in prediction files (e.g., 4.0 instead of 4 or 0004 or 4.000).

shuffle (Optional)

If True, shuffle the examples in the training data before using them for learning. This happens automatically when doing a grid search but it might be useful in other scenarios as well, e.g., online learning. Defaults to False.

class_map (Optional)

If you would like to collapse several labels into one, or otherwise modify your labels (without modifying your original feature files), you can specify a dictionary mapping from new class labels to lists of original class labels. For example, if you wanted to collapse the labels beagle and dachsund into a dog class, you would specify the following for class_map:

{'dog': ['beagle', 'dachsund']}

Any labels not included in the dictionary will be left untouched.

One other use case for class_map is to deal with classification labels that would be converted to float improperly. All Reader sub-classes use the function internally to read labels. This function tries to convert a single label first to int, then to float. If neither conversion is possible, the label remains a str. Thus, care must be taken to ensure that labels do not get converted in unexpected ways. For example, consider the situation where there are classification labels that are a mixture of int-converting and float-converting labels:

import numpy as np
from import safe_float
np.array([safe_float(x) for x in ["2", "2.2", "2.21"]]) # array([2.  , 2.2 , 2.21])

The labels will all be converted to floats and any classification model generated with this data will predict labels such as 2.0, 2.2, etc., not str values that exactly match the input labels, as might be expected. class_map could be used to map the original labels to new values that do not have the same characteristics.

num_cv_folds (Optional)

The number of folds to use for cross validation. Defaults to 10.

random_folds (Optional)

Whether to use random folds for cross-validation. Defaults to False.

folds_file (Optional)

Path to a csv file specifying the mapping of instances in the training data to folds. This can be specified when the task is either train or cross_validate. For the train task, if grid_search is True, this file, if specified, will be used to define the cross-validation used for the grid search (leave one fold ID out at a time). Otherwise, it will be ignored.

For the cross_validate task, this file will be used to define the outer cross-validation loop and, if grid_search is True, also for the inner grid-search cross-validation loop. If the goal of specifiying the folds file is to ensure that the model does not learn to differentiate based on a confound: e.g. the data from the same person is always in the same fold, it makes sense to keep the same folds for both the outer and the inner cross-validation loops.

However, sometimes the goal of specifying the folds file is simply for the purpose of comparison to another existing experiment or another context in which maintaining the constitution of the folds in the inner grid-search loop is not required. In this case, users may set the parameter use_folds_file_for_grid_search to False which will then direct the inner grid-search cross-validation loop to simply use the number specified via grid_search_folds instead of using the folds file. This will likely lead to shorter execution times as well depending on how many folds are in the folds file and the value of grid_search_folds.

The format of this file must be as follows: the first row must be a header. This header row is ignored, so it doesn’t matter what the header row contains, but it must be there. If there is no header row, whatever row is in its place will be ignored. The first column should consist of training set IDs and the second should be a string for the fold ID (e.g., 1 through 5, A through D, etc.). If specified, the CV and grid search will leave one fold ID out at a time. [2]

learning_curve_cv_folds_list (Optional)

List of integers specifying the number of folds to use for cross-validation at each point of the learning curve (training size), one per learner. For example, if you specify the following learners: ["SVC", "LogisticRegression"], specifying [10, 100] as the value of learning_curve_cv_folds_list will tell SKLL to use 10 cross-validation folds at each point of the SVC curve and 100 cross-validation folds at each point of the logistic regression curve. Although more folds will generally yield more reliable results, smaller number of folds may be better for learners that are slow to train. Defaults to 10 for each learner.

learning_curve_train_sizes (Optional)

List of floats or integers representing relative or absolute numbers of training examples that will be used to generate the learning curve respectively. If the type is float, it is regarded as a fraction of the maximum size of the training set (that is determined by the selected validation method), i.e. it has to be within (0, 1]. Otherwise it is interpreted as absolute sizes of the training sets. Note that for classification the number of samples usually have to be big enough to contain at least one sample from each class. Defaults to [0.1, 0.325, 0.55, 0.775, 1.0].

custom_learner_path (Optional)

Path to a .py file that defines a custom learner. This file will be imported dynamically. This is only required if a custom learner is specified in the list of learners.

All Custom learners must implement the fit and predict methods. Custom classifiers must either (a) inherit from an existing scikit-learn classifier, or (b) inherit from both sklearn.base.BaseEstimator. and from sklearn.base.ClassifierMixin.

Similarly, Custom regressors must either (a) inherit from an existing scikit-learn regressor, or (b) inherit from both sklearn.base.BaseEstimator. and from sklearn.base.RegressorMixin.

Learners that require dense matrices should implement a method requires_dense that returns True.

sampler (Optional)

It performs a non-linear transformations of the input, which can serve as a basis for linear classification or other algorithms. Valid options are: Nystroem, RBFSampler, SkewedChi2Sampler, and AdditiveChi2Sampler. For additional information see the scikit-learn documentation.

sampler_parameters (Optional)

dict containing parameters you want to have fixed for the sampler. Any empty ones will be ignored (and the defaults will be used).

The default fixed parameters (beyond those that scikit-learn sets) are:

{'random_state': 123456789}
{'random_state': 123456789}
{'random_state': 123456789}

feature_hasher (Optional)

If “true”, this enables a high-speed, low-memory vectorizer that uses feature hashing for converting feature dictionaries into NumPy arrays instead of using a DictVectorizer. This flag will drastically reduce memory consumption for data sets with a large number of features. If enabled, the user should also specify the number of features in the hasher_features field. For additional information see the scikit-learn documentation.

hasher_features (Optional)

The number of features used by the FeatureHasher if the feature_hasher flag is enabled.


To avoid collisions, you should always use the power of two larger than the number of features in the data set for this setting. For example, if you had 17 features, you would want to set the flag to 32.

featureset_names (Optional)

Optional list of names for the feature sets. If omitted, then the prefixes will be munged together to make names.

fixed_parameters (Optional)

List of dicts containing parameters you want to have fixed for each learner in learners list. Any empty ones will be ignored (and the defaults will be used). If grid_search (Optional) is True, there is a potential for conflict with specified/default parameter grids and fixed parameters.

The default fixed parameters (beyond those that scikit-learn sets) are:

AdaBoostClassifier and AdaBoostRegressor
{'n_estimators': 500, 'random_state': 123456789}
DecisionTreeClassifier and DecisionTreeRegressor
{'random_state': 123456789}
{'random_state': 123456789}
{'random_state': 123456789}
GradientBoostingClassifier and GradientBoostingRegressor
{'n_estimators': 500, 'random_state': 123456789}
{'random_state': 123456789}
LinearSVC and LinearSVR
{'random_state': 123456789}
{'random_state': 123456789}
MLPClassifier and MLPRegressor:
{'learning_rate': 'invscaling', max_iter': 500}
RandomForestClassifier and RandomForestRegressor
{'n_estimators': 500, 'random_state': 123456789}
{'loss': 'squared_loss', 'random_state': 123456789}
Ridge and RidgeClassifier
{'random_state': 123456789}
{'cache_size': 1000}
{'loss': 'log', 'random_state': 123456789}
{'random_state': 123456789}
{'random_state': 123456789}


This option allows us to deal with imbalanced data sets by using the parameter class_weight for the classifiers: DecisionTreeClassifier, LogisticRegression, LinearSVC, RandomForestClassifier, RidgeClassifier, SGDClassifier, and SVC.

Two possible options are available. The first one is balanced, which automatically adjust weights inversely proportional to class frequencies, as shown in the following code:

{'class_weight': 'balanced'}

The second option allows you to assign a specific weight per each class. The default weight per class is 1. For example:

{'class_weight': {1: 10}}

Additional examples and information can be seen here.

feature_scaling (Optional)

Whether to scale features by their mean and/or their standard deviation. If you scale by mean, your data will automatically be converted to dense, so use caution when you have a very large dataset. Valid options are:

Perform no feature scaling at all.
Scale feature values by their standard deviation.
Center features by subtracting their mean.
Perform both centering and scaling.

Defaults to none.


grid_search (Optional)

Whether or not to perform grid search to find optimal parameters for classifier. Defaults to False. Note that for the learning_curve task, grid search is not allowed and setting it to True will generate a warning and be ignored.

grid_search_folds (Optional)

The number of folds to use for grid search. Defaults to 3.

grid_search_jobs (Optional)

Number of folds to run in parallel when using grid search. Defaults to number of grid search folds.

use_folds_file_for_grid_search (Optional)

Whether to use the specified folds_file for the inner grid-search cross-validation loop when task is set to cross_validate. Defaults to True.


This flag is ignored for all other tasks, including the train task where a specified folds_file is always used for the grid search.

min_feature_count (Optional)

The minimum number of examples for which the value of a feature must be nonzero to be included in the model. Defaults to 1.

objectives (Optional)

The objective functions to use for tuning. This is a list of one or more objective functions. Valid options are:


  • accuracy: Overall accuracy
  • precision: Precision
  • recall: Recall
  • f1: The default scikit-learn F1 score (F1 of the positive class for binary classification, or the weighted average F1 for multiclass classification)
  • f1_score_micro: Micro-averaged F1 score
  • f1_score_macro: Macro-averaged F1 score
  • f1_score_weighted: Weighted average F1 score
  • f1_score_least_frequent: F:1 score of the least frequent class. The least frequent class may vary from fold to fold for certain data distributions.
  • neg_log_loss: The negative of the classification log loss . Since scikit-learn recommends using negated loss functions as scorer functions, SKLL does the same for the sake of consistency. To use this as the objective, probability must be set to True.
  • average_precision: Area under PR curve (for binary classification)
  • roc_auc: Area under ROC curve (for binary classification)

Regression or classification with integer labels:

  • unweighted_kappa: Unweighted Cohen’s kappa (any floating point values are rounded to ints)
  • linear_weighted_kappa: Linear weighted kappa (any floating point values are rounded to ints)
  • quadratic_weighted_kappa: Quadratic weighted kappa (any floating point values are rounded to ints)
  • uwk_off_by_one: Same as unweighted_kappa, but all ranking differences are discounted by one. In other words, a ranking of 1 and a ranking of 2 would be considered equal.
  • lwk_off_by_one: Same as linear_weighted_kappa, but all ranking differences are discounted by one.
  • qwk_off_by_one: Same as quadratic_weighted_kappa, but all ranking differences are discounted by one.

Regression or classification with binary labels:


  • r2: R2
  • neg_mean_squared_error: The negative of the mean squared error regression loss. Since scikit-learn recommends using negated loss functions as scorer functions, SKLL does the same for the sake of consistency.

Defaults to ['f1_score_micro'].


  1. Using objective=x instead of objectives=['x'] is also acceptable, for backward-compatibility.
  2. Also see the metrics option below.

param_grids (Optional)

List of parameter grids to search for each learner. Each parameter grid should be a list of dictionaries mapping from strings to lists of parameter values. When you specify an empty list for a learner, the default parameter grid for that learner will be searched.

The default parameter grids for each learner are:

AdaBoostClassifier and AdaBoostRegressor
[{'learning_rate': [0.01, 0.1, 1.0, 10.0, 100.0]}]
[{'alpha_1': [1e-6, 1e-4, 1e-2, 1, 10],
  'alpha_2': [1e-6, 1e-4, 1e-2, 1, 10],
  'lambda_1': [1e-6, 1e-4, 1e-2, 1, 10],
  'lambda_2': [1e-6, 1e-4, 1e-2, 1, 10]}]
DecisionTreeClassifier and DecisionTreeRegressor
[{'max_features': ["auto", None]}]
[{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}]
GradientBoostingClassifier and GradientBoostingRegressor
[{'max_depth': [1, 3, 5]}]
[{'epsilon': [1.05, 1.35, 1.5, 2.0, 2.5, 5.0],
  'alpha': [1e-4, 1e-3, 1e-3, 1e-1, 1, 10, 100, 1000]}]
KNeighborsClassifier and KNeighborsRegressor
[{'n_neighbors': [1, 5, 10, 100],
  'weights': ['uniform', 'distance']}]
[{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}]
[{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}]
[{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}]
MLPClassifier and MLPRegressor:
[{'activation': ['logistic', 'tanh', 'relu'],
  'alpha': [1e-4, 1e-3, 1e-3, 1e-1, 1],
  'learning_rate_init': [0.001, 0.01, 0.1]}],
[{'alpha': [0.1, 0.25, 0.5, 0.75, 1.0]}]
RandomForestClassifier and RandomForestRegressor
[{'max_depth': [1, 5, 10, None]}]
Ridge and RidgeClassifier
[{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}]
SGDClassifier and SGDRegressor
[{'alpha': [0.000001, 0.00001, 0.0001, 0.001, 0.01],
  'penalty': ['l1', 'l2', 'elasticnet']}]
[{'C': [0.01, 0.1, 1.0, 10.0, 100.0],
  'gamma': ['auto', 0.01, 0.1, 1.0, 10.0, 100.0]}]
[{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}]


Note that learners not listed here do not have any default parameter grids in SKLL either because either there are no hyper-parameters to tune or decisions about which parameters to tune (and how) depend on the data being used for the experiment and are best left up to the user.

pos_label_str (Optional)

The string label for the positive class in the binary classification setting. If unspecified, an arbitrary class is picked.


probability (Optional)

Whether or not to output probabilities for each class instead of the most probable class for each instance. Only really makes a difference when storing predictions. Defaults to False. Note that this also applies to the tuning objective.

pipeline (Optional)

Whether or not the final learner object should contain a pipeline attribute that contains a scikit-learn Pipeline object composed of copies of each of the following steps of training the learner:

  • feature vectorization (vectorizer)
  • feature selection (selector)
  • feature sampling (sampler)
  • feature scaling (scaler)
  • main estimator (estimator)

The strings in the parentheses represent the name given to each step in the pipeline.

The goal of this attribute is to allow better interoperability between SKLL learner objects and scikit-learn. The user can train the model in SKLL and then further tweak or analyze the pipeline in scikit-learn, if needed. Each component of the pipeline is a (deep) copy of the component that was fit as part of the SKLL model training process. We use copies since we do not want the original SKLL model to be affected if the user modifies the components of the pipeline in scikit-learn space.

Here’s an example of how to use this attribute.

from sklearn.preprocessing import LabelEncoder

from skll import Learner
from import Reader

# train a classifier and a regressor using the SKLL API
fs1 = Reader.for_path('examples/iris/train/example_iris_features.jsonlines').read()
learner1 = Learner('LogisticRegression', pipeline=True)
_ = learner1.train(fs1, grid_search=True, grid_objective='f1_score_macro')

fs2 = Reader.for_path('examples/boston/train/example_boston_features.jsonlines').read()
learner2 = Learner('RescaledSVR', feature_scaling='both', pipeline=True)
_ = learner2.train(fs2, grid_search=True, grid_objective='pearson')

# now, we can explore the stored pipelines in sklearn space
enc = LabelEncoder().fit(fs1.labels)

# first, the classifier
D1 = {"f0": 6.1, "f1": 2.8, "f2": 4.7, "f3": 1.2}
pipeline1 = learner1.pipeline

# then, the regressor
D2 = {"f0": 0.09178, "f1": 0.0, "f2": 4.05, "f3": 0.0, "f4": 0.51, "f5": 6.416, "f6": 84.1, "f7": 2.6463, "f8": 5.0, "f9": 296.0, "f10": 16.6, "f11": 395.5, "f12": 9.04}
pipeline2 = learner2.pipeline

# note that without the `pipeline` attribute, one would have to
# do the following for D1, which is much less readable


  1. When using a DictVectorizer in SKLL along with feature_scaling set to either with_mean or both, the sparse attribute of the vectorizer stage in the pipeline is set to False since centering requires dense arrays.
  2. When feature hashing is used (via a FeatureHasher ) in SKLL along with feature_scaling set to either with_mean or both , a custom pipeline stage (skll.learner.Densifier) is inserted in the pipeline between the feature vectorization (here, hashing) stage and the feature scaling stage. This is necessary since a FeatureHasher does not have a sparse attribute to turn off – it only returns sparse vectors.
  3. A Densifier is also inserted in the pipeline when using a SkewedChi2Sampler for feature sampling since this sampler requires dense input and cannot be made to work with sparse arrays.

results (Optional)

Directory to store result files in. If omitted, the current working directory is used.

metrics (Optional)

For the evaluate and cross_validate tasks, this is a list of additional metrics that will be computed in addition to the tuning objectives and added to the results files. For the learning_curve task, this will be the list of metrics for which the learning curves will be plotted. Can take all of the same functions as those available for the tuning objectives.


  1. For learning curves, metrics can be specified instead of objectives since both serve the same purpose. If both are specified, objectives will be ignored.
  2. For the evaluate and cross_validate tasks, any functions that are specified in both metrics and objectives are assumed to be the latter.
  3. If you just want to use neg_log_loss as an additional metric, you do not need to set probability to True. That’s only neeeded for neg_log_loss to be used as a tuning objective.

log (Optional)

Directory to store log files in. If omitted, the current working directory is used.

models (Optional)

Directory to store trained models in. Can be omitted to not store models.

predictions (Optional)

Directory to store prediction files in. Can be omitted to not store predictions.


You can use the same directory for results, log, models, and predictions.

save_cv_folds (Optional)

Whether to save the folds that were used for a cross-validation experiment to a CSV file named EXPERIMENT_skll_fold_ids.csv in the results (Optional) directory, where EXPERIMENT refers to the experiment_name. Defaults to False.

Using run_experiment

Once you have created the configuration file for your experiment, you can usually just get your experiment started by running run_experiment CONFIGFILE. [3] That said, there are a few options that are specified via command-line arguments instead of in the configuration file:

-a <num_features>, --ablation <num_features>

Runs an ablation study where repeated experiments are conducted with the specified number of feature files in each featureset in the configuration file held out. For example, if you have three feature files (A, B, and C) in your featureset and you specifiy --ablation 1, there will be three experiments conducted with the following featuresets: [[A, B], [B, C], [A, C]]. Additionally, since every ablation experiment includes a run with all the features as a baseline, the following featureset will also be run: [[A, B, C]].

If you would like to try all possible combinations of feature files, you can use the run_experiment --ablation_all option instead.


Ablation will not work if you specify a train_file and test_file since no featuresets are defined in that scenario.

-A, --ablation_all

Runs an ablation study where repeated experiments are conducted with all combinations of feature files in each featureset.


This can create a huge number of jobs, so please use with caution.

-k, --keep-models

If trained models already exist for any of the learner/featureset combinations in your configuration file, just load those models and do not retrain/overwrite them.

-r, --resume

If result files already exist for an experiment, do not overwrite them. This is very useful when doing a large ablation experiment and part of it crashes.

-v, --verbose

Print more status information. For every additional time this flag is specified, output gets more verbose.


Show program’s version number and exit.

GridMap options

If you have GridMap installed, run_experiment will automatically schedule jobs on your DRMAA- compatible cluster. You can use the following options to customize this behavior.

-l, --local

Run jobs locally instead of using the cluster. [4]

-q <queue>, --queue <queue>

Use this queue for GridMap. (default: all.q)

-m <machines>, --machines <machines>

Comma-separated list of machines to add to GridMap’s whitelist. If not specified, all available machines are used.


Full names must be specified, (e.g.,

Output files

For most of the tasks, the result, log, model, and prediction files generated by run_experiment will all share the automatically generated prefix EXPERIMENT_FEATURESET_LEARNER_OBJECTIVE, where the following definitions hold:

The name specified as experiment_name in the configuration file.
The feature set we’re training on joined with “+”.
The learner the current results/model/etc. was generated using.
The objective function the current results/model/etc. was generated using.

However, if objectives contains only one objective function, the result, log, model, and prediction files will share the prefix EXPERIMENT_FEATURESET_LEARNER. For backward-compatibility, the same applies when a single objective is specified using objective=x.

In addition to the above log files that are specific to each “job” (a specific combination of featuresets, learners, and objectives specified in the configuration file), SKLL also produces a single, top level “experiment” log file with only EXPERIMENT as the prefix. While the job-level log files contain messages that pertain to the specific characteristics of the job, the experiment-level log file will contain logging messages that pertain to the overall experiment and configuration file. The messages in the log files are in the following format:


where TIMESTAMP refers to the exact time when the message was logged, LEVEL refers to the level of the logging message (e.g., INFO, WARNING, etc.), and MSG is the actual content of the message. All of the messages are also printed to the console in addition to being saved in the job-level log files and the experiment-level log file.

For every experiment you run, there will also be a result summary file generated that is a tab-delimited file summarizing the results for each learner-featureset combination you have in your configuration file. It is named EXPERIMENT_summary.tsv. For learning_curve experiments, this summary file will contain training set sizes and the averaged scores for all combinations of featuresets, learners, and objectives.

If seaborn is available when running a learning_curve experiment, actual learning curves are also generated as PNG files - one for each feature set specified in the configuration file. Each PNG file is named EXPERIMENT_FEATURESET.png and contains a faceted learning curve plot for the featureset with objective functions on rows and learners on columns. Here’s an example of such a plot.

If you didn’t have seaborn available when running the learning curve experiment, you can always generate the plots later from the learning curve summary file using the plot_learning_curves utility script.



[1]We are considering adding support for YAML configuration files in the future, but we have not added this functionality yet.
[2]K-1 folds will be used for grid search within CV, so there should be at least 3 fold IDs.
[3]If you installed SKLL via pip on macOS, you might get an error when using run_experiment to generate learning curves. To get around this, add MPLBACKEND=Agg before the run_experiment command and re-run.
[4]This will happen automatically if GridMap cannot be imported.