.. _contributing: Contributing ============ Thank you for your interest in contributing to SKLL! We welcome any and all contributions. Guidelines ---------- The SKLL contribution guidelines can be found in our Github repository `here `__. We strongly encourage all SKLL contributions to follow these guidelines. SKLL Code Overview ------------------ This section will help you get oriented with the SKLL codebase by describing how it is organized, the various SKLL entry points into the code, and what the general code flow looks like for each entry point. Organization ~~~~~~~~~~~~ The main Python code for the SKLL package lives inside the ``skll`` sub-directory of the repository. It contains the following files and sub-directories: - `config/ `__ : Code to parse SKLL experiment configuration files. - `experiments/ `__ : Code that is related to creating and running SKLL experiments. It also contains code that collects the various evaluation metrics and predictions for each SKLL experiment and writes them out to disk. - `learner/ `__ : Code for the `Learner `__ and `VotingLearner `__ classes. The former is instantiated for all learner names specified in the experiment configuration file *except* ``VotingClassifier`` and ``VotingRegressor`` for which the latter is instantiated instead. - `metrics.py `__ : Code for any custom metrics that are not in ``sklearn.metrics``, e.g., ``kappa``, ``kendall_tau``, ``spearman``, etc. This module also contains the code that powers :ref:`user-defined custom metrics `. - `data/ `__ - `__init__.py `__ : Code used to initialize the ``skll.data`` Python package. - `featureset.py `__ : Code for the ``FeatureSet`` class metadata for a given set of instances. - `readers.py `__ : Code for classes that can read various file formats and create ``FeatureSet`` objects from them. - `writers.py `__ : Code for classes that can write ``FeatureSet`` objects to files on disk in various formats. - `dict_vectorizer.py `__ : Code for a ``DictVectorizer`` class that subclasses ``sklearn.feature_extraction.DictVectorizer`` to add an ``__eq__()`` method that we need for vectorizer equality. - `utils/ `__ : Code for different utility scripts, functions, and classes used throughout SKLL. The most important ones are the command line scripts in the ``utils.commandline`` submodule. - `compute_eval_from_predictions.py `__ : See `documentation `__. - `filter_features.py `__ : See `documentation `__. - `generate_predictions.py `__ : See `documentation `__. - `join_features.py `__ : See `documentation `__. - `plot_learning_curves.py `__ : See `documentation `__. - `print_model_weights.py `__ : See `documentation `__. - `run_experiment.py `__ : See `documentation `__. - `skll_convert.py `__ : See `documentation `__. - `summarize_results.py `__ : See `documentation `__. - `version.py `__ : Code to define the SKLL version. Only changed for new releases. - `tests/ `__ - ``test_*.py`` : These files contain the code for the unit tests and regression tests. Entry Points & Workflow ~~~~~~~~~~~~~~~~~~~~~~~ There are three main entry points into the SKLL codebase: 1. **Experiment configuration files**. The primary way to interact with SKLL is by writing configuration files and then passing it to the `run_experiment `__ script. When you run the command ``run_experiment ``, the following happens (at a high level): - the configuration file is handed off to the `run_configuration() `__ function in ``experiments.py``. - a `SKLLConfigParser `__ object is instantiated from ``config.py`` that parses all of the relevant fields out of the given configuration file. - the configuration fields are then passed to the `_classify_featureset() `__ function in ``experiments.py`` which instantiates the learners (using code from ``learner.py``), the featuresets (using code from ``reader.py`` & ``featureset.py``), and runs the experiments, collects the results, and writes them out to disk. 2. **SKLL API**. Another way to interact with SKLL is via the SKLL API directly in your Python code rather than using configuration files. For example, you could use the `Learner.from_file() `__ or `VotingLearner.from_file() `__ methods to load saved models of those types from disk and make predictions on new data. The documentation for the SKLL API can be found `here `__. 3. **Utility scripts**. The scripts listed in the section above under ``utils`` are also entry points into the SKLL code. These scripts are convenient wrappers that use the SKLL API for commonly used tasks, e.g., generating predictions on new data from an already trained model.