3 minute read

Author: Author IconGael Varoquaux

Scikit-learn 1.9 release is out, and it comes with solid improvements to many existing estimators, making them faster, more stable, handling missing values, adding GPU support… The release also enhances the estimator displays in notebooks, and introduces a callback mechanism that opens the door to progress bars or advanced monitoring of convergence.

Improvement: Richer HTML views

The improvements that most will easily view are those on the estimator displays.
Since recent versions, the estimator views displayed in notebooks can show the parameters of the estimators (revealed by clicking on the estimator name). Latest release adds a view of the fitted attribute, as visible below:

HTML displays showing the fitted attributes
HTML display in a notebook, with the fitted attributes visible.

In addition, the ColumnTransformer’s view has been enhanced to help the user understand how features are assembled:

HTML displays showing the output feature names
HTML display in a notebook, with the output feature names visible.

A new promise: Callbacks

Scikit-learn 1.9 comes with a callback mechanism, currently experimental. We spent a lot of time designing it so that it enables many different uses: nested tracking of progress (even in parallel computing) on a variety of measures, early stopping…
The release notes come with an example on how to use these callbacks to monitor scores and to display progress barsa more advance monitoring example.

The logging and progress-bar callbacks in a notebook
The logging and progress-bar callbacks in a notebook.

As of today, the callbacks are implemented in logistic regression (with LBFGS solver), the various *SearchCV objects, Pipeline, StandardScaler. The next releases will progressively add callbacks in more and more estimators (this is a place where contributors can help).

Improved statistics and numerics

As users, what we like about scikit-learn routines is that they tend to be “fire-and-forget”, because they reliably run on a huge diversity of inputs: sparse inputs, missing values, different data types. This diversity of inputs is compounded by a diversity of modeling choices: different losses, sample weights…

Each release of scikit-learn extends the toolbox, sometimes by completing the combinatorial of options and data types rather than adding new estimators. Release 1.9 was a real consolidation in this respect:

Tree-based models

  • Native missing-value support in RandomForestRegressor when minimizing the absolute error criterion

  • Support of missing-values for tree-based models monotonic constraints

  • Improved the statistical correctness of fitting with sample weights in HistGradientBoosting, RandomForest and ExtraTree (having exact support of sample weights in complex pipelines is challenging)

Linear models

  • Logistic regression can use natively float32, thus removing memory pressure

  • Multi-Task linear models support fitting on sparse X and sample weights

  • More stable and faster RidgeCV and RidgeClassifierCV

  • Gap safe screening of features for very fast fitting of sparse squared-loss regressors

Other models

Staying up to date with the ecosystem

Scikit-learn can now return sparse arrays, rather than sparse matrices. Indeed, scipy is slowly de-emphasizing sparse matrices, which often surprise users with their behavior that depart from arrays.

Increasing support for GPU

Scikit-learn is increasingly gaining support for optimized compute backends, which enables it, for instance, to run on GPUs. The challenge (and the value) is the incredible diversity of estimators and usecases supported by scikit-learn, and the package is progressively adding backend support in more and more places.
In the 1.9 release, the major features to gain GPU support were:

See the docs for all details on how to use the compute backends and which estimator support them.

PS: the user experience is currently not as good as with the default compute backend (numpy). But adding and improving GPU support (with the “array API”) is a good place for talented volunteers to help move the project forward.

Acknowledgements

Scikit-learn is the work of many contributors, with people volunteering their time as well as financial sponsors – see the funding page.