Open source library for machine learning in Python.

Pandas DataFrame Output for sklearn Transformers

2022-11-08 less than 1 minute read

Author:

Video

Upcoming feature in release 1.2

Starting with the next release of scikit-learn (v1.2), pandas dataframe output will be available for all sklearn transformers! This will make running pipelines on dataframes much easier and provide better ways to track feature names. Previously, mapping a transformed output back into columns would be cumbersome as it might not be a one-to-one mapping in cases of complex preprocessing (e.g., polynomial features).

The pandas dataframe output feature for transformers solves this by tracking features generated from pipelines automatically. The transformer output format can be configured explictly for either numpy or pandas output formats as shown in sklearn.set_config and the sample code below.

from sklearn import set_config
set_config(transform_output = "pandas")

See the sample notebook, pandas-dataframe-output-for-sklearn-transformer.ipynb and documentation for a more detailed example and usage.

Links to documentation and example notebook

Reporting bugs

We’d love your feedback on this. In case of any suggestions or bugs, please report them at scikit-learn issues

Thanks 🙏🏾 to maintainers: Thomas J. Fan, Guillaume Lemaitre , Christian Lorentzen !!

Share on

LinkedIn Bluesky Mastodon Facebook

You May Also Enjoy

scikit-learn release 1.9: better numerics, new core functionality

2026-06-12 3 minute read

Author: Gael Varoquaux

Update on array API adoption in scikit-learn

2026-03-05 10 minute read

Author: Lucy Liu Note: this blog post is a cross-post of a Quansight Labs blog post.

Enhancing user experience through interactive inspection

2026-01-06 2 minute read

Author: Dea María Léon

Interview with Virgil Chan, scikit-learn Team Member

2025-11-26 5 minute read

Author: Reshama Shaikh , Virgil Chan