3 minute read

Yao Xiao recently earned his undergraduate degree in mathematics and computer science. He will be pursuing a Master’s degree in Computational Science and Engineering at Harvard SEAS. Yao joined the scikit-learn team in February 2024.

  1. Tell us about yourself.

    My name is Yao Xiao and I live in Shanghai, China. At the time of interview I have just got my Bachelor’s degree in Honors Mathematics and Computer Science at NYU Shanghai, and I’m going to pursue a Master’s degree in Computational Science and Engineering at Harvard SEAS. My current research interests are in networks and systems (e.g. sys4ml and ml4sys), but this may change in the future.

  2. How did you first become involved in open source and scikit-learn?

    In my junior year I took a course at NYU Courant called Open Source Software Development where we needed to make contributions to an open source software as our final project - and I chose scikit-learn.

  3. We would love to learn of your open source journey.

    I was lucky to get involved in a pretty easy meta-issue when I first started contributing to scikit-learn. I made quite a few PRs towards that issue, familiarizing myself with the coding standards, contributing workflow etc., and during which I gradually explored the codebase and learned a lot from maintainers how to write better code. After that meta-issue was completed, I decided to continue contributing since I enjoyed the experience, and I started looking through the open issues, tried reproducing and investigating them, then opened PRs for those that I was able to solve. It is the process of familiarizing with more parts of the codebase, being able to make more PRs, so on and so forth. While contributing to scikit-learn, sometimes there are also issues to solve upstream, so I also had opportunities to contribute to projects like pandas and pydata-sphinx-theme. Up till today I’m still far from familiar with the entire scikit-learn project, but I will definitely continue the amazing open-source journey.

  4. To which OSS projects and communities do you contribute?

    I have contributed to scikit-learn, pandas, pydata-sphinx-theme, sphinx-gallery. I’m also writing some small softwares that I decide to make open source.

  5. What do you find alluring about OSS?

    It is amazing to feel that my code is being used by so many people all around the world through contributing to open source projects. Well it might be inappropriate to say “my code”, but I do feel like making some actual contributions to the community instead of just writing code for myself. Also OSS makes me care about code quality and so on instead of merely making things “work”, which is very important for programmers but not really taught in school.

  6. What pain points do you observe in community-led OSS?

    Collaboration can lead to better code but also slows down the development process. Especially when there are not enough reviewers around, issues and PRs can easily get stale or forgotten. But I would say it’s more like a tradeoff rather than a pain point.

  7. If we discuss how far OS has evolved in 10 years, what would you like to see happen?

    I couldn’t say about the past 10 years since I’ve only been involved for about one and a half years, but regarding the scientific Python ecosystem I would like to see better coordination across projects (which is already happening). For instance a common interface for array libraries and dataframe libraries would allow downstream dependents to easily provide more flexible support for different input/output types, etc. And as a Chinese I would also hope that open source can thrive in my country some day as well.

  8. What are your favorite resources, books, courses, conferences, etc?

    As for physical books I would recommend The Pragmatic Programmer by Andy Hunt and Dave Thomas, and Refactoring: Improving the Design of Existing Code by Martin Fowler and Kent Back. As for courses I like MIT’s The Missing Semester of Your CS Education. In particular about learning Python, The Python Tutorial in the official Python documentation is good enough for me. By the way I want to mention that documentations of most languages and popular packages are very nice and they are the best place to learn the most up-to-date information.

  9. What are your hobbies, outside of work and open source?

    I would say my largest hobby is programming (not for school, not for work, just for fun). I’ve recently been fascinated with Tauri and wrote a lot of small desktop applications for myself in my spare time. Apart from this I also love playing the piano and I’m an anime lover, so I often listen to or play piano versions of anime theme songs (mostly arranged by Animenz).