Lucy Liu joined the scikit-learn Team in September 2020. In this interview, learn more about Lucy’s journey through open source, from rstats to scikit-learn.
Tell us about yourself.
My name is Lucy, I grew up in New Zealand and I am culturally Chinese. I currently live in Australia and work for Quansight labs.
How did you first become involved in open source?
I first discovered open source when I started a research Masters, after finding my clinical Optometry job unfulfilling. I loved learning to program but was initially not game enough to contribute as I was only a beginner. After my masters, while working as a bioinformatician, I wrote some R packages for analysis of niche biomedical data and put them on github. My first contribution to an existing open source project was later when I worked at INRIA (French National Institute for Research in Digital Science and Technology) alongside the INRIA scikit-learn core developers. They helped me put up my first pull request and I have been contributing ever since!
How did you get involved in scikit-learn? Can you share a few of the pull requests to scikit-learn that resonate with you?
I’m very interested in statistics and code so I was super keen to contribute to scikit-learn. Being relatively a beginner in both areas I started by contributing to documentation, then bug fixes and features. My first PR to scikit-learn was submitted in October 2019 to improve the multiclass classification documentation. I have contributed the most to the calibration module in scikit-learn (including refactoring CalibratedClassifierCV), which has been very interesting and useful for when I later worked on post-processing of weather forecasts at the Bureau of Meteorology in Australia.
Reference: Lucy’s list of pull requests
To which OSS projects and communities do you contribute?
I contribute to Sphinx-Gallery and scikit-learn. Sphinx-Gallery was a great introduction to open source for me as it is a small package that does not get a large number of issues and pull requests (unlike scikit-learn!).
What do you find alluring about OSS?
I think the ability to see the source code and contribute back to the project are the best parts. If there is a feature you are interested in you can suggest and add it yourself, all the while learning from code reviews during the process!
What pain points do you observe in community-led OSS?
I think some of the positive aspects of the OSS community can also lead to pain. While it is great that you are able to get many different perspectives from people of various backgrounds, it also makes forming a consensus more difficult, slow progress. People from any geographical location can work together asynchronously but this can also mean people work in their own silos, making it difficult to have a cohesive direction for the project. Large projects also have a difficult learning curve, making it difficult for new contributors and contributors interested in becoming core-developers. The latter is the problem if the project lacks core-developer time for project maintenance and reviewing PRs.
If we discuss how far OS has evolved in 10 years, what would you like to see happen?
Some system that enables continuity of funding, which can combine funds from public and private sources. This would enable long term planning of OS projects and give developers more job stability. Better coordination between projects within the same area (e.g., scientific Python) would allow a better experience for users using Python for their projects.
What are your favorite resources, books, courses, conferences, etc?
I also love the YouTube channel statquest, which explains statistical concepts in a very accessible manner and introduces videos with a jingle - what more could you want?
What are your hobbies, outside of work and open source?
I love cycling and feel strongly about designing cities for people instead of cars. I also enjoy rock climbing (indoors and outdoors), though sadly have not had much time for this recently.