CS PhD student Vlad Niculae, a member of the scikit-learn core team, gave an invited tutorial at the 2016 ADSA Data Summit, speaking on scikit-learn with applications to textual data, and on principles of open-source collaboration and library design, and how they apply to writing research code. The summit placed "emphasis on Big Data and Data Science, and was organized and run by the Association of Data Science and Analytics and the students at the University of Illinois at Urbana-Champaign."

NLP-centric scikit-learn tutorial: https://github.com/vene/adsa_uiuc_sklearn_tutorial

PowerPoint deck: https://vene.ro/talks/LessonsLearned.pdf