Skip to main content

Overview

CS-5152: Open-Source Software Engineering This class is about learning software engineering, especially as employed by the open-source community, through a hands-on experience with mentorship, guidance, and peers. Each student will work in a team on an established code base from an active open-source project using the guidance of an industry mentor from that project. This class is not about "open source" as an entity in and of itself, though; we do not cover aspects of open source like its history, philosophy, or legal complexities (such as licensing).

Teams Teams and projects will be decided before the semester begins. They are usually in the range of 4 to 8 students working together with one or two industry mentors.

Kickoff Hackathon The Kickoff Hackathon will kick off the projects by putting students in face-to-face contact with their project mentors from industry. All students are required to attend. The Kickoff Hackathon will be the weekend of February 9-10. The Kickoff Hackathon will not be an overnight endeavor; it will start after breakfast, include lunch, and end before dinner.

Projects Students will rank the available projects in order of interest, and a matching process will be run to determine which project a student is assigned to. So far, students have always been able to get on one of their top 3 choices. Team rosters will be settled prior to class enrollment.

Grading 75% of a student's grade will be determined by contribution to the code base. The industry mentor will provide the majority of this evaluation. The remaining 25% of a student's grade will be determined by class participation and deliverables. There will be no final exam, but there will be final deliverables such as short papers and presentations to be done at the scheduled final time.

Weekly Meetings Every week students will video conference with their entire team, including their mentor, for 30 minutes hopefully during lab hours (some teams may need to meet outside of class due to external constraints). These meetings should quickly review what has been done, what problems people are hitting, and what should be accomplished in the next week. In order to keep meeting times flexible so that we may meet the constraints of others on the projects, students must be able to attend all lab times, at least until a meeting time has been decided. Every student needs to quickly meet with the professor to explain what they are working on and to check that they are keeping on track.

Lecture W 11:15-12:05 in Upson 216
Lab MF 11:15-12:05 in Upson 216
Units 4
Final Sunday, May 12th at 9am-noon in Upson 142

Instructor Ross Tate
Office 434 Gates Hall
Office Hours by appointment

Announcements

Resources

Application Process CLOSED

You will not be able to enroll in this class directly, since enrollment is tentative on being admitted. Please enroll in classes as if this class were unavailable, since it is easier to drop classes than add them, and once admitted into CS 5152 your enrollment is guaranteed.

Projects

GeoWave-Kudu

Mentor: Rich Fecher

Team Size: 5

Description: GeoWave is a software library that connects the scalability of distributed computing frameworks and key-value stores with modern geospatial software to store, retrieve and analyze massive geospatial datasets. While the core toolkit is generally applicable to multi-dimensional use cases, GeoWave has focused on tailored extensions to support spatial types and operators, with or without temporal timestamps or time ranges. Additionally, it provides advanced features to leverage a distributed backend for visualization or analysis. The software is intended to be easily pluggable into any sorted key-value store, and its modular design is intended to enable feature extension into various geospatial toolkits.

Apache Kudu is a popular open source distributed data store. It is a modern column-oriented data store built for fast analytics, and it can benefit from GeoWave's ability to index multi-dimensonal datasets, which is the underpinning of geospatial or spatio-temporal indexing. This effort will focus on creating a data store for Apache Kudu similar to GeoWave's existing Accumulo, HBase, Cassandra, DynamoDB, Redis, and BigTable data store extensions.

GeoWave-PyGW

Mentor: Michael Whitby

Team Size: 5

Description: GeoWave is a software library that connects the scalability of distributed computing frameworks and key-value stores with modern geospatial software to store, retrieve and analyze massive geospatial datasets. While the core toolkit is generally applicable to multi-dimensional use cases, GeoWave has focused on tailored extensions to support spatial types and operators, with or without temporal timestamps or time ranges. Additionally, it provides advanced features to leverage a distributed backend for visualization or analysis. The software is intended to be easily pluggable into any sorted key-value store, and its modular design is intended to enable feature extension into various geospatial toolkits.

GeoWave's user base is often python developers such as those within the Data Science community. Providing good Python bindings for the GeoWave programmatic API would greatly benefit this community. GeoWave has a clear Java API and this project is to provide a clean interface from Python into this API.

Exercism

Mentor: Peter Tseng (a CS-5152 alum)

Team Size: 4-5

Summary: Exercism offers coding practice and mentorship for everyone. It does this by offering exercises that people can complete and submit for review from a mentor. Exercism mentors discuss the solution with the student and guide them through Exercism’s language tracks. This project focuses on reducing the workload for Exercism mentors

Description:Your group will choose one or two language tracks and create a static analysis tool that can analyze submissions for the first few exercises in those track(s). The analysis will then determine whether a given submission is ready for approval (doesn’t need further attention from a mentor) and/or suggest to mentors a list of possible things they could give feedback on.

For example, for Python some possible suggestions might look like:

  • prefer x is None rather than x == None
  • Do you see a way you might be able simplify if some_condition: return True else: return False?
  • Instead of using a for loop and manually appending to an array each time, see if there is a way to express what you want using a list comprehension

We can provide five years’ worth of submissions to help you test and tune your analysis tool. We can also provide previous feedback that mentors have given, which can be used to identify common patterns/problems that are good candidates for identification by static analysis.

Skills: You will learn about:

  • Static analysis for your chosen language(s)
  • Common idioms for your chosen language(s)
  • Insight about how to do code review

Pyret

Mentors: Joe Politz, Ben Lerner, Shriram Krishnamurthi

Team Size: 4

Summary: Make Pyret available offline

Description: code.pyret.org is an IDE for the Pyret programming language. It runs entirely online in stock web browsers and saves users' work to their Google Drive. This is a huge win for deploying in schools with access only to Chromebooks and iPads, and where the educational default is to have Google Apps for Education accounts for all users. It's also convenient for a many users to not need installation who can rely on a Web-based service.

However, as use of the IDE grows, we find more and more users who want or need an option that runs offline and/or doesn't rely on cloud-based storage. These users range from undergraduates and instructors with busy travel schedules who want to work on planes, to schools with data privacy policies that preclude the use of cloud-based storage (especially in light of policies like GDPR), to those in environments with unreliable WiFi or fickle content-based firewalls.

For these users, we'd like to provide an offline solution. Much of the functionality of code.pyret.org makes heavy use of the flexible visual layout and rich content afforded by browsers. In addition, the Pyret compiler emits JavaScript and requires either a browser or a NodeJS-like environment to run. As a result, we think using tools like Electron or NW.js to bring the environment offline while maintaining much of the existing logic for rendering and evaluation will be a productive route.

The other substantial part of the project involves providing offline support for features typically available only online. For example, we'd like users to be able to save code to their local disk rather than to Google Drive. There is also provided functionality at the language level that assumes a network connection:

  • Users can import other modules by referring to their Google Drive ID, which is used to provide starter code for programming assignments.
  • A function called `image-url` takes an address of a jpeg or png image and loads it as a value in the language.
  • An API for loading and working with Google Spreadsheets.

These may be able to work naturally on the desktop when the network is available, but for the offline case new or replacement APIs (e.g. adapting the same operations to load images from disk, or CSV versions of Drive spreadsheets), or some form of offline caching could make them still functional when the network isn't present.

Minimum viable product: An installable version of code.pyret.org that opens on the desktop and saves files to disk, with all of the IDEs other features that don't require internet connectivity working.

Extensions:

  • Identify features that require a network connection and provide alternatives as transparently as possible for using them offline.
  • Allow a user to choose between file-based storage and linking their Google Account in the desktop application.

Skills:

WordPress

Mentor: George Stephanis

Team Size: 6

Summary: WordPress Development for Jetpack and Gutenberg

Description: WordPress powers about one out of every three websites on the internet today, and Jetpack extends the cloud infrastructure and functionality of WordPress.com to self-hosted sites. Gutenberg is the new React-powered content editing interface for WordPress, shipping in WordPress 5.0.

In the course of this project we will learn how to build in the WordPress ecosystem, contribute to WordPress Core, Plugins such as Jetpack, and Themes, manage unit tests, and acclimate to new systems and codebases - especially those that need to run on legacy systems as well.

Skills: Moderate PHP and Javascript. Depending on level, we can focus more in the Javascript side or the PHP side, but there should be a passing familiarity with both. React is a bonus.

App Inventor

Mentor: Evan Patton

Team Size: 4-5

Description: MIT App Inventor is a web-based platform for building mobile applications with little to no coding experience. It is primarily used by educators to teach computer science principles to students, but is also used by people of all ages to publish their own mobile apps. Last year, over 8.5 million active users built apps with MIT App Inventor in over 190 countries and in 13 different languages. There are a number of opportunities for students to work on improving App Inventor, with a focus on improving stability, performance, usability, and accessibility of the platform. Team members will collaborate with MIT staff and students as part of our development process to make contributions to the platform.

GitHub URL: https://github.com/mit-cml/appinventor-sources

Skills: For students working on improvements to the mobile components, knowledge of Java and Android are important. For students working on improvements to the web interface, a familiarity with HTML, CSS, and JavaScript is important. Much of the interface is written in Java as well using the Google Web Toolkit, so knowledge of Java is also useful, although not necessarily required. Experience with Git and a Git-based development workflow are a plus, but not required. You will have the opportunity to learn any/all of these technologies as part of working on App Inventor.