CS 1380 + ORIE 1380 + STSCI 1380
Spring 2021


Course Description

This course provides an introduction to data science. Given data from economics, medicine, biology, or physics, collected from internet denizens, survey respondents, or wireless sensors, how can one understand the phenomenon generating the data, make predictions, and improve decisions? We focus on building skills in inferential thinking and computational thinking, guided by the practical questions we seek to answer. The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. We will also consider social issues in data analysis such as privacy and design.


Prerequisites

This course has no prerequisites beyond high-school algebra. It is designed specifically for students who have never written a computer program, and who have never taken a CS, ORIE, or statistics course. Students with some prior experience in either computing or statistics are welcome to enroll, though some parts of the course will be slow. Students who have already taken both programming and statistics courses should pass over 1380 and opt for a more advanced course, such as INFO 2950, INFO 3300, CS 4780, ORIE 4740, ORIE 4741, or STSCI 4060.


Communication

The course website is http://www.cs.cornell.edu/courses/cs1380/, which can also be reached by going to http://cornell-dsfa.org.

Your primary point of contact in the course is your recitation TA. Please speak with him or her about any issues involving technical support, grading, or logistics.

We use Ed Discussions for announcements. You are responsible for reading all announcements and pinned posts made on Ed Discussions.

Ed Discussions can also be used for Q&A. We will treat Piazza as a virtual study group attended by all the students in the course. Your questions will primarily be answered by other students. Thus we will capitalize on the greatest strength of Ed Discussions: the ability to collaborate with and benefit from your peers.

To encourage collaboration, instructors will allow students to provide the (first) answer to a question. Students who demonstrate exceptional ability to answer their peers’ questions on Ed Discussions will be rewarded for their mastery of material and dedication to the course by a small bonus to their final grade. If the initial student answer seems insufficient, begin a follow-up and collaborate to improve the student answer. Posting of assignment solutions or hints is not permitted and could violate Academic Integrity.

Ed Discussions is best used to ask questions whose answers will be useful to many students. If you have questions that are likely to be specific just to yourself — especially if they fall in the category of technical support or debugging — we ask you to take those questions to office hours. But if you can phrase the questions in a way such that they are of broad interest, you are likely to find other students thanking you for them and answering them quickly.

Do not post questions about grading, grades, or regrades on Ed Discussions.

Emailing with instructors:

  • The best way to communicate with instructors is in person, after class, or during office hours. We ask that you always try to use this option first. In-person meetings are more efficient, more effective, and more fun.

  • If it becomes necessary to send email to the professors, assume that it will take some time to receive a reply. If you need a reply sooner, you always have the opportunity to ask questions in person after lecture and during office hours.


Course materials

The course textbook is Computational and Inferential Thinking: The Foundations of Data Science, by Ani Adhikari and John DeNero. The book is freely available online. We have, with the permission of the authors, created our own local adaptation of the book. Be sure to use our version, not any other version you might find online.

We’ll be using Poll Everywhere to take polls in class. Signing in as someone else to answer poll questions is a violation of Academic Integrity.

There is no custom software that you need to install for this course. All that is required is a computer and a web browser. You will need to purchase access to Vocareum, which is the computing platform we will use for course labs, homework assignments, and projects. The fee is $30 for the semester.

You will need to bring a laptop with you to your recitation section, so that you can work with your classmates and the TAs on lab assignments during that time. If you do not have access to a reliable laptop or tablet, the Cornell library even provides laptops that you can check out for short periods of time. If you have trouble getting access to a laptop for the labs, please discuss with the professors.

If your laptop ever malfunctions, there’s no need to be concerned about your work in this course: all of it is saved in Vocareum, and you can access it from any machine. There are many computer labs on campus, and the Cornell library even provides laptops that you can check out for short periods of time.


Participation

Participation in lecture will be recorded by Poll Everywhere. We ask that you silence and put away all mobile devices, so that you do not distract other students or yourself.

There are no excused absences: what we are tracking is whether you are present and participating, not whether you had a reason to be absent. So if you need to be absent because of illness, personal, employment, or academic reasons, there is no need to notify your instructors. Similarly, if Poll Everywhere glitches on a day, we regret that the instructor software doesn’t enable us to record that you were nonetheless present, so it won’t help to notify us.

Class participation will be worth only a small part of your final grade, typically around 1%. That’s enough to make a difference in borderline cases, but not enough for you to be worried about any particular absence you might have. As a guideline, don’t miss more than about a week’s worth of class meetings, and you’ll be fine. Participation points begin counting toward your final grade on the day of the Add Deadline. Before that we’ll use Poll Everywhere, but it won’t count toward your grade.


Labs

Labs are assignments containing programming and free-response questions designed to reinforce the material recently covered in lecture. There will be approximately ten labs. You may skip one lab during the semester without penalty.

The primary way to complete a lab is to attend your recitation section, work on it collaboratively with your classmates and TAs, and individually submit your own work. As long as you are at the recitation and working on the lab, you will receive full credit, even if you don’t manage to answer all the questions.

To receive credit, the recitation section you attend must be the section for which you are registered. We wish we could be more flexible on that, but it’s going to be necessary to fairly balance the number of students in each recitation. Also, your recitation TA will be the primary grader for all your assignments, so it is in your best interest to get to know them and to see them regularly.

Alternatively, you may also attempt an early completion of the lab on your own. Labs will be posted by Monday nights. If you finish 100% of the lab and submit it by Tuesday at 11:59 pm, you will receive full credit for the lab, and you will not need to attend your recitation section that week. Of course, if you try that and don’t manage to get 100% completion, you are still welcome to come to recitation and continue working there—in which case you’ll get full credit, regardless.


Homework

Homeworks are assignments that are similar in format to labs, except that you will work on them by yourself outside of recitation. You are permitted to discuss the homework problems with other students, but your final submission must be your own work. There will be approximately eight homeworks. Your lowest homework score will be dropped.

To encourage you to complete homeworks in a timely manner, there will usually be a small bonus granted to students who submit their homework at least 24 hours before the deadline.


Projects

Projects are assignments that are similar to both labs and homeworks, except that they are larger in scope and involve more creativity on your part. Projects are designed to give you experience with the kind of analysis a real-world data scientist would undertake. You are permitted to have a partner for each project. That partner must be from your recitation section, in part because there will be recitations devoted to project work. You are free to switch partners between projects. There will be approximately three projects.


Late policy and extensions

We will allocate each student some number of ‘slip days’ that can be used during the semester in order to turn in homework or projects up to two days after the stated deadline. Please use these carefully; they are intended to cover all circumstances under which you might want to turn in homework late, including such things as large workloads in other classes, extra curricular activities, and the like. We will not grant extensions for such things; instead, we will tell you to use your slip days.

Here are a few words of caution about late submissions:

  • The deadline for an assignment is not the time by which you must finish writing a solution; rather, the deadline is the time by which you must successfully submit your solution file in Vocareum. We recommend that you submit your file at least one hour before the deadline.

  • You must submit your work through Vocareum; email submissions, whether late or on time, will be deleted without response.

  • Vocareum enables you to upload as many times as you wish before submissions are closed. Only the most recent version will be graded. Requests to have the course staff grade earlier versions (with or without a penalty) will be denied.

  • It is your responsibility to verify before the deadline that you have submitted the intended versions of your files. Requests to substitute another version (e.g., “I accidentally submitted the wrong files”) will be denied.


Exams

There will be two preliminary exams and a final exam. Exams will cover material from lectures, textbook readings, labs, homeworks, and projects. The final exam is cumulative.

Exams will occur at the dates and times published by the Registrar. Unless you have accommodations as determined by the university or previous permission from the professors, you must take the exams at the published dates and times. For exam accommodations, contact the course administrator, Beatrix Johnson (bj11@cornell.edu).

If you cannot attend an exam because of health or family crises, or similar life events, you may ask the professors for permission to be excused from the exam. Once you enter the examination room you may no longer ask permission to be excused. If you miss a prelim, and we excused you, we modify the grading formula to compute your grade with just the other prelim. It is unlikely anyone would ever be excused from the final. You must ask us for permission in advance: don’t just skip exams and assume you can be excused retroactively. If you miss an exam and we didn’t excuse you, you get a zero for the missing score.


Regrades and Appeals

In our experience, exceptionally few regrade requests would actually make a difference in the final course grade. So rather than obsessing over regrades, we’d prefer that you spend your time doing well on the next assignment. The course staff will go to great lengths to help you understand the course content. We are considerably less enthusiastic about discussing how what you submitted could be stretched to seem correct.

Regrades. If there is something you don’t understand about your grading, your primary point of contact is your recitation TA. You should feel free to ask them in person for clarifications or for advice on how to improve your work. But the grade on your solution and/or changes to your grade are “out of bounds” for discussion. If during the course of your discussion the grader realizes they might have made a mistake, they will volunteer to take a second look outside your meeting, fix any mistake they discover, and change your grade up or down accordingly. You are free to point out grading mistakes of a purely arithmetic nature, which the grader will happily fix immediately.

Appeals. If, after discussing your solution with the grader, you still disagree with your grade, you may appeal. Appeals are intended to correct serious errors in grading, not to dispute judgment calls made by graders. Graders do sometimes take off a little too much, but they just as often give a little too much. If you decide that a serious mistake was made in grading your assignment, then we would be happy to fix it. Here is the process:

  1. Schedule a meeting with a PhD TA. Present your appeal to them.

  2. The PhD TA will, in consultation with the professors, consider the merits of your appeal, and make a decision about a grade change. The grade on your assignment might increase, decrease, or remain unchanged as a result of the additional scrutiny the regrade request engenders.

The deadline for making an appeal is seven days after the original grade was released. Appeals made after that will be denied without consideration of their merits.

Resist the temptation to use appeals as a means to fish for a better grade. Here are two words of caution:

  • Any appeal that we perceive to be specious will inspire increased rigor in rechecking your submission, and that often leads to a grade reduction.

  • We track all appeals that are submitted throughout the semester. When we determine final course grades, we look carefully at students who are near the cutoffs between letter grades to see whether any extra consideration is warranted for adjusting their grade up or down. Abuse of appeals will factor heavily into this extra consideration.

We sincerely regret having to enforce this policy, but “grade grubbing” is a serious problem leading to an unjustifiable amount of work. We would rather be generous at the end of the semester in determining final grades, rather than debate half-point deductions throughout the semester.


Academic Integrity

Absolute integrity is expected of every Cornell student in all academic undertakings. If you are unsure about what is permissible and what is not, please ask.

We encourage you to discuss your work with your friends and classmates. You will definitely learn more in this class if you work with others than if you do not. Ask questions, answer questions, and share ideas liberally.

Cooperation has limits, however, as set forth in these university, departmental, and course policies:

Integrity includes you being honest about the sources of the work you submit. When you submit work in this course, you are representing it as the work of the stated authors, subject to any exceptions that are clearly stated in the submission itself. To avoid committing plagiarism, simply be sure always to accurately credit your sources. To do otherwise is to commit fraud by claiming credit for the ideas and efforts of others, and that is punishable under the Code of Academic Integrity. Penalties for violation of Academic Integrity may be severe, ranging from a zero grade for an assignment or exam, up to expulsion from the University for a second offense.

Grades, on the other hand, are about the course staff assessing what you have learned. If you turn in someone else’s work for course credit, and forthrightly acknowledge you are doing so, you are not acting dishonestly and are not violating academic integrity. But you also give us no basis for concluding that you have learned the course content. So you are likely to receive a grade penalty, but it will be less severe than if you had failed to cite the other person.

We recommend the following rule of thumb: Never look at any other student’s solutions (including source code), or have their solutions in your possession in any portion or form whatsoever. Once you have seen another solution, it becomes impossible to unsee and is likely to infect your own. Likewise, never share your solutions with other students. That includes not writing code together at a whiteboard: even if you erase it and later write code separately at a computer, you are likely to write similar code that could be flagged as a potential violation.

You are always free to use code presented in this class in lecture, or on this semester’s course website. It does not require citation. Any other code, however, at least requires citation, and it could result in a grade deduction.


Grading

We expect the breakdown for the final course grade to be as follows:

  • Labs: 10%
  • Homeworks: 20%
  • Projects: 27%
  • Prelim 1: 10%
  • Prelim 2: 10%
  • Final exam: 20%
  • Other factors: 3%

These weights are approximate. We reserve the right to change them later. We will not publish details about how scores translate to letter grades.

Labs, homeworks, and projects are weighted equally within their category unless otherwise specified. “Other factors” include participation as measured by Poll Everywhere, submission of course evaluations, participation in any surveys we might hold, and excellence in answering other students’ questions on Ed Discussions.

Sometimes students ask whether the final grade is curved. The answer is that it depends on what you mean by “curved.” Any mapping from numeric scores to a letter grade implicitly defines some kind of curve. But we will not give out a fixed percentage of A’s, B’s, etc. In fact, we would be delighted to give a high grade to all students who complete all assignments and show mastery of the material on the exams.