General Information | Course Description | Textbooks | Grading | Ed Discussions | Homework | Course Conduct | Accommodations

General Information

Instructor: Bobby Kleinberg, Gates Hall 317, email

Office Hours: Mondays 2:30–3:30pm; Wednesdays 10–11am

Zoom link — https://cornell.zoom.us/j/99763995306?pwd=N2F0Yksyd1htRFVhT1ZSK2tiYnRNQT09).

Lectures: MWF 9:05am–9:55am

Zoom link — https://cornell.zoom.us/j/92674518945?pwd=QWl3VlZCblhpL1J0RnJkNDhldU9YZz09.

Instructions for phone access here.

Lectures will be given live during this time and you are encouraged to attend. Lectures will also be recorded and available for asynchronous viewing. There will be links to the video+audio and audio-only for each lecture on the schedule page, along with any notes or handouts associated with that lecture. The raw recordings will be available soon after the lecture, to be replaced by edited and closed-captioned versions as soon as they can be processed.

Zoom etiquette If you attend the live lecture, please enable your camera. You may ask questions at any time during the lecture. Type your question in the chat window. A staff member will be monitoring the chat window and will either respond with an answer in the chat or will interrupt the instructor at an appropriate moment.


Course Description

Probability and high-dimensional geometry have become valuable tools in the analysis of algorithms. This course will explore the mathematics that lies behind designing and analyzing algorithms for massive, high-dimensional, and often random, data. Topics to be covered include dimensionality reduction for high-dimensional data; random walks and Markov Chain Monte Carlo algorithms; randomized algorithms and data structures for hashing, data sketching, and data stream processing; random graphs; the probabilistic method; and algorithms for detecting sparse or low-rank structures in data.

Textbooks

Foundations of Data Science The textbook for the course is Foundations of Data Science by Avrim Blum, John Hopcroft, and Ravi Kannan. Although this book was designed for this course, there will be topics covered in lecture that are not in the text and there will be topics in the text that are not covered in lecture. You are responsible for topics covered in lecture and for any assigned reading in the text.

This course is enrolled in the Instant Access Program, providing digital textbooks or publisher courseware directly in Canvas. You have access to your digital course materials directly in Canvas once the course opens. You have a trial period to decide whether to continue accessing the digital course materials or opt-out.

Instant Access is an opt-out program for required course materials. You may have the option to opt-in for optional course materials. You can access required course materials without charge until the opt-out deadline of February 14, 2022. After this date, you will be bursar billed for continued use of the eBook unless you have opted-out.

To Opt-Out of required course materials: If you do not wish to have access to the eBook or courseware through Instant Access, you may opt-out by clicking the Instant Access - VitalSource tool in the Canvas course's navigation panel, then OPT-OUT. If you opt-out, you will lose access to the digital course materials and it may be your responsibility to acquire the appropriate course materials on your own to satisfy course requirements.

For Instant Access assistance, please contact instantaccess@cornell.edu.

The following (optional) books are also useful references.

Prerequisites

The prerequisites for the course are CS 2800 and 2110, and a linear algebra course at the level of MATH 2210/2940 or above. CS 4820 is highly recommended but not required. A prior course in probability (such as ENGRD 2700, MATH 4710, ECE 3100, BTRY 3080, ECON 3130) is also recommended but not required.

Grading

For CS 4850, your grade will be based on seven homework assignments, four in-class quizzes, one final exam, and completion of a course evaluation. For CS 5850, your grade will be based on all of the above, plus a final project.

Each of these components will be given a weight in the following ranges:

Component4850 range5850 range
Homework 40% to 60%35% to 50%
Quizzes 15% to 40% 15% to 35%
Final Exam 0% to 35% 0% to 30%
Final Project 0% 15% to 30%
Completion of course evaluation: 1%1%
For each student, we will determine weights in these ranges that add up to 100% and result in the best grade. If the score computed above exceeds 55%, you are guaranteed a passing grade.

Ed Discussions

We will be using Ed Discussions as an online discussion forum. Ed Discussions allows for open discussions of all course-related questions. You are encouraged to post any questions you might have about the course material. The course staff monitor Ed Discussions closely and you will usually get a quick response. If you know the answer to a question, you are encouraged to post it. Posting questions or answers that are endorsed by TAs or instructors can improve your participation grade.

By default, your posts are visible to the course staff and other students, and you should prefer this mode so that others can benefit from your question and the answer. However, you can post privately so that only the course staff can see your question, and you should do so if your post might reveal information about a solution to a homework problem. You can also post anonymously if you wish. If you post privately, we reserve the right to make your question public if we think the class will benefit.

Ed Discussions is the most effective way to communicate with course staff. Please avoid email if Ed Discussions will do. Broadcast messages from the course staff to students will be sent using Ed Discussions and all course announcements will be posted there and pinned, so check the pinned announcements often!

Homework

Homework is an important part of the course. We will have homework assignments every week or two. All homework assignments will be posted on CMS. Most homework assignments will be due on Wednesday at 11:59pm.

Typesetting

We will require problem sets to be typeset and submitted as a PDF. This requirement is for everyone's benefit. In general, we recommend that you first develop your solutions in draft form, and then write or type your solution in a concise way. Typesetting not only makes the last step essential (instead of handing in solution in draft form), it also makes it much easier for you to edit and improve your writeup, as well as easier for your TAs to read your proofs. It is up to you which tool you use; though we recommend LaTeX, tools like the Equation Editor in Microsoft Word can be surprisingly effective as an alternative. See typesetting resources for a list of typesetting software and references.

For some proofs or writeups, it may be helpful to use a figure to explain your thinking more concisely. This is encouraged! Again, it is up to you how you want to include that in your writeup, whether it is a picture of a drawing in your notebook that you took with your phone or something you made digitally, as long as the figure was produced by you personally and is clear enough to see, it's a great idea to include it.

Collaboration

In the real world of algorithms research, collaboration and conversation is an important part of how ideas get generated. So too in this course; we encourage you to work in groups of up to three peers on the homework. For homework questions that require you to write a piece of code, every member of the group can submit the same code. However, for homework questions that require a written solution to a problem, your solution must be written up completely on your own. Except for problems that involve writing code, you are not allowed to share digital or written notes or images of your work in any form with each other. Just like in research, your work must also include acknowledgements of all students with whom you collaborated. Both the physical or digital distribution of information about solutions and the failure to acknowledge collaborators are serious violations of academic integrity.

Admissible Resources

For the homework, it is not admissible to use resources beyond course material and student discussions. In particular, you may not use Wikipedia, or search the Web, or look at any textbook, other than the ones assigned/recommended in the course. Using such additional resources is a violation of academic integrity. If you feel the resources available to you are insufficient, talk to course staff or ask questions on Ed Discussions.

Advice for Success

Algorithms assignments can often require creative insights and complex proofs beyond what previous courses have required. Here are a few tips for succeeding in your writeups:

Course Material Copyright

Course materials posted on this website, Ed Discussions, or CMS, are intellectual property belonging to the author. Students are not permitted to post course materials on the web, share them on any online platform, buy, sell, or distribute them outside of Cornell without the express permission of the instructor. Such unauthorized behavior constitutes academic misconduct.

Course Conduct

We understand that our members represent a rich variety of backgrounds and perspectives. Cornell University is committed to providing an atmosphere for learning that respects diversity. We expect students to communicate in a respectful manner with the instructors, course staff, and fellow students, in a way the honors the unique experiences, values, and beliefs represented by different members of our community.

Academic Integrity

Any violation of academic integrity will be severely penalized. You are allowed to collaborate on the homework to the extent of formulating ideas as a group. However, you are expected to write up (and understand) the homework on your own, and you should acknowledge the names of the students with whom you collaborated.

From Cornell's code of academic integrity:

Absolute integrity is expected of every Cornell student in all academic undertakings. Integrity entails a firm adherence to a set of values, and the values most essential to an academic community are grounded on the concept of honesty with respect to the intellectual efforts of oneself and others. Academic integrity is expected not only in formal coursework situations, but in all University relationships and interactions connected to the educational process, including the use of University resources. […]

A Cornell student's submission of work for academic credit indicates that the work is the student's own. All outside assistance should be acknowledged, and the student's academic position truthfully reported at all times. In addition, Cornell students have a right to expect academic integrity from each of their peers.

Inclusiveness

You should expect and demand to be treated by your classmates and the course staff with respect. You belong here, and we are here to help you learn and enjoy this course. If any incident occurs that challenges this commitment to a supportive and inclusive environment, please let the instructors know so that the issue can be addressed. We are personally committed to this, and subscribe to the Computer Science Department’s Values of Inclusion.

Accommodations

This course complies with the Cornell University policy and equal access laws to ensure that students with disabilities can still participate fully in this course. Requests for academic accommodations should be made during the first three weeks of the semester, except for unusual circumstances, so arrangements can be made as soon as possible. Students are encouraged to register with Student Disability Services, as we may require verification of eligibility to provide appropriate accommodations.