CS4740/LING4744/COGST4740/CS5740 Fall 2022

The links below (some in bold) are ordered by what we think you'll be checking most often through the semester first! Hence, information unlikely to change throughout the semester, such as policies, are later in the list.

  1. The course schedule, with links to lecture and assignment materials.

  2. We will post some announcements in the lecture slides, and others (when timely broadcasting to the class is needed), on Ed Stem (Discussions). Please set your Ed notifications appropriately.

  3. Need to reach us? Consult our office hours and staff contact info.

  4. Resources

  5. Policies:

    1. Prerequisites: This course is not only an introduction to natural language processing, but also satisfies the practicum/project requirement for CS majors, and the coursework is designed with that in mind. Hence, Fall 2022 has the following prerequisites:
      1. Strong programming skills are important. Three semesters of programming classes are recommended. CS2110 suffices if you individually could have completed the assignments by yourself.
      2. Python experience required. Some Python courses at Cornell are listed here: https://www.cs.cornell.edu/courses/cs1110/2022sp/alternatives.html
      3. Elementary probability and familiarity with differentiation. Depending on topic coverage (TBD), linear algebra may also be required.
    2. Workload/grading:

      Four programming assignments (with possible partial milestones) (should be done in pairs, individual is OK) (each is expected to take tens of hours, but this time is distributed over multiple weeks) = 17% each; one evening midterm = 16%; one in-person final = 16%;
      but, since the exams test conceptual, individual-level knowledge, to receive a C- or above in the course, students must receive at least a C- on both exams.

      1. For each "4740" piece of graded coursework, we make preliminary score-to-grade conversions. We do not use the same absolute cutoffs across different assignments/exams for grade-levels, so as to adjust for the difficulty of the exams and assignments each semester. Also, because you are not in competition with each other, student course grades are not dependent on how other students do. We do not report medians or means: as a wise former course-staff member said, "Reporting the median is guaranteed to make at least half the class feel bad", even if everyone did well.
      2. Students enrolled in CS5740 complete an additional component for each 4740 homework, to be done individually. Scores on these components are converted to "satisfactory", "borderline", and "unsatisfactory". If a student receives two "borderline"s or one "unsatisfactory" among the four homeworks, we reserve the right to lower the student's letter grade as computed for 4740 by the equivalent of a "level", for example, from a B to a B-.
      3. Regrade requests: Communication regarding regrade requests must be done only in writing via Gradescope/CMS (as appropriate): given the number of staff members involved in handling regrade requests, we need records of all discussions.
        We want to give grades that accurately represent our assessment of your understanding of the course material. Hence, if you are given a lower score than you should have been, you should absolutely bring it to our attention via the mechanisms just described. However, we must explicitly mention an additional consequence of the importance of grade accuracy: if we notice that you have been assigned more points than you should have been, we are duty-bound to correct such scores downward to the correct value.
    3. Deadlines policy: We do not have slip days, and there is no "you can submit late for a small penalty": you need to hit the submission deadlines. But if there are extenuating circumstances, please email and we can talk. (Still submit what you have before the deadline, so we have an indication of your progress at that point.)
    4. Exam conflicts: we will open a CMS "quiz" to submit make-up exam requests roughly two weeks before the exams. We cannot determine too far ahead of time the dates/locations, in part because there are many classes with currently-unknown demand that are looking to reserve rooms.
    5. SDS accommodations: The instructor(s) have online access to SDS letters regarding accommodations for exams and other course matters, and will honor these accommodations. As recommended by the SDS office, we do ask that for each homework/exam, you let us know beforehand in a timely fashion whether you wish to apply your accommodations. For homeworks, email to suffices; for exams, you should use the aforementioned pseudo-"assignment" to register.
    6. Collaboration and preserving academic integrity:
      1. Groups (a.k.a. "teams" or "partners") of two are allowed on all assignments except for CS5740 add-on assignments. You can partner with anyone in the class (regardless of whether they or you are registered for a grad or undergrad version of the class), but we strongly suggest that you let potential partners know (1) whether you are taking the course for letter grade or S/U, (2) what your preferred working hours are (morning, afternoon, night). You do not have to have the same partner on each assignment.

        If you want to use git with your partner, you should use the Cornell COECIS GitHub, which allows you to create a private repository. Other versions of GitHub may make your private repository public without your knowledge.

        If there is a need for a "group divorce" (some work was done jointly but the two of you no longer wish to work together), please contact for further instructions.

      2. Below, "you" means you and, if there is one, your official group partner.

        Until all students' submissions' grades for the assignment have been posted (in case there are people with extensions and makeups) ...

        (1) You must never look at, access or possess any portion of another group's program(s) in any form. (This includes lines of code written on a whiteboard, lines of code described verbally.)

        (2) You must never show or share any portion of your program(s) in any form to anyone except a member of the course staff. As a consequence, do not post any part of your programs to Ed Discussions. (Posting error messages that contain snippets of code is OK.)

        (3) You must not ask for or copy solutions from outside sources (such as StackOverflow or code autogenerators).

        (4) You should specifically acknowledge by name all help you received, whether or not it was "legal" according to rules (1)-(3) above. This is also known as "citing your sources". Exception : you do not need to acknowledge the course staff (although we appreciate it if you do!).
        Example: in an assignment file, the header could read "Sources/people consulted: discussed strategy for process_strings() with Claire Cardie and Hakim Weatherspoon".

        Of particular note:
        • The minimum penalty in this course for receiving unauthorized help (upon a guilty finding for an academic integrity violation): besides the mandated letter to the student's college, a negative score on the affected work (this is more than just a grade deduction, where one might retain some points). Hence, a student who submits fraudulent work receives less credit than a student who didn't turn the work in at all .
        • The minimum penalty for giving unauthorized help (upon a guilty finding for an academic integrity violation): the mandated letter to the student's college. Please don't put your friends at risk by asking them for unauthorized help .
        • We plan to use software-similarity checkers for each assignment.

        If you turn in someone else's work for course credit, and forthrightly acknowledge you are doing so, you are not acting dishonestly and are not violating academic integrity, but that also does not show us you have learned anything. Thus, you may not receive grading credit, but you would not undergo academic integrity hearings. If, on the other hand, you violate academic integrity by claiming someone else's work as yours or by giving unauthorized help, then the academic integrity hearing process will be triggered, which can incur both grade penalties and storage of records by your College. For more on Cornell's policies and procedures, see the Dean of the Faculty's Academic Integrity Website.

  6. Course description: This course constitutes an introduction to natural language processing (NLP), the goal of which is to enable computers to use human languages as input, output, or both. NLP is at the heart of many of today's most exciting technological achievements, including machine translation, automatic conversational assistants and Internet search. Possible topics include methods for handling underlying linguistic phenomena (e.g., syntactic analysis, word sense disambiguation and discourse analysis) and vital emerging applications (e.g., machine translation, sentiment analysis, summarization and information extraction).

Acknowledgments: the collaboration policies are drawn from those posted for CS1110 Spring 2022. Image from https://unsplash.com/photos/AmLssHPF58k.