Assignment 1

(updates will be posted on Piazza).

Task: Propose a research idea related to one of the readings below and execute a pilot empirical study using one of the listed datasets. Most crucial to is that (a) your idea is interesting, and (b) your pilot empirical study demonstrates that you can quickly evaluate feasibility and estimate the chances of an interesting result.

It is neither required nor expected that your proposal for this assignment will relate to your final course project.

Please strive to post your initial ideas well in advance of the actual due date (a suggested goal: Tuesday Aug. 28, 11:59pm) to (a) give time to your classmates to read your proposal and post feedback; (b) since you are encouraged to work in groups, early posting will facilitate linking up with classmates having similar interests.

After posting your proposal, continue to monitor and participate on the course discussion site. After all, your classmates have read the same papers and are using the same data, so we have a lot of common ground. Example things to post: feedback on other people's proposals; some oddity of the datasets you've found that is worth alerting others to; unexpected early results that are interesting or that you need help interpreting.

Basically, I would like us all to act as a team; we're all in this together!

The two required readings

  1. Excerpts from anaesthetica's “Attacked from within”, 2009.
  2. Justine Zhang, Ravi Kumar, Sujith Ravi, and Cristian Danescu-Niculescu-Mizil, 2016. Conversational flow in Oxford-style debates. NAACL, pp.136–141.

These readings were chosen because they are thought-provoking, accessible, short, and together represent a wide range of possibilities.

The two datasets — you are required to use one.

  1. Cornell ChangeMyView data, November 2016 version
  2. Reddit coarse discourse dataset

    Data format

    If you program in Python3, you are strongly engouraged to transform the dataset you are working with into ConvoKit format. This will allow you to (a) directly use the ConvoKit functionality; (b) share code with (future) teammates and other groups; (c) contribute to ConvoKit.

Collaboration

Teamwork is encouraged. Groups of any size can be formed, where each group jointly submits a single project report at the end on the official course management system, CMS. However, each individual remains individually responsible for posting feedback on other people's/group's proposals.

There are further notes on how to find/work as a group below.

Due dates All deadlines refer to 5:00pm unless otherwise specified.

  1. Monday Aug. 27:
    1. Enroll on the course Piazza page http://piazza.com/cornell/Fall2018/cs6742. The piazza password will be provided on the first day of class (and later listed in the CMS class description).
  2. Wednesday Aug. 29, 2:30pm (Note the earlier-than-5pm deadline, and, as mentioned in the "Task" description above, aim for an earlier date of Tuesday Aug. 28, 11:59pm):

  3. Friday August 31: form groups on CMS. CMS group formation requires invitations and acceptance of invitations via the system, i.e., action by two people per person added; please check the official CMS documentation or this more graphically-oriented guide for instructions. need the group information from CMS to schedule the group presentations.
  4. Thursday Sept. 6, before class:
    1. Check back on Piazza for any comments on your proposal, and add, as replies, any suggestions you have on other people's proposals. Ideally, you will continually monitor the site for updates to your or other people's proposals.
    2. Be prepared to informally discuss in class how things are going. For example, any preliminary observations about the data? No formal presentation materials are required.
  5. Monday Sept. 17: Submit a project report on CMS. One group = one CMS submission: any group member can upload a version, which will overwrite any previous versions by any other members of the group.
    Required information: (a) the overall research problem you proposed; (b) relation of your research problem to the reading(s) (this description should provide evidence that you read the relevant parts of the readings carefully); (c) proposed techniques; steps employed to process/clean/select data; (d) results (probably preliminary, possibly negative); (e) what you learned; (f) a list of the roles that each member of the group played, if there is more than one person in your group. (g) If you collaborated a bit with people outside your group, acknowledge those other people by name and explain their contribution in the writeup.
  6. Thursday Sept. 20, in class: Group presentations. You can bring handouts (often most effective for discussions, since people can refer to things out of order) or project slides off a laptop. If the latter, bring a spare copy of your presentation on a flash drive and email a copy.

Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.

Certain points deserve emphasis here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See "Acknowledging the Work of Others" in The Essential Guide to Academic Integrity at Cornell and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.

This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment “only” risks grade penalties.