Kilian Q. Weinberger

Associate Professor of Computer Science


Machine Learning Seminar


When: Wednesdays 2-4pm
Where: Jolley 508
Who: Everybody is welcome (It is recommended that you have already taken cse517a (ML) and/or cse511a (A.I.))

Format: Our current format is that at each meeting a team of two students presents a full lecture on a specific topic of their choice.





Students should try to follow the following guideline for their lecture:

Introduce the learning problem:  (10 minutes - limited or no equations)
(This is very important and currently nobody spends enough time on it.)
Data and Task: 
- What is the data? [e.g. Vectorial data, text, images, time-series,….]
- What is the goal? [e.g. Regression, classification, modeling, …]
- Draw a figure if possible. 
- Why is it hard? [e.g. Noise, low sample size, missing features, …]
- What is specific about this problem setting?  [e.g. Focus on very high dimensions, data arrives instance by instance, you can query an oracle, labels are noisy, …]
- What assumptions do you make? [e.g. do you have the test-data during training time?, you are able to store the training data, …]

Example Application: 
- Provide several intuitive example application [E.g. predicting stock-market, doctor-patient diagnosis, predicting topics for wikipedia articles, ...]
- Choose one primary example which you will carry throughout the entire presentation as running example.  [Choose one that is most intuitive, ideally one for which you have data, so that you can use it in your demo]
- Show a pre-demo? It can be helpful to start with an illustration of the setting and what you want to achieve.

Mathematical background: (15 minutes)
Setting/ Notation:
- Introduce your notation (follow ML standards!!!) [e.g. x=data, y=label, n=#training points, d=#dimensions, \Sigma=covariance matrix, …]
- Relate each piece of notation with your example  [e.g. here x is a wikipedia entry, y is the topic, each feature is a word, ….]
Baseline:
- What is the simple and naive but surprisingly successful approach (there always is one -- always) [e.g. Random Forests, TF-IDF, locally weighted regression, OLS, …]
- Explain this approach so that everybody gets it  [i.e. don't assume any prior knowledge]
- Draw a figure if possible.
- Relate the simple approach to your running example
{Everybody should follow you up to this point. If they fall asleep now, they already got their money worth: They understood an important problem and a simple yet effective method to solve it. }
Limitation:
- Explain when and where the simple yet effective baseline fails 
- Relate this limitation to your running example application (i.e. give an intuitive example where it fails)
[e.g. if a wikipedia articles is too short tf-idf similarities are bad, for handwritten digits the decision boundary is non-linear and OLS fails, ....]

Meat and Potatoes: (30 minutes)
High Level:
- Introduce your elegant solution to this problem on a high level [do not introduce any new equations yet]
- Draw a figure that explains this high-level concept. 
- Relate this solution to your running example. [e.g. "Now we make OLS non-linear by applying a linear classifier locally", "we represent articles as the topics which are associated with its words - so short articles obtain a similarly rich representation as long ones.", …] 
- Show how this intuition solves the problem in your particular example. 
The big gun:
- Introduce your elegant solution to this problem in terms of mathematics. 
- Relate each step to your running example. [e.g. The covariance matrix captures the interaction between articles, the dirichlet distribution returns mixing coefficients over wikipedia topics ….]
- Do not assume that the audience knows anything about this topic. 
- Do NOT skip steps even if they seem trivial. You rather want to have 8 people in the audience who think the intermediate steps are obvious than loose 4 because they got stuck on some silly notational thing. 
Awesomeness:
- All algorithms that we talk about are absolutely awesome. Explain why this one is so incredible. Explain why people in the audience should name their first-born daughter/son after this algorithm.
- What is so elegant about it? [e.g. Does it converge to the exact solution as n-> infinity, can it be computed in closed-form, does it combine two seemingly unrelated topics, …]
Limitations: 
- What is the price of the awesomeness [e.g. is it a lot slower, does it require a lot of memory, …]

Demo (10 minutes):
- Showcase your awesome demo.
- Do not download someone else's code, the expectation is that you implement the algorithm yourself. [almost all ML algorithms are less than 20 lines of Matlab]
- Find some interesting aspects about the algorithm, and highlight them with specific demos. [e.g. I really liked the Gaussian Process Demo from today]



General advice:
- Imagine the target audience is yourself, on your first day of your PhD. Remember how little you knew?
- Practice your talk several times by yourself or with your partner
- When your partner practices, ask questions about everything that is unclear to you. Make sure both of you gain a deep understanding of the material. 
- If there is anything that is unclear DO NOT hush over it - instead dig deep, this is one of the rare opportunities to gain a profound insight
- Don't be scared to admit that you didn't understand something. It is better to figure something out as a group than to confuse people. 
- Your IQ will drop by 50% once you are in front of the blackboard. Do not plan to do anything on the fly. 
- Do not think that other people know everything. Most likely you are the student with the most knowledge about this topic in the room. 
- Do NOT assume that people pay attention all the time or remember what you said 5 minutes ago. As a rule of thumb: At any given time at least one person is daydreaming about Justin Bieber or Lady Gaga. 
- There is nothing wrong with being over-repetitive. Never say "and here we plug x into f and obtain y". Always say "Here we plug the feature vector x into the classifier f and obtain a prediction y". 
- Be aware that people write down what you say. There is a time-lag. Do not erase what you just wrote. Give people a few minutes to catch-up with you before you explain something important. 
- Less is more. My rule is 4 single-sided pages of handwritten notes easily fills 90 minutes. My handwriting is pretty big. 
- If you are worried about running out of time, plan ahead what you could skip. Do not explain things faster.
- It is very hard to be too obvious. If you include many simple steps you only make people feel smart about themselves. If you skip them, you'll make some people give up and start daydreaming. 
- Nobody can absorb new material for longer than 4 minutes (try it out - listen to a new piece of classical music that you've never heard before and try to really pay attention  - after 3 and 1/2 minutes you'll find yourself reading the news or searching wikipedia for random facts, like what currency people use in Somalia - I just checked, it is the Somali Shilling)
- Explain everything three times. Once when you write it down. Wait until people copy it. Explain it again with different words. Then relate it to your example and explain it in the specific terms of your running example. 
- Ask people if there are questions. 
- Good talks are layered. Explain things several times on different levels. E.g. One for the experts in terms of pure mathematics and one that is purely intuitive for those who are happy with only the big picture.