Schedule

Below is the tentative schedule for the course. The schedule is subject to change at any time.

Date Topic Presenter
  SE Basics  
25-Aug-25 Intro and Course Details Saikat
27-Aug-25 Program Analysis 1 Saikat
1-Sep-25 Labor Day NO CLASS  
3-Sep-25 Program Analysis 2 Saikat
8-Sep-25 Software Testing Saikat
10-Sep-25 Debugging Saikat
  LLM Basics  
15-Sep-25 ML Models: Intro Saikat
17-Sep-25 LLMs for Code (CodeBert/T5/CodeLlama) Project Proposal Due
  Primary: CodeBERT: A Pre-Trained Model for Programming and Natural Languages  
  Secondary: AST-T5: Structure-Aware Pretraining for Code Generation and Understanding  
  CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation  
  Code Llama: Open Foundation Models for Code  
22-Sep-25 Post-Training LLM Adaptation  
  Primary: Training language models to follow instructions with human feedback  
  Secondary: Direct Preference Optimization: Your Language Model is Secretly a Reward Model  
  SelfCodeAlign: Self-Alignment for Code Generation  
24-Sep-25 Fine-Tuning  
  LoRA: Low-Rank Adaptation of Large Language Models  
  QLoRA: Efficient Finetuning of Quantized LLMs  
29-Sep-25 Proposal Presentations  
1-Oct-25 Evaluating LLMs  
  Primary: Evaluating Large Language Models Trained on Code  
  Secondary: Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation  
  ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation  
  CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution  
  ML4SE  
6-Oct-25 Fuzzing with LLMs  
  Primary: Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models  
  Secondary: Large Language Model assisted Hybrid Fuzzing  
  Automated Unit Test Improvement using Large Language Models at Meta  
  {FuzzGuard}: Filtering out unreachable inputs in directed grey-box fuzzing through deep learning  
  Can Large Language Models Write Good Property-Based Tests?  
8-Oct-25 Program Repair with LLMs and Agents  
  Primary: AutoCodeRover: Autonomous Program Improvement  
  Secondary: AGENTLESS: Demystifying LLM-based Software Engineering Agents  
  Swe-bench: Can language models resolve real-world github issues?  
15-Oct-25 Verification  
  Primary: Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification  
  Secondary: Baldur: Whole-proof generation and repair with large language models  
20-Oct-25 Security  
  Primary: Large language models for code: Security hardening and adversarial testing  
  Secondary: NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness  
22-Oct-25 Test Generation MidTerm Report Due
  Primary: Learning Deep Semantics for Test Completion  
  Secondary: Generating Exceptional Behavior Tests with Reasoning Augmented Large Language Models  
27-Oct-25 Code Translation  
  Primary: AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation  
  Secondary: Scalable, Validated Code Translation of Entire Projects using Large Language Models  
  VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners  
SE4ML    
29-Oct-25 Test Oracle Generation  
  Primary: Toga: A neural method for test oracle generation  
  Secondary: On learning meaningful assert statements for unit test cases  
3-Nov-25 Code Generation  
  Primary: Monitor-guided decoding of code LMs with static analysis of repository context  
  Secondary: SynCode: LLM Generation with Grammar Augmentation  
  Codeplan: Repository-level coding using llms and planning  
5-Nov-25 Human-AI collaboration  
  Primary: Why AI Agents Still Need You: Findings from Developer-Agent Collaborations in the Wild  
  Secondary: Grounded Copilot: How Programmers Interact with Code-Generating Models  
10-Nov-25 Fuzzing  
  Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models  
  WhiteFox: White-box Compiler Fuzzing Empowered by Large Language Models  
12-Nov-25    
17-Nov-25 Debugging  
  Primary: ReproCopilot: LLM-Driven Failure Reproduction with Dynamic Refinement  
  Secondary: Testora: Using Natural Language Intent to Detect Behavioral Regressions  
  ChatDBG: Augmenting Debugging with Large Language Models  
19-Nov-25 Detecting Numerical Errors  
  Primary: Automatically Detecting Numerical Instability in Machine Learning Applications via Soft Assertions  
  Secondary: Detecting numerical bugs in neural network architectures  
24-Nov-25 Testing DL Libraries  
  Primary: Lightweight Concolic Testing via Path-Condition Synthesis for Deep Learning Libraries  
  Secondary: NeuRI: Diversifying DNN Generation via Inductive Rule Inference  
  Docter: Documentation-guided fuzzing for testing deep learning api functions  
  DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis  
26-Nov-25 Thanksgiving break NO CLASS  
1-Dec-25 Project Presentations  
3-Dec-25 Project Presentations  
8-Dec-25 Project Presentations  
15-Dec-25 No classes Final Report Due