| | SE Basics | |
| 25-Aug-25 | Intro and Course Details | Saikat |
| 27-Aug-25 | Program Analysis 1 | Saikat |
| 1-Sep-25 | Labor Day NO CLASS | |
| 3-Sep-25 | Program Analysis 2 | Saikat |
| 8-Sep-25 | Software Testing | Saikat |
| 10-Sep-25 | Debugging | Saikat |
| | LLM Basics | |
| 15-Sep-25 | ML Models: Intro | Saikat |
| 17-Sep-25 | LLMs for Code (CodeBert/T5/CodeLlama) | Project Proposal Due |
| | Primary: CodeBERT: A Pre-Trained Model for Programming and Natural Languages | |
| | Secondary: AST-T5: Structure-Aware Pretraining for Code Generation and Understanding | |
| | CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation | |
| | Code Llama: Open Foundation Models for Code | |
| 22-Sep-25 | Post-Training LLM Adaptation | |
| | Primary: Training language models to follow instructions with human feedback | |
| | Secondary: Direct Preference Optimization: Your Language Model is Secretly a Reward Model | |
| | SelfCodeAlign: Self-Alignment for Code Generation | |
| 24-Sep-25 | Fine-Tuning | |
| | LoRA: Low-Rank Adaptation of Large Language Models | |
| | QLoRA: Efficient Finetuning of Quantized LLMs | |
| 29-Sep-25 | Proposal Presentations | |
| 1-Oct-25 | Evaluating LLMs | |
| | Primary: Evaluating Large Language Models Trained on Code | |
| | Secondary: Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation | |
| | ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation | |
| | CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution | |
| | ML4SE | |
| 6-Oct-25 | Fuzzing with LLMs | |
| | Primary: Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models | |
| | Secondary: Large Language Model assisted Hybrid Fuzzing | |
| | Automated Unit Test Improvement using Large Language Models at Meta | |
| | {FuzzGuard}: Filtering out unreachable inputs in directed grey-box fuzzing through deep learning | |
| | Can Large Language Models Write Good Property-Based Tests? | |
| 8-Oct-25 | Program Repair with LLMs and Agents | |
| | Primary: AutoCodeRover: Autonomous Program Improvement | |
| | Secondary: AGENTLESS: Demystifying LLM-based Software Engineering Agents | |
| | Swe-bench: Can language models resolve real-world github issues? | |
| 15-Oct-25 | Verification | |
| | Primary: Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification | |
| | Secondary: Baldur: Whole-proof generation and repair with large language models | |
| 20-Oct-25 | Security | |
| | Primary: Large language models for code: Security hardening and adversarial testing | |
| | Secondary: NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness | |
| 22-Oct-25 | Test Generation | MidTerm Report Due |
| | Primary: Learning Deep Semantics for Test Completion | |
| | Secondary: Generating Exceptional Behavior Tests with Reasoning Augmented Large Language Models | |
| 27-Oct-25 | Code Translation | |
| | Primary: AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation | |
| | Secondary: Scalable, Validated Code Translation of Entire Projects using Large Language Models | |
| | VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners | |
| SE4ML | | |
| 29-Oct-25 | Test Oracle Generation | |
| | Primary: Toga: A neural method for test oracle generation | |
| | Secondary: On learning meaningful assert statements for unit test cases | |
| 3-Nov-25 | Code Generation | |
| | Primary: Monitor-guided decoding of code LMs with static analysis of repository context | |
| | Secondary: SynCode: LLM Generation with Grammar Augmentation | |
| | Codeplan: Repository-level coding using llms and planning | |
| 5-Nov-25 | Human-AI collaboration | |
| | Primary: Why AI Agents Still Need You: Findings from Developer-Agent Collaborations in the Wild | |
| | Secondary: Grounded Copilot: How Programmers Interact with Code-Generating Models | |
| 10-Nov-25 | Fuzzing | |
| | Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models | |
| | WhiteFox: White-box Compiler Fuzzing Empowered by Large Language Models | |
| 12-Nov-25 | | |
| 17-Nov-25 | Debugging | |
| | Primary: ReproCopilot: LLM-Driven Failure Reproduction with Dynamic Refinement | |
| | Secondary: Testora: Using Natural Language Intent to Detect Behavioral Regressions | |
| | ChatDBG: Augmenting Debugging with Large Language Models | |
| 19-Nov-25 | Detecting Numerical Errors | |
| | Primary: Automatically Detecting Numerical Instability in Machine Learning Applications via Soft Assertions | |
| | Secondary: Detecting numerical bugs in neural network architectures | |
| 24-Nov-25 | Testing DL Libraries | |
| | Primary: Lightweight Concolic Testing via Path-Condition Synthesis for Deep Learning Libraries | |
| | Secondary: NeuRI: Diversifying DNN Generation via Inductive Rule Inference | |
| | Docter: Documentation-guided fuzzing for testing deep learning api functions | |
| | DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis | |
| 26-Nov-25 | Thanksgiving break NO CLASS | |
| 1-Dec-25 | Project Presentations | |
| 3-Dec-25 | Project Presentations | |
| 8-Dec-25 | Project Presentations | |
| 15-Dec-25 | No classes | Final Report Due |