# CS 3410: Computer System Organization and Programming Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 **Computer Science** **Cornell University** # **Course Objective** Bridge the gap between hardware and software - How a processor works - How a computer is organized Establish a foundation for building higher-level applications - How to understand program performance - How to understand where the world is going # Where did it begin? ### **Electrical Switch** - On/Off - Binary **Transistor** The first transistor on a workbench at AT&T Bell Labs in 1947 # Moore's Law #### 1965 number of transistors that can be integrated on a die would double every 18 to 24 months (i.e., grow exponentially with time) ### Amazingly visionary - 2300 transistors, 1 MHz clock (Intel 4004) 1971 - 16 Million transistors (Ultra Sparc III) - 42 Million transistors, 2 GHz clock (Intel Xeon) 2001 - 55 Million transistors, 3 GHz, 130nm technology, 250mm<sup>2</sup> die (Intel Pentium 4) – 2004 - 290+ Million transistors, 3 GHz (Intel Core 2 Duo) 2007 - 721 Million transistors, 2 GHz (Nehalem) 2009 - 1.4 Billion transistors, 3.4 GHz Intel Haswell (Quad core) 2013 #### Microprocessor Transistor Counts 1971-2011 & Moore's Law ### **Processor Performance Increase** # Moore's Law #### 1965 number of transistors that can be integrated on a die would double every 18 to 24 months (i.e., grow exponentially with time) ### Amazingly visionary - 2300 transistors, 1 MHz clock (Intel 4004) 1971 - 16 Million transistors (Ultra Sparc III) - 42 Million transistors, 2 GHz clock (Intel Xeon) 2001 - 55 Million transistors, 3 GHz, 130nm technology, 250mm<sup>2</sup> die (Intel Pentium 4) – 2004 - 290+ Million transistors, 3 GHz (Intel Core 2 Duo) 2007 - 721 Million transistors, 2 GHz (Nehalem) 2009 - 1.4 Billion transistors, 3.4 GHz Intel Haswell (Quad core) 2013 # **Parallelism** ### Then and Now http://techguru3d.com/4th-gen-intel-haswell-processors-architecture-and-lineup/ #### The first transistor - One workbench at AT&T Bell Labs - 1947 - Bardeen, Brattain, and Shockley #### An Intel Haswell - 1.4 billion transistors - 177 square millimeters - Four processing cores ### Then and Now ### The first transistor - One workbench at AT&T Bell Labs - 1947 - Bardeen, Brattain, and Shockley ### Galaxy Note 3 8 processing cores # **Parallelism** ### **GPU-type computation offers higher GFlops** (Source: Sam Naffziger, AMD) # GPUs: Faster than Moore's Law Graph courtesy of Professor John Poulton (from Eric Haines) ### Supercomputers - Petaflops (10<sup>15</sup>) - GPUs/multicore/100s-1000s cores China's Tianhe-2 Supercomputer Maintains Top Spot on 2013-11-18 08:29:48+00:00 MANNHEIM, Germany; BERKELEY, Calif.; and KNOXVILLE, Tenn. - Tianhe-2, a supercomputer developed by China's National University of Defense Technology, retained its position as the world's No. 1 system with a performance of 33.86 petaflop/s (quadrillions of calculations per second) on the Linpack benchmark, according to the 42nd edition of the twice-yearly TOP500 list of the world's most powerful supercomputers. The list was announced Nov. 18 at the SC13 conference in Denver, Colo. Titan, a Cray XK7 system installed at the Department of Energy's (DOE) Oak Ridge National Laboratory, remains the No. 2 system. It achieved 17.59 Pflop/s on the Linpack benchmark. Titan is one of the most energy efficient systems on the list consuming a total of 8.21 MW and delivering 2.143 gigaflops/W. Sequoia, an IBM BlueGene/Q system installed at DOE's Lawrence Livermore National Laboratory, is again the No. 3 system. It was first delivered in 2011 and achieved 17.17 Plop/s on the Linpack benchmark. # Why? Parallelism Pipelining # Programmable Hardware - Started in 1999 - Flexible, programmable - Vertex, Geometry, Fragment Shaders - And much faster, of course - 1999 GeForce256: 0.35 Gigapixel peak fill rate - 2001 GeForce3: 0.8 Gigapixel peak fill rate - 2003 GeForceFX Ultra: 2.0 Gigapixel peak fill rate - ATI Radeon 9800 Pro: 3.0 Gigapixel peak fill rate - 2006 NV60: ... Gigapixel peak fill rate - 2009 GeForce GTX 285: 10 Gigapixel peak fill rate - 2011 - GeForce GTC 590: 56 Gigapixel peak fill rate - Radeon HD 6990: 2x26.5 - 2012 - GeForce GTC 690: 62 Gigapixel/s peak fill rate # **Course Objective** Bridge the gap between hardware and software - How a processor works - How a computer is organized Establish a foundation for building higher-level applications - How to understand program performance - How to understand where the world is going # How class is organized Instructor: Kavita Bala and Hakim Weatherspoon (kb@cs.cornell.edu, hweather@cs.cornell.edu) #### Lecture: - Tu/Th 1:25-2:40 - Statler Auditorium #### Lab sections: - Start next week - Carpenter 104 (Blue room) - Carpenter 235 (Red room) - Upson B7 ### Who am I? #### Prof. Kavita Bala - Ugrad: IIT Bombay - PhD: MIT - Started in compilers and systems - Moved to graphics - Also work on parallel processing in graphics # Autodesk 360 Cloud Render ### Who am I? #### Prof. Hakim Weatherspoon - (Hakim means Doctor, wise, or prof. in Arabic) - Background in Education - Undergraduate University of Washington - Played Varsity Football - » Some teammates collectively make \$100's of millions - » I teach!!! - Graduate University of California, Berkeley - Some class mates collectively make \$100's of millions - I teach!!! - Background in Operating Systems - Peer-to-Peer Storage - Antiquity project Secure wide-area distributed system - OceanStore project Store your data for 1000 years - Network overlays - Bamboo and Tapestry Find your data around globe - Tiny OS - Early adopter in 1999, but ultimately chose P2P direction ### Who am I? ### Cloud computing/storage Optimizing a global network of data centers ### **Course Staff** #### cs3410-staff-l@cs.cornell.edu #### Lab/Homework TA's | , | OTTIC VV OTTO TA 17 13 | | | |---|------------------------|----------------------------------------------------|--------| | • | Paul Upchurch | <paulu@cs.cornell.edu></paulu@cs.cornell.edu> | (PhD) | | • | Zhiming Shen | <zshen@cs.cornell.edu></zshen@cs.cornell.edu> | (PhD) | | • | Pu Zhang | <pz59@cornell.edu></pz59@cornell.edu> | (PhD) | | • | Andrew Hirsch | <akh95@cornell.edu></akh95@cornell.edu> | (PhD) | | • | Emma Kilfoyle | <efk23@cornell.edu></efk23@cornell.edu> | (MEng) | | • | Roman Averbukh | <raa89@cornell.edu></raa89@cornell.edu> | (MEng) | | • | Lydia Wang | <lw354@cornell.edu></lw354@cornell.edu> | (MEng) | | • | Favian Contreras | <fnc4@cornell.edu></fnc4@cornell.edu> | ( 0, | | • | Victoria Wu | <vw52@cornell.edu></vw52@cornell.edu> | | | • | Detian Shi | <ds629@cornell.edu></ds629@cornell.edu> | | | • | Maxwell Dergosits | <mad293@cornell.edu></mad293@cornell.edu> | | | • | Jimmy Zhu | <jhz22@cornell.edu></jhz22@cornell.edu> | | | • | Antoine Pourchet | <app63@cornell.edu></app63@cornell.edu> | | | • | Brady Jacobs | <bij4@cornell.edu></bij4@cornell.edu> | | | • | Kristen Tierney | <kjt54@cornell.edu></kjt54@cornell.edu> | | | • | Gary Zibrat | <gdz4@cornell.edu></gdz4@cornell.edu> | | | • | Naman Agarwal | <na298@cornell.edu></na298@cornell.edu> | | | • | Sanyukta Inamdar | <pre><sri7@cornell.edu></sri7@cornell.edu></pre> | | | • | Sean Salmon | <ss2669@cornell.edu></ss2669@cornell.edu> | | | • | Ari Karo | <aak82@cornell.edu></aak82@cornell.edu> | | | • | Brennan Chu | <pre><bc385@cornell.edu></bc385@cornell.edu></pre> | | | | Breiman ena | , bosos es connenicados | | #### Administrative Assistant: • Molly Trufant (mjt264@cs.cornell.edu) # Pre-requisites and scheduling **CS 2110 is required** (Object-Oriented Programming and Data Structures) - Must have satisfactorily completed CS 2110 - Cannot take CS 2110 concurrently with CS 3410 CS 3420 (ECE 3140) (Embedded Systems) - Take either CS 3410 or CS 3420 - both satisfy CS and ECE requirements - However, Need ENGRD 2300 to take CS 3420 CS 3110 (Data Structures and Functional Programming) Not advised to take CS 3110 and 3410 together # Pre-requisites and scheduling #### CS 2043 (UNIX Tools and Scripting) - 2-credit course will greatly help with CS 3410. - Meets Mon, Wed, Fri at 11:15am-12:05pm in Hollister (HLS) B14 - Class started yesterday and ends March 5<sup>th</sup> #### CS 2022 (Introduction to C) and CS 2024 (C++) - 1 to 2-credit course will greatly help with CS 3410 - Unfortunately, offered in the fall, not spring - Instead, we will offer a primer to C during lab sections and include some C questions in homeworks # Schedule (subject to change) | /eek | Date (Tue) | Lecture# | | | Lecture Topic | нw | Prelim Evening | Lab Topic | Lab/Proj | |------|------------|----------|-----|----------|--------------------------------------------------|--------------------------------------------------|--------------------------------------------------|---------------------------|------------------------------------| | | <u> </u> | | | | , | | Ĭ | | | | 1 | 23-Jan | 1 | к&н | | Intro | | | | | | | 28-Jan | | Н | | Logic & Gates | | | Logisim | Lab 0: Adder/Logisim intro Handout | | 2 | | 3 | К | | Numbers & Arithmetic | | | | , , , | | | | | | | | HW1: Logic, | | | | | | | | | | | Gates, Numbers, | | | | | | 4-Feb | 4 | Н | KB(out) | State & FSMs | & Arithmetic | | ALU/Design Docs | lab 1: ALU Handout (design doc due | | 3 | | 5 | Η | KB(out) | Memory | | | | one-week, lab1 due two-weeks) | | | 11-Feb | 6 | K | | Simple CPU | | | FSM | Lab 2: (IN-CLASS) FSM Handout | | 4 | | 7 | K | | CPU Performance & Pipelines | | | | | | | | | | | | HW2: FSMs, | | | | | | | | | | | Memory, CPU, | | | | | | 18-Feb | | | H(out) | Winter Break | Performance, | | MIPS | Proj 1: MIPS 1 Handout | | | | | | | | and pipelined | | | | | | | 8 | K | H(out) | Pipelined MIPS | MIPS | | | | | 5 | 25-Feb | 9 | r | | Pipeline Hazards | | | C for Java<br>Programmers | Proj 1: Design Doc Due | | | 23-760 | 9 | _ | | Control Hazards & ISA | | | riogrammers | r roj 1. Design Doc Due | | | | 10 | К | | Variations | | | | | | | | | | | | | | | | | 6 | 4-Mar | 11 | K | | RISC & CISC & Prelim 1 Review | | Prelim1 | C lecture 2 | C lecture 2 | | | | 12 | Н | | Calling Conventions | | | | | | | | | | | | HW3: Calling | | | | | | | | | | | Conventions, | | | | | 7 | 11-Mar | 13 | Н | | Calling Conventions | RISC, CISC | | MIPS 2 | Proj 2: MIPS 2 Handout | | | | | | | | Linkers & and | | | | | | | | l | | L | more calling | | | | | | | 14 | | | Calling Conventions | conventions | | | | | 8 | 18-Mar | 15 | Н | | Linkers | - | | Intro to UNIX/Linux | Proj 2: Design Doc Due | | | | 16 | ν | | Caches 1 | | | ssh, gcc, How to tunne | | | 9 | 25-Mar | 17 | | _ | Caches 2 | | | C lecture 3 | C lecture 3 | | | 25-14101 | 18 | | | Caches 3 | | | C lecture 3 | c recture 3 | | | 1-Apr | | | H(out) | Spring Break | | | | | | | 1-Apr | | | | <del> </del> | | | | | | 10 | 0 4 | 10 | | H(out) | Spring Break<br>Virtual Memory 1 | <del> </del> | <del> </del> | Stack Smachine | Lab 2: Buffor Overflows bands::t | | 10 | 8-Apr | | | <b>—</b> | <u> </u> | <del> </del> | | Stack Smashing | Lab 3: Buffer Overflows handout | | | - | 20 | П | <u> </u> | Virtual Memory 2 | | | | | | | 1 | | l | l | | HW4: Virtual | 1 | | | | 11 | 15-Apr | 21 | н | | Traps | memory, Caches, | | Caches | Proj 3: Caches Handout | | 1. | 25 7.91 | | Ė | | Multicore Architectures & | Traps, Multicore, | İ | | , | | | | 22 | K | | GPUs | Synchronization | | | | | 12 | 22-Apr | 23 | K | | Synchronization | | | Virtual Memory | Lab 4: (IN-CLASS) Virtual Memory | | | | 24 | | | Synchronization 2 | | 1 | | , | | 13 | 29-Apr | | к н | | GPUs & Prelim 2 Review | | | Synchronization | Proj 4: Multicore/NW Handout | | | | 26 | | | 1/0 | | Prelim 2 | | | | 14 | 6-May | | K&H | | Future Directions | | i | | Proj 4: Design Doc Due | | | T, | | | i | | | | | , | | | 13-May | | | | | | 1 | | Proj 4 Due | | | 15 IVIAY | | | | | | | | , | | | 20-May | | | | | | | | | | | ZU-IVIdy | | | | <del> </del> | | | | 1 | # Grading Lab (50% approx.) - 5-6 Individual Labs - 2 out-of-class labs (5-10%) - 3-4 in-class labs (5-7.5%) - 4 Group Projects (30-35%) - Participation/Quizzes in lab (2.5%) Lecture (50% approx.) - 2 Prelims (35%) - Dates: March 4, May 1 - Homework (10%) - Participation/Quizzes in lecture (5%) # Grading #### Regrade policy - Submit written request to lead TA, and lead TA will pick a different grader - Submit another written request, lead TA will regrade directly - Submit yet another written request for professor to regrade ### Late Policy - Each person has a total of four "slip days" - Max of *two* slip days for any individual assignment - For projects, slip days are deducted from all partners - 25% deducted per day late after slip days are exhausted Put all devices into *Airplane Mode* # **Active Learning** L Deslauriers et al. Science 2011;332:862-864 Fig. 1 Histogram of 270 physic student scores for the two sections: Experiment w/ quizzes and active learning. Control without. # **Active Learning** Demo: What year are you in school? - a) Freshman - b) Sophomore - c) Junior - d) Senior - e) Other # **Active Learning** Also, activity handouts will be available before class In front of doors before you walk in ### Administrivia http://www.cs.cornell.edu/courses/cs3410/2014sp - Office Hours / Consulting Hours - Lecture slides, schedule, and Logisim - CSUG lab access (esp. second half of course) ### Lab Sections (start next week) | 2:55 – 4:10pm | Carpenter Hall 104 (Blue Room) | |-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| | 8:40—9:55am | Carpenter Hall 104 (Blue Room) | | 11:40am – 12:55pm | Carpenter Hall 104 (BlueRoom) | | 3:35 – 4:50pm | Carpenter Hall 104 (Blue Room) | | 7:30—8:45pm | Carpenter Hall 235 (Red Room) | | 8:40 – 9:55pm | Carpenter Hall 104 (Blue Room) | | 11:40 – 12:55pm | Carpenter Hall 104 (Blue Room) | | 2:55 – 4:10pm | Carpenter Hall 104 (Blue Room) | | 8:40 – 9:55am | Carpenter Hall 104 (Blue Room) | | 11:40am – 12:55pm | Upson B7 | | 2:55 – 4:10pm | Carpenter Hall 104 (Blue Room) | | | 8:40—9:55am<br>11:40am — 12:55pm<br>3:35 — 4:50pm<br>7:30—8:45pm<br>8:40 — 9:55pm<br>11:40 — 12:55pm<br>2:55 — 4:10pm<br>8:40 — 9:55am<br>11:40am — 12:55pm | - Labs are separate than lecture and homework - Bring laptop to Labs - Next week: intro to logisim and building an adder ### Administrivia http://www.cs.cornell.edu/courses/cs3410/2014sp - Office Hours / Consulting Hours - Lecture slides, schedule, and Logisim - CSUG lab access (esp. second half of course) ### Course Virtual Machine (VM) - Identical to CSUG Linux machines - Download and use for labs and projects - https://confluence.cornell.edu/display/coecis/CSUG+Lab+VM+Information ## Communication #### **Email** - cs3410-staff-l@cs.cornell.edu - The email alias goes to me and the TAs, not to whole class #### **Assignments** CMS: http://cms.csuglab.cornell.edu #### Newsgroup - http://www.piazza.com/cornell/spring2014/cs3410 - For students #### iClicker http://atcsupport.cit.cornell.edu/pollsrvc/ # Lab Sections, Projects, and Homeworks #### Lab Sections start *next* week Intro to logisim and building an adder #### **Labs Assignments** - Individual - One week to finish (usually Monday to Monday) #### **Projects** - two-person teams - Find partner in same section #### Homeworks - One before each prelim - Will be released a few weeks ahead of time - Finish question after covered in lecture # **Academic Integrity** All submitted work must be your own - OK to study together, but do not share soln's - Cite your sources Project groups submit joint work - Same rules apply to projects at the group level - Cannot use of someone else's soln Closed-book exams, no calculators - Stressed? Tempted? Lost? - Come see us before due date! Plagiarism in any form will not be tolerated # Why do CS Students Need Transistors? # Why do CS Students Need Transistors? Functionality and Performance # Why do CS Students Need Transistors? #### To be better Computer Scientists and Engineers - Abstraction: simplifying complexity - How is a computer system organized? How do I build it? - How do I program it? How do I change it? - How does its design/organization effect performance? # **Computer System Organization** ## **Computer System Organization** Computer System = ? Input + Keyboard Mouse Output + Memory + Datapath + Video **Network USB** Control Registers bus bus Serial **CPU** Disk Memory Audio # **Compilers & Assemblers** ``` int x = 10; x = 2 * x + 15; r0 = 0 compiler - r5 = r0 + 10 MIPS addi r5, r0, 10 ← ___r5 = r5 * 2 muli r5, r5, 2 \leftarrow assembly addi r5, r5, 15 ↔ -r5 = r15 + 15 language assembler op = addi r0 r5 10 001000000000010100000000000001010 MIPS 0000000000001010010100001000000 machine 001000001010010100000000000001111 language op = addi r5 r5 ``` ## Instruction Set Architecture #### ISA abstract interface between hardware and the lowest level software user portion of the instruction set plus the operating system interfaces used by application programmers # **Basic Computer System** ### A processor executes instructions Processor has some internal state in storage elements (registers) ## A memory holds instructions and data von Neumann architecture: combined inst and data #### A bus connects the two # How to Design a Simple Processor ## Inside the Processor ## AMD Barcelona: 4 processor cores Figure from Patterson & Hennesssy, Computer Organization and Design, 4<sup>th</sup> Edition # How to Program the Processor: MIPS R3000 ISA ## **Instruction Categories** - Load/Store - Computational - Jump and Branch - Floating Point - coprocessor - Memory Management | Registers | | | | | | |-----------|--|--|--|--|--| | D0 D04 | | | | | | | R0 - R31 | | | | | | | | | | | | | | | | | | | | | PC | | | | | | | HI | | | | | | | | | | | | | LO | OP | rs | rt | rd | sa | funct | | |----------------|----|----|-----------|----|-------|--| | OP | rs | rt | immediate | | | | | OP jump target | | | | | | | ## Overview # **Applications** ## Everything these days! • Phones, cars, televisions, games, computers,... #### **Applications** XILINX: VIRTEX®-6 1200 ■ Cell Phones XC6VLX240T PCs FF1156ABC0842 1000 785 ■ TVs **800** 800 **502** Xilinx FPGA 405 295 400 136 135 114 1193 200 1997 2003 1999 2001 2005 2007 Cloud Comput Berkeley mote NVidia GPU Cars Cell Phone 53 # Covered in this course ## Reflect ## Why take this course? - Basic knowledge needed for all other areas of CS: operating systems, compilers, ... - Levels are not independent hardware design ↔ software design ↔ performance - Crossing boundaries is hard but important device drivers - Good design techniques abstraction, layering, pipelining, parallel vs. serial, ... - Understand where the world is going