Section notes for CS162 Section #2, February 04 2003 Hakim Weatherspoon - Welcome to section - Personal introduction, undergrad experience, grad school, research interests, etc. - Logistics: - My class web page: http://www.cs.berkeley.edu/~hweather/ta/cs162 - Office hours: Mon 1:30-2:30 Wed 4:30-5:30, Location: 651 Soda (Alcove on the 6th floor) - e-mail: cs162-td@imail.cs.berkeley.edu - Please use newsgroup for most things: ucb.class.cs162 - Class info and registration - Please fill out form on my web page by Friday Feb 7 (so I can make sure I have all of your details correct - important!) - Important that I get to know everyone in section - Get to know each other: Go around and say name - SSH - Need to generate a public key 'ssh-keygen' - Pick a passphrase you can remember! - Key will be used for identification when using CVS - Project info - We will be using Java! - Mainly need to understand basics of language, classes, etc. (not all of the crazy APIs) - You will not be using Java threads (directly) - so don't really need to understand them either - Java/Nachos will is on course web page soon - please download and read! - First project design due on Feb 19th - Example design document on web page soon - Keep them short: 3 pages max - GET STARTED NOW! Don't wait until last minute - Next section: Will do Nachos walkthrough - Please use CVS: - Will really help your project - CVS tutorial later in section - CVS tutorial on web, also Wed 2/5/2003 6:00-8:00pm, 306 Soda - Why do we study OS's? - Isn't this just about hacking Linux or Windows NT? - No -- OS class is the foundation for much important research and industry work - Why develop OS's? Aren't Linux and Windows good enough? - Increased performance - More features - ease of use, device support - New applications - not just about desktop use! - High performance systems - Internet servers, Workstation clusters - How to scale systems to enormous size in terms of: CPUs, memory, disk storage, network bandwidth - cf. Google: 4000-node Linux cluster - Embedded systems: - Tiny networked sensors everywhere (cf. TinyOS) - Ubiquitous computing: intelligence and networking transparently embedded in the environment - Fault tolerant systems: - How to keep data consistent and always available? - cf. Air traffic control systems, bank accounts - Distributed systems: - How to access data seamlessly from anywhere in the world? - How to deal with intermittent connectivity? - cf. OceanStore - All of these things are OS problems! - Basic structure of operating systems - OS is like a library: Invoked whenever program needs services from it - e.g. file access, network access, memory allocation, etc. - OS is like an interrupt handler: responds to events from h/w - e.g. network board interrupts CPU when packet arrives - interrupt by a hardware timer device : to maintain system clock - OS also responds to software traps from the CPU - e.g., trap when process reads/writes memory address it does not have access to - BTW, trap also used for system calls - jump into trusted code within the OS - QUESTION: Why can't this just be done with a library call? A: Because library call isn't safe: User could just provide their own routine; a trap provides a trusted "entry point" into the kernel code; trap also raises protection level of CPU to "kernel mode" -- so that the OS code can do anything it wants - Multiprogramming: Here be dragons! - One of the trickies aspects of OS design to understand - The illusion to support: Multiple programs on the same hardware which each think they have infinite resources and the whole machine to themselves - The reality: Single machine with limited resources (one CPU, memory, network, disk, etc.) - How does it all work? The most important aspect of OS design - Address spaces for isolation and abstraction - Give each program its own ADDRESS SPACE: - Range of "virtual addresses" (usually in range of 0 to 2^32) which it can access - Program thinks that each virtual address corresponds to its own "private memory" - Important as this ISOLATES processes - QUESTION: Apart from protection, how else are address spaces useful? ANSWER: Also lets programs be compiled to use virtual addresses (i.e. compiler can place data variable at location "0x3000", without needing to know where other programs are running on the machine) - OS maps virtual memory addresses to physical memory addresss, with the help of the MMU (in hardware) - more on h/w support later - QUESTION: What if we have less physical memory than virtual memory? - ANSWER: Paging: OS swaps memory to and from disk - Get into all of the details later in the course - Regardless of paging, address spaces isolate the memory accesses that a program can make - Programs can only access their own memory, not that of other processes or the OS - So if program X writes to "* 0xdeadabba" then program Y reads from "* 0xdeadabba", X and Y really writing to different addresses - A nice feature to support for inter-process communication is shared memory: Will be discussed later in the course - Also, if X reads/writes memory it doesn't have access to (e.g. "null page") will get trapped by hardware and killed: -- segmentation fault: core dumped ! - PROCESSES and THREADS - A "process" generally refers to a single program that you run, with its own code, address space, etc. - The object resource allocation that the operating system deals with - A "thread" is an entity corresponding to a single flow of control through that program - The unit of scheduling - One process can have many threads - those threads all share the same address space, but have their own stack, CPU registers, etc. - A Thread Consists of - Program Counter - Register Set - Stack space - it shares with peer threads its code section, data section, and OS resources (open files, signals) - Program text and data - Heap - Other memory (* shared with other threads in process) - I/O state (file descriptors, sockets, etc.) - no protection between threads in a task, no problem because should be cooperating - Thread States - thread states are ready, blocked, running, or terminated - Thread Switch - Save state of current thread -> TCB - Restore state to new thread <- TCB - Jump to new returnPC Hint: It's good to always have one idle thread so you always have a thread to switch to/from. - Kernel Level Threads - uses system calls (interrupt to kernel) - Mach and OS/2 - schedules each thread - User Level Threads - no system calls, faster to switch - Project Andrew (CMU) - kernel schedules at task level - (disadv) if the kernel is single-threaded, than any user-level threads executing a system call will casue the entire task to block until the system call returns - (disadv) scheduling can be unfair - Example of User Level Threads - process a (1) and process b (100), the thread in a runs 100 times faster than a thread in b - How does multiprogramming work? - Remember that only one program is really running at any time - QUESTION: So if I start running a program (i.e. Netscape), how do I prevent it from spinning on the CPU and just hanging the machine? - Note that Win 3.1 used to have this problem; programs had to explicitly yield to the OS - ANSWER: OS is interrupted by hardware timer Define: PREEMPTION Whenever timer goes off, OS gets control back from program Can decide to CONTEXT SWITCH to another program - Also: OS gets control whenever you do a system call, so you can make scheduling decisions then - Also: Any time a hardware interupt occurs - CONTEXT SWITCH - OS does the following: - Save state of previous thread (Saves registers, I/O descriptors, page table ptr, etc. in TCB) - Restores machine registers to state of new program - Sets up stack pointer, etc. - Sets up other state (I/O descriptors, etc.) - Sets up PAGE TABLES to map address space of new program - Flushes TLB (QUESTION: Why? Answer: So that new virtual->physical mappings are enforced) - Calls special instruction (sometimes) to "return from system call": resume CPU from user program state at user's protection level - EIT - Exceptions = Interrupts + Traps - Traps are synchronous events in CPU - eg page fault, divide by zero, memory error, illegal instr, illegal address, hardware error, supervisor call (SVC) - Interrupts are asynchronouse events outside of the cpu - eg I/O timer - Hardware support - Memory Management Unit (MMU) provides address translation - Page Table for virtual->physical address mapping - Page Table Base Register (PTBR) points to page table - Checking protection bits in page table entries - Hardware timers - to allow preemption - Protection levels, diallow user code from doing certain things (i.e. mucking with page tables, etc.) - Issues with threads and concurrency (things to think about) - How to pick next thread to run? (SCHEDULING.) - Want to maximize fairness and still have low response time - Much detail on this later in the course - How to let threads interact safely? (SYNCHRONIZATION.) - I.e. two threads each writing to the same shared variable - How to avoid having one thread overwrite the other's value (i.e. T1 does: local := shared; local := f(local); shared := local; T2 does: local := shared; local := g(local); shared := local; RACE CONDITION: T1 writes new value of 'shared' while T2 computing g, so T2 writes 'shared' based on "old" value.) - Various ways of solving this: Condition variables, mutexes: next few lectures - How to make threads efficient - eg. Web server: Convenient to code as one thread per client request - But, there is some overhead with threads? QUESTION: What overhead is there? - TCB, stack, machine registers, etc. - Time to contex switch - Time to warm up cache, TLB, etc. after switch - Quick CVS Primer - Basic model: Single REPOSITORY which stores the "master" version of the code, as well as all previous versions - Users do NOT edit anything in the repository - rather, they "check out" a copy of the files, and edit that copy - After making changes, you "commit" the changes back to the repository - Other users only see those changes if they do an "update" - Basic CVS usage - Repository will be created for each project group - stay tuned - Repository should be readable ONLY to members of your group - so we will assign UNIX groups to you 1) Set your CVSROOT environment variable to point to the repository: setenv CVSROOT If you want to check the files out from a remote machine (say, over ssh), then you would use the following instead: setenv CVSROOT cs162@torus:/home/ff/cs162/repository setenv CVS_RSH ssh 2) Check out a local copy of the files you want to edit. For example, to modify your project's Nachos code, you might do: cvs co nachos This will create a "nachos" directory with all of the Nachos code in it. 3) Edit your local copy of the files. 4) Commit the changes back to the repository: cvs commit This will start up an editor for you to enter a log entry describing your changes. PLEASE enter a descriptive entry of the changes you made -- this will make tracking changes a lot easier. Note that you cannot commit changes unless your local files are up to date: that is, that you have been editing the most recent copy. See next step... 5) For other users to see the changes, they need to do: cvs update -d which will update their local files to be in sync with the repository. If there are conflicts (that is, someone else edited the same file while you were in the process of editing it), you will be notified at this point and given a chance to fix the conflict. To add a file or directory to the repository, just do: cvs add filenames... After 'cvs add' you need to 'cvs commit' for the addition to take effect. Also: 'cvs import' to add a whole tree (e.g. the nachos code for the first time): cvs import To remove a file, first remove it from your local copy: rm filename Then remove it from the CVS repository: cvs remove filename cvs commit Other features: - Can check out copy of code as it looked at a given time cvs co -D "yesterday" nachos cvs co -D "02/03/03 8:00" nachos - Can tag a version of the file for later use: cvs tag working1 cvs co -r working1 nachos - Can look at changes: cvs log cvs diff -D "yesterday" cvs diff -r 1.3 foo.java IMPORTANT CAVEATS: Note that CVS does not allow you to rename files - you need to "remove" the file and then "add" the file under the new name. In general it's best not to rename files, since this loses all of the editing history for that file. Also note that CVS does not allow you to remove or rename directories once they have been added to the repository. THIS IS VERY IMPORTANT!! Do not add directories to the repository temporarily - you can never get rid of them. This is a real drawback to CVS, but it's something you just have to live with.