Contents:

  1. Indirection in the queue functions

    Why is that all the append and prepend queue functions accept an "any_t" parameter, while dequeue accepts an any_t * item? Since any_t is already a void pointer, are we supposed to return a pointer to a pointer ?
    That's C's method of returning multiple results from a function. Remember that C only uses call-by-value. dequeue is supposed to return true/false based on its success. It is also supposed to return a pointer to an element. It does the former by just issuing a statement like "return 0". It does the latter by taking a pointer to a location, and patching that location with the value it would like to return.

    E.g. let's say you have a function that is supposed to return height (which is 10) and width (say, 20). If you do the obvious:

    int getheightwidth(int width) {
            width = 20;
            return 10;
    }
    main() {
            int h = 0,w = 0;
    
            h = getheightwidth(w);
    }
    

    The change you made to width will not be visible to the calling routine - w will still be 0 after the call, because C uses call-by-value parameters.

    You can get around this problem by passing in a pointer:

    int getheightwidth(int *width) {
            *width = 20;
            return 10;
    }
    main() {
            int h = 0,w = 0;
    
            h = getheightwidth(&w);
    }
    

    Working through an example and convincing yourself that this approach works is the best thing to do. If you have further questions, please see the course staff during one of the TA hours.

  2. The initial context

    We were told that we could represent the thread that NT gives us as a minithread. But what I don't see is where we get the NT stack's base and top to initialize our minithread structure. How do we get such information and more importantly, what is the exact purpose of representing the kernel thread as a minithread?
    Bootstrapping is a problem for any OS. In minithreads, just like in a real system, you have the problem of going from the initial bootloader context to switching between threads that you have created. In our case, the initial stack assigned to us by NT is the bootloader stack, and you would like to then start context switching between minithreads. There are a couple of things one could do here. For instance, you could take a context_switch out of the initial stack, and never return there. This is OK, but you would in effect be throwing away the entire initial stack, which is wasteful.

    A better approach is to use the initial stack provided to you by the bootloader (NT in our case) as if it were a minithread stack. The problem is that you don't know the base or the top of the stack assigned to you by NT. But why do you need to know the base or the top ? You need the base if you want to ever free the stack, and you need the top when you need to initialize the stack. But the boot stack is already initialized, and if you turn the initial boot context into the idle thread or the stack cleaner thread, then it will never terminate. Hence, you'll never need to free its stack.

    So it's perfectly OK to create a special TCB for the initial context. In effect, you have the context already, and you are legitimizing it in your threads package by wrapping it in a TCB. Unlike every other TCB, this one may have NULLs for stack base and top, but that's ok. You get the nice property that you do not lose or discard any of the memory available to you, including the initial stack.

  3. Queue_delete signature

    Also, with the new definition of queue_delete(queue_t, any_t* item), what is the purpose of the any_t* item? Is it so that we can return a ptr to the data of what we are deleting? If this is the case, then how do we identify what it is we are to delete?
    In queue_delete, *item is a pointer to the queue element the calling application wants you to delete. queue_delete does not return a pointer, do the (any_t *) is not strictly necessary. So why is there a level of indirection ? For no fundamental reason, only to keep the queue_dequeue interface identical to queue_delete, so people will not make mistakes when they use either of the routines that take items off of a queue.
  4. Error reporting

    What should we do about errors (in non-C jargon, "exceptions") which occur in the functions we write?
    It depends on what causes the error. If it's because of bad parameters passed by the caller, or because a data structure is empty when the caller thought it was full, or some similar condition, then you should return an error code indicating what the error is, if necessary. If the error is one which the function or the caller can't deal with gracefully (memory corruption, not enough memory for a malloc, etc), then it may be better to print out as descriptive an error message as possible, and terminate the program, since the error is likely to be unrecoverable.
  5. Assertions

    What is the assert() function for? When should I use it?
    assert() is for detecting mistakes and inconsistencies when you're writing and debugging code. If the condition in the assert fails, the program prints an error and dumps core (writes the contents of its memory to a file, for non-Unix people). For instance, if at at the start of function A, the value of X should be 2, and the value of Y should be 8, you could put an "assert(X == 2 && Y == 8)" at the top. Asserts are useful for detecting when you made an incorrect assumption about a program invariant, or you violated the invariant elsewhere in your code. You should not really use them for catching any of the caller-caused errors in the previous paragraph.
  6. Memory Management Errors in C

    I seem to be getting some kind of corruption of memory pointed to by pointers. What could be the cause?
    There are many causes for memory errors. We list a few common errors here, and a way to find and locate any memory error quite efficiently.

    There are several common causes of memory errors:

    • Using a pointer to an object on a stack frame, which is then deallocated.
    • Writing past the end of an allocated structure, often caused by allocating too little memory to fit an object.
    • Freeing the same object more than once.
    Here's how the first case can arise in practice: you use a pointer to a local variable on the stack, which you returned from a function. For instance:
    char* return_a_string(...) {
      char string[32];
      ...
      return string;
    }
    
    ...
    
    int main() {
      char* x = return_a_string(...);
      ...
    }
    
    This is a serious, but unfortunately common C mistake, which is often hard to spot. The problem is that a function allocates its local variables on the stack, which is shared by subsequent function invocations. The pointer x ends up pointing to some location on the stack, which may get used for some other data and overwritten by a later function invocation. Returning this kind of a pointer is even more dangerous because you might not notice the bug immediately. If in place of return_a_string we had
    int* return_an_integer(...) {
      int integer = 32;
      ...
      return &integer;
    }
    
    then we'd have a different version of the same problem. Be aware that the problem still occurs if you store one of these pointers-to-the-stack in a data structure which gets used after the function finishes, as well. You just have to be careful.

    The second case of memory errors is often caused by code like this:

    struct foo { int x; int y};
    typedef struct foo *foo_t;
    
    void function(...) {
      ...  
      // this code allocates too small a space for a struct foo
      foo_t myptr = (foo_t) malloc(sizeof(foo_t));
      myptr->y = 13; // writing past the end of allocated memory
      ...
    }
    
    

    Here, foo_t is a pointer to an object. The amount of memory that has to be allocated to hold a struct foo object is sizeof(struct foo). sizeof(foo_t) will always return the size of a single pointer, (4 bytes on 32-bit architectures, no matter how big the actual object might be).

    Finally, if you free the same object several times, the memory allocator's internal data structures may become inconsistent, and may lead to a crash during a subsequent malloc or free. Some memory allocators will catch this kind of memory errors and alert you, especially if you build within a debug as opposed to a release environment; others might silently fail later.

    Luckily, debugging memory errors is fairly straightforward most of the time. When the crash occurs, you need to identify which virtual address caused the crash. Then you need to examine the instruction at the crashed program counter to see what type of an access was performed. Then work backwards from that point to determine what kind of an inconsistency in memory is the root cause of the problem (e.g. location 0x4003fdc has an erroneous value of 0xdeadbeef stored in it, which causes a subsequent store to location 0xdeadbeef). Finally, set up a watchpoint such that you can see the value stored at that address, and perform a binary search through the program execution to find the instruction that is creating the inconsistency in the first place. For instance, stop the program halfway through to the crash to see the value stored at location 0x4003fdc. If it's not 0xdeadbeef, the memory access is occuring in the second half of the execution. Using binary search, you should be able to pinpoint the root cause of the problem in log N steps.

    Some memory errors require a bit more detective work, especially if the crash is not easily reproducible or "non-deterministic" (by that, most people actually mean that the crash is triggered by external events over which they have little control, such as the timing of interrupts. But keep in mind that most crashes can be made "deterministic" with some effort. For example, recording the PC values at which interrupts occurred, and disabling interrupts at all PC's except those recorded in the crash, will often enable the deterministic reproduction of timing-related errors). Finally, it helps to do the debugging as a team.

  7. Guard Pages

    I am getting a "guard page exception" when running my system, what is the reason?
    It could be because you are declaring an array as a local variable in a function, which is too big to fit on the stack. Threads have quite small stacks (about 64k). Either increase the stack size (defined by the STACKSIZE constant in minithread_md.h), or make the array smaller.
  8. The Build

    Why is it necessary to use "make" to build the PortOS system? Why can't I just add all the sources to a Visual Studio Project?
    The emulation of interrupts requires that we specify the order object files are linked in order to allow us to prevent preemptions in NT library code. Visual Studio doesn't allow specifying the order of linking without a makefile. If you do not use the Makefile, you may get random crashes when preemption catches your code in an NT library.
  9. Creating Stacks

    How does stack allocation work?
    Each new thread you create needs its own stack to store function arguments and return addresses on. You can obtain a stack by caling minithread_allocate_stack(), and initialise it by calling minithread_initialize_stack(). Initialising the stack requires two functions: the function the new thread should run, and a "final" function, which runs after the main function has returned. The final function need not be unique for each thread. Both functions have arguments, so that the order of invocation looks like: main_proc(main_arg); finally_proc(finally_arg). The exception is the first thread you create, which gets the original stack for the process, and runs inside minithread_system_initialize(). It still needs a thread control block, however, so that it can participate in context switches correctly. This first thread cannot terminate in the same way as the others: since it's running on the NT stack, when it returns from main(), it will destroy the whole process.
  10. Freeing Stacks

    Why can't a thread free its own stack?
    When a thread announces it wants to terminate, by calling the thread destroy routine, it is still running on its own stack. In order to transfer control to another thread in a context switch, it pushes some data onto its stack. So if it freed its stack and then context-switched, it would end up writing to an invalid stack. The solution is to have the terminating thread "notify the system" (set a flag, put itself on a queue, etc) so that a thread which runs after it will know it has to free the stack.
  11. Thread Stop and Start

    What are minithread_start() and minithread_stop() for?
    They're really only useful for implementing semaphores, it's unlikely that your "user programs" will need to use these low-level calls.
  12. Thread Identifiers

    What are minithread_self() and minithread_id() for?
    A thread needs to be able to get a pointer to its TCB in case it wants to add it to some data structure, for instance: then another thread could call minithread_start() on it. Of course, this is only useful outside minithread.c. Thread IDs are really only helpful for debugging.
  13. System Termination

    What do I do if all the threads terminate? What if there's no thread avaible to run when a thread does a stop or yield?
    Some students have the tendency to exit from the system when there are no threads to schedule. This is a mistake. Keep in mind that you are writing an operating system, a long-lived process that is never supposed to terminate. If, for some reason, there are no more threads to schedule, your system should just loop in the idle loop. A subsequent interrupt might kick off a previously blocked thread, and cause a thread to start up at an arbitrary time in the future.
  14. Reading Semaphore Internals

    Is it ok to read (not write) the value of a semaphore without executing a P on it?
    No, you should stick to the interface in synch.h. Semaphores do not provide an interface by which the internal semaphore count can be read. There are two good reasons for this: (1) reading internals of a semaphore breaks the semaphore abstraction and encapsulation, (2) that value will be out of date the moment it is read, and therefore code that depends on reading the internal count of a semphore cannot be correct.