Final Project - A Thread-safe Dynamic ory Allocator

CS 3410 Spring 2018


Schedule design doc meeting by: 11:59pm, Saturday, May 6th, 2018

Design Doc Deadline: 11:59pm, Wednesday, May 9, 2018

Deadline: 4:30pm, Tuesday, May 15, 2018

Important: you must work in a group of two for this project. You don't need to have the same partner as the earlier projects. See What to submit for details.


In this assignment, you will implement a dynamic memory allocation library (aka malloc) that supports the allocation, freeing, and resizing of blocks of memory. Your implementation will be assessed based on its correctness, adherence to the specifications of the library, and its utilization of the memory it manages. Please note that although egregiously slow implementations could time out (and you should fix this if this applies to your code), your implementation will not otherwise be judged on performance. Lastly, we will be comparing your implementation of malloc to your classmates' in a competition for the best malloc! You may find this slides helpful

You can find the files for this project in your Github repository. You may find these git tips useful.

WARNING!!

Do not change anything other than the following files:

  • alias.txt
  • heaplib.c
  • spinlock.c
  • tests.h

Changing anything else could result in your project not being able to function, which is VERY difficult to debug as a student or TA.

Make sure you thoroughly read through this ENTIRE write-up before starting.

What to submit:

Note the different due dates.

By 11:59pm, Saturday, May 6th, 2018
  • Schedule a meeting with a TA to present your intended implementation of spinlock and heaplib.
By 11:59pm, Wednesday, May 9, 2018
  • Meet with your TA at your scheduled time to present your intended design.
  • Submit your preliminary design doc (with any suggested modifications your TA gives you).
By 4:30pm, Tuesday, May 15th, 2018
  • alias.txt: 1-line file containing your Leaderboard Alias, to be used to identify you on the Leaderboard. The alias can consist of alphanumerics only plus _ (underscore) or - (hyphen) but no other symbols. It cannot start or end with a symbol, and must have no more than 32 characters. Should your alias not follow these rules, the 3410 TAs are more than happy to create a great name for you. Remember to keep it clean, folks.
  • heaplib.c: your implementation of malloc!
  • spinlock.c: your implementation of spinlock!
  • A git commit history obtained using the command git log on your repository.

Overview

By now you should be familiar with static vs dynamic allocation. Statically declared memory is allocated on the stack and exists only for the life of the function in which it was created; dynamically allocated memory is allocated on the heap and exists from the moment it is allocated until it is explicitly freed. Support for dynamic memory allocation is part of the Standard C Library.

A dynamic memory allocator maintains the memory associated with the heap. The most basic task will be to keep track of which bytes of the heap are currently in use (i.e., are allocated) and which bytes are not in use (i.e., free). On real systems, the heap can grow with the needs of a process. For this assignment, however, we will consider the heap to be a fixed-size block of memory. Your dynamic memory allocator will always be associated with a void *heap that points to the beginning of the heap that you must manage. (The heap pointed to by heap is a simulated version of the actual heap that the C Standard Library manages.)

A dynamic memory allocator is a bit like an actual library (with books). When someone asks for 512 bytes of memory, the allocator finds 512 free contiguous bytes of memory for the caller and returns the address of the first of these 512 bytes. These 512 bytes are now "checked out" and cannot be given out by any subsequent requests until they are explicitly freed. The bytes are freed when the user explicitly tells the allocator it no longer needs the bytes beginning at that address.

For this assignment, it will be important to understand not only what it means to correctly implement a dynamic memory allocator, but also what it means to correctly use a dynamic memory allocator. If at any point your code seg-faults, you must know whether the fault lies in the library or the user of the library. For example, a user should never write past the fixed-size heap. If they do, the problem could be that the user has written past the chunk of memory they were given (this is the fault of the user) or it could be that the memory allocator gave the user a block of memory which did not actually reside entirely inside the heap (this is the fault of the allocator). To help you learn this distinction, we begin by having you write tests that should correctly use the dynamic memory allocator. Then, should you experience an error, you will know the fault lies in the implementation. Once you have a test suite that you can trust, you can use it to test your own implementation.

Remember the best way to implement a large task is to break it up into very small, testable subtasks. Code a little, test a lot. Code a little, test a lot... and so forth.

The Specs

Your dynamic memory allocator will consist of four functions, which are declared in heaplib.h and will be implemented (by you) in heaplib.c. The functions hl_alloc(), hl_release(), and hl_resize() correspond to the actual C Standard Library functions malloc(), free(), and realloc(). You should use your understanding of these functions to guide your implementation, but nowhere in heaplib.c should you call these three functions.

void hl_init(void *heap, unsigned int heap_size)

Sets up a new heap beginning at heap of size heap_size (in bytes). This function does not allocate the heap (of size heap_size) pointed to by heap. That should be done already by the user calling this function.

PRECONDITIONS:
  1. the allocated heap must be heap_size in bytes
  2. heap_size must be >= MIN_HEAP_SIZE (here, 1024)
If preconditions are violated, non-graceful failure is acceptable.

See the given tests in the tests.h file for what it means to allocate the heap before calling hl_init.

This hl_init function should be called exactly once for each heap being managed.

Your memory allocator must live entirely inside the heap it is managing. All meta-data (pointers, size fields, flags indicating free or in-use) must all live in the heap itself. You may not use any global variables or structures. Every variable or structure must live inside either the heap or in the local function being executed. Your design should work for any heap size that is at least 1024 bytes in size.

void *hl_alloc(void *heap, unsigned int block_size)

Allocates a block of memory of size block_size bytes from the heap starting at heap.

Returns a pointer to the block on success; returns 0 (NULL) if the allocator cannot satisfy the request.

PRECONDITIONS:
  1. hl_init must have been called exactly once for this heap
If preconditions are violated, non-graceful failure is acceptable.

The returned address should be aligned at multiples of 8 bytes, but heap is not required to be 8-byte aligned. The function is allowed to allocate more than the requested size, but never less. The memory "allocated" by this function does not need to be zeroed out. Requested blocks of memory can be of size 0.

void hl_release(void *heap, void *block)

Releases the block of memory pointed to by block (which currently resides in the heap pointed to by heap). Acts as a NOP if block == 0 (NULL).

PRECONDITIONS:
  1. block must have been returned to the user from a prior call to hl_alloc or hl_resize
  2. hl_release can only be called ONCE in association with that prior call to hl_alloc or hl_resize
If preconditions are violated, non-graceful failure is acceptable.

For utilization purposes, released memory should be able to be allocated again by subsequent calls to hl_alloc. The memory released by this function does not need to be zeroed out.

void *hl_resize(void *heap, void *block, unsigned int new_size)

Changes the size of the block pointed to by block (that currently resides in the heap pointed to by heap) from its current size to size new_size bytes, returning a pointer to the new block, or 0 if the request cannot be satisfied. The contents of the block should be preserved (even if the location of the block changes -- this will happen when it is not possible to increase the size of the current block but there is room elsewhere on the heap to satisfy the request). If block has the value 0 (NULL), the function should behave like hl_alloc.

PRECONDITIONS:
  1. block must have been returned to the user from a prior call to hl_alloc or hl_resize
If preconditions are violated, non-graceful failure is acceptable.

Note: The new block size might be smaller than the current size of the block. As for hl_alloc, the return value should be 8-byte aligned. You can copy the contents of a chunk in memory by using memcpy (if the source and destination do not overlap) or cast block to a char * and copy the contents byte by byte, making sure to copy in the correct direction.

void spin_lock(slock* l)

This function implements the locking semantics: if the original value in l is 0, then you can have the lock. Also, you need to change the value in l to 1. Keep in mind that this operation has to be done atomicly.

void spin_unlock(slock* l)

This function implements the unlocking semantics: you release the lock by set the value in l to be 0.

Note: You should protect your four hl_* functions using the global lock (slock* malloc_lock), such that within a multicore environment where there are more than one cores calling these functions, only one core should operate on the heap at a time. In another word, your malloc should be thread-safe. See spinlock.h for more details.

For hints on how to implement the spinlocks, take a look at the lecture slides from the multicore lecture. They should give you an idea on how to implement the spinlock with inline MIPS assembly.

Compiling and Running Your Code

You have been given the following files as part of this assignment (heaplib.c is the only one you will submit):

  • heaplib.h -- contains the interfaces of the functions you are to test and implement
  • heaplib.c — this is the file containing your implementation
  • tests.h — this is the file containing your test suite
  • compile.sh — this is how you will compile your malloc implemenation
  • run_each_tests.sh — this is how you will run your malloc implemation against tests
  • spinlock.h — this contains the headers for the spinlock you are to implement
  • spinlock.c — this is the file containing the spinlock you should implement with MIPS assembly

The heart of this assignment is your implementation of the functions declared in heaplib.h. You are implementing a library, which does not contain a main. The tests.h file uses your library. The test file will contain many test files but will not be exhaustive. You should consider implementing some of your own tests. You should be prepared to do this after your lab.

The compile.sh file will compile your malloc implemation when you run ./compile.sh. And run_each_tests.sh will run the tests.h file on your compile malloc.

Case Study

We have here a lame implementation of the assignment that we call heaplame. This implementation is meant to give you a way to start understanding how a dynamic memory allocator might work, but beware: (1) this implementation is buggy. In several respects, it does not follow the specs above and (2) this approach is inherently simplistic.

/* --------------------- HEAPLAME ------------------- */

#include "heaplib.h"

/* -------------------- DEFINITIONS ----------------- */

/* to keep things simple, support only 5 blocks of memory at a time*/
#define N_SUPPORTED_BLOCKS 5
#define BLOCK_NOT_FOUND -1

/* Useful shorthand: casts a pointer to a (char *) before adding */
#define ADD_BYTES(base_addr, num_bytes) (((char *)(base_addr)) + (num_bytes))

typedef struct _block_info_t {
    unsigned int block_size;
    void *block;
} block_info_t;

typedef struct _heap_header_t {
    unsigned int heap_size;
    block_info_t blocks[N_SUPPORTED_BLOCKS];
    bool in_use_f[N_SUPPORTED_BLOCKS]; // true or false?
} heap_header_t ;


/* -------------------- PRINT DEBUG FNS ----------------- */

/* This is an example of a debug print function.  The entire body is
 * wrapped in an #ifdef so that the default compile does not include
 * them.  Print statements create huge output files (for which we have
 * ZERO TOLERANCE) so please wrap all printf statements like this.
 */
void print_debug_heap_header(heap_header_t *header) {
#ifdef PRINT_DEBUG
    printf("heap starts at addr %p\n"   // C printing trick.
           "heap_size = %d\n",            // Notice: no commas between lines
           header, header->heap_size);
    for (int i = 0; i < N_SUPPORTED_BLOCKS; i++) {
        printf("[%d] = [%d,%p,%d]\n", i,
               header->blocks[i].block_size,
               header->blocks[i].block,
               header->in_use_f[i] ?  1 : 0);
    }
#endif
}

void print_debug_block_block(heap_header_t *header, int which_block) {
#ifdef PRINT_DEBUG
    int i, num_chars = header->blocks[which_block].block_size;
    char *block = header->blocks[which_block].block;
    printf("block:\n");
    for (i = 0; i < num_chars; i++) {
        printf("\t%c\n", block[i]);
    }
#endif
}

void print_debug_alloc(void *block_addr) {
#ifdef PRINT_DEBUG
    printf("will give user block beginning @: %p\n", block_addr);
#endif
}

void print_debug_entering_init() {
#ifdef PRINT_DEBUG
    printf("Entering hl_init()\n");
#endif
}

void print_debug_entering_alloc(int size) {
#ifdef PRINT_DEBUG
    printf("Entering hl_alloc(), looking for block of size %d\n", size);
#endif
}

void print_debug_entering_release() {
#ifdef PRINT_DEBUG
    printf("Entering hl_release()\n");
#endif
}

void print_debug_entering_resize() {
#ifdef PRINT_DEBUG
    printf("Entering hl_resize()\n");
#endif
}

/* ------------------- HELPER FUNCTIONS ----------------- */

/* Given the heap header & a block address, finds the block in
 * question. Returns BLOCK_NOT_FOUND if not found.
 *
 */
int find_block(heap_header_t *header, void *block) {

    // searching through blocks, looking for one given
    for (int i = 0; i < N_SUPPORTED_BLOCKS; i++) {
        if (header->blocks[i].block == block) { // found it!
            return i;
        }
    }

    return BLOCK_NOT_FOUND;
}

/* -------------------- THE BIG FOUR FNS ----------------- */

/* See the .h for the advertised behavior of this library function.
 * These comments describe the implementation, not the interface.
 *
 * Initializes the heap size (after asserting size >= MIN_HEAP_SIZE).
 * Initializes block information for N_SUPPORTED_BLOCKS, which start
 * empty ( = size of 0, NULL pointer, and not in use).
 */
void hl_init(void *heap, unsigned int heap_size) {

    int i;

    print_debug_entering_init();

    heap_header_t *header = (heap_header_t *)heap;
    header->heap_size = heap_size;

    for (i = 0; i < N_SUPPORTED_BLOCKS; i++) {
       header->blocks[i].block_size = 0;
       header->blocks[i].block = NULL;
       header->in_use_f[i] = false;
    }

    print_debug_heap_header(header);

}

/* See the .h for the advertised behavior of this library function.
 * These comments describe the implementation, not the interface.
 *
 * Looks through N_SUPPORTED_BLOCKS, looking for one that is free.  If
 * found, initialize the block and return the location of the block
 * to the user. If not found, return FAILURE.
 */
void *hl_alloc(void *heap, unsigned int block_size) {

    heap_header_t *header = (heap_header_t *)heap;
    int i;

    // begin by pointing to the first block
    void *next_free_byte = ADD_BYTES(header,
                                     sizeof(unsigned int) /* heapsize */ +
                                     sizeof(block_info_t) * N_SUPPORTED_BLOCKS /* block info */ +
                                     sizeof(bool) * N_SUPPORTED_BLOCKS /* in use */);

    print_debug_entering_alloc(block_size);

    // searching through blocks, looking for one not in use
    for (i = 0; i < N_SUPPORTED_BLOCKS; i++) {
        if (!header->in_use_f[i]) {      // found one!
            header->in_use_f[i] = true;  // mark used
            header->blocks[i].block_size = block_size;
            header->blocks[i].block = next_free_byte;
            break;
        } else {
            // if this block is taken, the next block will be "block size" later
            next_free_byte = header->blocks[i].block + header->blocks[i].block_size;
        }
    }
    if (i == N_SUPPORTED_BLOCKS)
        return FAILURE;

    print_debug_alloc(next_free_byte);
    print_debug_heap_header(header);

    return next_free_byte;
}

/* See the .h for the advertised behavior of this library function.
 * These comments describe the implementation, not the interface.
 *
 * Initializes the heap size (after asserting size >= MIN_HEAP_SIZE).
 * Initializes block information for N_SUPPORTED_BLOCKS, which start
 * empty ( = size of 0, NULL pointer, and not in use).
 */
void hl_release(void *heap, void *block) {
    heap_header_t *header = (heap_header_t *)heap;

    print_debug_entering_release();
    int i = find_block(header, block);
    header->in_use_f[i] = false;
}

/* See the .h for the advertised behavior of this library function.
 * These comments describe the implementation, not the interface.
 *
 * Finds the block that needs to be resized. Changes the size of that block.
 */
void *hl_resize(void *heap, void *block, unsigned int new_size) {

    heap_header_t *header = (heap_header_t *)heap;

    print_debug_entering_resize();

    int i = find_block(header, block);
    header->blocks[i].block_size = new_size;

    return block;

}

The overall approach of heaplame is to keep track of only N_SUPPORTED_BLOCKS:

#define N_SUPPORTED_BLOCKS 5

This means that the user may only allocate 5 blocks at a time. (Pretty lame, huh?) The information for each block (the size and the starting address of the block) is stored in a the following defined structure called block_info_t:

typedef struct _block_info_t {
    unsigned int block_size;
    void *block;
} block_info_t;

This block information is stored in an array in a containing structure called block_header_t:

typedef struct _heap_header_t {
    unsigned int heap_size;
    block_info_t blocks[N_SUPPORTED_BLOCKS];
    bool in_use_f[N_SUPPORTED_BLOCKS];
} block_header_t ;

This implementation dedicates the beginning of the heap to the metadata (which is initialized in hl_init), and the remaining portion of the heap to the 5 blocks allocated for the user.

In the beginning (see Figure 1) there is an array which the user has created (either by statically allocating it on its stack or by calling malloc). This is the area of memory which simulates the heap. If you look in heaplib.h, you will see the following definition:

#define MIN_HEAP_SIZE 1024

Let's step through the implementation of some of the functions. Figure 1 shows the state of memory before a call to hl_init with a heap of size 1024 beginning at address x100:

figure1

After the call to hl_init, the meta data has been initialized and the heap looks as shown below in Figure 2:

figure2

Figure 3 shows what a subsequent call to hl_alloc with a block size of 512 bytes would change the heap:

figure3

Figure 4 shows how a call to resize would change the heap:

figure4

These diagrams are not intended to show you the correct behavior of the functions you should implement. Instead they show you an accurate depiction of what is currently implemented so that you may fully understand the code you have been given. In fact, the code you have been given is woefully incomplete and incorrect. You should take some time to look at the implementation, run it through a few of the given tests, and write your own tests that expose the ways in which the code is broken or inadequate.

Known Correctness Bugs. In addition to being really unfriendly (allowing only five blocks to be allocated!), the lame solution in heaplame.c is incorrect in several respects:

  1. no functions return pointers that are 8-byte aligned
  2. heap is assumed to be 8-byte aligned
  3. not all functions behave correctly when given null block pointers

.... and possibly more???

Although the implementation is not a working solution, it provides you important examples of heap management, casting, pointer arithmetic, and debugging.

Fixing all of the problems in this implementation would require a significantly different approach than the lame implementation. You are encouraged to spend some time thinking about how to debug this code (and to place your fixes in heaplesslame.c for sanity's sake, but it should not be viewed as the baseline for your own implementation, which should be far more sophisticated. For Part 2 of this assignment, you will be asked to come up with a completely new approach to a dynamic memory allocator--one that is robust enough to accomplish the task.

  • Casting. Take a look at the implementations of hl_init and hl_alloc. Notice that casting the heap to a heap_header_t * is a straightforward and very readable way to set and access the fields of the otherwise unspecified heap.

  • print_debug functions. Look at the function print_debug_entering_init. It only prints "Entering hl_init()" to the screen. Although this function may be useful to a programmer when implementing and debugging, it is not the kind of information that should be printed to the screen when a program includes and makes calls to your heaplib library. Novice programmers notoriously litter their code with print statements that are commented in and out as they code. This is not good form, especially since some straggling printf's almost always remain in the final version of the code. Print statements can be particularly devastating during a long run -- at best they make the code painfully slow; at worst they fill up the hard drive and make the computer unusable.

  • One solution to this problem is to create a variable such as debug_mode_f (_f often indicates a flag in C) and put the print statement inside an if statement that checks the variable:

    if (print_debug_f) {
        print_heap(heap);
    }
    

    This solution has the nice property that (if tied to a command line option) the flag can be turned off and on each time you run the program. The problem with this approach is that even your "production-level" code will be littered with if statements that always evaluate to false.

    The code you have been given solves the problem by placing the body of the print statement as the controlled text in a conditional group:

    void print_debug_entering_init() {
    #ifdef PRINT_DEBUG
        printf("Entering hl_init()\n");
    #endif
    }
    

    The preprocessor (part of the compiler) will only insert the controlled text if the PRINT_DEBUG has been defined/set.
    This Macro approach requires re-compiling in order to insert the print statements. If you didn't want to create a separate function, you could also simply wrap the printf directly:

    #ifdef PRINT_DEBUG
      printf("the value of x is %d\n", x);
    #endif
    

    This may not be the best strategy for every scenario, but it is good for this assignment. Do not use any print statements inside your heaplib.c that are not wrapped in an #ifdef flag. Compiling and Running Your Code discussed how to turn the flag on and off at compile time.

  • Pointer arithmetic. To implement this assignment, you will need to understand pointer arithmetic. As a basic example, see the following snippet of code:

  • int array[4];
    int *ptr = &array + 1;
    

    What do you think ptr points to? Not sure? Paste the code into the C Tutorial, click on "Visualize Execution", and step through it. When you add 1 to the pointer, the unit of addition is int, meaning that you're taking the base address of array, and saying "go one integer later". The 1 is actually 1 integer, or 4 bytes.

    When you are manipulating pointers in this assignment, you may often want to add "raw bytes". For example, if you want to get the address that is 16 bytes into your heap, which you've already cast to be a heap_header_t *, you would need to write:

    heap_header_t *header = (heap_header_t *)heap;
    void *sixteen_later = ((char *)header)+16;
    

    Because we suspect you might want to do addition like this a lot, we've provided you with a simple #define:

    #define ADD_BYTES(base_addr, num_bytes) (((char *)(base_addr)) + (num_bytes))
    

    which you can could then use as follows:

    heap_header_t *header = (heap_header_t *)heap;
    void *sixteen_later = ADD_BYTES(header, 16);
    
  • C is not the same everywhere! You should not assume that you know the size of all variable types in C. An int might be 4 bytes on one machine and 8 bytes on another. This is one reason why it is VERY important to use the VM or the Linux machines we have provided for this class for this specific assignment. If you never run your code on the machines we test them on, you may be in for a horrible, seg-faulty surprise. It is always a good idea to use sizeof() instead of assuming you know the size of any variable or type in your code. Alternatively, uintptr_t in <stdint.h> is guaranteed to contain the size of a pointer on the machine where the executable is running. You may find this useful.

  • Another aspect of C is that the compiler will align your structures for you. How it performs the alignment varies not only by machine but also by operating system, so once again, do not make any assumptions. As an example, look at the definition of heap_header_t.

    typedef struct _heap_header_t {
        unsigned int heap_size;
        block_info_t blocks[N_SUPPORTED_BLOCKS];
        bool in_use_f[N_SUPPORTED_BLOCKS];
    } heap_header_t ;
    

    How large is this structure? hl_alloc calculates the size of the header to be:

    sizeof(unsigned int) /* heapsize */
    + sizeof(block_info_t) * N_SUPPORTED_BLOCKS /* block info */
    + sizeof(bool) * N_SUPPORTED_BLOCKS /* in use */
    

    However, if you simply said sizeof(heap_header_t) you might get a different answer because the size of the structure is typically rounded to a size that is divisible by 4 (or sometimes 8). This is one of the reasons we require you to keep your block pointers also 8-byte aligned. (This is also one of the reasons we did not put the in_use_f flags inside the block_info_t structure.) You should decide exactly how you want to pack your data in the heap. It will be important for you to understand structure alignment so that as you make these decisions you understand how to implement them.

  • Using Macros. A macro is a <keyword, definition> pair. Any reference to the keyword will be replaced by the definition at compile time. You may be familiar with simple macros such as:

  • #define ALIGNMENT 8

    which is a great way to rid your code of magic numbers. Macros can also be more complex, taking arguments. For example:

    /* Useful shorthand: casts a pointer to a (char *) before adding */
    #define ADD_BYTES(base_addr, num_bytes) (((char *)(base_addr)) + (num_bytes))
    

    We recommend using macros for any complicated but simple tasks; your code will be much more readable, maintainable, and debug-able at no performance cost.

  • Ternary Operators. You will also notice that a ternary operator is used in the print_block_header function of heaplame. This is very useful as it allows you to have a conditional output without explicitly writing an if statement.

  • block->in_use_f ? "Yes" : "No"

    The value before the ? is a boolean expression, and if true, evaluates to "Yes", otherwise it evaluates to "No".

    Note that the ternary operator has very low precedence, so be sure to wrap your ternary operator usage in parentheses! i.e.

    (x ? 1 : 2)

    instead of

    x ? 1 : 2

Student implementations of dynamic memory allocators are notoriously ugly and impossible to debug. Yours should not be. First, because you will make your own life a living hell. Second, because we will deduct style points for anything egregious. We have given you sample code with all sorts of tricks in it because these are tricks that expert programmers know about and use regularly. Now that you have been shown these techniques, you will be expected to use them.

Cost of a ridiculously long project writeup? 15 minutes. Shedding your status as a novice and being able to produce code that is readable, debug-able, and portable? Priceless.

Malloc Test Suite

While we will be providing you with tests, we will have a more extensive line up of tests for our autograder. Which is why, alongside a malloc test lab — where we give you some guidance on how to create malloc tests — we encourage you to write some of your own tests. Additionally, if you write some tests yourself, you will further your understanding of what can go wrong with a malloc implementation.

Look in tests.h to see how the test suite is structured. There is room for 26 tests.

Spec Tests (0-15)

"Spec tests" detect ways in which a heaplib implementation does not meet the given specifications. There are many ways in which heaplame is not up to spec. Your task is to write tests that detect these problems. Use the known problems in heaplame as a way to foresee and avoid possible problems that your code could experience in Part 2.

Stress Tests (16-23)

"Stress tests" detect errors that might only occur after many calls to the heaplib library. This could be due to subtle bugs, corner cases, or just errors that only manifest under extreme conditions (like the heap filling up).

There are two classic problems that result from a buggy implementation of heaplib:

The first we will call a structural integrity problem. At some point a field in the heap thought to contain a pointer will no longer contain a valid address. Perhaps you have a next pointer in a linked list, but you accidentally overwrite it with the size of a block. Now you try to dereference the address 42. This will result in a segmentation fault. Bugs that corrupt the structural integrity of your heap lead to fatal errors at runtime. You can test for structural integrity problems by simply hammering the library with a bunch of calls to alloc, resize, and free. If the test finishes, it is a success. If a segmentation fault occurs, the test has failed.

The second problem we will call a data integrity problem. In this case, a buggy implementation will either give the user a bad block of memory (maybe a block that goes past the edge of the heap, for example, or a block with some overlapping region that has already been given out in a previous request to alloc that has not yet been freed) OR the library itself writes something into the block portion of the heap, which it should never do unless that block has been freed by the user. You can test for data integrity by requesting blocks from the library and then filling the block with data. After subsequent calls to the library, check and see whether the data is still intact.

TIPS:

  • Make sure you use legal, well-formed calls. Also, make sure that the errors are a result of the heap library, and not your test:
    • if you ask for 24 bytes and then fill 42 bytes, you will probably corrupt data on your heap, but that's the test's fault not the library's
    • if you try to call free with a pointer that was never returned by the library from a previous call to alloc, the library might very well seg-fault, but that's the test's fault not the library's. You cannot free something that was never allocated in the first place.
  • Structural and data corruption often manifests itself only when different blocks are requested for, not when the same chunk of memory is requested and freed repeatedly.
  • Our robustness tests (which we will use to grade your code) check for both structural and data integrity.

IMPORTANT: In semesters past students have asked for more test placeholders. This is why you have room to write 23 tests. However, you might not need all 23 test placeholders. (We do not have 23 tests...)

Designing and Implementing Your Own Memory Allocation Library

Now it's time for you to implement your own dynamic memory allocation library that supports the allocation, freeing, and resizing of blocks of memory. Your code must follow both the interface and the functional specifications which were outlined above.

You are free to implement malloc as we suggest or use your own original ideas. As you consider your options, you and your partner should under no circumstances look at anyone else's code (including code on the internet).

Your implementation will be graded based on correctness. Correct implementations of heaplib.c will then be evaluated on memory utilization. Speed will not play a factor unless your code is so slow that it times out when we test it. (Incorrect implementations will not be tested for utilization--and will forgo the points associated with these tests--because these metrics are meaningless in the context of an incorrect program. (Example: you can fulfil many allocation requests when you give everyone the same overlapping bytes in the heap...)

Utilization. When a heap is initialized, it is given a fixed memory area that it can use to fulfil allocation requests. (Note: the actual heap on a real system can grow as needed within the virtual address space of the process. The fixed-sized heap in this assignment is an artificial but simplifying restriction.)

  • Overhead: the first aspect of heap utilization relates to how much meta data your implementation requires. Meta-data is the information about the block, the size of each block, whether it is free, etc. Less meta data allows you to dedicate more of the heap to service alloc requests.
  • Management: the second aspect of heap utilization relates to the management of the heap itself. Careful implementations of alloc, resize, and free will keep your heap less fragmented and more able to fulfil future allocation and resize requests.
  • Allocation Strategies. Think about the pros and cons of some common allocation strategies. A first-fit approach always gives the user the first memory block in the region that is big enough to fulfil the request. This is the easiest to implement. A best-fit approach gives the user the smallest memory block that fulfils the request.
  • Fragmentation. After responding to many calls to hl_alloc and hl_release the heap could become chopped up into small interwoven blocks of allocated and freed memory. When your memory region becomes fragmented, the sum of the free blocks of memory might satisfy an hl_alloc request, but since they are not contiguous, the request will fail. Whether you implement a first-fit or best-fit allocation strategy is just one way in which you control the fragmentation of your memory region. Whether you support the coalescing of blocks (explained in the pdf) is another.

The Leaderboard

The autograder will run your code 5 times per day.

We will run your code and test it for correctness. Code that passes our correctness tests will then be tested for overhead, and heap maintenance. At some point during the following week we will set up a leaderboard that shows how your code fares on these three tests compared to other groups in the class. We hope you will be able to see from this exercise that design choices are full of trade-offs, often sacrificing one metric for another.

Design Doc and Meetings

After having thoroughly read this project write-up, you will need to prepare for your project by writing up a design doc and meeting up with a TA to go over your design doc. Your design doc should consist of the following elements:

  • An explanation of how you plan to implement your heaplib
  • An explanation of how you plan to implement the spinlock

Make sure you schedule your design doc meeting promptly (it's on CMS).

Tips

  • These slides can help give you an idea of the different kinds of implementations you can use.
  • Continuously implement the very smallest, testable unit of work possible and then immediately test it for correctness. (Here, your testing helps your implementation.) If no such test exists, write one. (Here, your implementation helps your testing.)

Grading

You will be graded first on the correctness of your malloc implementation. Afterwards, you will receive points based on how well your implementation utilizes the given heap.

A rough breakdown of your grade is as follows:

  • 70%: Correctness
  • 30%: Utilization

Note that you will not receive points for utilization if your implementation does not pass the correctness tests!


Acknowledgements

This assignment is the literal descendant of an assignment originally written by Robbert Van Renesse at Cornell. It is the spiritual descendant of the textbook "Computer Systems: A Programmer's Perspective" by Bryant and O'Hallaron, which your instructor thinks is a fantastic book.