A9: subprocess

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Overview

With this assignment, we hope to demystify how the essential, OS-adjacent parts of a computer system work. Your task in this assignment is to implement a minimal but functional subprocess management library called subprocess. The subprocess API resembles that of Python’s subprocess module, however it is significantly simplified. The library supports the very basics of subprocess management including environment variable support, passing input to STDIN, and error handling.

Provided Files

The source files included with the release of this assignment contain a mock for the subprocess_run function, implementing which is your only task for this assignment. The release code contains the below files:

  • subprocess.h — header file defining the subprocess_run API your code will use, reading it provides helpful documentation.
  • subprocess.c — where you will implement the subprocess_run function. This is the only file you need to submit.
  • runner.c — a driver for subprocess; while you won’t submit this file, we encourage you to use it as a testing utility.
  • Makefile — a makefile you can use to build runner and subprocess.c

Run rv make to create the executable ‘runner’, and then run rv qemu runner to run it.

Getting Started

To get started, obtain the release code by cloning your assignment repository from GitHub:

git clone git@github.coecis.cornell.edu:cs3410-2026sp-student/<NETID>_subprocess.git

Replace <NETID> with your NetID. All the letters in your NetID should be in lowercase.

Using C23 to Make Your Code Cleaner

Starting in C23, which is the C standard used in CS 3410, you may use the following language features.

  • Instead of using the NULL macro you may use the nullptr keyword.
  • Boolean values (true and false are natively supported).
  • The auto keyword allows you to automatically deduce the type of a variable. So, instead of writing int x = 2;, you could write auto x = 2;.

We encourage you to use them to make your code more concise.

Background

Fork and exec give programmers considerable flexibility in how to initialize and manage child processes. Unfortunately, this flexibility comes with a downside — complexity. This complexity is often unnecessary when most programmers want to declaratively create child processes in predictable ways. In addition, it makes code much more error prone with regard to certain edge cases. This is part of the motivation that led Python’s developers to introduce the subprocess module. You will implement a small subset of this subprocess functionality, in C, yourself. Below is the signature of the function you will implement:

completed_process_result subprocess_run(const char *path, char *const args[],
                                        char *const env[], const char *cwd,
                                        arr_with_length input);

Your subprocess_run will support arbitrary paths, arguments, inheriting (or not inheriting) the parent’s environment, setting the current working directory, and finally passing arbitrary input. While implementing this function may look deceptively easy at first try, there are many small complexities that add up. In particular, you will notice that a substantial portion of your code will not be dedicated to the common case, but rather to handling spawning errors caused by the parent process.

You will implement this assignment in three tasks. Task 0 is focused on supporting the path, args, env, and cwd arguments. You should also make sure to handle fork errors at this stage and save the return code. In task 1, you will implement passing input to the child processes STDIN. This will be a bit more challenging and will require learning more about pipes. Finally, in task 2, you will implement handling for EINTR errors, as well as adding a feature to kill our process (and reap it) if an IO error occurs while writing to its STDIN.

Task 0: Basic Implementation

As mentioned before you will need to support some basic arguments for subprocess_run in this part of the assignment. While we won’t tell you how to achieve your goals, we will describe what you need to do and provide you with a reference of every library function you will need to accomplish your job. A completed task 0 should do the following:

  • Support the path and args argument in the same way they would be handled by the exec system call.
  • If the env argument is a nullptr, then the environment of the child process should be inherited from the parent. Otherwise, set the child’s environment to be env.
  • Likewise, for cwd, a nullptr indicates that the parent’s cwd should be preserved. Otherwise, set the child’s current working directory to cwd.
  • Your code should populate and return a correct returncode field as described in the subprocess.h header. You may find the WIFSIGNALED, WTERMSIG, and WEXITSTATUS macros helpful for this process. More about them here.
  • Finally, in the event of a fork or waitpid error, your code should set the parent_error flag in the completed_process_result flag and zero the returncode.

To solve this part of the assignment you will need the following library functions. We encourage you to read their man pages.

Errors that occur in the child process while replacing the parent’s image (i.e. calling exec or handling pipes in the child process) do not need to be handled or reported in any way. Handling these types of errors correctly requires a non-trivial usage of pipes to distinguish from a child subprocess terminating with an error and a launch error during the invocation of exec.

Task 1: Adding Input Support

In this task you will use pipes and the write system call to add support for the input argument. The input argument specifies input that must be passed in to the child processes STDIN using a length prefixed string. That is, the number of bytes in the input string is governed solely by its len field. If len == 0, then the input argument is ignored. If len is not zero you must do the following to pass the input into STDIN:

  • Create an stdin pipe in the parent using the pipe system call. If pipe creation fails, you must set the parent_error flag to true and return.
  • In the parent process use the write system call to write to the write end of the pipe you created. Note that write is not guaranteed to write the entire string you pass to it at once, so you will need to handle that. We will handle write errors in part 2 of the assignment, which will involve killing and reaping the subprocess as well as handling EINTR errors.
  • In the child process, you should use the dup2 system call to connect our pipe to the STDIN of our process.
  • In both the parent and child, you should always make sure to close the file descriptors associated with the pipe using the close function.

We again encourage you to read the man pages for all of the system calls you will use in this part.

Task 2: Making the Code More Robust

In this task we will make our subprocess implementation more robust and specification conforming. We will do this by handling EINTR errors and killing the child process when necessary.

Killing the Child Process on Write Failure

When we write to the STDIN of the child process an error can occur. Our specification mandates that we kill the child process if possible. Use the kill system call with the SIGKILL signal. We are not done yet, however. A dead “zombie” process still has information about it stored in the process table. This is helpful if our parent wanted to know its return code. So, we need to use waitpid, like before, to get the processes return code. This will tell the kernel that it is safe to delete the information it stored about our dead process. As a note, we will not use this return code for any purpose, and you should not return it in any way.

Handling EINTR Errors

As mentioned in your lab, the kernel may interrupt a system call whenever a process recieves a signal. This is helpful, if, for example, the process wants to pause the system call to handle an alarm. In our case, however, we do not want our subprocess library to be interrupted by an unrelated signal. As a result, we should ignore any EINTR errors and restart the corresponding system call. This can be done by checking the errno flag after a system call returns with an error. You should implement this handling using a do-while loop for any call to write or waitpid in your code. In general, it is rare for a process to encounter an EINTR unless the program creates a special signal handler. However, since our library should work robustly for many different programs, this feature makes sense. We also implement this feature to mimic the behavior that Python exhibits.

Testing

Once you have implemented your code you can use the runner.c file to test your subprocess implementation.

Submission

You need to submit the following files to Gradescope:

  • subprocess.c, which will contain your solution to the subprocess assignment.

To receive credit for this assignment, as in all 43 previous assignments for this class, you must mark the code for each “task” above with a comment of the form /* HERE LIES TASK N */ for some value of N. Be sure to use exactly this syntax for your comment.

Rubric