A10: shall

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Overview

With this assignment, we hope to demystify how the essential, OS-adjacent parts of a computer system work. Your task in this assignment is to implement a minimal but functional command-line shell called shall (“shall we execute some commands?”). The shall syntax resembles that of the original Bourne shell (sh) and its successor, bash. Further, it supports a realistically large subset of the capabilities provided by these real-world tools, including I/O redirection, foreground/background process execution, and a limited form of interrupt handling.

Provided Files

The source files included with the release of this assignment contain most of the code for running shall commands. In particular, all the code associating with reading and parsing shell commands is complete. Your job is to write the code for spawning processes and redirecting I/O. In addition, you should improve the code for reading files.

  • shall.h — header file used by all other source files in this distribution.
  • shall.c — main driver for shall; you won’t modify this file
  • parser.c, token.c — code for lexical analysis and parsing shall commands; again, you won’t modify anything in these files
  • exec.c — code for running the built-in commands, spawning new processes, I/O redirection, etc.; almost all of your work will be done in this file
  • reader.c — code for implementing a very basic, 1-character-at-a-time input reader; you will make an improved version of this

Run rv make to create the executable ‘shall’, and then run rv qemu shall to run it.

Getting Started

To get started, obtain the release code by cloning your assignment repository from GitHub:

git clone git@github.coecis.cornell.edu:cs3410-2025fa-student/<NETID>_shall.git

Replace <NETID> with your NetID. All the letters in your NetID should be in lowercase.

In addition, the autograder for A10 compiles your code with an extra flag: -D_GNU_SOURCE.
Please add this flag to your Makefile so your local build matches the autograder’s environment.

Before

CFLAGS = -Wall -Wpedantic -fsanitize=address,undefined -Wshadow -Wformat=2 -Wconversion -std=c23 -g

After

CFLAGS = -Wall -Wpedantic -fsanitize=address,undefined -Wshadow -Wformat=2 -Wconversion -std=c23 -g -D_GNU_SOURCE

Background

Real-World Shells

In this section, we’ll use examples from a real-world shell, bash, which is perhaps the most common shell program in the Unix/Linux world. The syntax of the shall you implement is close to, but not quite the same as, the examples given in this section.

It’s easy to forget that a shell, despite its centrality as the user interface to a computer, is “just another normal program.” It is not part of the kernel, and it does not have any kind of special privileges. It processes commands that you type and then uses standard OS facilities to launch other programs on your behalf.

For our purposes, a shell is an interactive command-line interpreter. It prints a prompt, waits for the user to enter a command line on its standard input stream, and then carries out some action that the command describes.

A command is a string, consisting of whitespace-separated words, like ls -l somedir. The first word (ls in our example) is the command: either a special built-in command name or the name of an executable file to launch. The remaining words (-l and somedir in the example) are arguments to pass to the command. The command receives this list of arguments as strings and can do anything it likes with them. (That’s what the argc and argv arguments to C’s main function receive.)

Built-in commands are implemented and run by the shell itself. Some shell built-ins in “real” Unix shells you may have used before include set, source, exit, and alias.

Most of the time, however, commands refer to actual executable files (i.e., compiled programs) that exist in the filesystem. For example, ls, though seemingly an essential piece of functionality, is the name of an executable file in most shells, not a built-in command. Your shell has a set of directories it looks in to find executables.

Built-in Command Or Separate Program?

Most shells have a which command that shows you the location of a program. You can also use it to distinguish between shell commands that are built in vs. actual executables.

$ which exec
$ which source
$ which alias
$ which python
/home/jhl287/anaconda3/bin/python
$ which ls
/usr/bin/ls
$ which which
/usr/bin/which

You can also type the full path to any executable to use it. On my machine, for example, /usr/bin/ls -l _<somedir>_ is equivalent to ls -l _<somedir>_.

A shell’s main purpose is to launch and manage processes that execute these shell commands. In general, a single command might entail launching multiple processes—together, we’ll call this group of processes a job. For example, you can type ls | head -n 2 to combine the ls and head executables (if you want to see only the first 2 files in a directory); the job for this command consists of an ls process and a head process with the standard output of the first connected to the standard input of the second.

Foreground / Background Jobs

Usually commands run in the foreground: the shell waits for them to complete before showing you another prompt. Unix shells also support launching long-running commands in the background. This way, you can continue to use the shell while the background command is executing.

To run a command in the background, put a & at the end. For example, try typing this in your computer’s “real” shell:

sleep 5

The sleep command runs for 5 seconds in the foreground, during which you can’t type any new commands. Now try this version:

sleep 5 &

Your shell will print out some information about the background job it launched, and then it will immediately print another prompt and let you type more commands.

Job Control

Because you can have any number of background jobs running at once, real-world shells provide job control features to manage them. In Linux systems, the most useful of these are the built-in commands jobs, fg, and bg. You can also interrupt the current foreground job by typing CTRL-C; similarly, CTRL-Z will pause the current foreground job and send it to the background. It is an interesting and highly rewarding exercise to implement these features, but doing so is beyond the scope of this assignment.

Signals, Interrupts

Recall that signals are the mechanism that the kernel uses to communicate with processes. In C, you can use the following two functions from signal.h to handle and send signals:

  • [signal(signum,sig_handler)][signal-manpage] — register the function sig_handler to respond to the signal signum
  • kill(pid,sig) — send sig to process pid

Using shall

This section describes how you can interact with the shall shell. You’ll need to refer back to this section for Task 1. Shall we begin?

Built-in Commands

As in bash and other “real-world” shells, shall has a short list of built-in commands: cd, source, exec, and exit. exec and exit have already been implemented for you. You’ll implement cd and source in Task 1.

cd dir
    change the working directory to directory 'dir'

source script
    read commands from file script

exec cat exec.c
    same as 'cat exec.c', but without forking, so the 'shall' is
    replaced with 'cat exec.c' and doesn't return

exec > exec.out
    this doesn't run anything, but redirects standard output to the
    file exec.out. Further commands that are executed now have their
    standard output redirected to file exec.out.

exit 3
    exit 'shall' with status 3. If no status is specified, 'shall'
    exits with status 0.

Other Commands

shall also has access to every executable file that the parent shell process from which you launched shall does. For example,

$ rv qemu shall
-> ls -l
total 212
-rw-rw-r-- 1 1001 1001   342 Apr  7 14:48 Makefile
-rw-rw-r-- 1 1001 1001  8159 Apr  7 17:37 exec.c
-rw-r--r-- 1 root root 20592 Apr  7 16:10 exec.o
-rwxr-xr-x 1 root root  9040 Apr  7 16:10 myint
-rw-rw-r-- 1 1001 1001   706 Apr  2 17:31 myint.c
-rwxr-xr-x 1 root root  9032 Apr  7 16:10 myspin
-rw-rw-r-- 1 1001 1001   546 Apr  2 17:26 myspin.c
<output omitted>
-> cat exec.c
<output omitted: contents of exec.c>
-> which ls
/usr/bin/ls
-> qemu myspin 3
spinning ... spun.
-> apt
apt 2.4.13 (amd64)
Usage: apt [options] command

apt is a commandline package manager and provides commands for
<output ommitted>

Real-World Shells

Anything that you can run from the shell invoked by the rv alias can be similarly invoked at the shall prompt. However, programs like emacs, vim,etc., which make extensive use of TTY escape characters, will probably misbehave.

Shortcuts aren’t supported

Shells like bash, sh, Windows Powershell, and others usually provide a set of features to simplify the typing of commands. The ones you’re most familiar with are probably tab completion and wildcard patterns (for example, “ls *.c” to display all and only the C source files in a folder). Most of these “extras” aren’t supported by the shall parser. Another one you won’t be able to try is the use of parentheses to group commands together (as in the “qemu mysplit 5 & (sleep 1 ; ps ax)”) example, above.

Background commands

shall also supports running commands as a background process. To do so, terminate your command with an &:

-> qemu myspin 5 &
Process 11 running in background
-> spinning ...
-> spun.

You can even invoke multiple background commands by using the ; separator. For example,

-> qemu myspin 5 & ; qemu myspin 3 &
Process 106 running in background
Process 108 running in background
-> spinning ... spinning ... spun.
spun.

I/O Redirection

By default, a shell command receives its input from a keyboard, and it displays conventional and diagnostic output to a terminal window. All three of these are actually file handles that the process reads from (stdin) or writes to (stdout and stderr). It is possible to change a program’s input source and/or either of its output streams, using a variety of built-in redirection operators. You will implement a fairly complete set of the standard redirection operators, though in some cases, we have changed the syntax to simplify the parsing tasks:

  • cat exec.c > exec.out — execute cat exec.c but create the file exec.out, and write all output to it, instead of stdout. If exec.out already exists, this replaces the original contents with the output of the cat exec.c command.

  • cat exec.c >> exec.out — execute cat exec.c, but append output to file exec.out, if it already exists. If exec.out does not yet exist, it should be created.

  • cat nosuchfile.c {2}> exec.err — execute cat nosuchfile.c but redirect error output only (file descriptor 2) to file exec.err (for example, in case nosuchfile.c does not exist). Note that the format is a little different from the standard shell, where the command would have been cat nosuchfile.c 2> exec.err.

  • cat < exec.c — execute cat without arguments, but take standard input from file exec.c.

  • cat exec.c > exec.err {2}>{1} — write both error messages and standard output to file exec.err. In the standard shell, the command is cat exec.c > exec.err 2>&1.

Task 1

Complete the implementation of shall by implementing the following functions in exec.c:

  • void interrupts_disable() — Disable responses to the SIGINT signal. This one has been completed for you.
  • void interrupts_enable() — Enable handling of the SIGINT signal. For the simple version of signal handling here, it suffices to use the SIG_DFL handler.
  • void interrupts_catch() — Install a custom signal handler for SIGINT. You should use the implementation of sighandler that is part of the assignment distribution.
  • void redir_fd(fd1,fd2) — This implements redirection of fd1 output to the input of fd2 (the > operator).
  • void redir_file(name, fd,flags) — Used for redirection of input and/or output when fd is itself a file descriptor.
  • void spawn(command, background) — Spawn and run command, in either the background or foreground.
  • void cd(command) — The built-in cd command; change the current directory to command->argv[1], or to the directory in environment variable $HOME if command->argv[1] is null. Be sure to print out any errors if there are any when changing directories.
  • void source(command) — The built-in source command; read and execute commands from the specified files in the list of arguments of the command.

Your work in each procedure is delimited by the // BEGIN and // END pairs. After this task, you should have a working shall that can execute the examples above. You can build shall by running rv make. Then, you can run shall by executing rv qemu shall.

Important Details

  • shall is usually in one of two modes: it is either waiting for input, or it is waiting for processes to finish executing. It cannot wait for both at the same time.

  • At any point in time, there is at most one foreground process. If there is a foreground process, shall waits for the foreground process to finish executing before going back to waiting for input. Only input can cause shall to start processes.

  • shall itself should catch interrupts and print a message when it happens, such as “got signal 2” (2 is the signal number of interrupts).

  • shall should disable interrupts for commands that run in the background (using &). For command that run in the foreground, interrupts should generally cause the programs to finish executing.

Desired Behavior

  • When running a command in the background, have shall print its process identifier. For example

    process 36877 running in background
    
  • If a command finishes normally with exit status 0, shall simply prints a new “prompt” (-> in the case of shall). However, if a program returns a non-zero status or a program exits abnormally (say, due to an interrupt), have shall print a message. Examples include:

    process 36689 exited with signal 2
    process 36889 exited with status 1
    
  • It may also be useful for shall to print information about background processes that terminate, even if they exit normally with exit status 0:

    process 36889 exited with status 0
    

Hints

  • For the spawn() function, you are expected to use the fork() and wait() system calls as well as macros WIFSIGNALED() and WEXITSTATUS(). You should invoke interrupts_disable() for processes that run in the background.

  • Use wait() rather than waitpid(), as you will want to wait for any process terminating, including background processes. There’s no need to catch SIGCHLD. Note that termination of background processes will only be discovered and reported while waiting for a foreground process to terminate.

  • To execute a command, first invoke redir(command) (which redirects I/O), then execute(command) (which invokes system call execv() to run the command).

  • Most of the redir() function is already written, but you need to fill in the code in functions redir_file() and redir_fd(). All of the execute() function is already written. If redirection fails, use _exit(1) to exit the forked process with a failure status of 1. In case of redir_file(), don’t forget to close the original file descriptor returned by open().

  • For cd(), you are expected to use the C library function getenv() to obtain the $HOME environment variable in case no directory is specified in the cd command. You should also use errno to determine the error message to print.

  • For source(), use open() and close(). Once you have a file descriptor fd, you can use the following code to invoke the interpreter:

    reader_t reader = reader_create(fd);
    interpret(reader, 0);
    reader_free(reader);
    
  • Don’t forget to implement the remaining interrupt functions.

  • Here are some additional system calls that may be useful: chdir, dup2, stat, and opendir

Task 2

The reader struct defined in reader.c uses the read() system call to read one character at a time. This is highly inefficient because each time read() is called, a system call is made, which then causes a context switch to occur.

Your job in this task is to modify the reader struct and associated functions reader_create, reader_next, and reader_free to read up to 512 characters from the file descriptor fd at a time.

However, reader must function the same way as before, meaning, the reader’s interface must stay the same. For example, reader_next() should still return a single character.

Submission

You need to submit the following files to Gradescope:

  • exec.c, which will contain your solution to (Task 1)
  • reader.c, with your improved reader (Task 2)

Rubric

Attribution

This assignment was adapted from a project by Prof. Robbert van Renesse.