Lab 7: Introduction to UNIX and C

CS 3410 Spring 2017


Due: Sunday, March 19th at 11:59 PM. Submit all required files on CMS.


For this lab, we will cover virtual machines, SSH, Linux commands, and the GCC compiler. Before we go into the core material of this lab, we will give brief descriptions of various terms to make sure everyone is on the same page with certain concepts.

Virtual Machine and SSH Overview

What is an Operating System (OS)?

You may have heard this term several times, but as a basic introduction, an operating system is system software that interfaces between hardware (e.g. CPU, RAM) and application software (e.g. Word, Eclipse). OSes are in charge of user interface management, control of access to the system and data files, program scheduling and activation, efficient resource allocation, and other things like deadlock and error detection. Some examples of operating systems are Windows (XP, Vista, 7, 10), Mac OS X, Linux, Google Android, and Apple iOS.

Operating System Placement
Image courtesy of https://en.wikipedia.org/wiki/Operating_system

What is open-source software?

This is software that is freely available to the public to modify and download. The software is distributed with an open-source license so that the source code can be freely used, shared, and edited. The source code is often created in collaboration with many programmers. This is unlike Windows, where you pay Microsoft for their software, or Mac OS X, where you have to pay Apple.

What is Linux?

Linux is an operating system that is free to the public to download, since it was created as an open-source program. It was first released by Linus Torvalds in 1991. Since then, Linux has evolved through many improvements made by the computing community. It used to be considered the most secure OS because of the vast number of programmers who can fix bugs quickly. However, malicious hackers have been targeting Linux more in recent years, possibly because of its increasing popularity as server software.

The system is based on an older OS version called UNIX. To navigate through a Linux or UNIX system, you need to type in instructions to the computer using what is called a command-line prompt (nowadays, we often use graphical user interfaces (GUIs) to click on folders instead of typing commands to find those folders). You can even download software using the command-line prompt. Later in this lab, you will learn some of the most common Linux commands.

What is Ubuntu?

It is an African word meaning "the belief in a universal bond of sharing that connects all humanity" (https://en.wikipedia.org/wiki/Ubuntu_(philosophy)).

More seriously, it is a version of Linux. Some of the most popular distributions include names like Ubuntu, Mint, Fedora, openSUSE, and Debian, though there are dozens more. Why are there so many versions out there? Well, people try to create various versions that optimize certain features for certain users. For example, Ubuntu is known for being kind to beginners with its software center and media management while Debian is less user friendly and geared towards more technically oriented users. We will be using Ubuntu for our OS. In fact, we have created an Ubuntu version specifically for this class!

What is Dual Booting?

Well, right now, most of you are constrained to using one OS. But what if I want to run two OSes on one computer? Turns out, you can in fact partition your hard drive to store two operating systems on one machine. This is called dual booting. However, in order to switch from one OS to another, you often have to shutdown and restart the system. You also have to make a commitment to dedicating a certain amount of storage to each particular OS. Is there another option? Yup. Let’s finally talk about virtual machines!

What is a Virtual Machine (VM)?

A virtual machine is an application that can imitate hardware such that you can run more than one operating system on your own computer. For example, I can run Windows on a Mac computer using a virtual machine. The VM can run while my usual operating system running without requiring a reboot. It works like any other application, sitting in a window you can minimize when you want to go on Facebook and maximize when you want to work on CS 3410 again. In case you hear this terminology, the Host OS refers to your computer, while the Guest OS is the operating system you are running on the VM.

(So why do people use dual booting? Well, one reason is that VMs tend to have higher overhead, such as being slow with 3D graphics, but booting one OS at a time means that the OS can use the computer’s full speed.) Instructions to download one are in Part 1a.

What is a kernel?

The kernel is a program that has complete control over the whole operating system. It is the first program loaded when the OS starts up, placed in a safe location in memory to prevent being overwritten. It handles all the interfacing between the hardware and software. For instance, it may take input requests from a keyboard translate them into instructions the CPU understands.

What is a shell?

It is a program that provides the user interface for the kernel. This is where you type in the commands we mentioned earlier. Just as there are various Linux versions, there are also several shell versions, such as sh, bash, csh, tcsh, and ksh. Again, these versions came about due to improvements or extra features implemented by others (e.g. bash built upon sh by adding command completion and command history). On a Mac, the shell is called the Terminal. On Windows, the analogous form is the Command Prompt.

What is SSH?

SSH (Secure SHell) is a method of remotely connecting to and running commands on another computer (“remote” meaning something far away, so we are connecting to a computer far away). It can be accessed using the ssh command in a shell (like the Mac Terminal). It lets you access the resources of another computer that you may not have on your own computer. However, you cannot use SSH without an Internet connection. Cornell makes Ubuntu computers available to computer science students, which you can access remotely with SSH, explained in Part 1b.

Windows does not come with an SSH client, which means you will need to install one on your computer if you are a Windows user. The two options we suggest are Git and Cygwin - feel free to choose whichever you would like. You should have already installed Git, but it only lets you use the basic Linux commands. Cygwin puts a full UNIX environment (or close to it) on your computer, and you can use its command-line prompt. It is very powerful, easy to set up, and a great way to be able to use UNIX commands even on your Windows machine. However, a disadvantage is that Cygwin has more overhead than Git.

Part 1: VM and SSH

For this class, you will have the option of either using the VirtualBox VM to run Ubuntu or using SSH to remotely connect to a Cornell Ubuntu machine. Choose either Part 1a or Part 1b to ensure you have access to an Ubuntu machine. 1b may be easier to set up for some students (despite the length of the write-up), but as aforementioned, you must make sure to have an Internet connection.

Why are going through all the trouble of setting this up?

For one, it's often very difficult to get all of the course software on the variety of different computers and operating systems that everybody brings to the class. But more importantly, we want to ensure that the everyone is working in the same environment so that we can guarantee that if it works for you, it will also work for us (and our autograder). We therefore require that you do all of your projects work on the Ubuntu environment that we set up for you, because some of the projects we will be doing later on may work differently in different environments. We want to avoid anyone submitting anything that works on their computer but then doesn't work on our machines because the environment is different.

Part 1a: Virtual Machine Setup

For the setup:

  1. Go to the course resources page
  2. Follow the instructions under the Computing Environment section.
    Note: On the VirtualBox website, download the VirtualBox platform packages associated with your current Operating System.
  3. Once you have started up the OS, you can move on to part 2.
    Note: It may take some time to start up.

* You can learn more about VirtualBox at https://www.virtualbox.org/manual/ch01.html

Part 1b: SSH

SSH (Secure shell) is a method of remotely connecting to and running commands on another computer. Cornell makes Ubuntu machines available to computer science students and we have created one specifically for this class that is equivalent to the VM that is made available.

Important: If you use Windows, your computer does not come with an SSH client. Skip down to "For Windows users only", then come back here to complete the process"

For everyone:

  1. Enter the terminal (Cygwin users remember to use the Cygwin terminal) and type “ssh netid@ugclinux.cs.cornell.edu”, where netid is your netid
  2. Type yes to accept the new SSH target
  3. Now type the password associated with your netid
  4. You’re in! If you don’t already know how the Unix command line works, you’ll learn it later in this lab - that will be how you interact with the files on this machine.
  5. To exit the shell, type exit or hit Ctrl-D

For Windows users only:

Choose one of the two options:

Git
  1. You should already have Git installed from earlier in the semester. If you do not, install it now.
  2. You’ll now need to find where Git installed its ssh client. It will either be in “C:\Program Files\Git\usr\bin” or “C:\Program Files\Git\bin”. Find which of those directories contains a file called “ssh.exe”, then copy that filepath
  3. Open Control Panel -> System and Security -> System, then click “Change Settings”. Go to “Advanced”, then click “Environment Variables”
  4. Under “User Variables” click “Path”, then click “Edit”
  5. Click “New”, then paste in the filepath from above (You may see just a string to edit instead - in this case simply type a semicolon separator then paste the path)
  6. Done! Now when you open the windows command line, you will be able to ssh.
Cygwin
  1. Visit https://www.cygwin.com/ and download the setup script appropriate for your computer (almost certainly the 64 bit one)
  2. Run the setup script. Select the option to download cygwin, then it will prompt you for a download site - I recommend simply selecting the top one.
  3. When it asks you to select packages, scroll down to “net” and click on “default” until it changes to “install”. This will ensure that the ssh client is installed with Cygwin
  4. Hit next and allow Cygwin to complete the installation process. Hit OK to install dependencies when it asks. The installation process may take some time (~5 to 10 minutes)
  5. Now when you need to ssh, you’ll open the program “cygwin” and use the terminal it provides

Important Note for SSH

You now have a system that will work anytime you’re connected to the internet.... on Cornell Campus. Cornell's SSH network requires being connected to the same network as the machine you’re connecting to. Never fear though, the solution to this is called a VPN - a Virtual Private Network. This is a service that routes your traffic through a server (in this case on Cornell campus) so that you can connect to the VPN even when not on Cornell Wifi.

You can find the installation and connection instructions here. After installation, follow the connection tutorial at the same page to set up the connection.

Linux Command Overview

Now that we have a Linux environment to use, let’s learn some commands to make your Linux machine do your bidding! But first, we have a few more bits of knowledge for you. Note that there are some necessary tasks to complete along with the reading!

Which type of shell are we using?

We are using the bash shell, which is a typical default for Linux systems. Type in the following command into your shell to check!

echo $0

Can we use editors within the shell?

Yep. You have several at your disposal - nano, emacs, and vim. Each have their own set of commands for various behaviors like saving text, exiting, etc. If you have never used any of these editors before, we suggest getting started with nano to get accustomed to using an editor within a shell, and because the commands are written at the bottom of the nano screen, as shown in the screenshot below:

Nano Screenshot

The ^ stands for the Control key on your keyboard (this is true even for Mac users - do not use the Command key!).

Complete the following:

  1. To start nano, type nano into the shell.
  2. Type “I love math, science, and computer science!” into the nano window.
  3. To save your file, use Control-O. It will prompt you for a file name. Call it science.txt (to be consistent with the naming in a tutorial we will use later).
  4. To search for the word “science”, type Control-W, type “science” when prompted, hit Enter, and watch the cursor jump to the start of the first occurrence of “science”. To again find the same string, type Alt-W. Hit Alt-W one more time to return to the first occurrence.
  5. To cancel out of a prompt, use Control-C.
  6. You should now know enough to be able to experiment with the other commands on your own. But for now, to exit, type Control-X. If you did not save some changes, it will prompt you to save the file - just follow the prompts (e.g. hit Y to continue saving, and Enter if the filename shown is the one you want).
  7. To see your file again, you can type nano science.txt
  8. Keep reading to learn more commands!

How do I get files from my computer to the remote machine and back?

For this class, the best option is to use your git repository. Complete the following to get the repository on your remote machine:

  1. Use a web browser and login to your git account at http://github.coecis.cornell.edu/
  2. Navigate to your repository
  3. Click on Clone or download. You will need the link you see there (you may not be able to use copy and paste).
  4. Return back to your shell. Type in git clone your_repository_link, where your_repository_link is what you found in step 3.

You should now be able to navigate through the downloaded directory. Use the git commands to add, commit, push, pull, etc. to update your files. If you need a refresher on how to use git for the command-line prompt, see the tutorials in the Git, etc. section of the course resources page.

What are man pages?

These are manual pages, not pages for men. If you cannot remember what parameters you need for a command, you can type in man command_name to obtain a description. For example, try man mv. Use the up and down arrows to scroll, and use q to leave the screen. You can even do man man to get information about man itself.

Part 2: UNIX Commands

Read this whole part carefully before starting!

Complete Tutorials 1, 2, and 3 from the UNIX Tutorial for Beginners.

For the second command under Section 2.1 Copying Files, instead of copying the file given on the website, copy your previously made science.txt file from the unixstuff parent directory using

cp ../science.txt . (note the dots - there are four in this line!)

Save information about your Linux distribution to a file called linux_info.txt. This is to verify that you genuinely have the correct environment set up. Try running the command lsb_release -a and looking at the output. The lsb_release command prints out information about your Linux distribution (Ubuntu). LSB stands for Linux Standard Base, which is a project to help standardize the various distributions of Linux. It should look something like this:

No LSB modules are available.
Distributor ID: LinuxMint
Description:    Linux Mint 17.3 Rosa
Release:    17.3
Codename:   rosa

The above is an example of lsb_release -a output created by running the command on a different version of Linux. Yours should look very similar.

Yes, you will see the message No LSB modules are available. From what we can figure, Ubuntu is not totally LSB compliant since it is "a considerable amount of work for little measurable benefit" according to those working on the Debian project.

Put the output of this command in a file by redirecting the output of the command, which you should have learned in Tutorial 3 of the UNIX commands. The redirection you learned redirects only stdout (standard output), which means the No LSB modules are available. line will still be output to the terminal and not to the file. This is because that line is output to stderr (standard error), and so isn't redirected with the rest of the text. This is fine - we're only looking for your file to contain the lines outputted to stdout, which means the contents of your file should look similar to the block above but missing the first line.

Some useful tips:

Intro to C Overview

Welcome to C! It is a programming language that was first introduced almost half a century ago but is still one of the most commonly used programming languages due to its speed, efficiency, and ability to closely interface with hardware. Let’s finish discussing a few more terms:

What is a high-level programming language?

Thus far, you have been learning a machine language (the binary instructions) and an assembly language. These are what we call low-level programming languages, languages that are “close to the hardware”. High-level programming languages, on the other hand, are more similar to human languages and thus are easier for programmers to read and write code. C is an example of a high-level programming language.

How do we get high-level programming languages to be read by the hardware?

Basically, we have to do a few conversions, where software like compilers and assemblers convert programs to other types of programs. Here is one possible chain:

High-level program -> Compiler -> Assembly language program -> Assembler -> Machine language program -> Linker (and machine code from libraries) -> Complete machine code -> Loader -> Machine code loaded into memory -> Hardware

I will provide basic explanations of some of these pieces below.

What is a compiler?

A compiler converts a high-level language program into either an assembly language program or directly into a machine language program that the hardware can read. Another way to look at it is that a compiler takes in source code and outputs an executable file containing binary instructions.

For this lab, you will use the GCC compiler to compile C code into an executable script. GCC stands for GNU Compiler Collection.

What is an assembler?

This converts an assembly language program into a machine language program (which is filled with just 0s and 1s). The machine language program is also known as object code.

What is a linker?

This takes the object code and code from external libraries and combines them together into one large piece of object code.

The GCC compiler also takes care of the linking portion of the process. So, typically when people talk about compiling, they really mean compiling and linking, which is what GCC does.

What is a loader?

This takes an executable file and reads the complete object code into memory. Then the processor finishes the job!

What is a statically-typed language?

This is a language that requires all variables to be declared at compile time. In other words, you have to give the type of each variable (e.g. int, char) in the source code. This means that the compiler has a chance to check whether you stored the right types of values in each variable before you get to run the program, which helps to prevent bugs early on in the coding process. C is a statically-typed language. (An example of a dynamically-typed language, where the program does not catch type errors until runtime, is Python.)

Part 3: Intro to C

Now that you know how all the pieces fit together, let's get back to coding in C! On your Ubuntu machine, open up a text editor and create a new file called hello.c. Type in the following C program:

#include <stdio.h>

int main() {
    printf("Hello world! I am [netid].\n");
    return 0;
}

But replace [netid] with your NetID (don't print square brackets). When you are done typing, save the file and exit the editor. Now you are ready to compile and run the program you just created! In your terminal, navigate to where you saved hello.c. Now run the command:

gcc -o sayhello hello.c

The C compiler (GCC) has compiled your source code hello.c into an executable named sayhello. The -o option allows us to name the resulting executable file anything we want. Without the flag, the default name given to the executable is a.out.

If this gives you any errors, make sure you are in the right working directory (use ls to confirm that hello.c is in the same directory). If you are in the right location, then you did not enter the program correctly — go back to your editor and fix the program. Otherwise, you have just compiled a C program. You can run your program by running the command:

./sayhello

And your program should run! It should print Hello world! I am [netid]. and do nothing else. For example, if your NetID is "abc123", then compilation and execution would look like something like this:

~$ gcc -o sayhello hello.c
~$ ./sayhello
Hello world! I am [netid].

What does ./sayhello mean? The command ./sayhello means, "Run the executable sayhello in the current directory."

What to Submit

Congrats, you're done! It's time to submit some of the various files you've made to CMS. In order to get these files from either the machine you are ssh'ed into or the VM onto your host machine so you can submit them, you'll likely have to push them to your Github repositories. Feel free to make a new folder to do so. Please include

  1. The science.txt file you made during the Linux Command Overview portion
  2. The linux_info.txt file you made in Part 2
  3. The hello.c file you made in Part 3, with your netid filled in