COMP 530: Lab 1: Simple Shell

To become familiar with low-level Unix/POSIX system calls related to process and job control, file access, IPC (pipes and redirection). You will write a mini-shell with basic operations (a small subset of Bash's functionality). Expected length of this C program is 1000-2000 lines of code (not very long, but the code will be challenging, so start early).

Picking your group

You may do the lab alone, or in pairs. If you work in pairs, only one student will hand in the assignment for you.

Getting started

We will provide you with some initial source code to start from here The basline code will do simple input parsing and spit this back out to the standard output. You will extend this over the course of the assignment.

Helpful References

There are no required readings for this lab, but a few references explain how shells work in some detail. These references may provide substantial insight into how to complete this assignment. Do NOT copy and paste code from these sources into your assignment.

Core assignment

Write a C program named "thsh.c" (for Tar Heels SHell) that performs a subset of commands you're familiar with from other shells like GNU's Bash. You're welcome to study the code for bash, but the code you submit should be your own!

When you start your shell, you should be able to type commands such as this and see their output:

Note that commands like ls are (usually) just programs. There are a few built-in commands, discussed below. In general, though, the shell's job is to launch programs and coordinate their input and output.

Important: You do not need to reimplement any binaries that already exist, such as ls. You simply need to launch these programs appropriately and coordinate their execution.

Helpful and allowed interfaces

You are welcome to use any standard C version, including C99 or C11, as well as K&R, ANSI, or ISO C.

You will have to parse the command line and then use fork(2), clone(2), and/or and exec(2) (or flavors of exec, such as exece, execle, etc.). Programs you run should output to stdout and stderr (errors); programs you run should take input from stdin. You will have to study the wait(2) system call and its variants, so your shell can return the proper status codes. Don't spend time writing a full parser in yacc/lex: use plain str* functions to do your work, such as strtok(3). You may use any system call (section 2 of the man pages) or library call (section 3 of the man pages) for this assignment, other than system(3).

Hint: Note that, by convention, the name of the binary is the first argument to a program. Carefully check in the manual of the exec() variant you are using whether you should put the binary name in the argument list or not.

In general, your selection of libraries is unrestricted, with one important exception: you should avoid the use of system(), which is really just a wrapper for another shell. Speaking more broadly, it is not acceptable to simply write a wrapper for another shell---you should implement your own shell for this assignment.

Finding programs

Shells provide a nicer command-line environment by automatically searching common locations for commands. For instance, a user may type ls, and the shell will automatically figure out that the binary is actually located at /bin/ls. On Linux, the paths to automatically search is stored in the environment variable PATH.

When using PATH, check if the command includes a '/' character. If so, you may pass this command directly to the exec() system call, as the command itself is specifying a relative or absolute path. If the command does not include a '/' character, then the shell should try each of the values in the PATH list, e.g,: ls should be checked as /usr/lib/lightdm/lightdm/ls, /usr/local/sbin/ls, /usr/sbin/ls, /usr/bin/ls, /sbin/ls, /bin/ls, /usr/games/ls, and /hom/porter/bin/ls.

Hint: You can use the stat() system call to check whether a file exists, rather than relying on the more expensive exec() system call to fail.

You may use any exec() variant you like for this assignment, including variants that search the PATH for you. If you do not implement path searching yourself, be sure to test the case where the user changes the PATH (as described below), ensuring that the newer PATH value is used.

In general, environment variables are passed from the parent through the envp argument to main(). Be sure to parse these variables so that you can use them to find programs, as well as pass them to child processes.

Note: 'exit' is not a program you'll execute, but a built-in special program that should exit(3) from your shell.

Exercise 1. (15 points) Implement simple command parsing in your shell. Upon reading a line, launch the appropriate binary, or detect when the command is a special "built-in" command, such as exit. For now, exit is the only built-in command you need to worry about, but we will add more in the following exercises.

Before waiting for input, you should write the shell prompt thsh> to the screen. After each command completes, the shell should print another prompt.

The shell should print output from commands as output arrives, rather than buffering all output until the command completes. Similarly, if the user is typing input that should go to the running command via stdin, your shell should send these characters as soon as possible, rather than waiting until the user types a newline.

You do not need to clear characters from the screen if the user presses backspace (and this doesn't "just work" on your system). Simply rewrite the command on a new line without the missing character.

We will refine the parsing logic in subsequent exercises. Hint: you may want to read the input character by character, as some keystrokes may require action without a newline.

Be sure to use the PATH environment variable to search for commands. Be sure you handle the case where a command cannot be found.

When you are finished, your shell should be able to execute simple commands like ls and then exit.

If you build your shell correctly, you should be able to run your warmup program from lab0 inside.

Another built-in command you should support is 'cd' to change directory using the chdir(2) system call.

Exercise 2. (10 points.) Add support for changing the working directory, (i.e., cd). Verify that pwd works properly.

Note that the working directory can affect the interpretation of environment variables, as '.', the current working directory, is a valid entry in PATH.

Note that cd - should change to the last directory the user was in, and cd with no argument should go to a user's home directory (also stored in an environment variable).

Similarly, the built-in command should handle the targets cd . and cd .. properly. (Note that every directory includes these file names if you type ls -a, so this should not require special handling.)

Now that we can change directories, let's add some style to our shell. Any self-respecting shell has a fancier command prompt, which includes the working directory.

Exercise 3. (5 points.) Add the current working directory to your shell prompt. Rather than simply printing thsh> , instead print the current working directory in brackets, like this:

[/tmp] thsh> ls
# shows files in /tmp

Dealing with Zombies!

Whenever you fork a process ( i.e., create a new process), the forked process runs in parall el with the forking process. If these processes are not synchronized properly, or do not terminate properly, you run the risk of creating a "zombie" process. A zombie process is a process that does not execute (is "dead") but does not terminate and go away ("die") , and as such continues to consume resources within the operating system such as process descriptors. If zombie processes accumulate, it is possible to slow down, hang, or crash the operating system.

Since this assignment will be the first time many of you have created processes, there is an excellent chance you will create one or more zombie processes because of bugs in your program. If this happens, the server classroom.cs.unc.edu can become sluggish and/or hang.

That you will have bugs related to the use of fork is to be expected. However, to mitigate the effect of these bugs on the performance of the server, you need to take steps to limit the number of processes you can create. Every time you log into the servers classroom or snapper please execute the following commands from the command line:

The first command will show the limits of various resources you can consume. The second command limits the number of processes you can create to 10. The third command will again show your limits and allow you to confirm that you've correctly limited the number of processes you can create.

It is essential that you execute these commands every time you use the system. For this reason, the best thing to do is to edit the file ".cshrc" (with a leading period) in your home directory, and add the line "limit maxproc 10" to the file as the last line in the file. This will always set the process limit and then you don't manually have to do it every time you log in.

If you limit the maximum number of processes, then, if you have a bug in your program and are creating zombie processes, you'll eventually get a n error message when you try to run your program (the message indicating the maximum number of processes has been exceeded). This error message will be the only indication you get that you have a bug in your program. Should this happen, you won't be able to continue testing your program until you kill off your zombie processes. To kill zombie processes, first, use the "ps" command to see the identities of the processes you've created:

where you replace YOUR-LOGIN-NAME with your Linux login name. For example, if this were my login session I'd type":

You can then use the "kill" command to kill any found zombie processes by using the process number (the PID) which is shown under the second column of output by the ps command. To kill a process use the command:

Generally, until you are certain your program is working, execute the ps command prior to logging out so that you can see if you are leaving behind any zombie processes and kill them before you log out.

Debugging

One feature which will help with development of your shell is to add debugging messages, which can be enabled when you start your shell.

Exercise 4. (5 points.) Add debugging messages to your shell

If you start thsh with -d, it should display debugging info on stderr:

every command executed should say "RUNNING: cmd", where cmd is replaced with the text of the command.
When command ends you should say "ENDED: "cmd" (ret=%d)" and show it's exit status
add anything else to the debugging output (be creative)

Variables and Echo Support

In some sense, a shell actually defines a simple programming language. Like any self-respecting language, thsh should have variables. In order to avoid confusion with commands, our shell will require all variable names to start with a '$' character, and only have either alphanumerical names or a single "special" character (e.g., '?', '@', etc.), and are terminated by a space or newline.

For now, we will just add a few simple variables, namely the environment variables and a special variable to store the return code ($?). You are welcome to add others if you like.

A shell user may use a variable in a command, and the shell will automatically replace the variable with the value of this variable. Similarly, a user may assign a new value to a variable (including an environment variable) using the built-in set commands. A useful tool for debugging variables is the echo program.

Exercise 5. (10 points.) Add variable support to thsh. You should be able to set variables, and use them in commands, as illustrated above.

Test your environment variable support with the printenv binary, which prints all of the environment variables and their values. Be sure that, if the shell user changes an environment variable, the output of printenv reflects this.

It is ok to treat all variables as environment variables. You may exclude or include $? from the environment variable list.

Redirection Support

One of the most powerful features of a Unix-like shell is the ability to compose a series of simple applications into a more complex workflow. The key feature that enables this composition is output redirection.

Redirection is accomplished by three special characters '<', '>', and '|'. You will need to add logic to your parsing code which identifies these characters and uses them to identify shell-level directives, rather than simply passing them to exec().

The first two characters can direct input from a file into a program, and and output from a program, respectively.

In the example above, the standard output of ls -l is directed to a file, named newfile. If this file didn't exist previously, the shell created it. Note that the ls program does not know it is writing to a file, and is not passed the string '>newfile' as an argument. Similarly, the contents of newfile are passed to the cat program as its standard input.

Note that we are not constrained to just use standard input (handle 0) and output (handle 1) with these operators. You should be able to put an integer in front of the operator to indicate another handle, such as stderr.

You'll have to learn how to manipulate file descriptors carefully using system calls such as open, close, read/write, dup/dup2, and more.

In this example, my shell creates three child processes. The first reads the contents of my home directory and outputs them to the grep program, which searches for the string '.txt'. The output of grep, i.e., all files with the .txt extension, is then sent to the wc program, which counts how many lines of input it is given (i.e., the number of .txt files in my home directory.

Exercise 6. (15 points.) Add support for all three forms of redirection described above, as well as assigning inputs to arbitrary file handles other than stdin and stdout.
Be sure to run several test cases for piping applications together, and ensure that termination is handled cleanly.

Scripting Support

Most shells can be run interactively as well as non-interactively. In non-interactive mode, you can put the shell commands in a plain file---essentially creating a program of shell commands (called a shell script). For example, if I put this in a file called "foo.sh":

Then I can use this file (or program) to have the shell run these commands sequentially as follows:

In other words, thsh will identify the string 'foo.sh' on its own command line and then interpret these commands as a batch. In a batch, the first line runs to completion, then the second, and so forth. These commands do not need to run in parallel, except for pipes on the same line (described below).

One can also make the shell script, executable, and then run it directly like any other program. For that, I need the file to start with a special character sequence called a 'shebang' followed by the path of the shell

Note that thsh must be in your PATH for the shebang above to work, otherwise, you should use an absolute path, like /home/porter/lab2/thsh.

Exercise 7. (10 points.) Add support for thsh to run non-interactively: this boils down to basically supporting an optional input file argument. If 'testscript' is a shell script, the following examples should work, where '$' indicates your default shell (e.g., bash).

$ ./thsh testscript

[/home/porter] thsh> chmod u+x testscript
[/home/porter] thsh> ./testscript

You will also have to support a comment character '#' so if you see a line starting with '#' in the script, you should ignore it. The comment command could also be run non-interactively and "do nothing".

[/home/porter] thsh> #this is some text
[/home/porter] thsh>

Job Control Support

Another useful feature of a shell is the ability to pause and resume execution of a job. Here, we define a job as a single command, which can be either a single process, or multiple processes in a pipeline. For instance, ps -eaf | grep foo would be a single job.

In the case of a long-running program, it is helpful to be able to place it in the "background"---allowing the user to issue more commands interactively while the long-running program continues execution.

Your shell should identify the special character '&', which means that a program should be executed in the background, returning a shell prompt immediately. The built-in command jobs should list all background running jobs, their name, PID, job number, etc. just like bash with their status (running or suspended). It should also print the exit status code of background jobs that just ended.

In addition to jobs, we will need to add a few more built-in commands to make job control useful. The command fg 3 should make job number in your list to go to the foreground (and resumed execution if it is not running/stopped). The command bg 2 should cause suspended program 2 to run in the background.

Finally, we need to be able to forcibly pause or terminate a program. If you type Ctrl+C: the foreground program(s) should be killed. If you hit Ctrl+Z: the foreground program(s) should be suspended and added to the list of jobs (i.e., you send it a SIGTSTP signal to suspend it; fg sends it a SIGCONT to resume running).

Exercise 8. (15 points.) Add support for job control, including the '&' character, the built-in commands jobs, fg, and bg, and Ctrl-C and Ctrl+Z. Be sure to run plenty of tests, including handling of piped applications or scripts.

Fun

Exercise 9. (5 points.) Create a built-in command, or a separate program, called goheels that draws a Tar Heel on the console using ASCII art. You are welcome to use an ASCII art generator, or draw your own by hand

Contests

In order to encourage creativity and a bit of friendly competition, the instructor and TAs will judge a few contests. The prizes will be bonus points. Only teams that complete all exercises will be eligible to win.

Challenge! (10 bonus points) The team that implements its shell in the fewest lines of readable, clean code will get a bonus. This count excludes blank lines and comments (comments are always welcome). Code that is confusing and difficult to read, as subjectively judged by the course staff, will be disqualified.

Winners will be announced in class after the grading of lab 1 is complete. More than the points, of course, is the pride of winning.

Late submissions that are handed in after the judging has started (probably a few days to a week after the deadline), will not be included.

For this exercise, it is fine to set a compile-time history length, such as 50 lines.

In order for my history to survive after the shell exits, most shells will write a file in the user's home directory, such as /home/porter/.thsh_history. For full credit, your shell should persistently store the user's command history. You can use environment variables to figure out where the user's home directory is.

Challenge!. (5 bonus points.) Add support for tracking the history of a user, including saving the history to a file. Support the up and down keys to cycle through history, and add a built-in command history that dumps the entire history to the console. Also, add a built-in command, clear to reset the history.

Style and More

Aside from testing the proper functionality of your code, we will also evaluate the quality of your code. Be sure to use a consistent style, well documented, and break your code into separate functions and/or source files as it makes sense.

It should not be possible for a user of your program to ever make your program crash or hang. If your program has some limitation (e.g., the command line is limited to being x characters or less), you must detect when the limitation is reached and take an appropriate action (e.g., output a meaningful error message).

Along the lines of the previous point, any error conditions generated by fork or exec (or any other system calls you make) should be processed by your program and should result in the generation and output of an appropriate error message.

To be sure your code is very clean, it must compile with "gcc -Wall -Werror" without any errors or warnings!

If the various sources you use require common definitions, then do not duplicate the definitions. Make use of C's code-sharing facilities.

You must include a README file with this and any assignment. The README file should describe what you did, what approach you took, results of any measurements you made, which files are included in your submission and what they are for, etc. Feel free to include any other information you think is helpful to us in this README; it can only help your grade.

Challenge! (5 bonus points) Support time counting. If you start thsh with -t, it should count how long each program ran and print stats when the program ends. The output should be something like:

$ thsh -t
thsh> du -sh /usr
4.3MB /usr
TIMES: real=23.7s user=12.1s sys=7.0s

Be sure to note this in challenge.txt if you do this.

Challenge! (10 bonus points) Support file "globbing" for extensions, such as

thsh> ls *.jpg

The above should print all the file names that end with ".jpg". Only support *.[EXTENSION]. That is, you'll need to check to see if an argument starts with an '*', then use readdir(2) and getdents(3) as needed to read all files from the current directory, match them -- using strstr(3) -- and add them to list of args you pass to exec(2). In other words, your shell will be exec-ing a command that'll be as if you typed the full names of all the files on the command line one by one.
Be sure to note this in challenge.txt if you do this.

Challenge! (10 bonus points) Add support for "tab completion" in your shell. If a user types a prefix of a command and then hits the "Tab" key twice, the shell should show all possible commands that match the prefix. If only one command is possible, the shell should automatically fill in the rest of the command. If all possible commands share subsequent letters, automatically fill in letters until the commands diverge.
Hint: Consider using a trie data structure to organize the available commands.

Hand-In Procedure

For all programming assignments you will "turn in" your program for grading by placing it in a special directory on a Department of Computer Science Linux machine and sending mail to the TA alias (comp530ta-f16 at cs dot unc dot edu). To ensure that we can grade your assignments in an efficient and timely fashion, please follow the following guidelines precisely. Failure to do so will potentially result in your assignment receiving a failing score for this assignment.

As before, create a directory named lab1 (inside your ~/comp530/submissions directory). Note that Linux file names are case sensitive and hence case matters!

When you have completed your assignment you should put your program and any other necessary parts (header files, etc.) in the specified subdirectory and send mail to the TA (comp530ta-f16 at cs dot unc dot edu), indicating that the program is ready for grading. Be sure to include "COMP 530" in the subject line. In this email, indicate if you worked alone or with a partner (and both team members' names and CS login names, as many of you have emails different from your CS login); be sure to cc your partner on this email. If you used any late hours, please also indicate both how many late hours you used on this assignment, and in total, so that we can agree.

After you send this email, do not change any of your files for this assignment after sending this mail (unless you are re-turning in late, as prescribed by the lateness policy).! If the timestamps on the files change you wi ll be penalized for turning in a late assignment. If your program has a timestamp after the due date it will be considered late. If you wish to keep fiddling with your program after you submit it, you should make a copy of your program and work on the copy and should not modify the original.

All programs will be tested on classroom.cs.unc.edu. All programs, unless otherwise specified, should be written to execute in the current working directory. Your correctness grade will be based solely on your program's performance on classroom.cs.unc.edu. Make sure your programs work on classroom!

Functional correctness will be based on the program's ability to parse and execute arbitrary commands, deal with errors (e.g., "file not found"), and deal with arbitrary behavior of the child process (premature termination, faulting, etc.).

Generally, unless the homework assignment specifies otherwise, you should compile your program without using any special command line arguments ("flags") or compiler options.

The program should be neatly formatted (i.e., easy to read) and well-documented. In general, 75% of your grade for a program will be for correctness, 25% for "programming style" (appropriate use of language features [constants, loops, conditionals, etc.], including variable/procedure/class names), and documentation (descriptions of functions, general comments [problem description, solution approach], use of invariants, pre-and post conditions where appropriate).

Make sure you put your name(s) in a header comment in every file you submit. Make sure you also put an Honor pledge in every file you submit.