CSE 306: Lab 2: The Shell

You may do the lab alone, or in pairs. If you work in pairs, only one student will hand in the assignment for you.

Please email the instructor your group preference as soon as possible. Once we have your group membership, we will add group permissions to one of your git repositories on scm, which you will use to hand in the assignment. These group permissions will allow you to share code with your partner. If you choose to work alone, please email this to the instructor.

Important: If you work in a group, one partner will hand in the assignment for both. You are welcome to coordinate code sharing with your partner however you prefer, including emailing code, setting up your own version control system, etc.

Introduction

To become familiar with low-level Unix/POSIX system calls related to process and job control, file access, IPC (pipes and redirection). You will write a mini-shell with basic operations (a small subset of Bash's functionality). Expected length of this C program is 1000-2000 lines of code (not very long, but the code will be challenging, so start early).

Getting started

We will provide you with some initial source code to start from. To fetch that source, use Git to commit your Lab 1 source, fetch the latest version of the course repository, and then create a local branch called lab2 based on our lab2 branch, origin/lab2:

The git checkout -b command shown above actually does two things: it first creates a local branch lab2 that is based on the origin/lab2 branch provided by the course staff, and second, it changes the contents of your lab directory to reflect the files stored on the lab2 branch. Git allows switching between existing branches using git checkout branch-name, though you should commit any outstanding changes on one branch before switching to a different one.

You will now need to merge the changes you made in your master (lab1) branch into the lab2 branch, with the git merge master command.

In some cases, Git may not be able to figure out how to merge your changes with the new lab assignment (e.g. if you modified some of the code that is changed in the second lab assignment). In that case, the git merge command will tell you which files are conflicted, and you should first resolve the conflict (by editing the relevant files) and then commit the resulting files with git commit -a.

Sharing code with a partner

We will set up group permission to one partner's git repository on scm. Suppose Partner A is the one handing in the code. Partner A should follow the instructions above to merge the lab2 code. After Partner A has pushed this change to scm, Partner B should simply clone Partner A's repository and use it. For example:

Note that it may take a few days about letting the course staff know your partner selection for the tech staff to apply these permission changes. Again, you are not required to use git to coordinate changes, only to hand in the assignment, but we recommend you learn to use git. You may use any means you like to share code with your partner.

Hand-In Procedure

When you are ready to hand in your lab code and write-up, create a file called slack.txt noting how many late hours you have used both for this assignment and in total. (This is to help us agree on the number that you have used.) This file should contain a single line formatted as follows (where n is the number of late hours):

Then run make handin in the labs directory. If you submit multiple times, we will take the latest submission and count late hours accordingly.

In this and all other labs, you may complete challenge problems for extra credit. If you do this, please create a file called challenge.txt, which includes a short (e.g., one or two paragraph) description of what you did to solve your chosen challenge problem and how to test it. If you implement more than one challenge problem, you must describe each one.

This lab does not include any questions for you to answer, but you should document your design in the README file.

Helpful References

There are no required readings for this lab, but a few references explain how shells work in some detail. These references may provide substantial insight into how to complete this assignment. Do NOT copy and paste code from these sources into your assignment.

Core assignment

Write a C program named "swish" (for SeaWolves Interactive SHell) that performs a subset of commands you're familiar with from other shells like GNU's Bash. You're welcome to study the code for bash, but the code you submit should be your own!

When you start your shell, you should be able to type commands such as this and see their output:

Note that commands like ls are (usually) just programs. There are a few built-in commands, discussed below. In general, though, the shell's job is to launch programs and coordinate their input and output.

Important: You do not need to reimplement any binaries that already exist, such as ls. You simply need to launch these programs appropriately and coordinate their execution.

Helpful and allowed interfaces

You are welcome to use any standard C version, including C99 or C11, as well as K&R, ANSI, or ISO C.

You will have to parse the command line and then use fork(2), clone(2), and/or and exec(2) (or flavors of exec, such as exece, execle, etc.). Programs you run should output to stdout and stderr (errors); programs you run should take input from stdin. You will have to study the wait(2) system call and its variants, so your shell can return the proper status codes. Don't spend time writing a full parser in yacc/lex: use plain str* functions to do your work, such as strtok(3). You may use any system call (section 2 of the man pages) or library call (section 3 of the man pages) for this assignment, other than system(3).

Hint: Note that, by convention, the name of the binary is the first argument to a program. Carefully check in the manual of the exec() variant you are using whether you should put the binary name in the argument list or not.

In general, your selection of libraries is unrestricted, with one important exception: you should avoid the use of system(), which is really just a wrapper for another shell. Speaking more broadly, it is not acceptable to simply write a wrapper for another shell---you should implement your own shell for this assignment.

Finding programs

Shells provide a nicer command-line environment by automatically searching common locations for commands. For instance, a user may type ls, and the shell will automatically figure out that the binary is actually located at /bin/ls. On Linux, the paths to automatically search is stored in the environment variable PATH.

When using PATH, check if the command includes a '/' character. If so, you may pass this command directly to the exec() system call, as the command itself is specifying a relative or absolute path. If the command does not include a '/' character, then the shell should try each of the values in the PATH list, e.g,: ls should be checked as /usr/lib/lightdm/lightdm/ls, /usr/local/sbin/ls, /usr/sbin/ls, /usr/bin/ls, /sbin/ls, /bin/ls, /usr/games/ls, and /hom/porter/bin/ls.

Hint: You can use the stat() system call to check whether a file exists, rather than relying on the more expensive exec() system call to fail.

You may use any exec() variant you like for this assignment, including variants that search the PATH for you. If you do not implement path searching yourself, be sure to test the case where the user changes the PATH (as described below), ensuring that the newer PATH value is used.

In general, environment variables are passed from the parent through the envp argument to main(). Be sure to parse these variables so that you can use them to find programs, as well as pass them to child processes.

Note: 'exit' is not a program you'll execute, but a built-in special program that should exit(3) from your shell.

Exercise 1. (15 points) Implement simple command parsing in your shell. Upon reading a line, launch the appropriate binary, or detect when the command is a special "built-in" command, such as exit. For now, exit is the only built-in command you need to worry about, but we will add more in the following exercises.

Before waiting for input, you should write the shell prompt swish> to the screen. After each command completes, the shell should print another prompt.

The shell should print output from commands as output arrives, rather than buffering all output until the command completes. Similarly, if the user is typing input that should go to the running command via stdin, your shell should send these characters as soon as possible, rather than waiting until the user types a newline.

You do not need to clear characters from the screen if the user presses backspace. Simply rewrite the command on a new line without the missing character. Note, there is a challenge problem at the end to add backspace support.

We will refine the parsing logic in subsequent exercises. Hint: you may want to read the input character by character, as some keystrokes may require action without a newline.

Be sure to use the PATH environment variable to search for commands. Be sure you handle the case where a command cannot be found.

When you are finished, your shell should be able to execute simple commands like ls and then exit.

If you build your shell correctly, you should be able to run your fileutil program from hw1 inside.

Another built-in command you should support is 'cd' to change directory using the chdir(2) system call; and 'pwd' via getcwd(3) to print the current working directory:

Exercise 2. (10 points.) Add support for changing the working directory, including cd and pwd.

Note that the working directory can affect the interpretation of environment variables, as '.', the current working directory, is a valid entry in PATH.

Note that cd - should change to the last directory the user was in, and cd with no argument should go to a user's home directory (also stored in an environment variable).

Similarly, the built-in command should handle the targets cd . and cd .. properly. (Note that every directory includes these file names if you type ls -a, so this should not require special handling.)

Now that we can change directories, let's add some style to our shell. Any self-respecting shell has a fancier command prompt, which includes the working directory.

Exercise 3. (5 points.) Add the current working directory to your shell prompt. Rather than simply printing swish> , instead print the current working directory in brackets, like this:


[/tmp] swish> ls
# shows files in /tmp

Debugging

One feature which will help with development of your shell is to add debugging messages, which can be enabled when you start your shell.

Exercise 4. (5 points.) Add debugging messages to your shell

If you start swish with -d, it should display debugging info on stderr:

every command executed should say "RUNNING: cmd", where cmd is replaced with the text of the command.
When command ends you should say "ENDED: "cmd" (ret=%d)" and show it's exit status
add anything else to the debugging output (be creative)

Variables and Echo Support

In some sense, a shell actually defines a simple programming language. Like any self-respecting language, swish should have variables. In order to avoid confusion with commands, our shell will require all variable names to start with a '$' character, and only have either alphanumerical names or a single "special" character (e.g., '?', '@', etc.), and are terminated by a space or newline.

For now, we will just add a few simple variables, namely the environment variables and a special variable to store the return code ($?). You are welcome to add others if you like.

A shell user may use a variable in a command, and the shell will automatically replace the variable with the value of this variable. Similarly, a user may assign a new value to a variable (including an environment variable) using the built-in set commands. A useful tool for debugging variables is the echo program.

Exercise 5. (10 points.) Add variable support to swish. You should be able to set variables, and use them in commands, as illustrated above.

Test your environment variable support with the printenv binary, which prints all of the environment variables and their values. Be sure that, if the shell user changes an environment variable, the output of printenv reflects this.

It is ok to treat all variables as environment variables. You may exclude or include $? from the environment variable list.

Redirection Support

One of the most powerful features of a Unix-like shell is the ability to compose a series of simple applications into a more complex workflow. The key feature that enables this composition is output redirection.

Redirection is accomplished by three special characters '<', '>', and '|'. You will need to add logic to your parsing code which identifies these characters and uses them to identify shell-level directives, rather than simply passing them to exec().

The first two characters can direct input from a file into a program, and and output from a program, respectively.

In the example above, the standard output of ls -l is directed to a file, named newfile. If this file didn't exist previously, the shell created it. Note that the ls program does not know it is writing to a file, and is not passed the string '>newfile' as an argument. Similarly, the contents of newfile are passed to the cat program as its standard input.

Note that we are not constrained to just use standard input (handle 0) and output (handle 1) with these operators. You should be able to put an integer in front of the operator to indicate another handle, such as stderr.

You'll have to learn how to manipulate file descriptors carefully using system calls such as open, close, read/write, dup/dup2, and more.

In this example, my shell creates three child processes. The first reads the contents of my home directory and outputs them to the grep program, which searches for the string '.txt'. The output of grep, i.e., all files with the .txt extension, is then sent to the wc program, which counts how many lines of input it is given (i.e., the number of .txt files in my home directory.

Exercise 6. (15 points.) Add support for all three forms of redirection described above, as well as assigning inputs to arbitrary file handles other than stdin and stdout.
Be sure to run several test cases for piping applications together, and ensure that termination is handled cleanly.

Scripting Support

Most shells can be run interactively as well as non-interactively. In non-interactive mode, you can put the shell commands in a plain file---essentially creating a program of shell commands (called a shell script). For example, if I put this in a file called "foo.sh":

Then I can use this file (or program) to have the shell run these commands sequentially as follows:

In other words, swish will identify the string 'foo.sh' on its own command line and then interpret these commands as a batch. In a batch, the first line runs to completion, then the second, and so forth. These commands do not need to run in parallel, except for pipes on the same line (described below).

One can also make the shell script, executable, and then run it directly like any other program. For that, I need the file to start with a special character sequence called a 'shebang' followed by the path of the shell

Note that swish must be in your PATH for the shebang above to work, otherwise, you should use an absolute path, like /home/porter/lab2/swish.

Exercise 7. (10 points.) Add support for swish to run non-interactively: this boils down to basically supporting an optional input file argument. If 'testscript' is a shell script, the following examples should work, where '$' indicates your default shell (e.g., bash).


$ ./swish testscript


[/home/porter] swish> chmod u+x testscript
[/home/porter] swish> ./testscript

You will also have to support a comment character '#' so if you see a line starting with '#' in the script, you should ignore it. The comment command could also be run non-interactively and "do nothing".


[/home/porter] swish> #this is some text
[/home/porter] swish>

Job Control Support

Another useful feature of a shell is the ability to pause and resume execution of a job. In the case of a long-running program, it is helpful to be able to place it in the "background"---allowing the user to issue more commands interactively while the long-running program continues execution.

Your shell should identify the special character '&', which means that a program should be executed in the background, returning a shell prompt immediately. The built-in command jobs should list all background running jobs, their name, PID, job number, etc. just like bash with their status (running or suspended). It should also print the exit status code of background jobs that just ended.

In addition to jobs, we will need to add a few more built-in commands to make job control useful. The command fg 3 should make job number in your list to go to the foreground (and resumed execution if it is not running/stopped). The command bg 2 should cause suspended program 2 to run in the background.

Finally, we need to be able to forcibly pause or terminate a program. If you type Ctrl+C: the foreground program(s) should be killed. If you hit Ctrl+Z: the foreground program(s) should be suspended and added to the list of jobs (i.e., you send it a SIGTSTP signal to suspend it; fg sends it a SIGCONT to resume running).

Important: Because you will be launching swish from a parent shell, you need to disable job control on the parent shell. Otherwise, the parent will intercept the Control key, and key sequences such as Ctrl+Z or Ctrl+C. We have included a launcher script, invoked as . ./launcher.sh, which will disable job control in the paerent shell, so that swish will receive control sequences on standard in appropriately. Note also that you must include a single . in front of the launcher script. This dot causes the script to be "sourced", or executed within the parent process, rather than only in the child. We need to do this in order to disable job control in the parent shell. Because this launcher was not included in the original handout, you may need to git pull to fetch this from the read-only "origin" repository.

Exercise 8. (15 points.) Add support for job control, including the '&' character, the built-in commands jobs, fg, and bg, and Ctrl-C and Ctrl+Z. Be sure to run plenty of tests, including handling of piped applications or scripts.

Keeping History

A very useful feature of a shell is the ability to keep previously typed commands, and allow a user to easily re-issue them.

For this feature, the user should be able to type the 'up' or 'down' arrow and cycle through a circular buffer of their previous commands. In other words, if I type an up arrow, I should see a prompt followed by my previous command, and if I hit 'enter', the command will execute again.

Although a production-quality shell would clear the command buffer when you type the up or down arrow, it is ok for this assignment to just print a new prompt with the command on a new line.

Important: You will also need to run with the launcher (as described above) for this exercise, as the parent shell buffers input characters until it sees a newline by default. The launcher will disable line buffering in the parent.

For this exercise, it is fine to set a compile-time history length, such as 50 lines.

In order for my history to survive after the shell exits, most shells will write a file in the user's home directory, such as /home/porter/.swish_history. For full credit, your shell should persistently store the user's command history. You can use environment variables to figure out where the user's home directory is.

Exercise 9. (10 points.) Add support for tracking the history of a user, including saving the history to a file. Support the up and down keys to cycle through history, and add a built-in command history that dumps the entire history to the console. Also, add a built-in command, clear to reset the history.

Fun

Exercise 10. (5 points.) Create a built-in command, or a separate program, called wolfie that draws Wolfie on the console using ASCII art. You are welcome to use an ASCII art generator, or draw your own by hand

Contests

In order to encourage creativity and a bit of friendly competition, the instructor and TAs will judge a few contests. The prizes will be bonus points. Only teams that complete all exercises will be eligible to win.

Challenge! (10 bonus points) The team that implements its shell in the fewest lines of readable, clean code will get a bonus. This count excludes blank lines and comments (comments are always welcome). Code that is confusing and difficult to read, as subjectively judged by the course staff, will be disqualified.

Winners will be announced in class after the grading of lab 2 is complete. More than the points, of course, is the pride of winning.

Style and More

Aside from testing the proper functionality of your code, we will also evaluate the quality of your code. Be sure to use a consistent style, well documented, and break your code into separate functions and/or source files as it makes sense.

To be sure your code is very clean, it must compile with "gcc -Wall -Werror" without any errors or warnings!

If the various sources you use require common definitions, then do not duplicate the definitions. Make use of C's code-sharing facilities.

You must include a README file with this and any assignment. The README file should describe what you did, what approach you took, results of any measurements you made, which files are included in your submission and what they are for, etc. Feel free to include any other information you think is helpful to us in this README; it can only help your grade.

Challenge! (5 bonus points) Support time counting. If you start swish with -t, it should count how long each program ran and print stats when the program ends. The output should be something like:


$ swish -t
swish> du -sh /usr
4.3MB /usr
TIMES: real=23.7s user=12.1s sys=7.0s

Be sure to note this in challenge.txt if you do this.

Challenge! (10 bonus points) Support file "globbing" for extensions, such as


swish> ls *.jpg

The above should print all the file names that end with ".jpg". Only support *.[EXTENSION]. That is, you'll need to check to see if an argument starts with an '*', then use readdir(2) and getdents(3) as needed to read all files from the current directory, match them -- using strstr(3) -- and add them to list of args you pass to exec(2). In other words, your shell will be exec-ing a command that'll be as if you typed the full names of all the files on the command line one by one.
Be sure to note this in challenge.txt if you do this.

Challenge! (10 bonus points) Add support for "tab completion" in your shell. If a user types a prefix of a command and then hits the "Tab" key twice, the shell should show all possible commands that match the prefix. If only one command is possible, the shell should automatically fill in the rest of the command. If all possible commands share subsequent letters, automatically fill in letters until the commands diverge.
Hint: Consider using a trie data structure to organize the available commands.