Programming Assignment 4: Catastrophe!

COMP 211-002 | Spring 2023 (Bakita)

Author: Prof. Bakita

These steps were last updated .

Due Date

Tuesday, March 28th, 1:59 PM.

This assignment allows for free late submissions up until 1:59 PM on April 20th. 1 (Even if you did not submit before the original due date.)

The collaboration policy on this assignment is also relaxed. You are welcome to discuss more specific components of the assignment with friends, rather than only abstract concepts. You are also allowed to use another’s keyboard to interact with tools and utilities, such as man, valgrind, or gdb. You should still never be copying code.

Introduction

Catastrophe! In attempting to delete some falsified Tetris quicksaves, we accidentally deleted all of them! Even worse, in attempting to clean up some of vim’s autosaves (swap files), we accidentally deleted verify.c—the file containing our work-in-progress code for detecting falsified quicksaves! Fortunately, we were storing all our Tetris quicksaves on a flash drive, and so we quickly yanked it out as soon as we realized that the delete was happening. In an additional stroke of luck, we built our falsification detection program with gcc’s -save-temps option, so we still have the temporary object file (.o file) for verify.cverify.o.

You have been given verify.o, and a disk image of the flash drive with partially-deleted saves. Your task is to:

  1. Build a program which can call the work-in-progress logic in verify.o, catching and recovering from any faults that occur.
  2. Write a C problem to recover the Tetris savefiles from the disk image.

Key Challenges

Part 1:

Part 2:

Part 1: Making verify.o Useful

Getting Started

You’ll need to understand how to:

You can find verify.o in /playpen/a4/verify.o on the course server, or you can download it from this link. You will also want an updated copy of tetris.h. You can find that at /playpen/a4/tetris.h, or you can download it from this link. Move copies of these files into wherever you will be working on your implementation.

Detailed Specification

Your program is to be called check, and will be implemented in the C file check.c. In order to link with verify.o, you need to tell gcc to use it as an input file. If you copied verify.o into your current directory, you could do this via gcc verify.o check.c -o check. The autograder will provide tetris.h and verify.o in the same directory as your code.

Our autograding infrastructure will compile your code as gcc verify.o check*.c -o check.

Using verify.o

The verification function in verify.o which will sometimes segfault has the function signature:

void is_legitimate(TetrisGameState* game, int* is_legit);

It will set the integer pointed to by is_legit to 1 if the game is likely legitimate (from a human player) and 0 otherwise. You will need to make sure to declare the function in check.c so that gcc can link to the implementation of is_legitimate() in verify.o.

Since we hadn’t quite finished verify.o when we deleted verify.c, is_legitimate() will sometimes segfault (specifically, if the save is legitimate) after completing all the verification operations. You need to set up a signal handler to catch the segfault (SIGSEGV) and resume execution of your program after it occurs. You can use sigsetjmp() to mark the location you want to jump to, and siglongjmp() to jump to it from your signal handler.

Command-Line Arguments

check is to take one argument, the path to a Tetris quicksave. Your program should terminate with a help message on stderr if it is run with less than or more than one argument.

Input

The passed Tetris quicksave is of the same format as in prior assignments—just read it into a TetrisGameState as declared in tetris.h.

Output

On stdout, check should print a single line containing the text “is legitimate” if the quicksave is legitimate and “is not legitimate” if the quicksave is not legitimate.

Error Handling

On stderr, you are encouraged to print error messages if any C library functions fail.

Test Data

See the tetris quicksaves from Assignment 3 in /playpen/a3/ The saves in /playpen/a3/test_data/team_quicksaves and /playpen/a3/test_data/tetris_quicksaves are legitimate, everything else is randomly generated.

Example

A legitimate quicksave:

jbakita@comp211-2sp23:~$ ./check /playpen/a3/test_cases/team_quicksaves/andrew/save2.bin
The quicksave /playpen/a3/test_cases/test_256/random_test_98.bin is legitimate.

A faked quicksave:

jbakita@comp211-2sp23:~$ ./check /playpen/a3/test_cases/test_256/random_test_98.bin
The quicksave /playpen/a3/test_cases/test_256/random_test_98.bin is not legitimate.

Note that in the above examples I chose to print a more complicated output message than the specification requires, but it contains the words “is legitimate” or “is not legitimate”, which is sufficient for the autograder.

Suggested Steps

Since is_legitimate() only crashes when a quicksave is legitimate, first implement the part of check.c which verifies arguments, loads the Tetris quicksave, calls is_legitimate(), and prints the result. Only test this with illigitimate quicksaves for now.

Next, add a signal handler for SIGSEGV and test that it is called when you try testing with a legitimate quicksave.

Finally, add in sigsetjmp() and siglongjmp(). When you call siglongjmp() from your signal handler, execution jumps to as though sigsetjmp() had just been called. The second argument you pass to siglongjmp() dictates what the return value of sigsetjmp() will be the second time. You will likely want to use this value to prevent your program from calling is_legitimate() a second time.

Part 2: File Recovery

Detailed Specification

Your data-recovery program is to be called recover, and will be implemented in the C file recover.c. You may use multiple source files if you would like, the autograder will compile your code as gcc recover*.c verify.o -o recover.

We only deleted the user-generated (legitimate) quicksaves, and so only recover those. Use is_legitimate() from Part 1 to check if a recovered quicksave is legitimate.

The autograder will provide tetris.h and verify.o in the same directory as your code. See the description of Part 1 for information on where you can get the latest copies of these files.

You are allowed to copy SanityCheckState() from tetris.c. This function will be useful in verifying whether the data loaded into a TetrisGameState is sensible; if tetris could even load that datae.

Command-Line Arguments

recover is to take one argument, the path to a disk image. Your program should terminate with a help message on stderr if it is run with less than or more than one argument.

Input

The passed disk image is a binary file containing every byte (in order) that was stored on our flash drive at the time it was pulled from the computer. The flash drive was formatted using the ext4 filesystem with a 4096 byte block size.

Output

Each recovered quicksave should be stored in its own file, which your program will create with a generated name. All of these files should be stored in a subdirectory, also of your creation.

On stdout, recover should print the path at which you have stored each recovered quicksave.

Extra credit opportunity: Rather than simply generating names for the recovered files, recover the original filenames and paths. See Extra Credit Opportunities for details.

Error Handling

On stderr, you are encouraged to print error messages if any library functions fail. (The perror() function is very helpful for reading, interpreting, and printing error codes stored in errno. See man 3 errno and man perror.)

(You are also welcome to print progress messages to stderr, if so desired. Note that you can use terminal escape sequences to move and overwrite the last line printed—this avoids filling the full screen with progress messages.)

Test Data

The disk image you are to recover files from is /playpen/a4/a4_disk.img on the course server.

There are a total of 200356 valid quicksaves in this disk image, but only 100 are legitimate quicksaves.

The valid and legitimate quicksave that appears first in the disk image is in block #11565 (where the first block is #0) and is identical to /playpen/a3/test_cases/tetris_quicksaves/full_board.bin. You may find xxd -c10 (as in Assignment 2), or the diff command helpful in examining and comparing quicksaves.

Example

jbakita@comp211-2sp23:~$ ./recover /playpen/a4/a4_disk.img
./output/recovered_1.bin
./output/recovered_2.bin
... [not all lines shown in this example]
./output/recovered_100.bin

Where the 100 recovered legitimate quicksaves are in the directory output created by my program, and files recovered_1.bin through recovered_100.bin. Your files may be named differently, the autograder will just look at the file locations you print.

Suggested Steps

Preparation

If you missed the class 19, where we discussed how to do this assignment, please review the recording. Everything you should need to know has been covered in class.

If you’d like a more thorough review, some potentially useful readings in File System Forensics:

Open and Read Blocks from the Disk Image

Implement the argument parsing and input file opening logic, with error checking.

Note that there are at least two approaches to opening and reading from the disk image:

  1. Use fopen() and fread()
  2. Use open() and mmap()

I suspect that using mmap() should be slightly faster for any implementation, but either approach is acceptable.

Find Valid Quicksaves

You will want to examine each file system block, and look at if the data in the block (starting at offset 0 in the block) makes sense when interpreted as a TetrisGameState.

You can copy SanityCheckState() from tetris.c, and use this to verify if the data you loaded into a TetrisGameState is valid.

If it is valid, you’ll want to also call is_legitimate() to see if it’s a human-generated quicksave. (You’ll need appropriate signal handling, as in check.c.)

Save Recovered Quicksaves

Once you found a block containing a valid and legitimate quicksave, you should save it to a randomly-named file. (snprintf() may be helpful for generating the file name.) If this is the first file you are saving, also create a subdirectory to save the files in (see man 2 mkdir or 14.8 and 14.9.5 in The GNU C Library Reference Manual, info libc mkdir). Once you save the file, make sure to print the path to stdout, so that the user can know where to find it.

Extra Credit Opportunities

Recover Quicksave File Names (very difficult)

The top-level folder structure before the delete was:

jbakita@comp211-2sp23:~$ tree -L 2 /media/a4_disk
/media/a4_disk
├── lost+found
├── real_saves
│   ├── answer_cases
│   └── test_cases
├── synthetic_saves
│   ├── answer_cases
│   └── test_cases
└── tetris.h

7 directories, 1 file

The folder I accidentally deleted was real_saves/test_cases. Recover the names and paths of the files and folders in that directory, and use those for your output file names.

If you would like to see what else is on the disk, /playpen/a4/a4_disk.img is mounted read-only at /media/a4_disk on the course server.

In case it’s relevant, the disk was originally formatted as mkfs.ext4 -T small -m 0 <path to block device for disk>, with the part in <> substituted for the disk location.

I have yet to find an excellent document laying out how ext4 works, but it merely adds a few features to ext2 and ext3. Pgs. 738-766 in Understanding the Linux Kernel (3rd Ed) thoroughly lay out how ext2 works, which should be enough.

Also see File System Forensics, which has sections which should much more clearly show you how to recover the data.

While poking around the disk, you may want to also check for any other recently deleted files…

Submission

You will submit via Gradescope. We will not be posting an autograder for this assignment, but will curve the results from our internal autograder when computing your final score.

Use submit_a4 on the course server to submit your implementations of check.c and recover.c. This program will also check that your code successfully compiles before uploading it.

Please remember to include the honor code pledge on the first line of both check.c and recover.c, following the same format as with prior assignments.

Note that when you test your code in valgrind for this assignment, it will complain about an invalid read of size 4 in is_legitimate()—this is expected, and you should ignore it.


  1. Note: The syllabus states that late submissions “must be turned in by 168 hours (one week) after the time that the next assignment is due, or before the start of the final exam, whichever is sooner.” As that requires us to accept late A4 submissions up until the start of the final exam, you may request such a late submission using this form. BE WARNED: We will not be able to provide style feedback before Assignment 5 or 6 are due if you take this path. Furthermore, as Assignment 5 has a higher weight than Assignment 4, it is in your best interest to move on