Artifact Evaluation Instructions for “Enabling GPU Memory Oversubscription via Transparent Paging to an NVMe SSD”

There are the steps required to reproduce the experiments in our paper:

J. Bakita and J. H. Anderson, “Enabling GPU Memory Oversubscription via Transparent Paging to an NVMe SSD”, Proceedings of the 43rd IEEE Real-Time Systems Symposium, Dec 2022, to appear. (PDF)

These steps were last updated .

Preliminaries

On Ubuntu, run the following to ensure kernel build dependencies are installed:

apt install -y build-essential flex bison libssl-dev git

For plotting the figures, you will need Python 3, matplotlib and numpy. On Ubuntu:

apt install -y python3 python3-matplotlib

The CUDA SDK is also required, but is normally included with NVIDIA Jetson systems and need not be reinstalled.

Note: Throughout this document, text in brackets <like me> should be replaced before the command is executed.

Kernel Setup

Obtain and build the kernel sources by running:

git clone --branch tegra-l4t-r32.7.1 git://nv-tegra.nvidia.com/linux-4.9.git
git clone --branch rtss22-ae http://rtsrv.cs.unc.edu/cgit/cgit.cgi/nvgpu.git
git clone --branch rtss22-ae http://rtsrv.cs.unc.edu/cgit/cgit.cgi/nvidia-tegra-modules.git nvidia
cd linux-4.9
zcat /proc/config.gz > .config # This autoconfigures the kernel with the config of your currently running kernel
# The L4T Kernel includes an out-of-tree patch to the dmabuf subsystem which updates the I/O MMU lazily.
# Lazy updates make buffer deallocation unsafe when paging out, so disable this.
sed -i "s/CONFIG_DMABUF_DEFERRED_UNMAPPING=y/CONFIG_DMABUF_DEFERRED_UNMAPPING=n/" .config
# Build the kernel and modules
make Image modules -j8
sudo make INSTALL_MOD_STRIP=1 modules_install

To install the kernel, run sudo cp arch/arm64/boot/Image /boot/Image.ae from the same directory as before.

Configuring the bootloader requires appending the following to the end of /boot/extlinux/extlinux.conf:

LABEL ae
   MENU LABEL Artifact Evaluation Kernel
   LINUX /boot/Image.ae
   INITRD /boot/initrd
   APPEND ${cbootargs} quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4

Then change the default kernel by replacing DEFAULT primary with DEFAULT ae at the top of extlinux.conf.

Warning: Incorrect formatting of extlinux.conf can crash the bootloader.

Now sudo reboot the machine to load into the custom kernel and modules. After reboot, verify that the correct kernel is running by checking the kernel build date returned by uname -v.

Note: Our kernel changes are all carefully documented, and we encourage people to review them. We suggest starting with git log and git show <some commit> on the nvgpu repo.

Compatibility Note: We only support the NVIDIA Jetson AGX Xavier (any variant). Our code should also work on the NVIDIA Jetson TX2, but this is untested and will require different bootloader configuration.

Benchmark Setup

Download and build our benchmarks by running:

git clone --recurse-submodules --branch rtss22-ae http://rtsrv.cs.unc.edu/cgit/cgit.cgi/gpu-paging-tools.git
cd gpu-paging-tools
make

Generating Figures

Complete Kernel Setup and Benchmark Setup.
Switch to the gpu-paging-tools folder created during benchmark setup.
Select how many sampling iterations you would like to run. 1,000 are used in the paper, but similar results are obtainable at 100 samples or less. Be advised that even 10 samples may take several minutes, as the creation and random initialization of the test buffers can take much longer than the transfer operations themselves. The scripts are all non-interactive, and can be left unattended.

Figure 4

Run sudo ./paging_speed <number of sampling iterations> > fig4_ae_data.csv to run the Direct I/O Read and Demand Paging experiments.
Plot the results with ./plot_fig4.py fig4_ae_data.csv.

Figure 10

Run sudo ./fig10_experiments.sh <number of sampling iterations> to run the GPU, Direct I/O, and Demand Paging experiments.
Plot the results with ./plot_fig10.py <gpu_pg_results> <direct_pg_results> <demand_pg_results> using the filenames output by the previous step.

Note: These results should be slightly different than the numbers in Fig. 4 for Demand Paging and Direct I/O Reads. They differ in how they account for the userspace overheads of walking the buffer to trigger page faults as part of demand paging. This walk isn’t really part of the paging process, but is also unavoidable. In Fig. 4, we add time for a sequential walk to the direct I/O numbers to make it more comparable. Such a thing isn’t possible with GPU paging in Fig. 10, so we instead subtract the cost of a sequential walk from the demand paging time in those experiments. See directio_paging_speed.c and demand_paging_speed.c compared to paging_speed.c.

Figure 11

Run ./fig11_experiments.sh <number of sampling iterations> to run the GPU Paging overhead experiments.
Plot the results with ./plot_fig11.py <gpu_pg_overhead_results> using the filename output by the previous step.

Figures 12 and 13

This process is more complicated, and is still being documented. This page will be updated as instructions become available.