There are the steps required to reproduce the experiments in our paper:
J. Bakita and J. H. Anderson, “Enabling GPU Memory Oversubscription via Transparent Paging to an NVMe SSD”, Proceedings of the 43rd IEEE Real-Time Systems Symposium, Dec 2022, to appear. (PDF)
These steps were last updated .
On Ubuntu, run the following to ensure kernel build dependencies are installed:
apt install -y build-essential flex bison libssl-dev git
For plotting the figures, you will need Python 3, matplotlib and numpy. On Ubuntu:
apt install -y python3 python3-matplotlib
The CUDA SDK is also required, but is normally included with NVIDIA Jetson systems and need not be reinstalled.
Note: Throughout this document, text in brackets <like me>
should be replaced before the command is executed.
Obtain and build the kernel sources by running:
git clone --branch tegra-l4t-r32.7.1 git://nv-tegra.nvidia.com/linux-4.9.git
git clone --branch rtss22-ae http://rtsrv.cs.unc.edu/cgit/cgit.cgi/nvgpu.git
git clone --branch rtss22-ae http://rtsrv.cs.unc.edu/cgit/cgit.cgi/nvidia-tegra-modules.git nvidia
cd linux-4.9
zcat /proc/config.gz > .config # This autoconfigures the kernel with the config of your currently running kernel
# The L4T Kernel includes an out-of-tree patch to the dmabuf subsystem which updates the I/O MMU lazily.
# Lazy updates make buffer deallocation unsafe when paging out, so disable this.
sed -i "s/CONFIG_DMABUF_DEFERRED_UNMAPPING=y/CONFIG_DMABUF_DEFERRED_UNMAPPING=n/" .config
# Build the kernel and modules
make Image modules -j8
sudo make INSTALL_MOD_STRIP=1 modules_install
To install the kernel, run sudo cp arch/arm64/boot/Image /boot/Image.ae
from the same directory as before.
Configuring the bootloader requires appending the following to the end of /boot/extlinux/extlinux.conf
:
LABEL ae
MENU LABEL Artifact Evaluation Kernel
LINUX /boot/Image.ae
INITRD /boot/initrd
APPEND ${cbootargs} quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4
Then change the default kernel by replacing DEFAULT primary
with DEFAULT ae
at the top of extlinux.conf
.
Warning: Incorrect formatting of extlinux.conf
can crash the bootloader.
Now sudo reboot
the machine to load into the custom kernel and modules. After reboot, verify that the correct kernel is running by checking the kernel build date returned by uname -v
.
Note: Our kernel changes are all carefully documented, and we encourage people to review them. We suggest starting with git log
and git show <some commit>
on the nvgpu
repo.
Compatibility Note: We only support the NVIDIA Jetson AGX Xavier (any variant). Our code should also work on the NVIDIA Jetson TX2, but this is untested and will require different bootloader configuration.
Download and build our benchmarks by running:
git clone --recurse-submodules --branch rtss22-ae http://rtsrv.cs.unc.edu/cgit/cgit.cgi/gpu-paging-tools.git
cd gpu-paging-tools
make
gpu-paging-tools
folder created during benchmark setup.sudo ./paging_speed <number of sampling iterations> > fig4_ae_data.csv
to run the Direct I/O Read and Demand Paging experiments../plot_fig4.py fig4_ae_data.csv
.sudo ./fig10_experiments.sh <number of sampling iterations>
to run the GPU, Direct I/O, and Demand Paging experiments../plot_fig10.py <gpu_pg_results> <direct_pg_results> <demand_pg_results>
using the filenames output by the previous step.Note: These results should be slightly different than the numbers in Fig. 4 for Demand Paging and Direct I/O Reads. They differ in how they account for the userspace overheads of walking the buffer to trigger page faults as part of demand paging. This walk isn’t really part of the paging process, but is also unavoidable. In Fig. 4, we add time for a sequential walk to the direct I/O numbers to make it more comparable. Such a thing isn’t possible with GPU paging in Fig. 10, so we instead subtract the cost of a sequential walk from the demand paging time in those experiments. See directio_paging_speed.c
and demand_paging_speed.c
compared to paging_speed.c
.
./fig11_experiments.sh <number of sampling iterations>
to run the GPU Paging overhead experiments../plot_fig11.py <gpu_pg_overhead_results>
using the filename output by the previous step.This process is more complicated, and is still being documented. This page will be updated as instructions become available.