For extra credit of up to 5% of your final grade.

In the programming language of your choice, implement a cache simulator for the data cache in the matrix multiply example we looked at in class. Determine the number of cycles for the inner loop for various matrix sizes.

Your simulator should allow you to change the size of the matrices which you can assume are all square. It should also allow you to change the number of blocks and the block size in your cache.

Plot a graph like the one in the lecture slides showing the cycles per iteration of the loop on the dependent axis and matrix size (from 2x2 to 50x50 in steps of 1) on the independent axis. Your graph should include curves for 4 different cache configurations and should illustrate cache issues. You may assume each instruction takes 1 cycle except for load on a cache miss which takes 50 cycles. You may plot the graph with any tool that works for you. Your program can simply write out the appropriate data in a table, then you can plot with Excel, Matlab, XGraph, or even by hand.

Assume the data are single precision floating point requiring 4 bytes each. You are multiplying A*B to produce C. Assume the base address of A is 0x1000000, the base address of B is 0x2345678, and the base address of C is 0x3141592. Assume column major order for the arrays.

Make other assumptions as necessary an include them in your well commented source code.

Turn in your program source and the graph you produced to Gary by the beginning of class on December 8th.