Performance

11 October 2005

Which has best performance?

Airplane

Passengers

Range

Speed

Throughput

777

375

4630

610

228,750

747

470

4150

610

286,700

Concorde

132

4000

1350

178,200

DC-8-50

146

8720

544

79,424

Which network is best?

TIME is THE measure!

What kind of time?

DANGER Will Robinson!

Performance

Problem: machine A runs a program in 20 seconds, machine B runs the same program in 25 seconds.

Cycles

seconds/program = cycles/program * seconds/cycle

A 2GHz clock has a 1/(2*109) = 0.5 nanoseconds cycle time

How to improve performance?

seconds/program = cycles/program * seconds/cycle

So to improve performance you can either:

__________ the # of required cycles for a program, or
__________ the clock cycle time or, said another way,
__________ the clock rate

How many cycles are required for a program?

Could assume that # of cycles = # of instructions

ICycles.png

Wrong! Different instructions take different numbers of cycles on different machines? Why?

Instructions take differing # of cycles

VCycles.png

Important point: changing cycle time often changes # of cycles required

Now that we understand cycles...

Do any of these equal performance?

Common pitfall: thinking one of the variables is indicative of performance when it really isn't.

CPI Example

Suppose we have two implementations of the same instruction set architecture (ISA). For some program, machine A has a clock cycle time of 10 ns. and a CPI of 2.0, and machine B has a clock cycle time of 20 ns. and a CPI of 1.2. Which machine is faster for this program, and by how much?

If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?

# of instructions example

A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively).

The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C.

The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.

Which sequence will be faster? How much? What is the CPI for each sequence?

MIPS example

Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software.

The first compiler's code uses 5 million Class A, 1 million Class B, and 1 million Class C instructions. The second compiler's code uses 10 million Class A, 1 million Class B, and 1 million Class C instructions.

Which sequence will be faster according to MIPS?

Which sequence will be faster according to execution time?

Benchmarks

SPEC '89

Compiler "enhancements" and performance

SpecCompiler.jpg

SPEC '95

Does increasing clock rate increase performance?

Spec95.png

Amdahl's Law

Execution time after improvement =
  Execution time unaffected +
    Execution time affected / Amount of improvement

TI = TU + TA/I

Amdahl's Law Example

Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?

100/4 = 20 + 80/n

n = 16

How about 5 times faster?

Amdahl's Law

TI = TU + TA/I

Another example

Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions?

speedup = old/new = 10/(0.5*10 + 0.5*10/5) = 1.67

One more example

We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark?

100/3 = 100*f/5 + 100*(1-f)

f = 5/6

Remember

Cultural Highlight

last edited 2005-10-11 17:35:42 by GaryBishop