Notes on Software Testing
Comp 401 / 410
Software testing is a large and complex subject. In fact, the verb "test"
has different meanings for different people: the programmer, the project manager,
and the customer. Software testing can be categorized by the size of the pieces
being tested and the goals of the testing.
Size of software under test
- Unit - one method, class or cluster of very closely related classes.
- Component (also called a subsystem) - group of related units. For example:
a windowing subsystem, a database subsystem, or a email list manager. The
size of the components may vary enormously.
- System - all components or subsystems assembled into a whole.

Usually when we think of testing we think of finding bugs in the program, and,
although correctness testing is important, it is not the only reason to test.
Production software is often evaluated for usability: Does it do what the end
users need and is it easy for them to use? (If you design software run over
the web, Jakob Nielsen's book, Designing Web Usability, is a must read. He also
has an instructive web page.) Stress testing determines if the software is robust
under heavy loads: Will the database crash when subjected to maximum loads?
Will the system recover after a crash? Performance testing determines if the
program will be fast enough: Will the weather prediction program take longer
than one day to forecast tomorrow's weather? Some common criteria for testing
are:
- Correctness
- Functionality
- Performance
- Security
- Usability
- Stress
- Recovery
In Comp 401 and 410, we will concentrate on unit testing: testing the correctness
of a single class or a small set of closely related classes (e.g. LinkedList,
ListNode, and ListIterator). Comp 410 will touch on testing the correctness
of a components using test stubs and mock objects.
Unit Testing
Unit testing is the testing of an individual class in isolation from other
classes. First, we write test cases based on the expected external behavior
of the class. These are called black box tests since they depend only
on how the object behaves and not how its is implemented. As testing progresses,
we need to make sure that all of our code is exercised. As needed, we add tests
to execute all Java statements in our implementation. These are called white
box tests since they depend on the implementation details of our code. The
test coverage analyzer Emma measures our degree of success at white box testing.
In general, each individual unit test (each test method in our JUnit test)
is made up of four steps:
- Setup the test case. Optional.
- Exercise the code under test.
- Verify the correctness of the results
- Teardown. Optional.
Simple tests may require either no setup or simple creating a single object
to be exercised by the code under test. Teardown returns all resources allocated
during the test setup. For simple tests we can leave it to the gargabe collector
to reclaim objects no longer needed.
Method Testing
Use the following checklist to thoroughly test the correctness of a single
method.
- Typical Case. Test at typical values of parameters and object state (black
box).
- Boundary Conditions. Test at boundaries of parameters and object state (black
box).
- Parameter Validation. Verify that parameter and object bounds are documented
and checked. Normally, for a
public method, an exception should
be thrown, and for a private method, an assert is
used (black box).
- Statement Coverage. Ensure that all statements are executed by the test
suite (white box). Emma should report 100% coverage with the exception of:
assert statements that check program invariants. With correctly
functioning code, these will never fail, and you will always have 50%
coverage (colored yellow by Emma).
- Parameter validation. This need not be explicitly tested when the code
if of the form
-
if ( parameter == null ) throw NullPointerException
Args.checkForContent( parameter )
- gettr / settr methods as long as they are trivial, one line methods.
- Condition Coverage. For each compound Boolean condition, ensure that each
subexpression evaluates to both true and false. For example, in
if(
a > 0 && null == b ), full statement coverage can be achieved
by testing with { a=1, a=0 }since the compound condition evaluates to both
true and false. However, we have ignored 'b'. Full condition coverage is achieved
by testing with { <a=1,b!=null>, <a=1,b=null>, <a=0,b!=null>,
<a=0,b=null> }.
- Loop Coverage. For each indeterminate loop (i.e. a loop that doesn't
execute a fixed number of times), ensure that it is tested with the loop body
being executed
- 0 times
- Once
- Twice
- k times, 2 < k < maximum
- The maximum number of times possible
Class Testing
Correctness testing of a class requires that you first test each method then
their interaction. Special testing may be needed for the following:
- Inheritance.
- Parameterized types (e.g. generics).
- Polymorphism.
JUnit provides the framework for testing methods and clases. The really hard
part is deciding on the test cases to run. The following acronyms are useful.
Good unit tests are A-TRIP
- A - Automatic. Testing needs to be really, really easy since you
will be doing so much of it. Both the running of the test cases and the evaluation
of the results must be automated. Any input to the test software should be
part of the test software itself or come from a file. Manual entry of test
data or test cases is out of the questions. Assertions should be used to check
the correctness of most member functions.
- T - Thorough. Nuff said.
- R - Repeatable. You must be able to reproduce the results (erroneous
or not) independently of the environment in which the test runs.
- I - Independent. Each test case should be independent of the environment
and each other. You need to have a traceable correspondence between a bug
and the test code. When a test fails, it should be simple to find the code
responsible. Independence of test cases is one of the key advantages of the
XUnit testing framework.
- P - Professional. "A-TRI" is a lousy acronym.
What test cases should I run? Right-BICEP
- Right - are the results right in the typical case? These are usually
the easy and most obvious test cases. You must answer the question: If the
code ran correctly, how would I know? Any input should come from a file and
not from manual user input (see "A" in the next section).
- B - Boundary conditions. Identifying the boundary conditions is the
single most valuable part of testing. Must bugs live in the corners. The specific
boundary conditions depend on your data structure (and, in part, on your code).
Typical boundary conditions are:
- Empty data. Full data.
- Handling the first data element. Handling the last element.
- Not finding what we are looking for.
- Incorrect input values: total gibberish (e.g. "%^*%^*&^"
as a name), a negative value where positives are expected, an unusually
large value (e.g. age = 100000000), date with leap year.
- I - Inverse relationships. Some methods can be checked by applying
the logical inverse (e.g. square the result of a square root calculation).
Sometimes data structure consistency can be checked through inverses. For
example, in a doubly linked list, is
this == this.next.previous
?
- C - Cross check. Often, in large software projects, more than one
version will be developed. First, a quick prototype implements a slow and,
perhaps, partially functional version of the software. The prototype is often
used to test feasibility or solicit feedback from the users'. Next is the
production version. The prototype can be used to validate the results of the
production version by running both and comparing the results.
- E - Error conditions. Errors happen and production software must
deal with them gracefully. Some common possibilities are:
- Running out of main memory.
- Running out of disk space.
- Network is down.
- Heavy system load.
- Limited color palette or low video resolution on a user's display.
- Old versions of software (e.g. Internet Explorer version 4).
- P - Performance. Will the software be fast enough when run on large
amounts of data?
For Comp 401 / 410, we will primarily use the first three elements (Right-BI
?). Gracefully handling error conditions is a time consuming programming task,
and, although it is essential for production software, it will be largely ignored
in our class assignments. For the most part, we will not be concerned with the
performance of our programs nor their robust behavior under incorrect program
inputs or abnormal situations.Our performance concerns will emphasize use of
appropriate data structures and algorithms; we will not delve into the performance
tuning of software. Several of our assignments will parallel classes already
in the standard class library, and, in these assignments, cross checking could
be performed by comparison to a second program that uses the class library.
However, the modest size of these assignments does not warrant building a second
implementation (using the class library).
How to fix a bug
- Identify the bug and its cause. What is the simplest situation in which
the incorrect behavior is exhibited? Eliminate everything else from the code.
- Based on the first step, write a set of test cases that fail in order to
reproduce and demonstrate the conditions causing the bug. These test cases
now become the operational definition of fixing the bug - when they all pass,
the bug is fixed.
- Fix the code so that the test cases now pass.
- Run all other tests to make sure you didn't break anything as a result of
the fix. This is often called regression testing - we regress, go back to,
all previous testing.
- Could the same kind of problem occur anywhere else? It is much quicker to
find it now rather than later.
Process of Test Driven Design
Writing a method is a 4 step process.
- READY - Write the method signature, a brief comments explaining what the
method does, and its pre and post conditions.
- AIM - Write the JUnit test cases for the method.
- FIRE - Implement the method. You are done when the test cases all pass
- FINISH - Refactor. Rearrange the code to imporove its organization and clarity.
Acknowledgement
Adapted, in part, from Humphrey, W.S. 1995. "A Discipline for Software
Engineering" Addison-Wesley. Diagram of categories of testing taken from
Developer Works (IBM) web site.