Regression Testing During the Design Process

The Design Verification Process

In the modern top-down design methodology, there are several levels of abstraction through which the design progresses from initial conception to final implementation (fig. 1). Ideally, design verification is done at the level the designer works, and not lower. That would mean that design verification would be done wherever the designer writes code, or lays out gates. In reality, some design verification is done at each level of the design.

Architectural design
Verify specification correctness
Logic design
Verify logic correctness
Synthesize
Place & Route
Verify timing correctness
Fabricate

FIGURE 1. Top-Down Design Process

Different levels of abstraction in the design process require different types of models for verification. See Figure 2.

level model type

specification network models

behavior debug

RTL regression

gate timing

FIGURE 2. Verification activities by level

Design verification at the specification, or architectural, level is often done with transaction-based models, like network models. These can give the designer an indication of how his design is going to perform. They are often queueing based, they are often ad hoc, writ ten in some general-purpose programming language, they sometimes model hardware - software co-simulation, and the results at this level are often intuitive, and not quantitative. They simply indicate that the system will work more-or-less the way the designer had intended.

There is another use for models at this level, and that is performance analysis. These models are quantitative. They will tell you how fast your system will execute the modelled operations. For example, if you are designing a computer, you can determine what the spec-mark might be using a model at this level.

At the behavioral level, the main activity here is debugging. The model won't do anything, because the pieces are hooked up wrong or there are missing pieces. T he emphasis here is on finding a problem, fixing it, and trying again. This is w here we hear the argument that turn-around time is the most important characteristic of a simulator. It needs to be easy to simulate, debug, and re-simulate. Typically, this is not a long part of the design process.

As we move down the levels of abstraction, we get to RTL. This is where today's designers do the bulk of their work. Once they have gotten the behavioral level more or less correct, they move to this level. The main activity at the RTL level is regression. Regression testing begins when the model works to some degree, but it is still being completed or revised. In fact, in most cases, the design is not really completed at behavioral level and then transcribed to RTL, there i s a gradual transformation between behavioral and RTL, with the bulk of the work being done at RTL. The model works to some degree, it is being completed; as new functionality is added to the model, old functionality is broken or required to be changed. Typically at this stage there are tests which exercise the functionality that has been implemented in the model. These are called regression test s, and they are used to determine that some functionality in the model which has worked at some point continues to work as the design evolves.

The paradigm is run regression tests at night, when no one is there, in the morning come in and fix the things that broke, then make new revisions later in the day and repeat the process.

Because this is the level the designer works at, this is where his work is check ed. Naturally, this can be a very time-consuming process. As changes are made and the regression suite grows, there are more and more things that can be broken . Typically, this process takes far longer than anyone ever expects.

Proceeding down the levels of abstraction, we get to gate level. After the model works at RTL, you synthesize and get a netlist representation. Why would you simulate at this level? There are several reasons.

you don't trust your synthesizer

This was a common situation in the early days of logic synthesis. When people ran logic synthesizers, they wanted to check that the synthesizer had produced good logic. This is comparable to analyzing machine instructions produced by a compiler. People don't do this much anymore.

there is new information which needs to be included in the model

This is the case with back-annotation and timing simulation, so if you are going to do timing simulation, this is the level at which you have to do it.

you want to do fault simulation.

Regression Testing During the Design Process 3

If you want to determine how complete your test vectors are -- whether or not th ey will catch faults in the fabricated chip.

you want to use some gate-level simulation capabilities

One of these is a hardware emulator. Emulators typically can't be used at higher levels, but they can take a netlist and they can simulate it. There are some drawbacks to using hardware emulators -- the correction cycle is long and hardware emulators are expensive. However, hardware emulators are fast.

An interesting observation about doing design verification at the gate level is that if you do this, you need an amount of synthesis which is proportional to the number of designers, not the size of the design. In the typical case of doing verification at the RTL level, and then proceeding to synthesis once the design is relatively stable, you need an amount of synthesis work which is proportional to the size of your design. However, if you do your design verification at the post-synthesis level, every designer needs to make a correction, re-synthesize, and then re-verify. This can significantly increase the need for synthesis, which is often a scarce resource.

Regression Test Generation

Since the bulk of the design effort is at the RTL level, design verification should be done primarily at the RTL level. There are a variety of different tests t hat make up the regression suite, and this depends of course a lot on what kind of system it is that you are designing. If you are designing a processor, for example, the tests probably look a lot like programs -- a set of instructions to be executed.

If you are designing a computer system, there is probably a processor in it that you did not design, so there is not a whole lot of point in running particular instructions. But there is a need to run transactions which come out of the processor and stimulate the rest of the sys tem. There is also a need to simulate th e response to the environment. You need to simulate interrupts coming in, messages perhaps, I/O activity, and the response of the outside world.

The tests that are done for these different kinds of systems are often called architectural verification tests. These should be distinguished from traditional diagnostic programs that are used in hardware bring-up. It doesn't do a whole lot of good to test every bit of a register if all the bits are hooked up the same in the RTL description, whereas at the hardware bring-up level, you might very well test every bit to check that it was fabricated correctly.

Who produces the tests? This is always a problem. The designer will usually produce unit tests. If the designer has produced the logic for a piece of the system , he will produce some tests that test the functionality of that piece. This is usually a solo effort.

However, once his piece has gone into the larger system there is another verification process that needs to take place, and that is determining that each piece fits with the rest of the system. Are the interfaces correct and were the correct assumptions made.

Typically, a separate verification team produces system tests. There is a major benefit of this, and that is that the interpretation of the spec is done independently by this verification team. That is, they are writing tests to the specification, and the designer is designing his logic to the specification. A consistent interpretation is necessary for the tests to run on the design. In effect, the regression suite becomes an executable version of the specification.

Regression Test Characteristics

The best tests are self-checking. The result is simply yes or no. This is not always applicable, for example if you run a program on a processor model, there ma y be more than one output. Ideally there is something that will check the model output with a known good out put and then say yes this worked or no it didn't.

Self-checking models can range from simple consistency checks to complete duplicate models. A consistency check is some code in the model which is not part of t he design, which monitors the state of the design, and identifies illegal conditions or states. Note that doing consistency checks often requires the ability in the hardware description language to observe the model state from an external module. For example, we often will have a design with a watching model which look s at the design state, inputs and outputs, and compares the state and outputs with its own predictions. See Figure 3.

FIGURE 3. A watcher model

A completely self-checking model is often hard to do, since it requires a second fully functional model of the design with which to check the results. People actually do this. For example, if you are designing a processor, you might have an "architectural" model running in parallel with the simulation model, comparing results at each simulated clock cycle. This is a very effective way of catching errors. However, if you were designing an I/ O interface, you probably would no t have a functional model of your design to serve as a reference.

In the case where you do not have a reference model, the next best thing to have is refer ence output for a test. This is nearly mandatory. For any given tests, you need to have a known good output for it. If the reference output can be produced by an independent model, that is best. For example, you might be running l inpack as the stimulus to a work station model, and you would use the known good output as the reference. If you don't have an independent model to produce know n good output, you simply have to determine it by going back to the spec to determine what the correct output of this test is.

Another need in testing is to separate the stimulus from model. This is something that is becoming more important as simulation technology changes. This is standard practice in normal software design. It is a pretty rare program which takes no external input at run time. But it is often the case that the stimulus for t he model is actually part of the model itself. This is because the simulation language is as good at describing the stimulus as it is describing the model. As simulation technology changes, and compiling a model is becoming a more heavy-weight, time-consuming activity, it becomes a greater burden to recompile the model in order to run each new test. In order to get away from this, people have been moving towards using simulation stimulus from an outside file or perhaps using some form of an input language. Again, this is not terribly revolutionary -- people have been doing this in the programming world for many years. But it is relatively new in hard ware modelling.

Another good practice is to make the regression suite out of many small tests, instead of fewer large tests. This not only helps isolate errors, but protects against simulation plat form crashes and allows using more than one machine for doing the regression simulation. Typically, large system designs with large regression requirements will run many machines overnight running as much of the regression suite as possibly. If the tests were simply very long large tests, you could run only a handful of tests -- as many as you had machines -- in a given period of time. If you have many small tests, you can get a lot more tests run over t he course of a given period of time.

Large tests are valuable sometimes as well. This is often true when the test is a program and the model is a computer. It is easier to just use the program in its original form than to decompose it into a larger number of equivalent small tests. One of the classic tests is "boot unix" on your workstation model. This of ten takes many days of simulation time. Another example would be to run linpack. It may be just easier to run it as a simulation test than it is to cut it down to just the interesting part of it for any particular system. The drawbacks, of course, of using large tests are that because they are large programs, they weren't written to exercise the functionality of the system model. They were written for something else. What happens is that they end up testing the same state transitions over and over again. For instance, when you boot unix, the most common operation is probably loading a value from memory. Once it has been done a few times, all of the possible state transitions associated with loading a value have been tested, and simulation time is simply wasted in order to get through those cycles to get to the more interesting ones.

Dumber test production methods usually require more simulation cycles, but less engineering effort to produce the test. By dumber, we mean more automatic. That is, some thing like a randomly generated test. It is often easy, or relatively easy, to write a test generator. A test generator may be able to produce tests that test state transit ions which a designer might not guess were important things to test. However, i t might also produce tests which test the same state transitions over and over a gain before it gets to the interest ing states.

Another way of doing automatic test generation is state enumeration. It is often possible to write a program which will produce a test which exhaustively enumerates all the state transitions possible in a design. Of course, this is only possible with small to medium designs, but with designs which have up to about 20 bits of state, it is a feasible thing to do.

Error Diagnosis

Analyzing the results of a test can be time consuming. The results of regression tests can be quite voluminous. As we said earlier, the ideal result is either yes or no. However, if it is no, you need some data in order to debug it. So typically, regression tests put out a fair amount of intermediate data. Assuming we know what the correct result of the test is, then we use the output of the test to determine where the test went wrong. It is usually the case that the output o f the test doesn't have enough information to debug the model itself. Because the model may have thousands, or hundreds of thousands, of nets and registers, you don't want to save every net and register transition throughout the life of the test every time it is run. That produces an enormous amount of data. So what is usually done is that the test is run in the normal regression mode to determine that it has produced an error, and then the test is re-run in some way with monitoring information turned on so that it will produce a much larger amount of data and then that data can be analyzed, often using wave-form displays or other data displaying programs. It is often the case that display capabilities are added to the model itself so that intermediate data can be represented at the level of the design. For example, you might put out data as a picture if you were designing a graphics subsystem, or you might put out data as an instruction stream o r a set of ether net packets.

Exit Criteria

Finally, we need to address the question of when are you done. When have you simulated enough to determine that your design is in fact logically correct? The way people typically do it is to use a variety of ad hoc means to increase their comfort level with the amount of simulation they have done.

State coverage is often used. The design is monitored to determine if all the important states and state transitions have been covered. Branch-taken coverage an d line coverage of the HDL description are both used, though most HDL simulators do not offer much help in measuring these attributes.

But what it usually comes down to is that you stop when you run out of time. This is the most common exit criteria for the design verification stage. The design looks like it is relatively stable, the number of new bugs found does not increase for some period o f time, and it is time to move on to the rest of the design process.

Conclusion

There is an awful lot of simulation that needs to be done during design verification and the completion of the design at the RTL level. Creating and running regression tests can consume a large part of the design process. By using the techniques described here, this process can be accomplished in an efficient manner.