A Loosely Coupled C/Verilog Environment for System Level Verification
Andreas S. Meyer
Click Icon for a PDF version of this document.
Introduction
In this paper, we present a software C-Verilog interface, which is designed for the functional verification of any type of large system design. As a company specializing in ASIC verification, working with a wide range of systems including routers, parallel processors, and video applications, we not only developed this tool, but we are actively using it in large development environments with a variety of systems. In this paper, we will discuss some of the major concepts in this type of environment, along with the issues, and our experience with this tool in actual large development environments.
Requirements of a System Level Verification System.
The goal of system level verification is primarily to test the function operation of a complete large design. This may range from a few ASICs or FPGAs in a small system, to systems with millions of gates, where there are major difficulties with getting system level simulations to run at all. Determining the correct functionality of a system is a major task. This means more than verifying that the ASIC RTL design meets the requirements of the specification. Our major goal is to verify that the intent of the system architecture is met. To do this, we have developed, and are using a tool which lets us wrap around the entire system simulation, and communicate through all available ports, including processor interfaces, memory, disk, and communications ports.
In order to verify the intent of the system, we use code from many sources in the verification process. This may include diagnostic code, OS drivers, and application code in addition to writing specific verification code. Since writing verification code is very time-consuming, re-using existing code fragments from all of these sources is fundamental to our verification methodology. Since the verification code can be re-used in the diagnostic environment, the diagnostic code will have been developed and tested at the same time, again reducing the project development cycle.
Example
The following example is used to illustrate the basic concepts of a system level verification environment. This example shows a basic router, which has a control unit, a switching system, and several types of communications ports. The following figure illustrates this example. There are several ways that the verification test environment can communicate with the simulation. Transactors may transfer entire operations, such as an Ethernet packet. Bus functional simulation models understand the legal bus operations, and transfer to the test environment includes not only the data, but also all timing parameters and special conditions.
The transactor will return any received data, status, and error conditions. Low level operations may probe and force individual signals in the simulation environment. Interrupt handlers need to be triggered from the simulation environment, and monitors may be added for performance analysis or condition coverage analysis.
The test environment may consist of any number of independent processes, and each process may be communicating with any number of transactors. Communication, queuing, scheduling, and transaction progress monitoring are all dealt with in the C-Verilog software. The test environment itself does not need to see any of this.
One of the key features of this design environment is the re-usability of developed test code. A platform of resources and libraries of design verification functions are portable across projects and platforms. When a new system is introduced, individual transactors may be replaced, along with a specific set of low-level drivers. The higher level test environment is not aware of any changes, and remains portable across projects and platforms.
Design of the System
Many issues must be considered when a verification interface is designed. We will discuss a few of the more visible ones.
Tightly Vs. Loosely coupled test environment
A test environment needs to communicate with the simulation. How the test environment is coupled to the simulation is an important issue. A tightly coupled environment generally will have the test environment linked rigidly into the simulation, while a loosely coupled environment will have multiple processes, with some type of inter-process communication. Because communications overhead tends to be less expensive for tightly coupled systems, these tend to permit higher levels of interaction between the test environment and the simulation.
Loosely coupled systems can also provide tight linking between simulation variables and the test environment, however this can become prohibitively expensive if several thousand signals are being monitored. Loosely coupled systems will try to communicate at a higher level, using entire transactions as the favored communication, with only important single signals being transferred asynchronously.
The major advantages of loosely coupled systems is that they can be more flexible. Using a standard communications protocol, such as sockets, the test environment can be run in a native environment. If a system is being developed on an embedded controller, the software can be run on a development system, linked through sockets to the simulation running on another system. This permits the test code to be run on its native processor with the actual operating system. A loosely coupled system usually takes significantly less time to rerun a test, since only the test needs to be re-compiled. The simulation can either be rerun, which works well with compiled simulators, or re-connected to a running simulation, so that a corrected test is re-run without ever stopping the simulation. A loosely coupled system also permits a test environment to be connected transparently to any type of simulator, including cycle based, or emulated systems.
System Level Environment
Another major issue is in providing a complete system level environment. Our major goal is to be able to simulate an entire system, running actual applications under actual operating conditions. This requires that the test environment actually be able to wrap around the entire system simulation. All input and output ports must be accessible from the test environment. This includes not only the processor ports, but also memory, disk, and communications ports.
All relevant data must be stored in the C environment, since it is likely to be set up by the application or OS driver code. Since we use transactors wherever possible to communicate between the test environment and the simulation, we need to have classes of transactors, which can be used for different types of ports, and instantiations of transactors, since there may be more than one port of each type.
In a complete system, there may be many classes, with hundreds of transactors all running simultaneously. This leads to a great deal of complexity throughout the test environment. To effectively manage this, requires a great deal of flexibility in how the test environment is organized. We have found three mechanisms to permit a parallel, flexible environment: independent processes, parallel C functions, and a built-in fork/join construct.
Parallelism in C
The ability to run multiple distinct test processes in the same simulation environment can be critical. Since much test code can actually come from real diagnostic or application software, it is written to be run in a real environment, not for verification purposes. This means that it must be able to run alone, with no direct knowledge of any other software. This is easiest to accomplish by leaving each such area of code as a stand-alone process, unaware of the rest of the test environment, or even that it is running in a simulation environment.
Some functions are rather complex, and need to be handled whenever needed by the simulation. For this purpose, we have found that C functions linked to specific transactors, or classes of transactors to be useful. These functions operate in a parallel, asynchronous fashion, as determined by the simulation event scheduling.
Allowing parallelism in the test environment is critical to permitting tests to replicate worst-case system environments. Integrated queuing permits one test to access any number of transactions in parallel. Alternatively, a fork-join construct can be used to permit separate functions to access transactors in parallel using a more conventional programming style.
System Performance Enhancements
We have found that once we had this environment, there were several other interesting uses to permit system level simulation flexibility.
Mix between behavioral, RTL, and C models
As one builds a complete simulation environment, several impediments show up. The first is that not all models are available at the same time. Some sections may have RTL code written and running, while others are still using C models, or behavioral models. Since we have a class-based simulation environment, we can use it to mix and match models. This is not only useful to get simulations running sooner, but for large simulation environments, it can have drastic performance implications, since RTL based simulations are rather compute intensive.
Fitting Into a Design Environment
When running in a large design project there are several smaller issues which are absolutely critical to keep designers and verification engineers happy. Our experience with this system in the field has found some fairly basic issues are very import to the engineering teams.
Debug Environment
Determining what has gone wrong, or even what is happening in a large environment can be very frustrating. The ability to see what is happening in the test environment is important. Since this can be a large diverse environment, it is important to supply good information. The most obvious tool is to use a source debugger. Since we tend to keep our test environment in C/C++, there are many professional debugging tools available. The combination of a test debugger, and a simulation waveform viewer is reasonably powerful. One of the major drawbacks of conventional debuggers is that they don't show previous state information the way a waveform debugger does. We have seen that the ability to show transaction history across the test environment is also important.
Repeatability of Tests
If a test is run multiple times, it must have the same results each time. This is true not only for a single test, but also for any number of test processes, which are all running in parallel. Repeatability is the only way to know that an error in the RTL code has indeed been fixed.
Randomness is an important function for most system level simulations, and must be a part of any complete verification environment. Access to good random generators, and stochastic functions is important. Even random tests need to be repeatable to verify that a bug has been fixed. By specifying a specific seed, random tests can be made repeatable when needed.
Turn around time between bugs
Our experience is that most errors are actually in the test environment, not in the simulation code. For every RTL error found, there are many test environment errors. For verification engineers, the turn around time between bugs is a prime consideration. With a test environment written in C, it is usually possible to recompile a test in minutes, particularly if a distributed make environment is available. A loosely coupled test environment does not require that the simulation be re-linked when an error is found in the test environment. The simulation does not even need to be re-run. When an error is found in the test, it can be disconnected from the simulation, re-compiled, then dynamically re-linked to the simulation. This results in a turn-around time between test bugs of minutes rather than hours if a large simulation needs to be restarted.
Fitting into a Regression Environment
Environments with many designers and verification engineers require a thorough, automated regression environment. For this type of system simulation to fit into a regression environment, there are several key requirements, including the ability to start up and shut down the simulation, and the ability to detect errors anywhere in the simulation, including any of the test processes, or the simulation itself.
A distributed environment, with parallel independent processes requires a mechanism to provide error reporting across all processes. We have found that a built-in distributed error reporting mechanism was essential for regressions. This ensures that all processes know about any errors detected anywhere in the system. Furthermore, a process must have the ability to shut down the entire simulation environment when critical errors have been detected. Finally, for the regression process to work, any process must return with status indicating the success or failure of the entire simulation environment.
For regressions to be effective, they must be able to determine whether tests have passed or failed. This means that no human interaction is involved in determining pass/fail status. For this to work, all tests in the environment must be self-checking. How a test checks correct behavior is important. Simply storing what happened on each cycle is not acceptable in a large system, since a small timing change could cause all tests to fail. Rather, tests much verify that the system is operating as intended, and that all performance criteria are met. That may mean specifying the permissible range of timing characteristics for a transaction. This will permit tests to continue passing when small modifications are made to the RTL design, yet ensure that out-of-specification parameters will cause errors to be reported.
How a Group is Affected by this Environment
One of the differences we have seen with this environment is how various groups interact with each other. Having a uniform strategy between function verification, bring-up tests, and field diagnostics can greatly reduce the time to deliver a complete product. Once diagnostic code is ported to the simulation environments, the diagnostic and software engineers are able to run and test their code at a much earlier stage of the project.
There are several side effects from this, aside from greater simulation usage. The first is improved communications between the groups, since they are debugging code together earlier. The second is that verification engineers tend to be more closely associated with the diagnostic group, since they have access to routines and tools already existing in that group, and they have the ability to write verification code which may also be re-used as system diagnostics.
Results
From our work with this tool, we have seen some of the strengths and weaknesses of this approach.
Our experience is that verification test writers and diagnostic engineers are not even aware of the simulation interface. Their code is exactly what they will run on a real system. The only observable difference is that running the tests takes far longer than it would on a real system.
- Easy to get people working effectively
When this tool is installed, it works well with large groups of designers. Verification engineers are not hampered by the writing style. One of our observations is that the software-oriented engineers are not really aware that they are communicating with a hardware simulation environment.
- All the constructs of C/C++.
Engineers writing software do not want to be limited in their choice of coding style or available system functions. By keeping the test environment in C/C++, engineers have a whole range of system calls, and support
- Verification code is not throwaway
The ability to combine the verification and diagnostic efforts has several major advantages. First, both of these tasks are very labor intensive. While there are some verification tests which will not make sense in the diagnostic suite and vice versa, there is still a great deal of overlap, which can save considerable effort.
Second, as with any HW/SW co-design, since much of the diagnostic code and possibly also system software is run on the simulation, many of the basic system errors between hardware and software will have been fixed much earlier in the design cycle. Not only does this help ensure that the hardware is correct, but it also drastically reduces the integration cycle, since much of the software has also been debugged.
Sub-block level control not great - removed from system
Currently, one of the major weaknesses we see with this type of environment is that it is not as easy for low level testing. Unlike system level tests, most low-level environments are much less transaction oriented. Individual bits tend to be observed and manipulated more frequently. This can be done from the test environment, and we are currently providing the additional support routines necessary for this.
Complex transactors can be hard to design
Complex transactors are very suitable to this type of test environment, since a set of support functions are easy to build in using standard C techniques. On the other hand, a complex transactor is still difficult to design. As an example consider a P6 transactor, which can support multiple parallel split transactions, any of which could detect an error in one of many phases. A P6 will re-issue the faulty transaction without any interaction from the actual processor code. When a transactor is designed correctly, it will handle multiple split transactions, error detection and recovery automatically as specified, and will also permit the calling routine to control timing parameters, insert errors, observe the transaction as it progresses, or be called back on errors.
This functionality is essential to fully test a complex port. Once designed, such a transactor can provide that functionality with very little overhead from the test writer. However, developing a good bus-functional model of the transactor is a time-consuming task.
Conclusions:
Overall, we have been very pleased with this approach for system level verification. We have been able to build environments which were unthinkable before, which offer high degrees of parallelism. permit stochastic approaches to control large numbers of transactors, and allow complex environments to control the levels of system traffic, error densities, and test a system at various performance levels.
We have also been quite pleased with the amount of overlap we have found between the software development and verification efforts. While this is highly desirable in terms of both the level of simulation, and the advantages of co-design, it has increased the demand for simulation resources considerable.
Back To Top
Home | About Us | Solutions | Innovation | Jobs@Zaiq |News | Partners | Site Map | Contact Us