Zaiq Technologies












 
Optical
Wireless
Broadband
Case Studies
White Papers
Methodology Briefs
Explore our domain capabilities and experience.

FPGA-Based Systems: To Verify or Not to Verify!

PDF IconClick Icon for a PDF version of this document.

Introduction
FPGAs are wonderful devices. They allow changes to be made relatively easily compared to ASICs. Their design flows aren't that different from ASICs, and in fact the flow is a bit easier to understand for the novice. And, FPGAs, like ASICs, are allowing ever greater logic densities for designers to use for their systems - more with less, so to speak. However, it's been my experience that FPGA-based systems can lure inexperienced system designers into a trap. The logic goes something like "if it doesn't work, we can fix it in the lab!"

In the sections that follow, I'll provide a more rigorous approach to understanding the verification problem as it relates to FPGA-based systems. To illustrate the approach and results, a real system will be used as an example. In the end, I'll hope to show easily avoidable mistakes and what those mistakes could cost.

2.0 Re-programmability: A License to Steal?
Steal? What do I mean, steal? There seems to be something about re-programmability that makes a designer or a design manager feel good about removing significant chunks of their implementation schedule when FPGAs are involved. So, what I mean is that re-programmability is not a license to steal development time.

For instance, I've seen a development program totally ignore simulating a multi-FPGA system. The argument was that verification was not necessary, since the FPGAs could "easily" be repaired during the hardware debug phase of the project (i.e. "in the lab."). The popular sentiment for not doing this essential verification was, "Why waste the time!"

As a result of skipping the verification, they ended up with the equivalent of a rock on power-up. The clock was running and little to no activity was observed while probing the individual FPGAs. In the end, a simulation environment was constructed that matched the board netlist, and simple tests were run -- gross-functionality testing. Once the gross testing was completed, hardware debug could begin again.

The second attempt was more successful than the first. The design team could make progress debugging the hardware AND problems discovered during hardware debug could be run in the simulation environment. The simulation environment provided visibility into the FPGAs that was not available during hardware debug.

Another example: I've been involved peripherally with another development program with 16 complex FPGAs. Here, at least part-centric verification was planned - verify each FPGA stand-alone. However, no system-level verification was planned. The argument was that management was willing to take on the added risk of not doing system-level verification. Unfortunately, this group had minimal experience with complex digital systems. Their experience had been with small numbers of standard digital parts, and analog subsystem design. All the parts were thrown on a board and hardware debugged in the lab. This case will be discussed in greater detail later in this paper.

Most ASIC-based1 development programs with which I've been involved either directly or in-directly have not had the problems with verification that I've seen with FPGA-based systems. Is it because ASIC designers are much more savvy than FPGA designers? Do ASIC-based systems get the better development managers for their programs?

Relative Characteristics ASICs FPGAs
Logic density & speed High/ Medium Medium/ Low
Implementation cost High/Medium - requires back-end and ASIC vendor sign-off phases, plus NRE charges. Medium/Low - may take a day to fit the design and add the changes to the hardware (longer if over-designed), and could easily take an hour or less for small changes.
Cost to spin a new device for functionality changes High -- need to repeat some verification and whole back-end and ASIC vendor sign-off phases. This costs NRE charges and development time. Low - but can get more complicated for designs that push speed or density.
Visibility inside device Low Low
Correct by design ? ?
Table 1: High-level relative comparison od ASIC vs. FPGA characteristics.

Table 1 above compares some relative characteristics of ASICs and FPGAs. Note the costs to implement the functionality the first time and to spin a device for functionality changes due to bugs found during hardware debug. The ASIC path requires not only NRE (Non-Recurring Engineering) charges, but greater development costs due to more complicated back-end and ASIC-vendor sign-off processes. The FPGA path is fairly straightforward … and cheap. So, could it be that the ASIC folks take verification more seriously up-front due to the high cost of not producing required functionality on the first try?

For FPGA-based systems, this cost issue is not so great, so it's easier to get lost in determining what's required for verification prior to system hardware debug. That subject will be discussed below. The issues of logic density and speed, and visibility will be important to the discussion below.


1 Note that when I refer to ASIC-based development programs, I also include full-custom ICs. For the purposes of this paper they are the same.

3.0 Characterizing The Verification Problem
To understand the verification requirements for a given development program, it's important to look at the problem in terms of layers. Like stripping off layers of an onion, we need to understand the problem at not only the technical level. Input from product planning, marketing and product applications are all important to an optimal system verification strategy.

The sections that follow will explore the verification space for electronics-based systems, at the planning level. Three levels, or orders, or verification will be presented. It's important to note that these levels are just distinctions to help understand how to approach a solution to verifying any complex electronics-based system. This holds for ASIC-based or FPGA-based systems, and also, mixed analog and digital systems, or systems with embedded processors.

3.1 Verification First-Order The first and best place to begin any verification strategy discussion is by understanding the scope of the problem. Here we would look at verifying everything about the system:

  • Product scope & requirements;
  • Architecture;
  • Specifications;
  • Implementation.

Why should we verify everything? The golden rule of verification states that if you don't verify it, it doesn't work!

The questions to ask:

  1. Does the specified architecture meet the product scope and requirements?
  2. Do the specifications accurately capture the details of the architecture?
  3. Does the implementation match the specifications?

The first two questions above are an important part of the overall product verification. They are also outside the scope of this discussion. The last question is where we'll focus.

An implementation generally encompasses the following:

  • Board(s) - unique boards and unique configurations of boards;
  • Intra-device logic - what goes on inside a custom logic device, either an FPGA or ASIC;
  • Inter-device logic - what goes on between devices, either FPGAs, ASICs, or standard parts (i.e. SDRAMs, FIR filter chips, or embedded processors);
  • Control/status registers (or CSRs) - control software and/or diagnostic access to the system's devices, which provide the user's view of the system.

Figure 1 below shows a general system block diagram, with multiple FPGAs on a board, interconnected as they might be in a typical application, all tied together with a backplane. Even though this is a general system block diagram, the goal is to illustrate some important observations pertaining to a real multiple FPGA-based system.

There are four unique FPGA implementations on the visible board. Assuming a typical FPGA today, like a Xilinx V300 device, there are approximately 200K equivalent ASIC gates and 65Kbits of SRAM on each FPGA. This gives a board total of 800K equivalent ASIC gates and 260Kbit of SRAM. This is all custom logic. Except for the operation at the interfaces, all this logic operates invisibly at the board level to any observer with a logic analyzer or scope probe.

There are five inter-device paths (counting the backplane interface). Each interface, if it does not have an air-tight specification prior to design, may work differently than expected on different FPGAs. Therefore, it's smart to treat the five inter-device paths as ten custom interfaces. Having one person responsible for each inter-device interface will help in getting it correct the first time, and should minimize the verification effort.

There is a common interface for the CSR reads and writes. As with the inter-device paths, if this common interface does not have an air-tight specification prior to design, it may work differently than expected. Having one person specify and design the common interface will help in getting it right the first time, and should minimize the verification effort. In general, common interfaces are a good thing vis-à-vis verification.

There are standard part interfaces for the Synchronous SRAM and Synchronous DRAM parts. Sometimes these can be purchased as pre-verified cores. Other times, they must be created from scratch. Having vendor-certified RAM models is important in making sure the interfaces work, in either case. Even using a purchased core, there's a risk it's broken too. The core vendor may stand behind their product if it doesn't work, but it doesn't help your time to market or time to revenue. It's also easier for a core vendor to guarantee an FPGA core than an ASIC core!

There may be unique boards that comprise a system, with their own unique FPGAs and interconnections, compounding the verification problem.

Table 2 below shows a verification matrix for each implementation component listed above, versus required logical2 verification tasks. This table should help summarize the type of verification needed, based upon what the implementation looks like at each level. A correct result must be achieved for each verification task listed in the table.

Implementation Component Logical Verification Tasks
Boards
  • operation vs. the specification
  • connections between devices on board
  • connections between multiple boards in a system
  • mode tie-offs for each device
  • clock connections
  • reset connections
  • pull-up/pull-down/serial resistors connections
Intra-device logic
  • operation vs. the device specification
  • operation in each functional module
  • operation between functional modules
Inter-device logic
  • interface specifications
  • operation vs. the specification
  • operation between devices on a board and between boards in the system
  • operation between interface and internal logic
Configuration/Status Registers
  • Address space allocation
  • global/broadcast address operation
  • address uniqueness across boards and common devices
  • various power-on and software reset states

For example, if we took the four FPGAs in Figure 1 and squished them into an ASIC, would you still need to verify the inter-FPGA interface logic? Since the inter-FPGA interfaces are now inter-module interfaces inside the ASIC, you sure do have a verification task! Another example would be a system that is composed of multiple instances of the same board. In this case, would you need to verify operation between boards? If there is any functionality that requires another board in the system to operate as part of the specification, then there's a verification task!


2Here I'm using logical verification as distinguished from physical verification, for instance, where behavior of the device outside of timing is verified. Note that timing is also only one aspect of physical verification.

3.2 Verification Second-Order
In the last section we looked at the verification requirements for checking the entire system. In this section we'll look at the second-order verification issues resulting from balancing risk vs. time-to-revenue. In other words, how much can we slack-off on the verification above without shooting ourselves in the foot in getting a product out the door as quickly as possible, AND getting a product up the revenue curve as quickly as possible!

Figure 2 shows a representation of program risk versus time-to-revenue. Some things to note about this figure are that:

  • The penalty for too much risk can get really out of control fast!
  • Getting to the optimal time-to-revenue should be a goal, not a requirement, as it's usually hard to quantify where the optimal spot is located. Also, there are usually lots of factors influencing both program risk and time-to-revenue, like requirements versus goals, and company politics for example.
  • Removing too much risk may get you over the good/bad line in the figure, but is usually not as bad as too much risk!

If we take on too much risk, then the chances are good that we'll be identifying and fixing lots more bugs during the hardware debug phase, where it costs more in time and money to fix. If we take on too little risk, then the changes are good that we're spending too much time up front and delaying our entry into the market3.


3Note that it's probable that no design is ever truly bug free!

So, what are the risks we can take-on for a heavily FPGA-based project, and how do we keep the impact of those risks to a minimum or at least quantified? Table 3 below lists suggested allowable risks and the countermeasures that will help minimize and/or quantify their effects on the development program.

Allowable Risks Risk Countermeasures
Verify less & fix bugs during hardware debug (i.e. lab.) phase
  • Have a prioritized simulation plan, which insures that gross functionality is covered. For instance, verifying that the architecture is sound and that the data-paths are correct is important in general. For FPGAs, it's critical that large logic blocks like data-path logic is clean due to fitting issues. You don't want to find out during hardware debug that you need two FPGA's instead of one!
  • Have a simulation environment that matches what's in the hardware debug lab. Remember: if it's not verified, it doesn't work. Assuming some things don't work, what's the best plan for closing? A good simulation environment that matches what's in the hardware debug lab. can go a long way towards getting to market optimally4. Remember, fixing the bugs is usually easier than finding them!
Don't run the code coverage tool5 until later, or not at all. Following the two bulleted items above will mitigate most problems here for FPGAs. Any dead code (unused) will not hurt, and will generally free space later on the device, helping with fitting. Also, any uncovered code will likely fall into boundary cases in the control logic. Having a simulation environment will help to identify the cause of the problem, as it will provide visibility into each FPGA.
Tape-out the boards before simulation is closed
  • A prioritized simulation plan, with focus on closing board issues early, will allow the boards to be taped-out while simulation is still on-going. Focusing on the issues in Table 2 is a good start.
  • Rigorously track the board-level bugs found6, using a good bug tracking tool hopefully, and check that the bug rate is dropping or is zero for board-level bugs.
  • Make sure that any problems with fitting your designs into the FPGAs are resolved - for instance logic density, I/O placement, or timing -- as adding or changing logic during hardware debug will become a problem later (i.e. change in pinout).
Start hardware debug before simulation is closed If you've done a good job thus far, following the suggestions above, then only small boundary cases should remain in the logic. While hardware debug is starting, with things like smoke-testing, and reset/initialization testing, small fixes can be made in parallel using the simulation environment that will save time later.

Table 3: Allowable risks and countermeasures for FPGA-based systems

4Note that all changes made during lab. debug should be carefully tracked back into the simulation environment.
5Code coverage is discussed in the next section in more detail.
6Bug tracking and bug rates are discussed in the next section in more detail also.

Following the above will help optimize the time-to-market and time-to-revenue for FPGA-based systems. There are also permutations of the above suggestions, for example consciously deciding not to do any lower-priority simulation, and going to market with only with high-priority functionality verified - and hopefully only this functionality presented to your customers! There are always follow-on releases of hardware.

3.3 Verification Third-Order
So far, we've looked at what to verify and optimizations in the simulation program to help get to market quicker and up the revenue stream faster. This is a real benefit of heavily FPGA-based systems. With heavily ASIC-based systems, the decisions get a lot harder and usually more costly. How do you know when you're ready to close the simulation or phases of the simulation program?

Some of the "done-ness" techniques:

  • Running a code-coverage tool. A tool of this nature, like Synopsys' CoverMeter, will instrument the RTL code for each FPGA, and report statistics of how the RTL was exercised by the suite of verification tests. For instance, a given test will target specific functionality of each device, and the code coverage tool will report the extent of the coverage. This is useful feedback to the verification engineers for closing individual test coverage. It's also useful to combine the coverage results from each test to understand if there is logic that is not exercised that is also needed for correct operation of the device. This is especially important for high-priority functionality.
  • Rigorously tracking bugs found, for each device, for each board, and possibly per sub-category of component, like FPGA interfaces versus internal logic. Tracking every bug will allow management to review statistics -- like the bug rate curve or having too many bugs per device module -- per device, to decide if most of the bugs have been found and if it's therefore safe to release the device to manufacturing.
  • Simulation test reviews versus the simulation plan. This is a good check-step, to make sure the individual verification engineers interpreted the verification plan and device specifications as intended. It will also help to insure that the tests are complete.

Adding these techniques to your verification program will provide you with a comprehensive and rigorous approach to the transition from simulation to real hardware.

4.0 So What Do I Do Now?
In this section, we'll put the concepts described above together with a real system example, composed of 14 unique FPGAs and other devices that are not re-programmable. Figure 3 below gives a rough idea of the nature of the example system. The relevant characteristics of the design:

  • approximately 800K equivalent ASIC gates and 400Kbits of on-FPGA SRAM, distributed amongst 14 unique FPGAs -- all custom logic -- implementing a DSP-based architecture;
  • the design includes standard Synchronous SRAM components;
  • a standard backplane for interconnection to the rest of the system;
  • interface logic to the outside world, with analog, D-to-A and A-to-D converters;
  • most FPGAs exhibiting a tight fit for the intended logic;
  • weak specifications for not only the architecture but each component, and the architecture is new (little in-house experience).

In addition, the program is late to market and the verification plan is to simulate only at the FPGA-level! Your first reaction to this situation? Run away! It's going to be a disaster. But there's a challenge here. How can this be turned around?

Table 4 below summarizes the problems and proposed solutions for turning what appears to be a potential disaster into a success.

5.0 How Did I Do?
After putting a simulation plan together, based on the data in Table 4 below, the plan was executed over a 17 week period with the following results (15 weeks of actual simulation and 2 weeks of putting the environment together).

Problems Solutions
Incomplete system and architecture specifications, resulting in FPGA and board/system designs that don't meet the intent. There must be a plan to verify that the architecture meets the intent! If there are incomplete specifications, they must be completed during simulation. Test cases that verify gross architectural intent should be first on the list. In the end, the architectural requirements should have check marks against the verification/simulation plan and the individual tests.
Only FPGA-level simulation planned, resulting in risk that board-level, inter-device, and configuration/status components of the system will not work as intended.
  • All the logical verification tasks detailed above in Table 2: Implementation vs. verification matrix, for boards, inter-device logic, and configuration/status registers should be rolled into a simulation plan.
  • Also, the "done-ness" techniques should be rolled-in as well, especially test reviews and bug tracking
System software is an integral part of the overall device as presented to the user, but there is no co-verification planned, resulting in risk that there will be architectural hardware/software integration issues. Hardware/software co-verification should be included in the simulation plan.
Inexperienced design team, plus time-to-market pressure, results in team resisting any additional work. The view of the team is "we'll fix it during hardware debug!"
  • Lots of talking.
  • Lots of persuasion, including providing data pointing to expected problems.
  • Verification optimizations, as discussed in section 3.2 above, should be rolled into the simulation plan.
  • A detailed verification schedule, including simulation and hardware debug phases, should be created and consensus gained by verification and design team members.

Table 4: Example design verification problems and proposed solutions

Figure 4 shows the number of FPGA bugs found per week. These bugs fell into the following categories:

  • Architectural bugs,
  • Intra-device (interface) bugs,
  • Inter-device bugs, and
  • Configuration/Status register bugs.

It would have been nice and useful to see the bugs found per category, but this was sacrificed early in the program. Note that the FPGAs were simulated at the module and device levels prior to the system-level simulation, so all these bugs were "extras" that would have been dealt with during hardware debug.

Figure 5 above shows the board-level bugs found per week. Note that simulation of the board was not planned initially. Imagine getting stuck with over 70 board-level bugs during hardware debug! Considering that today's printed-circuit boards contain BGAs, blind vias, and components on both sides of the board, there is much less visibility than in the past for debugging hardware in the lab.

The total number of bugs found is shown in Table 5 below. Do you think with over 200 bugs found in additional simulation, that the design would have made it into production outside of a year if the original plan was executed? This is especially true with the architectural bugs found, which affected the critical product requirements!

Component Category Total Bugs Found
FPGAs 150
Boards 76
Total Bugs Found 226
Table 5: Total bugs found in 15 weeks

The additional simulation took 17 weeks prior to initial board tape-out. Then, more simulation was done as part of a second phase, to wring-out more boundary cases in the logic. Do you think the additional 17-week hit was worth it, considering the impact on time-to-revenue? Not putting the 17-week effort into the schedule could have easily added 52-weeks of unplanned work!

Some additional risks taken:

  • A code coverage tool was not used. The verification plan was reviewed along with the individual tests, so there was good confidence that the only bugs remaining were related to boundary cases in the control logic. These could be identified and fixed during hardware debug, with the aid of the simulation environment.
  • A board was sent for layout and fabrication prior to complete simulation closure. This was only after architectural and board-level verification were deemed closed. Any remaining board-level bugs were assumed to be few and non-critical, and would be caught during simulation while the board was being manufactured.
  • A second board was sent to layout and fabrication after all high priority verification was completed. The assumption was that low priority simulations could be completed while the higher-quality board was being manufactured.

6.0 Conclusion
Re-programmability is not a license to ignore verification -- don't fall into the trap. There is little distinction between FPGAs and ASICs when it comes to verification requirements. However, with FPGAs you can optimize your verification requirements to minimize time-to-revenue.

It's important to view product verification methodically, with the steps outlined above. Take a rigorous approach, and work your way down to a plan that meets schedule requirements without taking-on major risks. In other words, tailor the verification to the system and the schedule and market constraints.

7.0 References
Verification Links ASIC, FPGA Design, Ian Kersley, ASIC Alliance Corporation, EE Times, April 19, 1999.

Back To Top

Home | About Us | Solutions | Jobs@Zaiq | Innovation | News & Events | Partners | Site Map | Contact Us




In The News
Read the latest on Zaiq
Apply for a position today!