Beyond Testing: Verification and Constraints

Slides

Some of the material is on the slides, although I recommend the notes.

Computers are powerful assistive devices. Earlier in the semester, we talked about how computers can help us write tests---ideas like fuzz testing, model-based testing, and property-based testing. Today we'll go even further in that direction. But first, a story.

Intel and The Bug

It's 1994. Intel is releasing their best processor yet---the Pentium, which sports clock speeds of 66, 75, or even 100 MHz. It's a great time to be Intel.

Then an academic researching prime numbers found a bug: sometimes floating-point division was inaccurate. For instance, was calculated in error. should have been equal to within 19 significant digits. The Pentium returned instead.

This may seem minor. Intel certainly thought so. It did result in some unfortunate humor at their expense. My favorite joke was:

Q: How many Intel executives does it take to screw in a light bulb? A: 1, but 0.999999996274709702, is close enough unless you can convince us it matters to you.

You see, Intel initially said that you needed to prove that you were impacted to get a replacement. They even released a paper analyzing the bug (link here, although it's not the original; I need to do some searching to find a more reliable source). The paper's abstract said that its analyses:

lead us to the overall conclusion that the flaw is of no concern to the vast majority of users of Pentium processor based systems

and that

A few users of applications in the scientific/engineering and financial engineering fields who require unusual precision and invoke millions of divides per day may need to employ either an updated Pentium processor without the flaw or a software workaround.

This didn't go well. Intel lost the PR battle, and ended up recalling a significant number of processors. Still, their recall was crafted to minimize costs, e.g., requiring that the user be the one to decide to replace their processor:

"It is the individual decision of the end user to determine if the flaw is affecting their application accuracy."

and exhausting the original lifetime warranty:

The lifetime replacement is 'exhausted' once the replacement is made. Your replacement processor is then covered under a one-year warranty.

Even with these blocks in place, according to Intel's 1994 annual report, the bug cost around 475 million 1995 dollars (that's around 750 million today).

Public-relations issues aside, why do you think this specific bug was so damaging to Intel?

Think, then click! Hardware isn't easy to patch! In fact, even some software bugs can be tough to patch. (Consider all those security vulnerabilities in early versions of Internet Explorer.)

So the impact of a bug isn't just about how much harm it does, or how many people are affected, or even how tough the bug itself is to fix, but also how difficult it is to deploy a fix.


Things Work (Somehow)

It's sort of amazing that our computers even work today. Here's some perspective:

  • The Pentium (1993) had 3.1 million transistors.
  • A Core i7 (quad) (2008) has 731m transistors.
  • An iPhone 6: 2 billion.
  • An iPhone 13 (A15 SoC): 15 billion.

It's not perfect to use the number of transistors as proxy for complexity---there's a lot of duplication in there---but still. Our hardware and software have become more complex since 1994. And given that cars, planes, etc. now use software and commodity hardware far more in the 21st century, it's amazing that our systems haven't killed far more people than they have.

(I'm not joking.)

Basic Example: Puzzles

Keys:

  • we're working declaratively; and
  • we're working symbolically.

Concolic Execution

Consider this function:

int do_something(int x) {
  if (x == 0)
    return 0;
  if(x == 17) 
    return 17;
  if(x % 17 == 3) 
    return -4;
  if (x < 0)
    return -1;
  else
    return 1;
}

How would you go about producing a high-coverage test suite for it?

There exist tools that, for small but possibly intricate programs, can produce very high coverage test suites. I really like one called KLEE. They work by combining two ideas:

  • Symbolic execution: Imagine evaluating the above code, not on a concrete value like 5 or -123 but on a set of constraints that define the allowable range of inputs, like x: Int | x != 0. In fact, that constraint, that x is an int value but not 0 is what you might have left after passing through the first if statement.
  • Solving for concrete values: Given such a constraint, we can use a solver like Forge to produce concrete values that can serve as test cases. So we might take that constraint and produce x = 5. In practice, we'd get a constraint for every branch, and solve for each separately, getting (perhaps): 0, 17, 20, -1, and 1.

The key is that constraint solvers can be used to augment other forms of reasoning about programs.

Model Checking Software and Hardware

When we wrote the puzzle in Forge and asked for a solution, we were using a solver to path find. We asked "does there exist a path from a starting state to a goal state"?

Takeaways

There are many, many techniques to use if you want to go beyond testing. We can't do justice to the idea in a half class meeting, but if you're interested in tools and algorithms for modeling systems and reasoning about them together, you might be interested in taking CSCI 1710---which I'm offering in the Spring! We also have a course on program verification coming in the Fall.