I have been commenting on the testing of security software, specifically anti-spam and anti-virus products. The main point I made in both of those posts was that testing has to be on live data feeds, regardless of how difficult the task, because the threats evolve at such a high rate that corpus-based testing quickly becomes stale and does not represent the true state of incoming traffic.
In situations where there are a limited number of security vendors and adversaries, even live testing becomes extremely difficult. Let's consider an extreme case, where there is only one security vendor and multiple adversaries. Every single system is identical, running up to date anti-virus packages. (Yes, I fully realize this is a completely unrealistic example, but bear with me.) From the standpoint of the testing and user community, the accuracy of the system is perfect; no viruses are seen by the system, as they don't even have an opportunity to propagate. At the same time, virus writers realize there is a huge, untapped market of machines just waiting to be compromised if they could only gain a foothold. These guys sit around and hack code until a vulnerability is found in the AV system, and upon finding it, will release a virus that exploits this in the wild.
Before the virus is released, the accuracy of the system is:
- 100%: it catches all known viruses.
- 0-100%: there is no way to test it.
After the virus is released, havoc breaks out, aircraft fall out of the sky, and dogs and cats start living together. 5% of all computers worldwide are infected before the vendor releases a patch. If the vendor was able to move faster, the number of compromised systems would have been only 1%, but left to its own devices, the virus would have compromised every system connected to the net. In this situation, the accuracy of the system is:
- (1 - 1/(# of old viruses))*100%: only one virus couldn't be stopped.
- 0%: no viruses were in circulation at the time except for the one that caused mass havoc.
- (1-(# of compromised systems)/(# of uncompromised systems))*100%: the expected number of compromised systems at the end of the virus' run.
The third of these three accuracy measures seems the most appropriate, and the most flexible given a variety of network and economic conditions adversary styles. The measure, which is effectively the expectation of exploitation for a given host, is what is used today by anti-spam system evaluators. It is a slightly more sophisticated way of saying "what is the probability that a piece of spam will get through."
From a general security standpoint, however, it covers a difficult and often ignored parameter critical to the accuracy of a security product: response time. If the window of vulnerability between when the virus first appears and when signatures are issued is shrunk, the accuracy expressed by this metric improves. In fact, the Zero-Hour Anti-Virus industry is an emergent cottage industry in the security space. Ferris covered it back in 2004, and I talked about it at Virus Bulletin 2006.
Many of these zero-hour technologies are being used primarily in the message stream, but this probably won't last for long. I suspect the technology popped up here first because of the sheer volume of e-mail based viruses as well as the ease of which service providers, who ultimately end up spending money on these technologies, can quantify its cost. They store all mail then forward it along, unlike web-based trojans which just fly through on port 80, and have an opportunity to actually examine the number of viruses. As industry gains experience with automated means of identifying and distributing fingerprints or signatures for newly identified malware, we will see it spring up in other places as well.