Thursday, 18 February 2010

Vulnerability Modelling

Vulnerability rates can be modelled extremely accurately for major products. Those with an extremely small user base can also be modelled, but the results will fluctuate.

What most people miss is that the number of vulnerabilities or bugs in software is fixed at release. Once the software has been created, the number of bugs is a set value. What varies stochastically is the number of bugs discovered at any time.

This is also simple to model, the variance being based on the number of users (both benign and malicious) of the software. As this value tends to infinity (a large user-base), the addition of any further users makes only a marginal variation in the function. Small user-bases of course have large variations as more people pay attention (such as the release of a vulnerability.

As I have noted in prior posts, this is a Cobb-Douglass function with the number of users and the rate of decay as variables. For largely deployed software (such as Microsoft’s Office suite or the Mozilla browser), the decay function can be approximated as a decay function.

This is, for a static software system under uniform usage the rate of change in, N, the number of defects is directly proportional to the number of defects in the system.


Here, a Static system is defined as one that experiences no new development, only defect repair. Likewise, uniform usage is based on same number of runs/unit time. As the user-base of the product tends to infinity, this becomes a better assumption.

If we set time T to be any reference epoch, then N satisfies


This means we can can observe A(t) — the accumulated number of defects at time t.



With continuous development, an added function to model the ongoing addition of code is also required. Each instantaneous additional code segment (patch fix or feature) can be modelled in a similar manner.

What we do not have is the decay rate and we need to be able to calculate this.

For software with a large user-base that has been running for a sufficient epoch of time, this is simple.

This problem is the same as having a jar with an unknown but set number of red and white balls. If we have a selection of balls that have been drawn, we can estimate the ratio of red and white balls in the jar.

Likewise, if we have two jars with approximately the same number of balls in approximately the same ratio, and we add balls from the second jar to the first periodically, we have a most mathematically complex and difficult problem, but one that has a solution.

This reflects the updating of existing software.

In addition;

Where we have a new software product, we have prior information. We can calculate the defect rate per SLOC, the rate for other products from the team, the size of the software (in SLOC) etc.

This information becomes the posterior distribution. This is where Bayesian calculations are used.

Wednesday, 17 February 2010

Bayesian Statistics

I am going to post a few details concerning a number of distributions for those wanting to learn more on this topic. I will start with a little on the Beta-Binomial probability density function and as a Bayesian, I shall start with a small discourse on predictive distributions. These are used to account for parameter uncertainty with respect to:
· Estimation
· Inference
· Comparing models
In general, the prior predictive distributions can be stated as:
Starting with the Y~Binomial clip_image004distribution, the distribution of the data as it would be modelled before we have made any observations can be expressed as:
Starting with the selection of a conjugate Beta(a,b) prior, then;
To prove the prior result, we need to use the following:
As (by definition), clip_image012
We can calculate the prior predictive density. This comes out as:
The above distribution is called the Beta-Binomial. I will detail some aspects of this distribution tomorrow.

A response to modeling risk

This post follows from a prior post.

“P(compromise) already has the possibility of a vulnerability and the possibility of it being attacked by someone with sufficient motivation and resources built in, doesn't it?”

A normal or Gaussian curve is a good fit for white noise and random error. The number of software bugs is fixed on each iteration and hence the use of a normal distribution is an error. The number may be unknown, but it exists. An unknown set number is not random. The rate at which vulnerabilities is random, but it is not normally distributed.

“There are two problems with your approach that I think I would encounter while trying to use it. First, my company does a lot of in-house development. When a new web application is deployed, assuming we've tested thoroughly and remediated the vulnerabilities we've found, it would appear that the number of vulnerabilities is zero. This is false, of course.”

This is the wrong approach. In developing software, you have ‘a prior’ information already. You have statistics on the native coding error rate from prior exercises. This will be greater than zero. The SLOC (Source lines of code) data is also available. You should have some idea of the number of users.

This means that you should be using a Poisson decay model for bugs and vulnerabilities. This can be made more accurate as a Weibull function that incorporates users, SLOC, data from coding in past assignments (based on lines and correlated errors by programmer if available).

The assumption that all bugs are re-mediated is flawed. This would require that no software bug had every been discovered post remediation. Unlikely at best.

So, the very beginning of the modeling exercise must start from an unknown number of possibly theoretical vulnerabilities.

Unless you are starting with formally verified software, the start is an unknown but estimable number of flaws. For remote compromise, you need to include all paths. This is a network analysis (not as in hardware network, but mathematical)[1]. For a web application assuming a non-local attacker, this needs to incorporate the OS, your app, services used and any applications that an attacker can access remotely.

“What I don't know is the density function to apply when estimating (since I have no historical data at time zero).”
Actually you do. There is the data from other products, but for your own, unless this is the first exercise ever done by the company, data will exist.

The simplest method is to create a poisson decay model. Base this on data from prior projects.

“In addition to this, I may not have a detailed understanding of my user base to factor into the chance of vulnerability.”

What matters is the number of users. As an Internet application is open, this is difficult to model, but it will be estimable by traffic. You can also model the risk based on the level of knowledge regards the site based on how many people come to it.

“I can choose a standard density function, like the Gaussian for example, but I believe there will be cases where it's difficult to predict with certainty what the risk will be, due to not knowing what key factors will push a particular population of users to produce even one attacker, especially when the population is smaller and more restricted than the Internet at large.”

You have a set but unknown distribution of bugs. This is not a Gaussian (normal) distribution.

"I believe that better models might be produced with more data, but I also believe those models will be influenced by observation. Risk modeling in the financial sector has to be a sure sign of this. Taleb predicted in 1997 that the model being used wouldn't be accurate and he had analysis to back that position up."

There are a few issues here. Models need to be tested against real world conditions. They have to be tested. At the least, remove some data from the source used to create the model to test the completed model.

Black Swans are not the issue in financial models. Freddy and Fanny have been basket cases for years. Models have demonstrated the problems for years. But bailouts and subsidies have covered these failings for many people.

Like the financial crisis, data exists in the case you have noted. The issue is whether people are honest enough to use it.

[1] See also Graph Theory.

Sunday, 14 February 2010


I have noticed a sever lack of understanding where reliability and survival modelling is concerned over the last few days in people in the IT Security community. As such, I am going to try and explain a few terms here so that I can minimise (a little) some of this confusion.

Time Degrading Functions
A time degrading reliability function is one that approaches a minimal limit as time increases. That means that the longer the system runs, the more unreliable it becomes. This in information security parlance means that as the number of users and the length of time each increase, the security of the system decreases (and in some cases to zero).

  • R(t)=0 as tclip_image042
Limiting Functions
Software vulnerabilities are a limited function. In any piece of software code, there are a limited number of bugs. These are not known, but they can be estimated. Unlike a time degrading function, the discovery of each bug lowers the number of remaining bugs and increases the time to discovery of the next.

In this case, the longer the software is used and the more users, the less vulnerabilities will remain.

The issue of course is the rate at which new vulnerabilities are added as a patching function. If the rate of adding additional bugs through vulnerability patching is less than one additional expected bug for each patched bug (at the mean), then the software will tend over time to becoming more secure.

The issue is that old software becomes obsolete over time, so that it is replaced with new and hence buggy software.

Finding software bugs can hence be modeled as a Cobb-Douglass function with decay over time (x-axis) and an input as to rate based on the number of users (y-axis). This means the rate at which bugs are discovered increases as the number of users of the software increases. It also means that over time, less vulnerabilities will remain in the software and the chance of a vulnerability being discovered decreases.