Tuesday, 9 February 2010

Modelling Risk

Many people feel that it is not feasible to model risk quantitatively. This of course is blatantly false. In the past, many of the calculations have been computationally infeasible at worst and economically costly at best. This has changed. The large volumes of computational power that is available coupled with novel stochastic methods has resulted in an efficiently viable means of calculating risk quantitatively with a high degree of accuracy. This can be measured as a function of time (as survival time), finance (or monetary value) or any number of other processes.

As an example, a recent question as to the ability of secure SMS based banking applications has been posed on the Security Focus mailing list.

The reality is that any SMS application should be a composite of multiple applications. Such a system could be one where a system uses an SMS response with a separate system (such as a web page over SSL), the probability that the banking user is compromised and a fraud is committed, P(Compromise), can be calculated as:

P(Compromise) = P(C.SMS) x P(C.PIN)

Where: P(C.SMS) is the probability of compromising the SMS function and P(C.PIN) is the compromise of the user authentication method.

P(C.PIN) is related to the security of the GSM system itself without additional input. P(C.SMS) and P(C.PIN) are statistically independent and hence we can simply multiply these two probability functions to gain P(Compromise).

The reason for this is that (at present) the SMS and web functions are not the same process and compromising one does not aid in compromising another. With the uptake of 4G networks this may change and the function will not remain as simple.

The probability that an SMS only system can be cracked is simply the P(C.SMS) function and this is far lower than a system that deploys multiple methods.

For each application, we can use Bayes' theorem to model the number of vulnerabilities and the associated risk. For open ports, we can use the expected reliability of the software together with the expected risk of each individual vulnerability to model the expected risk of the application. For instance, we could model clip_image002using this method.

clip_image004

alternatively;

clip_image006

Over time, as vulnerabilities are uncovered the system has a growing number of issues. Hence, the confidence in the product decreases with time as a function of the SMS utility alone. This also means that mathematical observations can be used to produce better estimates of the number vulnerabilities and attacks as more are uncovered.

It is thus possible to can observe the time that elapses since the last discovery of a vulnerability. This value is dependent upon the number of vulnerabilities in the system and the number of users of the software. The more vulnerabilities, the faster the discovery rate of flaws. Likewise, the more users of the software, the faster the existing vulnerabilities are found (through both formal and adverse discovery).

If we let E sand for the event where a vulnerability is discovered within the Times T and T+h for n vulnerabilities in the software

clip_image008

Where a vulnerability is discovered between time T and T+h we can use Bayes’ Theorem to compute the probability that we have n bugs:

clip_image010

From this we see that:

clip_image012

By summing the denominator we can see that if we observe a vulnerability at time T after the release and the decay constant for defect discovery is clip_image014, then the conditional distribution for the number of defects is a Poisson distribution with expected number of defectsclip_image016, .

Hence:

clip_image018

The reliability function (also called the survival function) represents the probability that a system will survive a specified time t. Reliability is expressed as either MTBF (Mean time between failures) and MTTF (Mean time to failure). The choice of terms is related to the system being analysed. In the case of system security, it relates to the time that the system can be expected to survive when exposed to attack. This function is hence defined as:

clip_image020

The function F(t) in x.x1 is the probability that the system will fail within the time 't'. As such, this function is the failure distribution function (also called the unreliability function). The randomly distributed expected life of the system (t) can be represented by a density function, clip_image022and thus the reliability function can be expressed as:

clip_image024

The time to failure of a system under attack can be expressed as an exponential density function:

clip_image026

where clip_image028is the mean survival time of the system when in the hostile environment and t is the time of interest (the time we wish to evaluate the survival of the system over). Together, the reliability function, R(t) can be expressed as:

clip_image030

The mean (clip_image028[1]) or expected life of the system under hostile conditions can hence be expressed as:

clip_image032

Where M is the MTBF of the system or component under test and clip_image034is the instantaneous failure rate where Mean life and failure rate are related by the formula:

clip_image036

The failure rate for a specific time interval can also be expressed as:

clip_image038

As clip_image020[1] and clip_image018[1], we can see that the reliability of the SMS function can be expressed as:

clip_image040

What this means is that the SMS only function has a limit at R(t)=0 as tclip_image042. This means that the longer the application is running, the less secure it is.

Adding an independent second factor goes some way to mitigate this issue as long as R(t) does not clip_image0440 as tclip_image042[1]and that it does this more effectively than the SMS function itself.

9 comments:

Scudette said...

Craig,
This post is rather confused. You state that:

The probability that an SMS only system can be cracked is simply the P(C.SMS) function and this is far lower than a system that deploys multiple methods.

From a simple arithmetic pov its clear that multiplying 2 probabilities results in smaller probability (as both numbers are less than 1) so P(C.SMS) * P(C.PIN) < P(C.SMS)

unless of course the pin is automatically compromised.

Neglecting that it is rather dangerous of you to take about the statistics and probability of security vulnerabilities when these event are not actually random. While probability theory might be acceptable for dice rolling when the outcome is random it is entirely not applicable for a security system where the likelyhood of the event occurance does not depend on chance, rather on the attacker knowing a vulnerability in your system.

Support you have a service exposed to the world - the probability of compromise is not constant and has nothing to do with the length of time the service is exposed - either its vulnerable or not. If its not vulnrable you can leave it there for an infinite length of time before its compromised. If its vulnerable, and you are aware of it, it will get hacked quickly. If it is vulneable with a zero day and you dont know about it - you are not in a position to estimate the risk.

Security is really simple - dont make it too complex by throwing numbers around - it just confuses people. Your basic assumption by using probability theory is that the probability of failure is constant for all attackers - and the event happens randomly.

Probability theory is not applicable to security. You are better off designing a system which is as secure as you possibly can make it rather than knowingly leaving a system vulnerable just because your math tells you there is a mean time before failure of 20 years.

Craig S Wright said...

Probability is of course reliant to risk and security.

"Your basic assumption by using probability theory is that the probability of failure is constant for all attackers - and the event happens randomly."

Not at all. You have missed the point here totally.

In fact, I have stated that the SMS only application is "inherently" insecure. This is, it will start with a given failure rate that will, over time, increase.

So the longer the application runs, the less secure it becomes.

"Security is really simple"
Security is a trade off. It is an economic function. There is no such thing as perfect security.

"Support you have a service exposed to the world - the probability of compromise is not constant and has nothing to do with the length of time the service is exposed - either its vulnerable or not."

No, it is a function of the time and the number of users. This is both malicious and real users.

These functions can be modeled, this is simple. The fact that some people do not understand this is not the point.

Craig S Wright said...

PS Scudette,

You have confused dependence and independence.

SMS and PIN in the example are dependent. The PIN is sent over SMS and hence you can not multiple as you have and assume independence.

If you compromise the SMS function, you also can compromise the PIN function.

Craig S Wright said...

PPS
As the Pin function is related to other applications (and not just the SMS one), the SMS function makes the Pin function LESS secure.

That is, compromising the SMS feature also compromises the Pin with a high probability.

The Pin alone does not necessarily lead to the SMS function being compromised.

Hence, adding SMS and PIN is worse than Pin alone.

SMS as a separate function to the Pin cannot lead to a pin compromise, so this is (slightly) better than the combination.

The simple answer, do not deploy the SMS feature. If you have to, use an external auth method (independent). that does not use the same compromise vector as SMS.

Craig S Wright said...

PPPS
"not actually random"

SO? You do not require randomness for modeling a probabilistic function.

I could point out some good introductory probability and stats texts?

Hands said...

Rereading this post, I realized that I made a mistake in my previous response (although that response is not posted -- maybe because of this error).

P(compromise) already has the possibility of a vulnerability and the possibility of it being attacked by someone with sufficient motivation and resources built in, doesn't it?

I missed this because the work I've done generally requires me to split these things out. And we split them out because they're not well modeled where I work. I've seen some approaches that use a Gaussian curve for parts of this, but I don't generally agree with that approach.

I agree with your statement that the probability of compromise increases generally. There's a great deal of evidence to support the statement and it's not hard to see why.

There are two problems with your approach that I think I would encounter while trying to use it. First, my company does a lot of in-house development. When a new web application is deployed, assuming we've tested thoroughly and remediated the vulnerabilities we've found, it would appear that the number of vulnerabilities is zero. This is false, of course.

So, the very beginning of the modeling exercise must start from an unknown number of possibly theoretical vulnerabilities. It may turn out that the vendor for the application container has a protocol flaw, or the operating system services have some flaw, or the application itself has some non-obvious behaviour defects under unusual circumstances. I expect some of these will be found over time and increase the probability of compromise.

What I don't know is the density function to apply when estimating (since I have no historical data at time zero). In addition to this, I may not have a detailed understanding of my user base to factor into the chance of vulnerability.

I can choose a standard density function, like the Gaussian for example, but I believe there will be cases where it's difficult to predict with certainty what the risk will be, due to not knowing what key factors will push a particular population of users to produce even one attacker, especially when the population is smaller and more restricted than the Internet at large.

And I won't be surprised if you say I simply don't understand statistics well enough in this case to know how to approach the problem. Without question, you understand it better than I do. My reason for pointing to Taleb's work is that he has substantial data indicating that the choice of density function can mask the fact that we don't know something and then the rest of our calculation, while precise, will not be accurate.

I believe that better models might be produced with more data, but I also believe those models will be influenced by observation. Risk modeling in the financial sector has to be a sure sign of this. Taleb predicted in 1997 that the model being used wouldn't be accurate and he had analysis to back that position up.

Craig S Wright said...

Please go to the followup at:
http://gse-compliance.blogspot.com/2010/02/response-to-modeling-risk.html

I have answered these responses here.

Anonymous said...

What makes you believe that your original integral for P(E|n) holds? This is not a stationary process. I think this original assumption is mistaken. Furthermore, you are assuming that the your actions will not affect the environment. This is not true in security. Game theory is probably a better model here. You don't have to outrun the bear, just the guy next to you.

Craig S Wright said...

"What makes you believe that your original integral for P(E|n) holds?"

The integral is not a stationary process and if you check game theoretic models they are mathematical.

You should note that I have been using game theoretic models.

What matters is fit to the real world and this occurs.