Saturday, 11 April 2009

The Quantification of Information Systems Risk

I am underway with my latest research project. I am undertaking PhD studies at CSU for another PG degree. The research topic follows.

The goal of this research project is to create a series of quantitative models for information security. Mathematical modelling techniques that can be used to model and predict information security risk will be developed using a combination of techniques including:

  • Economic theory,
  • Quantitative financial modelling,
  • Algorithmic game theory and
  • Statistical hazard/survival models.
The models will account for Heteroscadastic confounding variables and include appropriate transforms such that variance heterogeneity is assured in non-normal distributions. Process modelling for integrated Poisson continuous-time process for risk through hazard will be developed using a combination of:
  • Business financial data (company accountancy and other records),
  • Legal databases for tortuous and regulatory costs and
  • Insurance datasets.
This data will be coupled with hazard models created using Honeynets (e.g. Project Honeynet), reporting sites such as the Internet Storm Centre and iDefence. The combination of this information will provide the framework for the first truly quantitative security risk framework.
Support has been sought and received from SANS (including DShield), CIS (Centre for Internet Security) and the Honeynet project. At present, the DShield storm centre receives logging from over 600,000 organisations. This is a larger quantity of data than is used for actuarial data in the insurance industry. The problem being that this information is not collated or analysed in any quantitatively sound manner. This data will provide the necessary rigour in which to model survival times for types of applications. There is also a body of research into quantitative code analysis for risk that could be incorporated.

The aim of this research is to create a series of models (such as are used within mechanical engineering, material science etc) and hence to move Information Risk modelling towards a science (instead of an art). Stengel R.F. (1984,1996) “Optimal Control and Estimation” provides an indication of such a framework in systems engineering.

Some of the methods used in the creation of the risk framework will include
  • Random forest clustering,
  • K-means analysis,
  • Other classification algorithms, and
  • Network associative maps in text analysis forensic work.
The correlation of reference data (such as IP and functional analysis data) between C&C (Command and Control) systems used in “botnets” is one aspect of this research.
Start from the outside (the cloud and perimeter) and working inwards to the network, the risk model would start by assessing external threats and move into internal threat sources, becoming gradually become more and more granular as one moves from network to individual hosts and finally to people (user behaviour) and application modelling.

The eventual result will be the creation of a model that can incorporate the type of organisation, size, location, application and systems used and the user awareness levels to create a truly quantitative risk model. This would be reported with SE (standard error) and confidence level rather than a point estimate.

To begin, a number of questions can be answered and a number of related papers can be published on this topic. For instance, the following are all associated research topics in this project:
  1. Is a router/firewall stealth rule effective
  2. What types of DMZ are most effective for a given cost
  3. How economical is the inclusion of additional router logging outside the perimeter firewall
  4. Are "drop" or "reject" rules more effective at limiting attacks - by type
  5. How do Firewalls, IDS, IPS influence and impact system survival times
The creation of a classification model (published on the SANS reading room site soon) that allows for the remote determination of an application versions for DNS software (which is to be expanded to other applications and devices – e.g. routers) has already been completed and published.

I would like to have a collection of data from the honeynet project aligned with this information. We can collect and model survival times for types of applications. These is also a body of research into quantitative code analysis for risk that could be incorporated.
Code to import data from hosts and networks, using raw “pcap traces” will be developed such that system statistics and other data can be collated into a standardised format. This code will be developed in “R” and “C++”.

This will enable the creation and release of actuarially sound threat risk models that incorporate heterogeneous tendencies in variance across multidimensional determinants while maintaining parsimony. I foresee a combination of Heteroscadastic predictors (GARCH/ARIMA etc) coupled with non-parametric survival models. I expect that this will result in a model where the underlying hazard rate (rather than survival time) is a function of the independent variables (covariates). Cox's Proportional Hazard Model with Time-Dependent Covariates would be a starting point, going to non-parametric methods if necessary.

The end goal will be to create a framework and possibly a program that can assess data stream based on a number of dependant variables (Threat models, system survival etc) and covariates and return a quantified risk forecast and standard error.

I am looking at incorporating fractal statistics, but this seems to be an area with little existing research.

No comments: