I am underway with my latest research project. I am undertaking PhD studies at CSU for another PG degree. The research topic follows.
The goal of this research project is to create a series of quantitative models for information security. Mathematical modelling techniques that can be used to model and predict information security risk will be developed using a combination of techniques including:
- Economic theory,
- Quantitative financial modelling,
- Algorithmic game theory and
- Statistical hazard/survival models.
- Business financial data (company accountancy and other records),
- Legal databases for tortuous and regulatory costs and
- Insurance datasets.
Support has been sought and received from SANS (including DShield), CIS (Centre for Internet Security) and the Honeynet project. At present, the DShield storm centre receives logging from over 600,000 organisations. This is a larger quantity of data than is used for actuarial data in the insurance industry. The problem being that this information is not collated or analysed in any quantitatively sound manner. This data will provide the necessary rigour in which to model survival times for types of applications. There is also a body of research into quantitative code analysis for risk that could be incorporated.
The aim of this research is to create a series of models (such as are used within mechanical engineering, material science etc) and hence to move Information Risk modelling towards a science (instead of an art). Stengel R.F. (1984,1996) “Optimal Control and Estimation” provides an indication of such a framework in systems engineering.
Some of the methods used in the creation of the risk framework will include
- Random forest clustering,
- K-means analysis,
- Other classification algorithms, and
- Network associative maps in text analysis forensic work.
Start from the outside (the cloud and perimeter) and working inwards to the network, the risk model would start by assessing external threats and move into internal threat sources, becoming gradually become more and more granular as one moves from network to individual hosts and finally to people (user behaviour) and application modelling.
The eventual result will be the creation of a model that can incorporate the type of organisation, size, location, application and systems used and the user awareness levels to create a truly quantitative risk model. This would be reported with SE (standard error) and confidence level rather than a point estimate.
To begin, a number of questions can be answered and a number of related papers can be published on this topic. For instance, the following are all associated research topics in this project:
- Is a router/firewall stealth rule effective
- What types of DMZ are most effective for a given cost
- How economical is the inclusion of additional router logging outside the perimeter firewall
- Are "drop" or "reject" rules more effective at limiting attacks - by type
- How do Firewalls, IDS, IPS influence and impact system survival times
I would like to have a collection of data from the honeynet project aligned with this information. We can collect and model survival times for types of applications. These is also a body of research into quantitative code analysis for risk that could be incorporated.
Code to import data from hosts and networks, using raw “pcap traces” will be developed such that system statistics and other data can be collated into a standardised format. This code will be developed in “R” and “C++”.
This will enable the creation and release of actuarially sound threat risk models that incorporate heterogeneous tendencies in variance across multidimensional determinants while maintaining parsimony. I foresee a combination of Heteroscadastic predictors (GARCH/ARIMA etc) coupled with non-parametric survival models. I expect that this will result in a model where the underlying hazard rate (rather than survival time) is a function of the independent variables (covariates). Cox's Proportional Hazard Model with Time-Dependent Covariates would be a starting point, going to non-parametric methods if necessary.
The end goal will be to create a framework and possibly a program that can assess data stream based on a number of dependant variables (Threat models, system survival etc) and covariates and return a quantified risk forecast and standard error.
I am looking at incorporating fractal statistics, but this seems to be an area with little existing research.


