Saturday, 28 November 2009

Size and Power in a Statistical Study

In the selection of tests and the appropriate statistic to use under the conditions of a violation of assumptions, the test can be selected using the statistic’s robustness and power as selection criteria.

Robustness is defined as the capacity of the statistic to control Type I error.
For instance, a test of variances is robust if it does not detect non-homogeneous variances in the event that the original data is not normally distributed but the variances are homogeneous.

Peechawanich (1992) noted that if the probability of a Type I error occurring exceeds the Cochran limit, that the test will not be capable of controlling the error rate. As such, a test can be considered to be robust where the calculated probability of a Type I error lies within the Cochran limit. The Cochran limit of the discrepancy of the Type I error (Tau_Hat) from the nominal significance level (Alpha or α) are set at the following values:

  • Alpha = 0.01 significance, 0.007 <= tau <= 0.015
  • Alpha = 0.05 significance, 0.004 <= tau <= 0.006
  • Where is defined as the real probability of a Type I error occurring. This is also equal to the probability that H0 will be rejected when H0 is actually true.
  • Tau_Hat is the empirically calculated value of a Type I error occurring.
  • Alpha is the nominal level of significance. For the purpose of my ongoing research, the values of Alpha =0.01 and Alpha =0.05 have been used.
A test’s power is the probability of rejecting the null hypothesis (H0) when it is false and should correctly be rejected.

The power of a test is calculated by taking the probability of a Type II (beta) error from the maximum power value (1.0). As such power is defined as:
Power = 1 - beta

As such, the power of a test ranges from a value of 0 (no power) to 1.0 (highly powerful).

Power studies rely on four context variables:
  1. the expected “size” of the effect (such as the approximate incidence rate for a population of survival times),
  2. the sample size of the data being evaluated,
  3. the statistical confidence level (α) used in the experiment, and
  4. the variety of analysis that is used on the data.
Statistical power can be seen as a fishing net, a low power tests (such as is due to low sample sizes) can be associated with large mesh nets. These will collect large values and generally miss most of the examples.

This leads to accepting the null hypothesis when it is actually false. Tests can be constructed that are too sensitive. Using larger sample sizes may increase the probability that the postulated effect will be detected. In the extreme, extremely large samples greatly increase the probability of obtaining a dataset that contains randomly selected values that are correlated to the population and lead to high power. This increase in power comes at a cost.

Many datasets do not allow for the economical selection of extremely large datasets. In destructive testing, any dataset that approaches the population also defeats the purpose of testing. Consequently, the selection of powerful tests that hold at low sample sizes is important. There is a trade-off between sample size and size of uncontrolled error.

The p-value is the risk of making a type I error . The lower the alpha or beta values that is selected, the larger the sample size. The type I error is also designated as "alpha".

The analysis data with the use of small alpha and low beta values can require immense datasets. This process has been unavailable in the past due to a lack of processing power.

References:
  • Lawless, J. F. (2003) Statistical Models and Methods for Lifetime Data. Second Edition, Wiley Series in Probability and Statistics.
  • Peechawanich, V. (1992), Probability theory and applications. Prakypueg, Bangkok
  • Rayner, J.C.W. (1997) The Asymptotically Optimal Tests, The Statistician, Vol. 46, No. 3, (1997), pp. 337-346

Friday, 27 November 2009

The 'IDS' "Mythical Man Month"

The idea of the man month has been analysed in coding theory for at least a generation. Frederick Brooks addressed this in the seminal work, "the Mythical Man-Month" (1975). Using these forms of analysis, we can demonstrate that incident response, intrusion analysis and other complex security tasks are "tasks with complex interrelationships".

Many in the industry would be stating that this is nothing new to them, what is new here is that I am attempting to quantify this relationship. In this post I have included a small amount of data from a paper I will be publishing next year. I have taken data from a 5 year period on incidents for 165 companies and other organisations with an Internet presence and some level of security reposnse capability.

This has been an analysis of 423,000 incidents. Costs are based on reported financial figures; this is the actual financial and accounting records for these firms. The data that is being analysed was collected from the 2003 to 2008 Australian Financial year.
The boxplot above lists the individual incident response data for the times (in minutes) against the number of personal involved in the process (including management etc). We see that the data has a positive or right-skew. This is clearly displayed in the histogram of incident response times from a 6 person teams (displayed below).
For the purpose of this post, I have simplified the results. The same data is displayed in the following graph. The difference is that is has been summarised with only the mean (average) values being reported.
In this, we see Brooks' (1975, Pp 18-19) supposition that in tasks with complex interrelations, "the added effort of communicating may fully counteract the division of the original task". This is shown in the inflection point. When around 6-7 people start to become involved in the incident response process, the amount of time required per incident actually increases sharply with additional team members.

This holds in both response and detection time. Additional people help an incident response team to a point. Adding the system administrator, a coordinator and other such parties does reduce the time per incident, but only to an inflection point, where the effort to coordinate team members starts to negatively impact the gains.

We see from the co-plot above that additional team members (over the ideal) have a greater negative effect on the incident response time than the detection time. These differences can be used to create the teams that are customised towards the needs of an individual organisation.

The Economics
The real issue here is the economic impact. This is more difficult to quantify overall. For this I shall be conducting a multivariate analysis of the data with separate classifications for industry, size, etc.

The economic results are what really matters. These results are tied to the organisation.

In the plot above, I have compared the mean costs of an incident (in consulting fees, displaced revenue, etc) for a number of types of organisations. Here we clearly see the Online Casino operation has the greatest impact of under-staffing its incident team.
Conversely, a construction firm with little existing online presence may find little benefit in doing anything. In this particular instance, with little tortuous impact, no PCI or other restrictions and an open form style design for blueprints - there was little to make them want to care about security. In this instance, the economics placed doing anything as a cost. This is the exception and only 2 of the 165 organisations displayed this pattern.

What we can take from this is that it is best to determine the costs and impacts of incidents for your organisation and construct a team suited to the needs and requirements that you face, before the impact hits.

[1] Brooks, Frederick P. (1975) "The Mythical Man-Month" Addison-Wesley, USA