## Saturday, 28 November 2009

### Size and Power in a Statistical Study

In the selection of tests and the appropriate statistic to use under the conditions of a violation of assumptions, the test can be selected using the statistic’s robustness and power as selection criteria.

Robustness is defined as the capacity of the statistic to control Type I error.
For instance, a test of variances is robust if it does not detect non-homogeneous variances in the event that the original data is not normally distributed but the variances are homogeneous.

Peechawanich (1992) noted that if the probability of a Type I error occurring exceeds the Cochran limit, that the test will not be capable of controlling the error rate. As such, a test can be considered to be robust where the calculated probability of a Type I error lies within the Cochran limit. The Cochran limit of the discrepancy of the Type I error (Tau_Hat) from the nominal significance level (Alpha or α) are set at the following values:

• Alpha = 0.01 significance, 0.007 <= tau <= 0.015
• Alpha = 0.05 significance, 0.004 <= tau <= 0.006
• Where is defined as the real probability of a Type I error occurring. This is also equal to the probability that H0 will be rejected when H0 is actually true.
• Tau_Hat is the empirically calculated value of a Type I error occurring.
• Alpha is the nominal level of significance. For the purpose of my ongoing research, the values of Alpha =0.01 and Alpha =0.05 have been used.
A test’s power is the probability of rejecting the null hypothesis (H0) when it is false and should correctly be rejected.

The power of a test is calculated by taking the probability of a Type II (beta) error from the maximum power value (1.0). As such power is defined as:
Power = 1 - beta

As such, the power of a test ranges from a value of 0 (no power) to 1.0 (highly powerful).

Power studies rely on four context variables:
1. the expected “size” of the effect (such as the approximate incidence rate for a population of survival times),
2. the sample size of the data being evaluated,
3. the statistical confidence level (α) used in the experiment, and
4. the variety of analysis that is used on the data.
Statistical power can be seen as a fishing net, a low power tests (such as is due to low sample sizes) can be associated with large mesh nets. These will collect large values and generally miss most of the examples.

This leads to accepting the null hypothesis when it is actually false. Tests can be constructed that are too sensitive. Using larger sample sizes may increase the probability that the postulated effect will be detected. In the extreme, extremely large samples greatly increase the probability of obtaining a dataset that contains randomly selected values that are correlated to the population and lead to high power. This increase in power comes at a cost.

Many datasets do not allow for the economical selection of extremely large datasets. In destructive testing, any dataset that approaches the population also defeats the purpose of testing. Consequently, the selection of powerful tests that hold at low sample sizes is important. There is a trade-off between sample size and size of uncontrolled error.

The p-value is the risk of making a type I error . The lower the alpha or beta values that is selected, the larger the sample size. The type I error is also designated as "alpha".

The analysis data with the use of small alpha and low beta values can require immense datasets. This process has been unavailable in the past due to a lack of processing power.

References:
• Lawless, J. F. (2003) Statistical Models and Methods for Lifetime Data. Second Edition, Wiley Series in Probability and Statistics.
• Peechawanich, V. (1992), Probability theory and applications. Prakypueg, Bangkok
• Rayner, J.C.W. (1997) The Asymptotically Optimal Tests, The Statistician, Vol. 46, No. 3, (1997), pp. 337-346