Friday, 11 December 2009

Introduction to Homogeneity of Variance tests

The majority of parametric statistical procedures (including the analysis of variance or ANOVA) rely on an assumption of the homogeneity of variance[1]. This critical assumption underlies any analysis of variance calculation. ANOVA is one of the most frequently deployed techniques in any number of disciplines (including agriculture, physiology and medicine amongst others). The technique’s use in the evaluation of dissimilar groups is a key aspect to many studies including treatment evaluations when used in comparing value means.
The importance of being able to test the assumptions used in evaluating experimental data cannot be understated. Further, the mere fact that populations and groups do not express the same variance can be an important finding. The O’Brien test is one of the more frequently used procedures for testing the homogeneity of variance assumption. Developed by O’Brien (1979, 1981), it is considered one of the more sensitive tests. In the O’Brien test, the null hypothesis asserts that the variables being evaluated all exhibit the same variance values within their respective populations. Conversely, the alternative hypothesis is defined to assert that the groups being tested all exhibit distinct variances throughout their respective populations.
Many software packages such as JMP (JMP, 2007) have an option that “Tests that the Variances are Equal”. In JMP, the tests applied are:
· O’Brien (0.5)
· Brown-Forsythe,
· Levene,
· Bartlett and
· F Test 2-sided.
There are many other alternative tests for the homogeneity of variance and a number of these will be evaluated in this paper. The assumption of homoscedasticity for a nominal level dependent variable is commonly a problem that needs to be addressed as well. This is commonly an “inappropriate application of a statistic” (Hair et al. 1998) since the variance is not computed for nominal variables. In this case, the ANOVA calculation cannot be considered if the independent variable is interval level where any answer will be an “inappropriate application of a statistic.” (Hair et al. 1998).
If the variable or transformed variable satisfies the assumption of homogeneity of variance where the assumption of homoscedasticity is essential for an ordinal level dependent variable, the solution is “true with caution” (Hair et al. 1998). It can occur that we have to defend the treatment of ordinal variables as metric values.
In the next few posts we offer the derivations of several commonly used homogeneity of variance tests. This leads to the development of a transformed test and a power study in Chapter 3. We also show that F test and Levene’s test are highly non-robust. This may mean either or both that when normality is suspect p-values and sizes (significance levels) are not reliable and that the weak optimality of likelihood ratio tests no longer holds (the power is poor). We offer a series of new tests that are both more robust that the F and Levene tests, and has power that is at least almost as good. Relative to the nonparametric tests less robustness and more power can be presented.
In the final part of this series I present data that displays questionable F/Levene test p-values. For example, these values exhibit bootstrap p-values and the Levene test p-values that are out of line with expectations. The p-values for the various tests have been used in the creation of a decision tree designed to select the optimal test based on the dataset under test.
The main issues addressed in this paper:
(1) Under what conditions is heterogeneity of variances actually a problem?
(2) Under these conditions, are any of the tests of homogeneity of variances useful for the detection of heterogeneity?
(3) A size and power study of various tests for the homogeneity of variance is presented, and
(4) We demonstrate situations where the F and Levene test p-values are questionable (they are out of line with, for example, bootstrap p-values and the Levene test p-values) and offer a test where the p-values are not questionable.
A simulation study is conducted to determine which tests of homogeneity should be used under the various conditions (small sample sizes, non-normal distributions). This incorporates a power study and evaluation of the robustness of the commonly deployed tests.

[1] This is the Homoscedasticity of a set of data and is also referred to as uniformity of variance. Homoscedasticity is associated with the assumption that that the dependent variable will display a similar variance value over the entire range of values for the independent variable.


Homerspy said...

This was great. Thank you! I'm still a little concerned that two tests will say there's a definite significant difference, while two will have a p-value of .54...but I guess if at least one of them is showing a p-value of less than .05...then maybe there's something there...

Craig S Wright said...

When you consider how few scientific papers are published where no heterosexuality tests have been completed and there are no references to power etc...

It does mean that care is needed in the selection and that it may be wise to increase sample sizes.