Friday, 11 December 2009

Derivations and Motivations

The following post will set the derivations and basis for the models presented in subsequent posts.

The hypothesis used when testing for homogeneity of variances is:

clip_image002 (F 2.1)

representing the null hypothesis to be tested, versus

clip_image004 (F 2.2)

as the alternate hypothesis.

The null hypothesis for the test of homogeneity of variance is based on the assertion that the variance of the dependent variable is equal across groups defined by the independent variable. This is that the variance is homogeneous. Where the probability associated with the homogeneity test statistic is less than or equal to the level of significance being used in the test, the null hypothesis is rejected and a conclusion that the variance is not homogeneous is asserted. Conversely, where the test statistic is greater than the level of significance being used, the null hypothesis is not rejected and the variance is held to be homogeneous.

ANOVA F Test

If we take clip_image002[4]to be the ith value of the kth group and define the following values, clip_image004[4] and clip_image006and we assume that clip_image002[5]is normally distributed and are independent for all values of i and j with the expected normal values for the mean of clip_image009and the standard deviation of clip_image011, we can obtain the best unbiased linear estimate of clip_image009[1]and clip_image011[1]. These are defined to be:

clip_image009[2]is estimated by:

clip_image013 (F 2.3)

clip_image015is estimated by:

clip_image017 (F 2.4)

The F test value (for the ANOVA F Test) is defined (Casella, 2002, p 534) as:

clip_image019 (F 2.5)

clip_image021And we reject

In this calculation, it is generally assumed that the population variance is equal (variance homeostasis or the homogeneity of variances). In this event, the value for clip_image023can be written in a simplified form as:

clip_image025 (F 2.6)

This results in an F distribution with a central F variable with (K-1) and (N-K) degrees of freedom. As a consequence, the requirement for a suitable test of variances is necessary to ensure that homogeneity of variances exists.

Robustness of F- statistic under unequal variances

Figure 1 displays how the rejection probability is decreases as the variation among the groups increases. As a consequence, the probability of the correct decision in which the Null hypothesis should be rejected decreases as the variances of the datasets become more and more heterogeneous. Thus the capacity or efficiency of the test in detecting the difference decreases as the variances of the three groups become more and more heterogeneous. This implies that the F-test is not robust for datasets that have large heterogeneous variances.

clip_image027

Figure 1 Rejection probability as heteroscedasticity increases

The variances that we start with and increments are arbitrary and we can expect the same result as long as we maintain some reasonable relation between the initial value of the means and standard deviations. The magnitude of increments can also be arbitrary.

Similarly we compare the rejection probabilities by increasing clip_image029and clip_image031.

As the probability of rejection increases with heteroscedasticity, the F-ratio is non-robust with non- homogeneous datasets.

Other Tests

In this series of posts, a number of alternatives to the ANOVA F test have been evaluated.

Bartlett χ2 test

Bartlett’s test is the one most frequently taught tests of variance homogeneity (Conover et al., 1981). The ease of calculation and general simplicity of many aspects of this test make it a staple in introductory statistics classes (Lim & Loh, 1996; Ott, 1998; Zar, 1999). The test statistic B involves a comparison of the separate within-group sums-of-squares to the pooled within-group sum-of-squares. The test statistic is given by:

clip_image033 where;

clip_image035 and

clip_image037 such that:

clip_image039 (F 2.7)

Bartlett’s statistic (B) can generally be used to test the null hypothesis (F 2.1 for Ho) assuming that the distribution function (F) follows a standard normal cumulative distribution function (CDF). Bartlett demonstrated (Bartlett,1946 ) that where the variances are equal (clip_image041) B will generally follow a clip_image043 distribution where the approximation holds well for small sample sizes. It was also demonstrated that where clip_image041[1] holds as valid, the convergence in the distribution of B can be expressed as clip_image045. Consequently, the null hypothesis will be rejected where B exceeds the clip_image047percentile of a chi-squared distribution with (K-1) degrees of freedom.

No comments: