Saturday, 12 December 2009

Levene Tests (Levene 1, Levene 2, Levene 3 and Levene 4)

Levene (1960) demonstrated that Bartlett’s test performs poorly when the homogeneity assumption is violated. A number of alternative tests were introduced as possible alternatives to Bartlett’s test. Four variants of Levene’s test (Levene 1, Levene 2, Levene 3 and Levene 4) are investigated in this paper. The Levene’s tests are known to be homogeneity-of-variance tests that exhibit improved result (over many of the other tests) when conditions of non-normality exist. The Levene tests involve computing the absolute difference between the value of that case and its cell mean (Note: some of the variants of Levene use the median in place of the mean). It then performs a one-way analysis of variance on the differences of those values.

Each of the four Levene tests used is defined completely (with derivations) in Chapter 2. The Levene 1 test (Lev-1) necessitates substituting each observation in a group by the absolute deviation from the group mean. The substitute observations, clip_image002are then treated as raw observations in the standard ANOVA test. The Levene 1 test with the new values of clip_image004is thus given by:

clip_image006 (F 1.8)

The assumptions that must be met for the Levene tests to hold true include:

1. The samples from the populations under consideration are independent.

2. The populations under consideration are approximately normally distributed.

The first assumption is validated by ensuring that the samples used have been selected independently of one another. The second assumption can be checked using a set of side-by-side boxplots or Q-Q plots. These plots are used by treatment to assess normality. Several of the other tests of normality can also be used. It should be noted that at least one of the samples must have 3 or more observations or else the Levene’s statistic will be undefined as the denominator will equal zero in the event that a sample has only 1 or 2 observations.

The notation used in the Levene test is defined as follows:

The testing for the equality of means of K populations given samples clip_image008given for (F 1.4) and (F 1.5).

  • K = number of sample sets used,
  • xij = sample observation j from sample set i (j = 1, 2,…, ni & i = 1, 2,…, K),
  • ni = number of observations from treatment i (at least one ni must be 3 or more),
  • clip_image010= total number of data in all samples (overall size of combined samples),
  • clip_image012 = mean of sample data from treatment i,
  • clip_image014 This is the absolute deviation of observation j from treatment ith mean,
  • clip_image016is the average of the ni absolute deviations from treatment i,
  • clip_image018is the average value of all n absolute deviations.

The test procedure for Levene’s test[1] is:

Step 1: Validate that the assumptions (above) hold true.

Step 2: State the null and alternative hypothesis. The Null hypothesis, clip_image020is tested against the alternative, clip_image022.

Step 3: Select a significance level (in this paper a value of clip_image024has been used)

Step 4: Select the critical value and rejection region to be used.

Classical Approach

p-Value Approach

Critical Value


Not Valid

Rejection Region



Step 5: Compute the Levene’s statistic. In this example, the Levene-1 test is given.

clip_image006[1] (F 1.9)

Step 6: In the case where the value of the test statistic, clip_image032, lies within the rejection region or in the event that the clip_image030[1], we reject the null hypothesis,clip_image034. Otherwise, if these conditions are not met, we fail to reject the null hypothesis,clip_image034[1].

Step 7: The final step is to state the conclusion:

Rejectclip_image034[2]: At the α level of significance, there is enough evidence to conclude clip_image036.

Fail to Reject clip_image034[3]: At the α level of significance, there is not enough evidence clip_image036[1].

Levene (1960) proposed four separate tests. Levene’s Tests 2 to 4 are defined below with the Levene 1 test in the section above. It has been demonstrated (Brown & Forsythe, 1974) that the Levene 1 test is robust when distribution is asymmetric and values of clip_image038are used.

Levene 2

The Levene 2 test involves the application of the ANOVA procedure to a set of transformed observations. These observations are derived from the taking the following clip_image040value:

clip_image042 (F 1.10)

Using the Levene 2 test, the null hypothesis (F 1.1) will be rejected were the Levene 2 statistic exceeds the clip_image044percentile of an F-distribution with (K-1) and (n-K) degrees of freedom. In all Monte Carlo simulations conducted, the Levene 2 test showed low testing power.

Levene 3

The Levene 3 test is based on another set of transformed variables. In this case, the clip_image040[1]value is given by:

clip_image046 (F 1.11)

As with the Levene 2 test, an ANOVA procedure is applied to the transformed variables. Using the Levene 3 test, the null hypothesis (F 1.1) will be rejected were the Levene 3 statistic exceeds the clip_image044[1]percentile of an F-distribution with (K-1) and (n-K) degrees of freedom.

Levene 4

The Levene 4 test is based on yet another set of transformed variables. In this case, the clip_image040[2]value is given by:

clip_image048 (F 1.12)

Again, the Levene 4 test has an ANOVA procedure applied to the transformed variables. Using the Levene 4 test, the null hypothesis (F 1.1) will be rejected were the Levene 4 statistic exceeds the clip_image044[2]percentile of an F-distribution with (K-1) and (n-K) degrees of freedom.

[1] N.B for each of these steps.

Friday, 11 December 2009

Disability statistics in Australia

The present state of disabilities in Australia is displayed below.

The majority of aids allow for full interaction within the community. The aging population has an interesting distribution of data.

Of interest is the greater level of 5-14 year olds with diabilities than 15-29 year olds.

Derivations and Motivations

The following post will set the derivations and basis for the models presented in subsequent posts.

The hypothesis used when testing for homogeneity of variances is:

clip_image002 (F 2.1)

representing the null hypothesis to be tested, versus

clip_image004 (F 2.2)

as the alternate hypothesis.

The null hypothesis for the test of homogeneity of variance is based on the assertion that the variance of the dependent variable is equal across groups defined by the independent variable. This is that the variance is homogeneous. Where the probability associated with the homogeneity test statistic is less than or equal to the level of significance being used in the test, the null hypothesis is rejected and a conclusion that the variance is not homogeneous is asserted. Conversely, where the test statistic is greater than the level of significance being used, the null hypothesis is not rejected and the variance is held to be homogeneous.


If we take clip_image002[4]to be the ith value of the kth group and define the following values, clip_image004[4] and clip_image006and we assume that clip_image002[5]is normally distributed and are independent for all values of i and j with the expected normal values for the mean of clip_image009and the standard deviation of clip_image011, we can obtain the best unbiased linear estimate of clip_image009[1]and clip_image011[1]. These are defined to be:

clip_image009[2]is estimated by:

clip_image013 (F 2.3)

clip_image015is estimated by:

clip_image017 (F 2.4)

The F test value (for the ANOVA F Test) is defined (Casella, 2002, p 534) as:

clip_image019 (F 2.5)

clip_image021And we reject

In this calculation, it is generally assumed that the population variance is equal (variance homeostasis or the homogeneity of variances). In this event, the value for clip_image023can be written in a simplified form as:

clip_image025 (F 2.6)

This results in an F distribution with a central F variable with (K-1) and (N-K) degrees of freedom. As a consequence, the requirement for a suitable test of variances is necessary to ensure that homogeneity of variances exists.

Robustness of F- statistic under unequal variances

Figure 1 displays how the rejection probability is decreases as the variation among the groups increases. As a consequence, the probability of the correct decision in which the Null hypothesis should be rejected decreases as the variances of the datasets become more and more heterogeneous. Thus the capacity or efficiency of the test in detecting the difference decreases as the variances of the three groups become more and more heterogeneous. This implies that the F-test is not robust for datasets that have large heterogeneous variances.


Figure 1 Rejection probability as heteroscedasticity increases

The variances that we start with and increments are arbitrary and we can expect the same result as long as we maintain some reasonable relation between the initial value of the means and standard deviations. The magnitude of increments can also be arbitrary.

Similarly we compare the rejection probabilities by increasing clip_image029and clip_image031.

As the probability of rejection increases with heteroscedasticity, the F-ratio is non-robust with non- homogeneous datasets.

Other Tests

In this series of posts, a number of alternatives to the ANOVA F test have been evaluated.

Bartlett χ2 test

Bartlett’s test is the one most frequently taught tests of variance homogeneity (Conover et al., 1981). The ease of calculation and general simplicity of many aspects of this test make it a staple in introductory statistics classes (Lim & Loh, 1996; Ott, 1998; Zar, 1999). The test statistic B involves a comparison of the separate within-group sums-of-squares to the pooled within-group sum-of-squares. The test statistic is given by:

clip_image033 where;

clip_image035 and

clip_image037 such that:

clip_image039 (F 2.7)

Bartlett’s statistic (B) can generally be used to test the null hypothesis (F 2.1 for Ho) assuming that the distribution function (F) follows a standard normal cumulative distribution function (CDF). Bartlett demonstrated (Bartlett,1946 ) that where the variances are equal (clip_image041) B will generally follow a clip_image043 distribution where the approximation holds well for small sample sizes. It was also demonstrated that where clip_image041[1] holds as valid, the convergence in the distribution of B can be expressed as clip_image045. Consequently, the null hypothesis will be rejected where B exceeds the clip_image047percentile of a chi-squared distribution with (K-1) degrees of freedom.