Citation
Comparisons among treatment means in an analysis of variance

Material Information

Title:
Comparisons among treatment means in an analysis of variance
Creator:
Agricultureal Research Service, United States Department of Agriculture
Place of Publication:
Washington, D. C.
Publisher:
Agricultureal Research Service, United States Department of Agriculture
Language:
English

Subjects

Subjects / Keywords:
Test ranges ( jstor )
Analysis of variance ( jstor )
Linear regression ( jstor )

Notes

General Note:
ARS / H / 6

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. §107) for non-profit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact Digital Services (UFDC@uflib.ufl.edu) with any additional information they can provide.

Downloads

This item has the following downloads:


Full Text
(22 ARS/H/6


COMPARISONS

AMONG TREATMENT

MEANS IN AN

ANALYSIS

OF VARIANCE


/I AGRICULTURAL
,U RESEARCH
SERVICE
OF UNITED STATES
DEPARTMENT OF
AGRICULTURE


HEADQUARTERS


ARS/H/6


















FOREWORD
That the analysis of variance is a powerful technique for testing hypotheses has been accepted for many years.
In analyzing a set of data, however, the scientist usually is interested in relationships between the means to which the
analysis of variance is insensitive.
As early as 1939, statisticians used techniques independent of the analysis of variance to compare means from a
given experiment. Since the middle 1950's, the interest and literature have increased almost exponentially.
In May 1957, Biometrical Services issued ARS 20-3, Mean Separation by the Functional Andlysis of Variance
and Multiple Comparisons. This publication has been out of print for many years. Since the publication of ARS 20-
3 much work has been done on the subject, indicating the need for a major revision.
Since the job of coordinating the national aspects of statistical consulting in ARS was delegated to the Data
Systems Application Division (DSAD), we asked Victor Chew, mathematical statistician, to revise ARS 20-3. We
feel that he has done a very thorough job, which should put mean separation techniques in the appropriate field of
reference with. respect to other statistical techniques that may be used in drawing judgements from data.
Copies of this publication may be obtained from Victor Chew, University of Florida, Room 217, Rolfs Hall,
Gainesville, Florida 32611.
Judson U. McGuire, Jr.
Staff Specialist, DSAD-ARS













PREFACE


The equality of the true average responses of two treatments (varieties, insecticides, concentrations,
temperatures, etc.) usually is tested statistically by the Student's t-test. This is generalized for t (three or
more) treatments by the F-test or the analysis of variance. If the F-test rejects the hypothesis that the t
treatment means are equal, the only conclusion is that the t means are not all equal. It does not necessarily
follow that these t means are all unequal although this may well be true. The next stage in the data analysis is
to determine which treatment means are different. Repeated application of the Student's t-test to all possible
pairs of treatment means (using pooled error either from all t samples or only from the two samples involved
in the t-test) usually is discouraged since this procedure gives a large probability of getting one or more false
positives (that is, of declaring two treatment means to be different, when they are, in fact, equal). Special
techniques (called multiple comparison procedures) are available for this purpose.
Uses and abuses of multiple comparison procedures are discussed in this publication. One glaring abuse
is its use in comparing several levels of a quantitative factor (such as concentration, temperature, and pH).
Regression analysis is the appropriate technique here. Equivalently, the treatment sum of squares in the
analysis of variance table should be partitioned into linear, quadratic, etc., components. In comparing the
effects of, say, 10, 20, 30, and 40 p/m of a certain chemical, if the regression of the response on concentration or
if any component of the sum of squares for concentrations is significant, then no multiple comparison
procedure is necessary. ALL concentrations are significantly different in their effects. In fact, not only will 10
and 20 p/m be different, but so also will 10 and 10.1 p/m. The difference, of course, between the effects of 10
p/m and 10.1 p/m will be extremely small. However, the usual statistical test of significance is not concerned
with the magnitude of the difference, but only whether a true difference exists, no matter how small.


Washington, D.C. Issued October 1977


Issued October 1977


Washington, D.C.









CONTENTS

Page
Chapter 1. Introduction --------------------------- -------------------------------- 1
Chapter 2. Partitioning of Degrees of Freedom for Treatments ----------------------------- 2
2.1 Orthogonal Contrasts ----------------------------------------------- 3
2.2 Qualitative Factors-------------------------- 4
2.3 Quantitative Factors----------------------- 7
2.3.1 One Factor -------------------------------------------------- 7
2.3.2 Two or More Factors -------------------------11
2.4 Mixed Factors ----------------- ------ 13
Chapter 3. Multiple Comparison Procedures ------------------------------------------- 15
3.1 Error Rates ---------------------------------- 15
3.2 Fisher's Protected and Unprotected LSD Methods ------------- 16
3.3 Newman-Keuls' Multiple Range Test ------------------------- 17
3.4 Tukey's HSD Method and Multiple Range Test -------------- 18
3.5 Scheffd's Method --------------------- ------------------------- 19
3.6 Duncan's Methods ----------------------------------------------- 20
3.6.1 Multiple Range Test ----------------------- 20
3.6.2 Bayesian k-ratio t (LSD) Rule ------------------- 22
3.7 Studentized Maximum Modulus Procedure --------------------- ---------- 24
3.8 Comparisons Against a Control -----------------------24
3.8.1 Dunnett's Method ------------------------- ------- 24
3.8.2 Gupta and Sobel's Method ---------------------- ----------- 25
3.8.3 Williams' Method ----------------- ----------------------------- 26
3.8.4 Sequential Methods --------------- ------------------------- 27
3.9 Miscellaneous Methods ---------------------------------------------- 27
3.9.1 Bonferroni Procedure for Preselected Contrasts ------------ ------ 27
3.9.2 Gabriel's Simultaneous Test Procedure (STP) ----------------------- 27
3.9.3 Kurtz-Link-Tukey-Wallace Procedure ---- ----------------------- 28
3.9.4 Covariance Adjusted Means ------------------------------ 28
3.9.5 Procedures for Two-Way Interactions ---------------------- 28
3.9.6 Nonparametric Methods ---------------------- --------------- 28
3.9.7 Gupta's Random Subset Selection Procedure ---------------------- 29
3.9.8 Scott and Knott's Cluster Analysis Method -------------------------- 29
3.9.9 Multivariate Populations --------------------------------------- 30
3.9.10 Subset Selection Approach to Multiple Comparisons ------------------ 31
3.9.11 Other Parameters and Populations -------------------------------- 31
Chapter 4. Conclusion --------------- ----------------------------------------------- 32
Tables
A. Two-Sided (100 alm)% Points of Student's t-Distribution With v Degrees of Freedom ------ 36
B. Percentage Points of the Studentized 'Range q(a;p,) ------------------------------- 37
C. Critical Values for Duncan's Multiple Range Test .---------------------------------- 45
D1. Critical Values of k-ratio t test (k=100) ----------------------------------------- 49
D2. Critical Values of k-ratio t test (k =500) ------------------------------------------ 52
E. 100y% Points of the Distribution of the Largest Absolute Value of k Uncorrelated Student t
Variates With v Degrees of Freedom --------------- ----------------------------- 54
Fl. Critical Values of t(a;q,v) for One-Sided Dunnett's Tests for Comparing Control
Against Each of q Other Treatments ----------------------------------------------- 55
F2. Critical Values of t(a;q,v) for Two-Sided Dunnett's Tests for Comparing Control Against
Each of q Other Treatments ------------ -------------------------------------- 56
G. Critical Values of t(a;p,v) for Testing Zero Against Nonzero Dose Levels ------------- 57
List of References --------------------------------------------------------------------- 59














COMPARISONS AMONG TREATMENT MEANS
IN AN ANALYSIS OF VARIANCE
By Victor Chew'

CHAPTER 1. INTRODUCTION

Before embarking on an experimental project, the research scientist should carefully consider various
issues. These issues include questions that the experiment hopefully will answer, the factors or variables to
be controlled or kept constant during the experiment, the levels of the factors to be varied in the study, the
number of observations to be taken, and the manner in which these observations will be grouped into blocks.
We shall need fewer observations or have wider applicability of the results, or both, if the experiment is
designed efficiently.
This publication is concerned with a particular facet of the analysis of the experimental data, assuming
that the experiment has been designed properly. It is applicable irrespective of the experimental design
(completely randomized, randomized blocks, Latin square, split plot, etc.). We also shall assume that the
reader is familiar with the computational aspects of the analysis of variance for these designs.
The basic terms and notions in statistical inference will be reviewed in this chapter. This is necessary to
understand the relative merits of multiple comparison procedures that are currently available.
In the simplest hypothesis testing situation, we compare two treatments (varieties of peanuts, fertilizers,
temperatures, pH, machine settings, etc.). If we denote the true means of the two treatments by /, and A2, the
statistical hypothesis to be tested is usually that these two means are equal (A, =/12). This hypothesis, called
the null hypothesis, often is denoted by Ho. We write it as Ho: (gi 92) = 0. (We can test a more general
hypothesis, viz., (p, P2) = d, where d is specified numerically.)
In classical hypothesis testing, we must decide whether to accept or to reject Ho. (In sequential testing,
we allow a third alternative of requiring more observations to be taken.) Because the true or population
means ., and A2 are unknown and unknowable, our decision from the statistical test (whether to accept or
reject Ho) is subject to error. If Y, and Y2 are the observed or sample means, estimating p., and A2
respectively, then because of nonhomogeneity of the experimental material (such as plants, animals, plots of
land, batches of peanuts), failure to reproduce identical experimental conditions, errors of measurements,
etc., Y, and 72 will be unequal. even if ,i and A2 are equal. In fact, we may even have Y, larger than y2 when
actually M, is smaller than P2, especially from a small experiment.
There are two kinds of error in hypothesis testing:
Type I-Reject Ho when Ho is, in fact, true (i.e., erroneously deciding that g, and P., are unequal).
Type II-Accept Ho when Ho is, in fact, false (i.e., incorrectly deciding that g, and /., are equal).
The probabilities of a test making these errors usually are denoted by a and 0f, respectively. The perfect test
is, of course, infallible (where a-= 03 = 0), but this is impossible with a finite sample. A good experiment is one
in which both a and /3 are small. The value of a is called the significance level of the test, sometimes expressed
as a percentage. By suitably choosing the rejection region or critical values for the test statistic, we can make
a as small as we like, but only at the expense of increasing fS. For example, we can make a = 0 by always
accepting H0, regardless of the experimental data, but in this case =- LThe only way to decrease both a an
f3 simultaneously is to increase the. sample size (number of observations). Conventionally, a is taken to be
equal to .05 or .01. With f3 defined as the probability of accepting Ho when Ho is false, (1 f3) is the probability
Mathematical statistician, Biometrical and Statistical Services, Agricultural Research Service, U.S. Department of Agriculture,
217 Rolfs Hall, University of Florida, Gainesville, Fla. 32611






of rejecting Ho when H, is false. This quantity is called the power of the test-the probability of the test to
detect a difference when one exists. There are infinitely many tests with the same value of a; among these, we
choose the most powerful one (for which f3 is least) if one exists.
If Ho is false, another alternative hypothesis (denoted by Ha) is true. Corresponding to H,,: (A, ju,) = 0,
three possible alternative hypotheses are (/, P2) > 0, (AI /2) < 0, and (/, b2) 0, called the right-tail,
left-tail, and two-tail alternative hypotheses, respectively. If the first treatment is "control" (i.e., no
treatment at all), the second treatment is the application of some insecticide, and the response being
measured is the number of a particular insect per plant, we know a priori that the alternative to Ho,:/, = /. is
Ha: /g > /2. because the application of the insecticide cannot possibly increase the average count. By
capitalizing on the one-sidedness of Ha, we can construct a more powerful test of H,,, with the same a. If we
are comparing two new insecticides, the alternative hypothesis is two-sided.
It will be seen that a is associated with Ho and 3 with Ha. This explains why we can control a but not /3.
We need the actual difference between the two means to control 13. For this reason, experimenters too often
ignore Type II errors. If they are only concerned with holding Type I errors down to 5%, they need not
conduct the experiment at all. They merely need to take 20 index cards, mark one with an X. shuffle them
thoroughly, and draw one card at random. Reject Ho if the marked card is drawn. At a saving of hundreds if
not thousands of dollars, this experimenter has only a 5% chance of making a Type I error. The reader should
think about the value of 3, in this case.
We cannot emphasize strongly enough the distinction between statistical and practical significance. Any
difference between the sample means 7, and F,, no matter how small, must be declared statistically
significant if the population or true means A, and p.2 are unequal, unless the test has committed a Type II
error (incorrectly declaring two means equal). The test will declare the difference significant if we have
enough replications. In calculating the number "n" of observations to be taken, we only should require n to be
large enough so that the test will detect a difference of at least d (of practical significance) between /, and P.,.
It is no big loss to declare incorrectly that A, and P. are equal if they differ by an insignificant amount.
The author thinks that the research worker has been oversold on hypothesis testing. Just as no two peas
in a pod are identical, no two treatment means will be exactly equal. They always will be different, even if only
in the thousandth decimal place. It seems ridiculous, therefore, to test a hypothesis that we a prior know is
almost certain to be false. If the test accepts the hypothesis of equal treatments, a Type II error probably has
occurred. A related but much more informative alternative approach is interval estimation of (A, P.2). The
confidence limits, of the form (y, y2) t c, will tell us whether the null hypothesis will be accepted (if the
limits have different signs) or rejected (if they have the same signs). They also will give the estimated
magnitude of the actual difference. The value of c depends, among other things, on the confidence level y. If y
= 0.95, we have 95% confidence that (g, P.2) is between (', y c) and (yF, Y2 + c). The closer y is to
unity, the wider the confidence interval. For a given y, we can shorten the interval by increasing the sample
size.
The practice of hypothesis testing when comparing several treatments is even more difficult to justify.
When comparing 10 new varieties of corn, for example, it is inconceivable that all the true average yields will
be exactly equal. Besides a simultaneous confidence interval approach for all pairs of varieties, a better
objective may be to select the smallest subgroup that has a preassigned probability (95%, say) of including the
highest yielding variety. This subgroup of varieties may be tested more intensively and compared in a later
experiment, as in the screening of new drugs.

CHAPTER 2. PARTITIONING OF DEGREES OF FREEDOM FOR TREATMENTS

This chapter deals with situations in which it is possible, before performing the experiment, to partition
the degrees of freedom (d.f.) for treatments, either completely into single d.f. or partially into groups of d.f.
Partitioning must not be suggested after examination of the experimental data. LeClerg (1957) 2 referred to
this partitioning as "functional analysis of variance." Use of a multiple comparison procedure in this chapter
(with a couple of exceptions, explicitly stated) constitutes an abuse of the technique. If the difference between


2 The year in parentheses following the author's name refers to List of References, p. 59.









the observed average responses of two treatments is statistically significant, we shall simply say that the two
treatments are different.
In this chapter, a significant F-test for treatments is not a prerequisite for the partitioning of the
treatments d.f. or s.s. (sum of squares). In fact, the F-test need not and should not be carried out at all. In
comparing t treatments, with (t 1) d.f., the blanket or overall F-test for treatments is averaged over (t 1)
orthogonal comparisons (defined later). If only one or two of these comparisons (or contrasts) are significant,
the overall F-test is diluted or weakened by the (t 2) or (t 3) nonsignificant contrasts and erroneously may
give a nonsignificant F value.

2.1 Orthogonal Contrasts

Let y, Y2,. yt and TI, T2,. . Tt be the sample means and totals from Treatments 1, 2, . t,
respectively. (Unless otherwise stated, we shall assume that the treatments are equally replicated. If n is the
common number of replicates per treatment, we have Y, = Ti/n.) The expression lay = (aTy +. . + ayt) is
called a linear combination of the treatment means. A linear combination is called a comparison or a contrast
if the coefficients (the a's) add up to zero. For example, if we have t = 4 treatments, y, (y2 + Y3 + y4) is a
linear combination of the treatment means. It is not a contrast, however, since the sum of coefficients is
nonzero. (It is equal to -2.) This linear combination compares the mean of the first treatment with the sum of
the means of the remaining three treatments, which is not a fair comparison according to the ordinary
meaning of "fair." A fair comparison is to compare y, with the average of the means of the remaining three
treatments, given by y, (y2 + y, + y4)/3, which is now also a contrast since the coefficients add up to zero. To
avoid fractional coefficients, the preceding contrast usually is written 3YI (Y2 + Y3 + Y4).
The sum of squares corresponding to a contrast C = Yay is
s.s. (C) = n(Say-)2/(Sa2) = (YaT)2/[n(a2)], (2.1)
where Xa2 is the sum of the squares of the coefficients in the contrast. (Notice that the s.s. is unchanged if we
multiply the coefficients by a constant.) Since a contrast has one d.f., the s.s. is also a mean square (m.s.)
because (m.s.) = (s.s.)/(d.f.). It may be tested for significance by dividing it by the error m.s. (with m d.f.,
say) that normally would be used to make the overall test for treatments in the analysis of variance. The
calculated ratio is compared with the critical value of the F-distribution with 1 and m d.f.
If we are comparing t = 4 treatments in a completely randomized experiment with n = 3 replicates per
treatment, the d.f. for the error m.s. is m = t(n 1) = 8. In a 5% two-tail test, the critical value of the
F-distribution with 1 and 8 d.f. is 5.32. If a one-tail test is justifiable (as, for example, if in the contrast 31, -
(y2 + y.3 + Y4), the first treatment is control and the other treatments are three types of insecticides), the 5%
critical value is only 3.46. Since a smaller critical value is easier to exceed, a significant difference is easier to
declare in a one-tail test. Consequently, the test is less likely to commit a Type II error (failure to declare a
difference when one exists).
Two contrasts, C, = Say and C2 = Yby, are said to be orthogonal if lab =0 (i.e., if the sum of the
products of the corresponding coefficients in the two contrasts is zero). A set of contrasts is said to be
mutually orthogonal if all pairs of contrasts in the set are orthogonal. If, for brevity, we write (a,7 + aTy2 +
. . + atYt) as (a,, a2, . at), the three contrasts (1, 1, -1, -1), (1, -1, -1, 1), and (1, -1, 1, -1) are
mutually orthogonal. It can be proved that there are only (t 1) mutually orthogonal contrasts among t
means; however, there are infinitely many such sets of mutually orthogonal contrasts.
The following are another two sets of mutually orthogonal contrasts: (1, 1, -1, -1), (1, -1, 0, 0), (0, 0, 1,
-1), and (3, -1, -1, -1), (0,2, -1, -1), (0, 0, 1, -1). It also can be proved thatif C, C2,. ., C,-i ac (t -1)
mutually orthogonal contrasts, their individual sums of squares add up exactly to the treatments s.s. The
statistical distributions of these contrasts are independent. This is one reason why, whenever possible, we
should aim for an orthogonal decomposition of the treatments d.f. Of the possible sets of mutually orthogonal
contrasts, the experimenter should choose the set that is most interesting or most relevant to his study.
Mutual orthogonality is desirable but not absolutely essential. If several contrasts interest the scientist, he
should not let the lack of mutual orthogonality prevent him from performing the statistical tests, as long as
these contrasts have not been suggested by the data. Contrasts suggested after data snooping should be,
tested by a multiple comparison procedure.






2.2. Qualitative Factors


Experimental variables or factors may be divided into qualitative and quantitative factors. E
qualitative factors are varieties (peanuts, corn, etc.), types (soils, fungicides, etc.), locations, and
chemical analyses or of counting bacteria. Examples of quantitative factors are temperature
humidity, pH, concentration, and several levels of a fertilizer. Although the various varieties or
an experiment also are referred to as the levels of the factors "varieties" and "soil types," no
numerical values can be assigned to the levels of a qualitative factor. Levels of a quantitative vari
course, naturally numerical.
Factorial experiments are those in which the treatments are made up of all possible combine
levels of two or more factors (qualitative or quantitative). (The term factoriall" thus merely de
nature of the treatments and not the design of the experiment, which may be completely r:
randomized block, Latin square, split-plot, etc.) The simplest factorial is the 22 or 2 x 2 experimel
factors A and B, ach at two levels. For the 2 x 2 factorial, the partitioning of the d.f. for treat
same whether the two factors are both qualitative or quantitative, or one of each kind. The two le
designated generally as H (high) or L (low). The low level, in particular, may be zero. For a quality
we may arbitrarily label one level H and the other L. The four treatments are denoted by (1), a
where absence of a letter implies that the corresponding factor is at the low level; and (1) is a spe
for the treatment where both factors are at the low level. These four treatments could have
explicitly but awkardly denoted by ALB,, AHB,,, A,,BH, and AHBH, respectively.
The three d.f. for treatments are partitioned into the main effect of A, main effect of B
interaction. The coefficients for these contrasts are as follows:

Treatments
Contrasts -(1) a b ab
C, -1 1 -1 1 Main effect of A
C2 -1 -1 1 1 Main effect of B
C3 1 -1 -1 1 Interaction of A and B
The coefficients for the main effect of A are +1 for treatments where A is at the high level and -1I
low level; and similarly for B. The coefficients for interaction are obtained by multiplying cor
coefficients for main effects. To get the sums of squares for the preceding contrasts, we apply Eq
to the four treatment means or totals, using the coefficients for each contrast in turn.
The difference [a (1)] is called the simple effect of A at the low level of B; similarly, (ab b) i
effect of A at the high level of B. The main effect of A is the average of the simple effects of A
fractions, the coefficients for this average have been multiplied by two. The reader will recall th
contrast is unchanged if the coefficients are multiplied by a common number.)
If the factors A and B act independently, the two simple effects of A should be about
(Experimental or random errors will prevent them from being exactly equal.) Therefore, their
(ab b) [a (1)] = ab + (1) a b = C3
should be approximately zero if A and B are independent. If this quantity is large (significantly dit
zero), we say that there is interaction between A and B (i. e., effect of A at low level of B is different
of A at high level of B). We also can write C3 as (ab a) [b (1)] = (effect of B at high level of A)
B at low level of A) so that if effect of A depends on the level of B, we know that the effect of B der
level of A.
The following artificial two-way tables of means show some possible results of the tests for i
and interaction. In (d), for example, the simple effect of A is 10 units at low B and 20 units at high
dependence of the effect of A on the level of B or interaction between A and B.












Low
B High
Average
(a)







Low
B High
Average
(b)







Low
B High
Average
(c)







Low
B High
Average
(d)


Low High Average
10 20 15
12 24 18
11 22
A sig.
B not sig.
A x B not sig.


A
Low High Average
10 20 15
22 34 28
16 27
A sig.
B sig.
A x B not sig.


A
Low High Average
10 20 15
. 6 26 16
8 23
A sig.
B not sig.
A x B sig.


A
Low High Average
10 20 15
18 38 28


A sig.
B sig.
A x B sig.


In general, a two-factor experiment is a p x q factorial. The (pq 1) d.f. for treatments will be
partitioned into main effects of A with (p 1)d.f., main effects of B with (q 1) d.f., and interaction with (p
- 1) (q 1) d.f. The A x B interaction is more difficult to illustrate if p and q are greater than two, but the
interpretation is similar to that in the 2 x 2 factorial; viz., differences among levels of A depend on the levels of
B, and vice versa. If the p levels of A are such that orthogonal contrasts are possible, the (p 1) d.f. for the
main effects of A should be partitioned further into single d.f. If it is impossible to partition the (p 1) d.f. for
A, then it is legitimate to use a multiple comparison procedure to compare the p levels of A.
Testing the main effects of A presupposes that there is no A x B interaction. If interaction exists, the
differences among the levels of A depend on the level of B. It does not make much sense to compare the levels
of Aaveraged over all levels of B, which is what main effect is. It is more instructive to compare the levels of A






for each level of B separately, and vice versa, using the pooled error mean square from the complete
experiment, if the assumption of homogeneous variances is valid.
With three factors, the simplest is a 23 or 2 x 2 x 2 factorial. The eight treatments may be denoted by (1),
a, b, ab, c, ac, be, abc, in an obvious extension of the previous notation, where, for example, ac stands for the
treatment with factors A and C at their high level and B at the low level. The seven d.f. for treatments will be
partitioned into main effects (A, B, C), two-factor (or first order) interactions (A x B, A x C, B x C), and
three-factor (or second order) interaction (A x B x C), each with a single d.f. Second and higher order
interactions-are difficult to interpret. The A x B x C interaction is the interaction of (A x B) and C. If A x B
x C interaction is significant, the A x B interaction at the high level of C is different from that at the low level
of C. The coefficients for the following contrasts are obtained as in the 2 x 2 factorial experiment.

Treatments
(1) a b ab ac be abc
A -1 1 -1 1 -1 1 -1 1
B -1 -1 1 1 -1 -1 1 1
A x B 1 -1 -1 1 1 -1 -1 1
C -1 -1 -1 -1 1 1 1 1
A x C 1 -1 1 -1 -1 1 -1 1
B x C. 1 1 -1 -1 -1 -1 1 1
A x B x C -1 1 1 -1 1 -1 -1 1
The 2 x 2 x 2 factorial can be generalized to the p x q x r factorial (three factors A, B, and C, with p, q,
and r levels, respectively), to the 2P factorial (p factors, each at two levels), and to the p, x p2 X . x p, (r
factors with p, P2, P. Pr levels). The total number of treatment combinations increases rapidly with
increasing number of factors. With six factors, even if each is at two levels, we require 26 = 64 experimental
units per replicate. Besides the 6 main effects, there will be 15 two-factor, 20 three-factor, 15 four-factor, 6
five-factor, and 1 six-factor interactions. If we can assume that high order interactions (four-factor or higher,
say) do not exist, as is usually true, we may pool these interactions for use as error mean square so that we do
not need to replicate. In fact, a single replicate already may be too large an experiment, and our resources
may allow us to carry out only a portion of the full factorial experiment. So-called fractional factorial
experiments are available for this purpose. They are discussed in Davies-(1956), Cochran and Cox (1957),
Peng (1967), John (1971), and Anderson and McLean (1974).
The following example, taken from Little and Hills (1972), shows the partitioning of treatments d.f. to
give meaningful single d.f. contrasts. Six sources of nitrogen on yield of sugar beet were compared: Control
(1), urea (2), ammonium sulfate (3), ammonium nitrate (4), calcium nitrate (5), and sodium nitrate (6).

Treatments
Contrasts 1 2 3 4 5 6
C, -5 1 1 1 1 1 Nitrogen vs. no nitrogen
C., 0 -4 1 1 1 1 Organic vs. inorganic nitrogen
C, 0 0 -1 -1 1 1 Ammonium vs. nitrate nitrogen
C, 0 0 -1 1 0 0 Ammonium nitrate vs. sulfate
C, 0 0 0 0 -1 1 Calcium vs. sodium nitrate
The reader should check the mutual orthogonality of the contrasts. Note that the interpretation of Contrast
C3 is not quite right since Treatment 4 contains both ammonium and nitrate nitrogen.
An interesting factorial experiment was conducted by Dr. Ralph Segall at the U.S. Horticultural
Research Laboratory in Orlando, Fla. He studied the effects of 10 fertilizer treatments on the incidence of
postharvest bacterial soft-rot of tomato fruits. The 10 treatments (all of which had 18-0-25) initially may be
regarded as a 2 x 5 factorial (mulching at two levels and "additives" at five levels). The five additives are
made up of control and four chemicals. The four chemicals are in the form of a 2 x 2 factorial (2 anions and 2
cations). We have shown the coefficients for only five mutually orthogonal contrasts. The remaining four
contrasts are the interactions between C, and each of C2, C3, C4, and C5. The reader may interpret the
contrasts C1, . C, and the interactions between C, and each of C2, . C5.








Contrasts
Treatments C, C, C, C4 C5
Control (1) 1 -4 0 0 0
Calcium nitrate (2) 1 1 1 1 1
Mulched beds Calcium chloride (3) 1 1 1 -1 -1
Potassium nitrate (4) 1 1 -1 1 -1
Potassium chloride (5) 1 1 -1 -1 1

Control (6) -1 -4 0 0 0
Calcium nitrate (7) -1 1 1 1 1
Nonmulched beds Calcium chloride (8) -1 1 1 -1 -1
Potassium nitrate (9) -1 1 -1 1 -1
Potassium chloride (10) -1 1 -1 -1 1

There may be situations in which it is justifiable to apply a multiple comparison procedure to compare
factorial treatments. For example, suppose a farmer is interested in growing one of three types of grasses and
using one of four types of fertilizers. The farmer is not interested in the scientific comparison of yields from
the three varieties of grasses or types of fertilizers. He is only interested in maximizing his profit. If the
commercial values of the three grasses and the costs of the four fertilizers are different, analyzing the profit
(in dollars and cents) per plot is more relevant than analyzing yields per plot. The 12 treatments (combina-
tions of grasses and fertilizers) may be compared for profitability, using a multiple comparison procedure and
ignoring their factorial nature.
At a panel discussion sponsored by the Data Systems Application Division, Agricultural Research
Service, during the joint meeting of the statistical societies in Atlanta in August 1975, two panel members
(Dr. David B. Duncan and Dr. John W. Tukey) said they might condone multiple comparisons of individual
factorial treatments (from qualitative factors) if the main effects were not significant (Duncan) or if their F
ratios were less than two (Tukey).


2.3. Quantitative Factors

2.3.1. One Factor
With a quantitative factor (e.g., temperature, pressure, humidity, pH, and concentration or levels of a
fertilizer), regression analysis or curve fitting is the most appropriate technique. The treatments d.f. and s.s.
should be partitioned into components due to linear (first degree) regression, quadratic (second degree)
regression, cubic (third degree) regression, and so forth. If enough theoretical knowledge exists to specify
the mathematical form of the relationship between the response y and the experimental variable x (e.g.,
logistic, Mitscherlich's law, Gompertz's law, von Bertalanffy's curve, etc.), this equation should be fitted to
the data. In most (if not all) agricultural experimentation, however, the mathematical relationship between
the response and the so-called independent variable is so complex that it defies specification. Therefore, we
must approximate the unknown mathematical relationship by means of a polynomial of the form y = bo + bx
+ b2x2 + . + bdxd Within a limited range of the independent variable, a polynomial approximation is
usually satisfactory if the response does not level off in the experimental range of x, in which case an
asymptotic curve should be fitted.
Table 1 shows the analysis of variance of a randomized block experiment with b replicates or blocks, t
treatments (levels of a quantitative factor), and m measurements per plot (experimental unit), with partition-
ing of the treatments d.f. and s.s. into linear and quadratic components. With the general availability of
computer programs, it is not difficult to fit a polynomial of a higher degree than quadratic. The ratio'
ms(dr)/ms(e) provides a test for the statistical significance of the combined contributions from the higher
order polynomials, sometimes called a test of the lack of fit of the fitted model (in this case quadratic). If
quadratic is sufficient, this ratio has the F-distribution with (t 3) and (b 1) (t 1) d.f. (For testing, the
author generally recommends the use of ms(e) rather than ms(s) as the error term since the latter does not
represent true replications. If b = 1, we are forced to use ms(s) as the error term, but this is dangerous since
ms(s) may seriously underestimate ms(e) and it will then be easy to get a spuriously significant result.)








If the quadratic term is statistically significant but its s.s. is only a small part of the treatments s.s., we
may prefer to fit a linear trend only since the curvature of the response curve is only slight. We may be able to
predict the response y better (i.e., with a smaller mean squared error of prediction) by using a straight line
rather than a quadratic, even if the true response curve is a quadratic function. The curvature, however,
must be slight. This comes about through having to estimate fewer parameters (constants of the response
function) in linear regression. A straight line is also easier to use than a parabolic curve.
In comparing the effects of, say, 10, 20,.30, and 40 p/m of a certain chemical, if the linear or quadratic
regression of response on concentration is significant, or both are significant, no multiple comparison
procedure is necessary. All concentrations are significantly different in their effects. In fact, even 10 and 10.1
p/m also will be different. Of course, the difference between the effects of 10 and 10.1 p/m will be extremely
small. The usual significance test is not concerned with the magnitude of the difference, however. It is only
concerned about whether a true difference exists, no matter how small.
We have the following possible results with one factor:


I I I I
10 20 30 40
x
(a) LR (NS)
QR (NS)


I I I I
10 20 30 40
x
(c) LR (S)
QR (S)


LR = linear regression;
S = significant;


I I I i
10 20 30 40
x
(b) LR (S)
QR (NS)






y*



I x*


10 20 30 40
x
(d) LR (NS)
QR (S)


QR = quadratic regression
NS = not significant


In (a), all treatments (infinitely many between 10 and 40 p/m) are the same, while in (b) and (c) all treatments
are different. In (d), all treatments less than x* (the value of x that will maximize y) are different. We may
want to estimate x* and construct confidence limits for it. If y* is the maximum response, we may be
interested in finding the range of x that will give a response higher than (y* A), where (y* A) is an
acceptably high yield. If it costs more to apply the factor x the higher its level is, we should take z as the





Table 1. Analysis of variance of a randomized block experiment to compare effects of several levels of a
quantitative factor

Sources of variation d.f. s.s. m.s. F
Blocks (B) b-1 ss(b) ms(b) ms(b)/ms(e)
Treatments (T) t-1 ss(t) ms(t) ms(t)/ms(e)
Linear regression 1 ss(o r) ms( 1 r) ms( J r)/ms(e)
Quadratic reg. (additional) 1 ss(qr) ms(qr) ms(qr)/ms(e)
Deviations from reg. t-3 ss(dr) ms(dr) ms(dr)/ms(e)
Error (B x T) (b-l) (t-1) ss(e) ms(e)
Subsampling error bt(m-1) ss(s) ms(s)
Total btm-1 ss(T)


response variable, where z is the yield per unit cost of application of x. These considerations are more
meaningful than the question often asked by the naive experimenter: Among 10, 29, 30, and 40 p/m, which are
different in their effects?
There are two options if the lowest level of x in the experiment is zero (control). We may fit a regression
curve to all levels (including zero), or we may isolate a single d.f. for the contrast between zero and nonzero
levels and fit a regression curve to the nonzero levels only. Quite often the regression is curvilinear in the first
option and linear in the second option. If this is so, the second method of analysis is preferable, especially if in
actual usage the factor x will not be applied at a level below the first nonzero level of the experiment.
For the linear regression model y = bo + b,x, the estimated responses at x = x* and at x = x** are y* = b,
+ bix* and y** = bo + bix**, respectively. Therefore, the estimated difference in response at any two values
x* and x** is equal to b,(x** x*), and the variance of this estimated or predicted.difference is (x** x*)2
(variance of b,). The formula for the variance of b, is given in Equation (2.4). The 100 (1 a)% confidence
interval for the true difference is b,(x** x*) t(a;v) V(x** x*)2 (estimated variance of b,), where t(a;i) is
the two-sided (100 a)% point of Student's t-distribution with v d.f.
For the quadratic regression model y = bo + b,x + b2x2, the estimated difference is b,(x** x*) +
b2(x**2 x*2), with variance equal to [(x** x*)2 (variance of b,) + (x**2 x*2)2 (variance of b2) + 2(x** -
x*) (x**2 x*") covariancee of b, and b,)]. In a good regression computer program, the printout will include
the estimated variances and covariances of the estimated regression coefficients.
Because linear relationships occur frequently, we will give the computational results for linear regres-
sion analysis. In general, let yS be the mean of the n, observations taken at xi, the ith level of the factor (i = 1, 2,
. t). (We are allowing unequal replications here. In Table 1, ni = bm, a constant.) The equation of the
fitted line is y = b, + b, x, where

b, = xnixiy (1nixi) (Xniyj)/N (2.2)
in,.xi (Inixi)2/N
bo = [(Sniyi) b, (Inixi,)/N, (2.3)
and N = (n + n2 +. . + nt), the total number of observations. (In the simplest linear regression problem, n,
= n, =. . = nt = 1, and the above formulas for the slope and intercept of the line will reduce to more familiar
ones.) The s.s. for linear regression is (Num.)2/Den., where "Num." and "Den." are the numerator and
denominator, respectively, of the expression for b1 above. The s.s. for deviations from regression, now with (t
- 2) d.f. if we are only fitting a straight line, is most conveniently obtained by subtracting ss(. r) from ss(t),
the treatments s.s. Finally, the variance of h1 is
var. (b,) = 0o2/[n iX (1nix1)2/N], (2.4)
and o-2 may be estimated by ms(e) in Table 1, or by ms(dr) if b = 1.
If the levels are replicated equally and spaced equally, the computations for obtaining the various s.s. for
regression will be simplified considerably by the use of orthogonal polynomials, shown in Table 2 for 3, 4, and
5 levels only. For more extensive tables and discussion of the method for getting the actual regression
equation, see Fisher and Yates (1963). If we look at t = 4 levels, say, in Table 2, we see that the three sets of

9











coefficients form a set of mutually orthogonal contrasts. (A polynomial curve of degree (t 1) will pass
through the t means exactly.) With these coefficients, we can obtain the s.s. for linear or quadratic
regression, using Equation (2.1) in the previous section on orthogonal contrasts. An example follows.



Table 2. Orthogonal polynomials
(t = number of levels; d = degree of polynomial)

t=3 t=4 t=5
d=1 d=2 d=1 d=2 d=3 d=1 d=2 d=3 d=4
-1 +1 -3 +1 -1 -2 +2 -1 +1
0 -2 -1 -1 +3 -1 -1 +2 -4
+1 +1 Tl -1 -3 0 -2 0 -
+3 +1 +1 +1 -1 -2 -4
+2 +2 +1 +1


Chew (1962) discussed published results of an experiment wherein the research worker erroneously
concluded that there were no treatment differences, through failure to partition the treatments d.f. and s.s.
Table 3 shows the analysis of variance and treatment means with b = 5 blocks, t = 4 treatments (0, 2, 4, and 6
degrees of angle), and m = 5 repeated measurements on each experimental unit. (The response was the force
in pounds required to separate a set of electrical connectors at various angles of pull.) The treatment means
show increasing response with increasing angles. Each treatment mean was an average of ni = bm = 25
observations. From the coefficients in Table 2, the means in Table 3 and Equation (2.1), we have the following
sums of squares for regression:


linear regression =


quadratic regression =


cubic regression =


25[(-3) (41.94) + (-1) (42.36) + (1) (43.82) + (3) (46.30)]2
(-3)2 + (-1)2 + (1)2 + (3)2
25[(1) (41.94) + (-1) (42.36) + (-1) (43.82) + (1) (46.30)]2
(1)2 + (-1)2 +(-1)2 +(1)2

25[(-1) (41.94) + (3) (42.36) + (-3) (43.82) + (1) (46.30)]2
(-1)2 + (3)2 + (-3)2 +(1)2


In a two-tail test, the F-ratio for linear regression is significant at between the 2V2% and the 1% level. In a
one-tail test it will 'be significant at between the 11A% and the 1/2% level. (A one-tail test could be justified
here.)


Table 3. Analysis of variance and means

Source of variation d.f. s.s. m.s.
Blocks 4 1234.83 308.71
Treatments: 3 290.79 96.93 2.56 (not sig.)
Linear regression 1 264.26 264.26 6.97*
Quadratic regression 1 26.52 26.52 <1
Cubic regression 1 .01 .01 <1
Error 12 455.03 37.92
Subsampling error 80 316.50 3.96
Total 99 2297.15
x: 0 2 4 6
y: 41.94 42.36 43.82 46.30
Difference: 0.42 1.46 2.48


= 264.26


= 26.52


= 0.01




With n, = 25 and N = 100, the formulas for the slope and intercept give:
bi = (25)[0(41.94) + 2 (42.36) + 4 (43.82) + 6 (46.30)] (25) (12) (25) (174.42)/100
(25) (0 + 4 + 16 + 36) [25(12)]2/100
= 0.727;
bo = [25 (174.42) 0.727(25) (12)]/100 = 41.424,
so that the equation is y = 41.424 + 0.727x.
Since regression is significant, no multiple comparisons are necessary. The treatments areALL different
(in their effects). For example, 0 and 2 degrees are different (without testing), as well as 0 and 1 degree or
even 0 and 0.1 degree. This equation gives an estimate of y for any given x; and, clearly, for two different
values of x, the equat. on gives different values of y. The difference in response at x = x* from that at x = x** is
y(at x**) y(at x*) = 0.727 (x** x*),
and its estimated variance is (x** x*)2 (37.92)/{(25) (56) [25(12)]2/100} = 0.0758 (x** x*)2, using
Equation (2.4) for the variance of bi. The 95% confidence interval for the difference in the two responses
corresponding to a unit difference in the x values is 0.727 2.179 vr775 = 0.727 .600 = (.127, 1.327).
If the observed means of the t levels are in increasing (or decreasing) order and t is at least four, no
further statistical test is necessary to establish significance of treatment effects, if it is known a priori that the
effect of treatment, if any, is to increase (or decrease) the response, for the probability of the t means falling in
that order under the null hypothesis is 1/(t!) < 1/24, ift -- 4, which is significant at the conventional 5% level. If
there is no prior knowledge of the direction of the treatment effect, a two-sided test is necessary and t has to
be at least five for the ordering of the t means to be significant at the 5% level.
For a criticism of the widespread misuse of Duncan's multiple range test in agricultural research to
compare levels of a quantitative factor, see Mead and Pike (1975), particularly Section 2.2.

2.3.2. Two or More Factors
For one quantitative factor, we partition the treatments d.f. into linear, quadratic, cubic, etc., regression,
which is equivalent to fitting a polynomial of the form y = bo + b,x + b2X2 +. . + bdXd, where y is the
measured response and x is the level of the experimental factor. We similarly analyze two quantitative factors
A and B. Denote the levels of A and B by x, and x,, respectively. The following are the first and the second
degree (or order) polynomials in two variables:
y = bo + b,x, + b2x2 (first order)
y = bo + (bx, + bx2) + (b1,x,2 + bl2x 1x + b22X22) (second order)
In the second order polynomial, the coefficients b11, b12, b22 could have been replaced by b3, b4, bs. The double
subscript, however, reminds us that these are the coefficients for the quadratic terms. Just as the second
order model is obtained from the first order model by adding the second order (or quadratic) terms, we
similarly obtain the third order model by adding the cubic terms (b,11x,3 + b112x 1x2 + b122x1x22 + b222X23) to
the second order model.
In partitioning the d.f. in a 2 x 2 factorial, we are in essence fitting the model y = bo + bix, + b2x2 +
b,2x,x.,, an incomplete second order model: (With only two levels, we cannot estimate squared terms.)
In a 3 x 3 factorial, the 2 d.f. for each of the two main effects may be further partitioned into linear and
quadratic terms. The 4 d.f. for the A x B interaction may be partitioned into products of the linear and
quadratic terms of the main effects. Therefore, we are fitting the model
y = bo + (bx, + b11x12) + (b2x2 + b22x22) + (bl2X1X2 + b122XIx22 + b112x12x2 + b1122I2x22),
(main effects of A) (main effects of B) (interaction A x B)
which is a second order model plus two cubic and one quartic terms.
Table 4 gives the analysis of variance of a randomized block experiment with b blocks and t treatments,
with the t treatments forming a p x q factorial. This table should be compared with Table 1 for one
quantitative factor. (If m measurements were made on each experimental unit, we will assume that they have
been averaged; otherwise, there will be an extra line in the analysis of variance, as in Table 1.) The 2 d.f. for
linear regressica. may be further partitioned to show the individual contributions from x, and x2 separately.

11




T










They are partitioned similarly for quadratic and cubic regressions. The sums of squares in the s.s. column
usually are called the sequential sums of squares. For example, ss(qr) is not the total quadratic regression
s.s.; it is the additional s.s., after fitting a linear model. In other words, ss(qr) is the difference in regression
sums of squares between fitting a linear model and a full quadratic model. If the true model (true state of
nature) is linear, n.o(qr), ms(cr), and ms( of) will be almost the same as ms(e), the error m.s. The quadratic
model has 5 coefficients (other than the intercept bo); therefore, it has 5 d.f. and its s.s. is obtained by adding
ss(, r) and ss(qr). If p = q = 5 (i.e., a 5x 5 factorial), t = pq = 25 and "lack of fit" has (t -10) = 15 d.f. If we are
certain that a cubic model is adequate, and this is usually so, we do not need any replication. We can use
ms( o of) as the error m.s. in making tests of significance. With replication, however, we can test the cubic
model. The extension of Table 4 to three or more quantitative factors should be obvious.

Table 4. Analysis of variance of a randomized block experiment
with 2 quantitative factors

Sources of variation d.f. s.s. m.s.
Blocks (B) b 1 ss(b) ms(b)
Treatments (T) t 1 ss(t)
Linear regression- 2 ss(A r) ms(U r)
Quadratic reg. (additional) 3 ss(qr) ms(qr)
Cubic reg. (additional) 4 ss(cr) ms(cr)
Lack of fit t -10 ss(. of) ms(A of)
Error (B x T) (b 1) (t 1) ss(e) ms(e)
Total bt 1 ss(T)


Since getting the various s.s. is extremely tedious on a desk calculator, a computer is necessary. If the
levels of A and B are equally replicated and equally spaced (e.g., 5, 10, and 15 units for A and 100, 200, and 300
p/m for B), we can use orthogonal polynomials, as in the one-factor case. We illustrate this with a 3 x 3
factorial. From Section 2.3.1, we know how to obtain the linear and quadratic regression s.s. for A and for B.
using either the means or the sums for the levels of A and of B. Table 5 gives the coefficients for getting the
s.s. corresponding to xix2, x12x2, xix22, and x,2x22. The coefficients will operate on the treatment means as
usual. For example, if we denote the treatment means by y1, . ., y9 in the order shown in Table 5, the s.s.
correspondingtoxiX2 (or AL x BL) is, from Equation (2.1) in Section 2.1, equal to b(y, Y Y7 + 7,)2/4, where
b is the number of observations in each mean. We also can use the coefficients in Table 5 to get the s.s. for A,
AQ, BL and BQ, but these can be obtained more easily from the three means for the three levels of A, and
similarly for B. The reader should verify that the coefficients for the components of the main effects are
similar to those given in Table 2. As before, the coefficients for interactions are the products of corresponding
coefficients for the main effects. With Table 5 as an example, the reader should have no difficulty in extending
this to a 3 x 4 or 4 x 5 factorial, or to more than two factors. As an exercise, the reader should write the
coefficients for a 2 x 3 x 3 factorial.

Table 5. Orthogonal polynomials for 3 x 3 factorial (equally spaced)

Treatments
A=1 A=2 A=3
B: 1 2 3 1 2 3 1 2 3
x, or AL: -1 -1 -1 0 0 0 1 1 1
x2 or AQ: 1 1 1 -2 -2 -2 1 1 1
x2 or BL: -1 0 1 -1 0 1 -1 0 1
x2 or B: 1 -2 1 1 -2 1 1 -2 1
xx, or ALx BL: 1 0 -1 0 0 0 -1 0 1
xix, or ALx BL: -1 0 1 2 0 -2 -1 0 1
xlxx or ALx BQ: -1 2 -1 0 0 0 1 -2 1
xx2 or AQx BQ: 1 -2 1 -2 4 -2 -1 -2 1







As in the one-factor case, if regression (whether linear or quadratic) is significant, then all treatments
are different and no multiple comparison procedure is necessary. Suppose a second order model is necessary
and sufficient. We can use this model for interpolation; i.e., to predict the response y at any point within the
range of the values of the two factors used in the experiment. Polynomials are notoriously bad for extrapola-
tion. We also can find the combination of values of x, and x2 that will optimize (maximize or minimize) y. To do
this, we differentiate y with respect to x, and x2, set these two derivatives to zero, and solve the two resulting
equations The solution is:

x* = (2bb22-b2b12)/(b2i-4b11b22)
x*2 = (2b2b1,-b,b12)/(b'i-4b ,b22).
These values of x* and x* (if the true values of the b's are known) will optimize y. The estimated optimum
value of y is obtained by putting the estimated values of x* and x* (in terms of the estimated b's) into the
second order model.
If the two factors are two kinds of fertilizers, say, the optimum y may require such a large amount of both
fertilizers that it will not be economically optimum. Instead of fitting a model to the yield y, perhaps we should
fit a model to z, the yield per dollar of fertilizers applied, and optimize z.
If the response surface (value of y as x, and x2 vary) is highly peaked at the optimum, we should not stray
far from the optimum combination of x, and x2 because y will drop sharply. On the other hand, if the response
surface is rather flat near the optimum, we can depart from the optimum condition without any appreciable
decrease in y and the other combinations may be more convenient. One way to study the response surface is to
draw contours. Suppose the estimated optimum value ofy is 138, say. We can set y = 135,130, 125, etc., in the
second order model. These values will give us the sets of values of x, and x, that will give an estimated yield of
135, 130, etc.
We also can use the equation to estimate the difference in the response at two different points. For
example, for the same value of x, but different values of x2 (x* and x2, say), the difference in the responses
is y(x,,x*) y(x1,x') = (x*-x2) b2 + x,(x*-xi) bl2 + (x2-X22) b22, and its variance is (x*-x2)2
V(b2) + X2(x*-x')2 V(b12) + (x2*2-x22 2)V(b22) + 2xi(x*-X)2 Cov(b2,bl2) + 2(x*-xD) (x*2-X 2)
Cov(b2,b22) + 2x1(x *-x) (x*2-X2) Cov(b12,b22). Similarly, we can estimate y(xi*,x2) y(x1',X2) and
y(x*,,x*2) y(x ,x), and their standard errors. Variances and covariances of the regression coeffi-
cients will be included in the computer printout from a good regression analysis program.
We conclude by mentioning a question of experimental design. Box and Wilson (1951) pointed out that
the squared terms in the second order model are estimated with relatively low precision in a 3 x 3 factorial.
Box and his coworkers have developed so-called response surface designs. The texts mentioned previously
for fractional factorials also contain discussion on response surface methodology. Further references are Box
and Hunter (1958) and Myers (1971).

2.4. Mixed Factors

Consider two factors A and B, with p and q levels respectively, where A is qualitative and B is
quantitative. An example would be an experiment comparing several varieties of peanuts and several rates of
a fertilizer, or destruction rates of a certain bacteria at different temperatures, using several culture media.
Table 6 shows the analysis of variance of a randomized block experiment, showing the partitioning of the
d.f. for the pq treatments. We have partitioned the d.f. for the main effects of B into linear and quadratic
regression only, but a higher polynomial also may be fitted. If the levels of B are spaced equally, ss(BL) and
ss(BQ) will be easy to get, using orthogonal polynomials, and ss(BR) will be obtained by difference, using ss(B).
If the levels of A are such that meaningful orthogonal contrasts can be formed among them (before looking at
the data), we should partition its d.f. accordingly, and also the d.f. for Ax BL,, etc.






Page
Missing
or
Unavailable






Page
Missing
or
Unavailable








If a = .05, this equation gives E = .05, .0975, .1426, .1855, .2263, .2649, .3017, .3366, .3698, .5124, and .6227
for t = 2, 3,. . 9, 10, 15, and 20, respectively. Thus, if we test each of the 9 orthogonal comparisons at the
5% level, in an experiment with t = 10 treatments (and the null hypothesis Ho is true), the probability of
rejecting (incorrectly) one or more comparisons is 36.98%. The overall protection against incorrectly reject-
ing any of the nine comparisons is 63.02% in this example.
If E = .05, the preceding equation gives a = .05, .0253, .0169, .0127, .0057, .0037, and .0028 fort = 2, 3, 4,
5, 10, 15, and 20, respectively. Thus, if we wish to hold the experimentwise error rate to 5% (i.e., 5%
probability of rejecting one or more orthogonal comparisons in an experiment where all treatments are equal
or, equivalently, 95% protection against incorrectly rejecting any comparison), we have to make each
comparison at a = .0057 (i.e., the 0.57% level) if there are 10 treatments in the experiment.
There is no rigid rule or criterion that enables us to decide whether a comparisonwise or an experi-.
mentwise error rate is more appropriate. It is mostly a subjective choice. An experimentwise rate is more
conservative in that fewer Type I errors (false significance) will be made; however, more Type II errors
(failure to detect true differences) will be made. A similar problem exists in choosing the significance level a
in the simple two-treatment case. Should a be taken to be .05 or .01? In situations where incorrectly rejecting
one comparison may vitiate the entire experiment or incorrectly rejecting one comparison is as serious as
incorrectly rejecting 10 comparisons, an experimentwise error rate is more pertinent. A comparisonwise
error rate should be used if one faulty inference does not affect the remaining inferences from the same
experiment. The author favors comparisonwise error rates in general. For further discussion of error rates,
see Tukey (1953b), Harter (1957), and Federer (1961).
We shall now describe the multiple comparison procedures in turn. Some textbooks that contain a
discussion of this topic are Federer (1955), Steel and Torrie (1960), Scheff6 (1959), Seeger (1966), Kirk (1968),
Bancroft (1968), and Miller (1966). Some review papers on this topic are Hartley (1955), Cornell (1971), Gill
(1973), Games (1971), Ryan (1959), O'Neill and Wetherill (1971), Thomas (1973), Waldo (1976), etc. The
O'Neill and Wetherill paper has a bibliography of 234 references, classified into 15 categories (multiple range
tests, error rates, simultaneous confidence intervals, etc.). Thomas has an unpublished bibliography on
multiple comparison techniques (available from him) containing about 300 references up to 1970.


3.2 Fisher's Protected and Unprotected LSD Methods

Fisher's protected LSD (least significant difference) procedure is to be applied only if the overall F test
for treatments is significant. It consists of applying the ordinary Student's t test to any pair of means yi and yj.
Let s2 be the error mean square (with v degrees of freedom) from the analysis of variance table, and ni and nj
be the number of replications of treatments i and j, respectively. The two treatments will be declared
different if the two observed means y, and yj differ (in absolute magnitude) by more than the LSD given by
LSD = t(a,v) Vs [(1/nt) + (/inj)], (3.2)
where t(a,v) is the tabulated two-sided (100 a)% value of the t-distribution with v degrees of freedom; e.g.,
t(.05, 30) = 2.04.
Besides permitting unequally replicated treatments, the procedure is applicable for interval estimation.
Thus, the 100(1 a)% confidence interval for (At jA) is (y -7y) LSD. (Note that if the difference between
yi and Yj is less than the LSD, the confidence limits will have different signs so that the hypothesis of equal
means is accepted. Recall the connection between hypothesis testing and interval estimation mentioned in
chapter 1.) A third desirable feature is its ease of application, especially if all treatments are replicated
equally. The LSD for all pairs of treatments is t(a,v) V2s2/n, where n is the common number of replications.
(It is possible for the overall F test to be significant but none of the t tests for the pairwise differences to be
significant. See Miller (1966, page 91).
To illustrate the method we will use the data in Duncan (1955) from a randomized block experiment with
six blocks and seven treatments (varieties of barley). The analysis of variance gave a treatments mean square
of 366.97 (with 6 d.f.), an error mean square (s2) of 79.64 (with v = 30 d.f.), with a highly significant F ratio of
4.61. The means (in bushels per acre) of the seven varieties, given below, have been relabeled A through G in
increasing order.






49.6 58.1 61.0 61.5 67.6 71.2 .71.3
A B C D E F G
With v = 30 and taking a to be 0.05, t(a,v) = 2.04 and the LSD = 2.04 x V/2(79.64)/6 = 10.51. Any two means
differing by more than 10.51 will be significantly different at the 5% level. We systematically test G -A,
G-B,G-C,G-D,G-E,G-F;F-A,F-B,. .,F-E;E-A,. .,E-D;. .;B-A. In practice, of
course we may not need to test all possible pairs. For example, once we have found G C = 10.3 to be less
than the LSD, we need not test G -D, G -E, and G -F, for these cannot be significant. The results usually
are presented by underscoring (means underscored.by the same line are not significantly different) or by
using superscripts (means having the same superscript are not significantly different). For the preceding
example, the results are as follows:
49.6c 58.1bc' 61.0ab 61.5ab 67.6ab 71.2a 71.3a
A B C D E F G


Another way of presenting the results, which is typographically convenient, is to group the means as follows:
(A,B), (B,C,D,E), and (C,D,E,F,G). Means in the same parentheses are not different. There were seven
differences (GA, GB, FA, FB, EA, DA, and CA). An unpleasant feature of many multiple comparison
procedures is the lack of transitivityy." In the preceding example, (A and B) and (B and C) were the same, but
A and C were different.
This procedure is satisfactory if Ho is true. However, suppose Ho is false such that all means but one are
equal, and this single mean is much larger (or much smaller) than the other (t -1) means. The overall F-test
will be significant, and repeated t-tests applied to the (t-1) equal means will have a large probability of
declaring some of these (t-1) means .to be unequal. This objection is removed in the Newman-Keuls'
procedure, to be discussed in Section 3.3.
In the unprotected LSD method, a preliminary F test need not be carried out at all, but the error rate for
each individual comparison is reduced to a/m, where m is the total number of comparisons (preferably
specified in advance) that we wish to make among the t treatments. If we restrict ourselves to orthogonal
contrasts, m = (t-1); if we make all possible pairwise comparisons, m = t(t-1)/2. More generally, we can
budget m different error rates a,, as, . am for the m contrasts, where these add up to a. If it is more
serious to incorrectly reject the i-th contrast than the j-th contrast, we would choose a, < a,. It can be
shown (using the so-called Bonferroni inequality) that the experimentwise error rate E is at most a.
Percentage points of the t-distribution for carrying out Fisher's unprotected LSD procedure may be found in
Table A in the appendix, reproduced from Dunn (1961). Alternatively, Scheffd (1959, page 80) gives the
. following approximation (due to A.M. Peiser) for the upper (one-sided) a point of the t distribution with v d.f.:
t., = z, + (4v)-(z. + z3,),

where z. denotes the upper a point of the standard normal distribution; e.g., z.o. = 1.645.



3.3. Newman-Keuls' Multiple Range Test

This method is applicable only in situations where all t treatments are equally replicated n times. As in
Section 3.2, s2 is the error mean square with v degrees of freedom. This method does not have a prior
significant F test as a prerequisite. To apply the method, we arrange the means in ascending order, but
instead of comparing the difference between any two means with a constant least significant difference (as in
Section 3.2), we test it against a variable yardstick
Wp = q(a; p, v)s-Is/n, (3.3)
where p (= 2, 3,. . t) is the number of means whose range (i.e., largest-smallest) we are testing, and q(a;
p, v) is the (100 a)% point of q(p, v), the distribution of the studentized range of p means and v degrees of
freedom. Values of q(a; p, v) are tabulated in Pearson and Hartley (1966) and Harter (1960a). They are






reproduced in condensed form in the Appendix (Table B), Beyer (1968), Miller.(1966), Steel and Torrie (1960),
etc.
For the numerical example in Section 3.2, t = 7, v = 30, and V/s2/n = V79.64/6 = 3.643. For a = .05, the
values of q are:
p: 2 3 4 5 6 7
q(.05; p, 30): 2.89 3.49 3.85 4.10 4.30 4.46
Wp = 3.643q : 10.53 12.71 14.03 14.94 15.66 16.25
Fisher's LSD and W2 are identical. We test G -A against W7 = 16.25 since G -A is the range of 7 means.
There are 2 ranges of 6 means (viz., G-B and F -A), and these are compared with W; = 15.66. Similarly, we
test the three five-mean ranges G-C, F-F, E -A against W = 14.94; G-D, F-C, E -B, D-A against W4=
14.03; G -E, F -D, E -C, D -B, C -A against W3 = 12.71; and G-F, F -E, E -D, D -C, C -B, B -A against
W2= 10.53. In practice, we need to perform much fewer tests than these, for once two means are judged to be
not different, they are underscored by a line, and no further testing is made among means that are between
the two means so underscored. We need only test G-A = 21.8>W7, G-B = 13.2 21.6 > W6, E -A = 18.0 > W5, and D-A = 11.9 < W4 (underscore). No further testing is necessary. The
results are as follows:
Aa Bab Cab Dab Eb Fb Gb or (A, B, C, D) and (B, C, D, E, F, G).
This method gives only 3 significant pairs (G-A, F-A, and E-A),. compared to 7 pairs from the LSD
method. The Newman-Keuls' procedure is intuitively more appealing than the LSD method. One feels that
the difference between the extremes of 7 means should pass a more stringent test than the difference between
the extremes of, say, 3 means. The method has the disadvantage of not being amenable to interval estimation.
The error rate is confusing because it is neither experimentwise nor comparisonwise. At each stage of testing
(range oft means, (t -1) means, etc.), the probability of rejecting the hypothesis of equal means, if true, is a.

3.4. Tukey's HSD Method and Multiple Range Test

Tukey's original HSD (honestly significant difference) procedure (1951, 1953) requires equal replica-
tions. It has the simplicity of Fisher's LSD method in having a constant yardstick with which to test all pairs
of treatment means. The HSD is calculated as the Wp of the Newman-Keuls procedure, with p taken at its
maximum value (i.e., with p = t, the total number of treatments). Thus, two treatments are declared to be
different (in their effects) if the absolute magnitude of the difference between their means exceeds
HSD = Wt = q(a; t, v)Vs2/n, (3.4a)
where the symbols are as in Equation (3.3).
In the previous example, with t = 7 treatments, error mean square s2 = 79.64 with v = 30 d.f. and n = 6
replications, the HSD = q(a; 7, 30) x 3.643 = 4.46 x 3.643 = 16.25, if a = .05. Testing the difference between
every pair of means against 16.25, we get results that are identical to those given by the Newman-Keuls
procedure. In general, we shall get fewer significant differences from Tukey's method. Since error rate of
Tukey's HSD method is experimentwise, Hartley (1955) recommends that a be taken as 0.10 or higher.
Tukey's HSD procedure also can be used to construct simultaneous confidence intervals for all pairs of
treatment differences as follows:
Prob. {(piJ,-j) lies within (,i-yj) HSD: i,j = 1,2, . ., t} = (1-a). (3.4b)
In words, Equation (3.4b) states that the probability is 0.95 that all of the following statements are true:
/9G- A = (71.3 49.6) 16.25; /AG-/AB = (71.3 58.1) 16.25; .;
AG --F = (71.3 71.2) 16.25; AF-- A = (71.2 49.6) 16.25; .;
AMF-/AE = (71.2 67.6) 16.25; . ; /B-A = (58.1 49.6) 16.25.
Equation (3.4b) can be generalized to simultaneous confidence intervals for linear contrasts among the t
treatment population means, as shown in Equation (3.4c).






t t t
Prob. .i ci/i lies within I ciLY (WSD) I I ci =(l-a), (3.4c)
=1 i= i= 1

for all sets of coefficients (cl, c2, . ct) satisfying 2c, = 0. (There is an uncountable infinity of such sets.)
Equation (3.4c) immediately reduces to (3.4b) if the contrast is a pairwise difference, for then one coefficient is
+1, another is -1, and the rest are zero. Equation (3.4c) also enables us to test a more general hypothesis Ho:
ecigi = d (specified). We reject Ho if the confidence limits for the contrast exclude d. Gabriel (1964) shows that
at least one contrast will be significant if, and only if, the overall F test is significant. This is not true if the
contrasts are restricted to paired differences only.
To overcome the conservativeness of his HSD procedure, Tukey also has proposed a multiple range test,
using the average of his HSD and the Newman-Keuls statistic as the test criterion. Thus, the range of p
ranked means is tested against
Y2[q(a; p, v) + q(a; t, v)]Vs2/n. (3.4d)
Spj0tvoll and Stoline (1973) and Hochberg (1975, 1976) have extended Tukey's HSD procedure to allow
unequal variances or unequal sample sizes. If sample sizes are unequal, two approximate procedures are to
use the harmonic mean of the sample sizes (reciprocal of the arithmetic mean of the reciprocals of the sample
sizes) or to replace the estimated variance of a mean (s2/n) in Equation (3.4a) by the average of the variances of
the two means concerned, viz., s2[(1/n1) + (1/nj)]2, as in Kramer's (1956) modification of Duncan's multiple
range test. Keselman, Toothaker, and Shooter (1975) found that these two methods "have the same
sensitivity for detecting real mean differences."

3.5. Scheffe's Method

Like Tukey's HSD, Scheffe's (1953) procedure is applicable to general contrasts, and not just paired
comparisons. Since it employs an experimentwise error rate, Scheffe (1959, page 71) suggests taking a = .10.
Scheff6's procedure is more general than Tukey's in being able to handle unequal replications. Let ni be the
t
number of replications of the i-th treatment. The contrast C = 2 cai will be estimated by C = 2ciy,, with
variance estimated by i=1
V(C) = s2 (c2 /n), (3.5a)
where s2 is the error mean square (from the analysis of variance table) with v degrees of freedom, say. The 100
(1. a)% simultaneous confidence intervals for all contrasts C (uncountable infinity of them, obtainable by
varying the set of coefficients c, c2, . et) are
C V(t-1).F(a;t-1,v).V(C), (3.5b)
where F(a; t-1,vP) is the upper (100 a)% point of the F-distribution with (t-1) and v degrees of freedom (for
numerator and denominator, respectively). As an example, F(.05;6, 30) = 2.42. For pairwise differences (2c2
= 2) and equal replications (ni = n), Equation (3.5a) reduces to
(-yj) = 2s2/n. (3.5c)
From Equation (3.5b), the 100(1 -a)% simultaneous confidence interval for all paired differences (g -gj) (for
all i and j) is
(i-yj) V(t-1).F(a;t-1l,). (2s2/n). (3.5d)
Equation (3.5d) can be used to test the significance of the difference between two means 1A, and gj. We declare
these to be different if the sample means J' and Y differ in absolute magnitude by an amount exceeding
S = /(t-1).F(a;t-1,v).(2s2/n). (3.be)








For t=2 treatments, S above is identical with the LSD since V/F(a;1,v) = t(a,v). Using the relationship
between hypothesis testing and interval estimation, we can test the general null hypothesis Ho:lci/i = r
(specified) by seeing whether d falls inside or outside the interval given in Equation (3.5b).
For the previous numerical example, taking a = .05, we have S = V/6 x 2.42 x 2(79.64)/6 = 19.63. Two
treatment sample means will be declared significantly different at the 5% level if their difference exceeds
19.63 in magnitude. (Note that this least significant difference is even larger than Tukey's HSD = 16.25. This
is a general result. Tukey's procedure is preferred over Scheff4's for pairwise comparisons, but for general
contrasts Scheff4's method gives a shorter interval.) Application of Scheffe's procedure to the previous
numerical example gives the following results: (A,B,C,D,E) and (B,C,D,E,F,G). There are only two
significant differences (G-A and F-A), compared to three differences from the Newman-Keuls and the
Tukey procedures.
Equations (3.5a) and (3.5b) are directly applicable to situations where the sample means have unequal
variances because of unequal replications, assuming that single observations are uncorrelated and have equal
variances. For situations where the unequal variances of the sample means also may be caused by observa-
tions from the different treatments having unequal variances, Brown and Forsythe (1974) replace Equation
(3.5a) by S(cisI/n1), where s2 is the sample variance of the i-th treatment, and F (a; t- 1,v) in Equation (3.5b) is
replaced by F(a; t-1,f), where f is obtained using Satterthwaite's result on the d.f. of a linear combination of
sample variances, as follows:
1
= flP/(ni-1)
f i
f, = (s,/ni)/I(s2/n,).
For another approximation, see Spj0tvoll (1972).
If the sample means are correlated, Equation (3.5b) will still hold but Equation (3.5a) must be modified to
include the covariances of the sample means, as in Equation (3.5f).
Scheffe's method can be directly generalized to linear model situations, expressible in matrix notation as
y = X/3 + e. This covers both multiple regression and analysis of variance models higher than just the one-way
classification. The contrast C = Zc1/3 will be estimated by C = 1cibl, where the b,'s are the least squares
estimates of the f31's. The estimated variance of C is
V(C) = I 'cicj (estimated covariance of bi,bj). (3.50
Most regression computer programs (e.g., the SAS package put out by North Carolina State University)
include the estimated covariances of the estimated regression coefficients as part of the output. Equations
(3.5b) and (3.5f) may now be used to construct simultaneous confidence intervals for linear contrasts or to
make multiple comparisons among the g3's.

3.6. Duncan's Methods

Of the several procedures that D.B. Duncan proposed between 1941 and 1975, we shall discuss only
two-his most popular (multiple range test) and his most recent (Bayesian k-ratio LSD rule), which he hopes
will supplant the former.

3.6.1. Multiple Range Test
This method assumes homoscedastic (equal variances) and uncorrelated means. It is very similar to the
Newman-Keuls procedure, except that the protection level at each testing stage varies with p, the number of
means whose range is being tested for significance. Duncan's rationale for decreasing the protection level as p
increases is as follows. In experiments factoriall or otherwise) where the (p-1) degrees of freed:mn for the p
treatments are partitioned into single degrees of freedom to correspond to (p-1) mutually orthogonal
contrasts, the experimenter has no qualms about testing each contrast at the a level. Assuming for simplicity
that the number of degrees of freedom for the error mean square is infinite (or quite large). the (p -1) F-ratios
are statistically independent (almost). Therefore, the probability of rejecting one or more contrasts, if all p
means are equal, is










Duncan (1955) modifies Newman-Keuls' multiple range test by using a variable level a, as the significance
level when testing the range of p means. As an illustration, with p = 9 equal means and a = .05, the
probability of incorrectly rejecting one or more of 8 orthogonal contrasts is 1 -(.95)8 = 1 -.6634 = .3366. This
large probability of Type I error makes Duncan's multiple range test very powerful (large probability of
detecting differences when they exist). Experimenters are often more interested in finding than in not
finding significant differences among the treatments being tested. For this reason, Duncan's procedure
received widespread acceptance among research workers, particularly in the agricultural sciences. As
originally proposed, no preliminary significant overall F test is required. To overcome, somewhat, the
objection of a possibly large Type I error probability, we may conservatively require a significant overall F
test as a necessary condition for the application of the multiple range test.
In the Newman-Keuls procedure, the yardstick for testing the significance of the range or p means is W,
= q(a;p,)Vs2/n. In Duncan's procedure, the yardstick is similar, except that a is replaced by a,, defined by
Equation (3.6a), giving the following "shortest significant range" criterion:
R, = q(a,;p,p)V)Sn. (3.6b)

Thus, no special tables are required if we have extensive tables of q(p,v), the distribution of the studentized
range of p means and v d.f. However, the percentiles a, are "awkward," being equal, for example, to .05,
.0975, .1426, .1855, .2262, and .2649 if a = .05 and p = 2,3,4,5,6, and 7, respectively. For this reason, Duncan
(1955) tabulates q(a,;p,v) for a = .05 and .01; p = 2(1)10(2)20,50,100; and v = 1(1)20(2)30,40,60,100, and =.
More accurate and more extensive tables are given in Harter (1960), reproduced in Harter (1970). A
condensed table of q(a,;p,v) is given in the appendix as Table C, in Steel and Torrie (1960), etc.
To apply the method, we arrange the means in ascending order and test each pair against R,, starting
with the extremes. Once two means are declared to be not significantly different, we underline them and no
further testing is made between means underscored by this line. Applied to the previous example with t = 7
means, v = 30 d.f., s2 = 79.64, and each treatment equally replicated n = 6 times so that Vs/ = 3.643, we
have:


p: 2, 3, 4, 5, 6, 7
q(.05,;p,30): 2.89, 3.04, 3.12, 3.20, 3.25, 3.29
R,=3.643q : 10.53, 11.07, 11.37, 11.66, 11.84. 11.99
The results of the test are:
A B C D E F G
49.6 58.1 61.0 61.5 67.6 71.2 71.3


In these results, G A = 21.7 > R7, the shortest significant range for 7 means; G B = 13.2 > R,; G C =
10.3 < R5, so we underline G through C and make no comparisons among C,D,E,F, and G. F A = 21.6 > Re;
F B = 13.1 > Rs, and we need not test F C, etc.; E A = 18.0 > Rs; E B = 9.5 < R4, so underline B
through E; D A = 11.9 > R4; C A = 11.4 > Ra; and finally P A = 8.5 < R2, so underline A and B. Thus,
the method gives seven significant differences (GA, GB, FA, FB, EA, DA, CA), compared to three
significant differences from Newman-Keuls' test.
One disadvantage of this procedure is that it is not amenable to simultaneous interval estimation. If we
use (yi yj) R, as the confidence interval for (/Ii g), some pairs of means will have confidence intervals of
different widths, even though all treatments are equally replicated.
In a sense, Fisher's LSD, Newman-Keuls' MRT, and Tukey's HSD are particular cases of Duncan's
MRT. If in Equation (3.6b), we put a, = a and p = 2, we obtain Fisher's LSD. Tukey's IISD is obtained by
putting a, = a and p = t; and substitution of a for a, gives the Newman-Keuls' MRT.
If the sample sizes are unequal, Bancroft (1968) suggests using the harmonic mean of the sample sizes
(reciprocal of the arithmetic mean of the reciprocals of the sample sizes):
nh = [(ni- + n2-2 + . + n')/t]-'.


ap = 1 (1-a)P-1.


(3.6a)








Kramer (1956) suggests replacing s2/n (the common variance of the sample means) in Equations (3.3), (3.4a),
and (3.6b) by the average of s2/n, and s2/nj, the variances of the two sample means being tested. Equation
(3.6b) becomes
R, = q(a,;p,v)Vs2[(1/n+) + (l/ni)]/2. (3.6c)
Kramer (1957) extends the procedure in an obvious manner to correlated as well as heteroscedastic
means, where the variance ofy is c,,o-2, that ofY is cjj~2, and their covariance is cjcr2. The coefficients c,, cjj,
and c,j are known, but 0.2 is unknown and is estimated as usual by the error mean square with, say, v.d.f. from
the analysis of variance. (This does not handle the situation where the unequal variances of the means are due
to observations from the different treatments having unequal variances. The correlation between the means
may be due to an incomplete block design or a covariate being used in the analysis.) If 5, and 3 are the
extremes of p ranked treatments, then we declare these treatments to be different if their difference exceeds
q(a,;p,P)VT/2(c, 2cu + cjj)s" (3.6d)
in Duncan's test, and similarly for the Newman-Keuls or the Tukey tests. Note that if the means are
uncorrelated, c1j = 0, ci, = 1/ni, and cjj = 1/nj, so that Equation (3.6d) reduces to (3.6c).
Kramer's extension of the test to correlated and heteroscedastic means is approximate; and it is also
conservative, in the sense that it tends to declare two means equal when they are not. Duncan (1957) proposes
a more powerful test, which imposes a further condition for a subset of means to be declared homogeneous.


3.6.2 Bayesian k-ratio t (LSD) Rule
In Fisher's protected LSD method, the result of the overall F test for treatment effects is used only in a
go, no-go fashion. In Duncan's Bayesian k-ratio t or k-ratio LSD rule, the observed value of the F test statistic
actually is used in calculating the LSD or the critical t value for comparing two means. If the F ratio is large
(indicating heterogeneous treatments), the critical t value is reduced, thereby increasing the power of the
test; and if the F ratio is small (indicating homogeneous or nearly homogeneous treatments), the critical t
value is increased, making it more difficult to declare two treatments to be significantly different and thus
decreasing Type I error probability. Duncan (1975) summarizes his earlier work (1961 and 1965) and that of
his former doctoral students (Ray A. Waller and Dennis 0. Dixon) at The Johns Hopkins University in 1969
and 1974.
The k-ratio t test is based on an EBALEP (empirical Bayes, additive losses, exchangeable priors)
approach. The sample mean T is, of course, a random variable, usually assumed to be normally distributed
with mean li and variance 0o2/n. In Bayesian statistical inference, the population means p/,, p2s,. . pt also
are regarded as random variables, with a prior distribution that usually is assumed to be normal with some
mean po and variance or2. (This may well be true experimentally and not merely conceptually, if the t
treatments correspond to t varieties, say, randomly selected for field testing from a larger collection of
varieties.) The term "empirical Bayes" comes about through having to use the data to estimate the parame-
ters of the conceptual superpopulation of populations. If Li is the loss incurred when the i-th decision is
erroneous, and similarly with Lj, the additive losses assumption states that the loss incurred is Li + Lj, if both
the i-th and the j -th decisions are incorrect. Finally, the exchangeable prior distributions assumption states
a prior the comparisons are "equally plausible:" This rules out, for example, the case where the t treatments
form a p x q factorial (where a priori comparisons of main effects are more likely to be significant than
interaction effects) or where the t treatments correspond to t levels of a quantitative factor, where we may a
priori expect an ordering of the true treatment means p/. - under Ch. 2, and no multiple comparison technique is appropriate.)
A novel feature of the test is the use of the ratio (denoted by k) of the relative seriousness of Type I to
Type II errors. By considering the case of t = 2 treatments (where no multipJ) comparison problem exists),
the critical value in the regular Student's t test at a given a level can be made approximately equal to that in
the k-ratio t test for some value of k. In round figures, the approximate correspondence between a and k is:

a : .10. .05, .01
k : 50. 100. 50>.





Therefore, Duncan recommends that k be taken to be equal to 100 or 500, where an experimenter previously
used to test at the 5% or the 1% level, respectively.
Any difference d between two means or, more generally, any contrast c among the means is significantly
different from zero if the ratio d/sd or c/Sc exceeds some critical value t(k,F,t,v), where sd = v2s2/n, and s2 is
the error mean square with v degrees of freedom, n is the constant number of replications of each treatment,
and F is the observed F ratio for treatments from the analysis of variance table. (The estimated variance si
of a contrast is given in Equation (3.5a).) As indicated above, the critical t value depends on the four
arguments k, F, t, and v. (Unfortunately, we have used the same letter t to denote two entirely different
things-the total number of treatments in the experiment and the t test or distribution.) Its dependence on F
is awkward for tabulation because of the uncountably infinite number of values that F can take, making
interpolation almost inevitable in each application. There is also no easy or explicit formula for calculating the
critical value. It is the solution of an extremely complicated integral equation, which appears as Equation
(3.15) in Duncan (1975). Table D in the appendix gives the critical values for the k-ratio t test for k = 100 and
500, taken from Waller and Duncan (1972). For interpolating with respect to F, Waller and Duncan (1969)
recommend linear interpolation using a = V\/Ffor F -< 2.4, except who- q > 100 and v > 60; otherwise, we
use b = VF/(F -1), for F > 2.4, except when q < 20 and v -< 20, where q = t-1. When a cannot be used, b is
used, ai.d vice versa. Interpolation with respect to q and v should hardly ever be necessary. If needed, the
recommendation is to interpolate using q and 1/v. Values of a and b are included in Table D.
For large experiments (large number t of treatments and large number v of d.f. for error), the critical
values may be approximated as follows, with b already defined above:
t(100. F, x, oc) = 1.72 b(for k = 100) (3.6e)
t(500, F, x, x) = 2.23 b(for k = 500)

Duncan (1965) considers Equation (3.6e) to give adequate approximation if t -= 15 and v-= 30. Equation (3.6e)
shows that for large F (sign of heterogeneous treatments), two means will be declared different if their
studentized difference (d/sd) exceeds only 1.72 (for k = 100, corresponding to a = .05), while for a small F =
1.5, say, the critical value is raised to 1.72 V1/.5-7 = 2.98, reducing the probability of Type I error.
In the numerical example we have been considering, t = 7 treatments, error mean square s& = 79.64 with
v = 30 degrees of freedom, F = 4.61, and standard error of a difference sd = v/2sn = V2(79.64)/6 = 5.15. For
k = 100, q = t-1 = 6, and v = 30, Table D gives t = 2.16 for F = 4.0 (and b = 1.155) and t = 2.02 for F = 6.0
(and b = 1.095). Interpolating for F = 4.61 (and b = V/4.61/3.61 = 1.130), we get the critical t value as t(100,
4.61, 7, 30) = 2.02 + (2.16 2.02) (1.130 1.095)/(1.155 1.095) = 2.02 + .08 = 2.10. (If we had interpolated
directly with respect to F, instead of the recommended b = VF/(F -1), the calculated value oft would be 2.12.
Although t = 7 is too small to be regarded as infinite, use of Equation (3.6e) gives a calculated t of
1.72-V4.61/3.61 = 1.72 (1.13) = 1.94.) Instead of dividing each difference by its standard error sd and
comparing it with the k-ratio t value, it will be more convenient computationally to multiply the t value by Sd
to give the corresponding k-ratio LSD = 2.10 (5.15) = 10.82 for the present problem. Any two means differing
by more than 10.82 will be declared different. The results are as follows, being identical to those obtained by
using Fisher's LSD method.

49.6 58.1 61.0 61.5 67.6 71.2 71.3
A B C D E F G


The LSD's (in multiples of sd) from the procedures for 7 treatments and 30 d.f. for error are:

LSD/s,
Fisher's 2.04
Newman-Keuls' MRT (q(a;p,v)/V2) 2.04, 2.47, . 3.15
Tukey's HSD 3.15
Tukey's MRT 2.60--3.15
Scheffo's 3.81
Duncan's MRT (q(a,;p.0)/IV ) 2.04. 2.15, . 2.33
Duncan's k-ratio t test (for an observed F = 4.61) 2.10

23







This tabulation shows that Duncan's k-ratio LSD rule is almost as powerful as Fisher's LSD, without the
latter's higher Type I error probability, for if Ho were true, the observed F would have been smaller (equal to
2.4, say) and from Table D, the critical value for t would have been 2.42. If the treatments are very
heterogeneous, the k-ratio LSD rule can be more powerful than Fisher's LSD. If F = 10, for example, the
critical k-ratio t value is 1.93, compared to 2.04 for Fisher's LSD rule.
The k-ratio t test is adaptable for simultaneous interval estimation. Following Fisher's, Scheffe's, and
Tukey's methods, one would expect the k-ratio confidence interval for 8 = (Ai Aj) to be (d = Yi -Yj) t k-ratio
LSD, where the LSD = t(k,F,t,v)sd, but this is not so. Besides the four parameters k, F, t, v, the LSD in the
interval estimation problem also depends on the observed value of t = d/sd. Unfortunately, tables are not
available at present. We refer the reader to Duncan (1975) and Dixon and Duncan (1975) for details. A large
sample solution for the limits is as follows:
[S., 8u] = [1-(l/F)]d V/1-(1iF)sdt(k, x, c, c ), (3.6f)
where t = 1.72 (for k = 100) and 2.23 (for k = 500). Note that the point estimate of 8 = (p., j) is [1 -(1/F)] (y
7j). Dixon and Duncan (1975) think that the preceding large sample approximation is adequate if t 16, v /
60, and F > 6.
Another approximation that assumes only a large observed F value (with finite t and v) is the following:
[8., 8u] = d sdt(k, c-, t, v). (3.6g)
The values of t(k, -c, t, v) are independent of t and are obtainable from the last row in Table D in the
appendix for k = 100 and 500.


3.7. Studentized Maximum Modulus Procedure

All the procedures so far discussed for simultaneous interval estimation are for contrasts among the k
means (or paired differences in particular). Sometimes, the experimenter may wish to construct simultane-
ous confidence intervals for the population means themselves. Assume that all the sample means y,, y2,.
Yt are correlated equally with correlation coefficient p and with possibly unequal variances dro2, dO2 ....
dtor2, where the d's are known constants. If s2 is the usual unbiased estimate of 0-2 with v degrees of freedom,
the probability is y = (1 -a) that gt lies within y, u(t, v, p; y) Vd-s, for all i = 1, 2, . t simultaneously,
where u(t, v, p; y) is the two-sided (100 y)% point of the maximum absolute value of the t-variate Student's t
distribution with v degrees of freedom and common correlation p. (Constructing a 10o(/y)/k% confidence
interval for ji independently of the others, using data from the i-th sample only, is not efficient.)
This technique can be extended to linear combinations of the means (not necessarily contrasts). The
probability is (1-a) that !ci/jL lies within 1ciYi u(t, v, p; y) sicil Vdi for all (uncountably infinite) sets of
constants (c,, c2,. . c). Values of u(t, v, p; y) are given in Hahn and Hendrickson (1971) for p = 0, .2, .4, .5;
y = .90, .95, .99; t = 1 (1)6(2)12, 15, 20; v = 3(1)12, 15(5)30, 40, 60. Table E in the appendix gives the values of
u(t, v, 0; y). Use of Table E in cases where p#;z 0 gives conservative results. The values for p 0 are smaller
than corresponding ones with p = 0.


3.8 Comparisons Against a Control

3.8.1. Dunnett's Method
In experiments comparing t treatments, one of the treatments quite often is a control (check or
untreated). In these experiments, we could partition the (t -1) d.f. for treatments into 1 d.f. for comparing
control against the average of the other treatments and (t -2) d.f. for comparisons among the (t -1) "real"
treatments. If these (t -1) other treatments are significantly different, the 1 d.f. comparison between their
average and the control may not be meaningful. The experimenter may wish to compare the control with each
of the other (t -1) treatments (and not with their average). Duncan's k-ratio t test is not applicable here since
the exchangeable priors (or equally plausible comparisons) assumption is not satisfied. (The difference
between a control and a treatment is a priori likely to be larger than that between two treatments.) Dunnett






(1955) gives a procedure for the simultaneous interval estimation or multiple comparisons of the control with
each of the others, with an experimentwise error rate. A treatment and a control are declared different if
their means differ by.more than t(a; q, v)sd, where so is the standard error of a difference, q = (t -1) is the
number of treatments other than control. Values of t(a; q, v) are given in Dunnett (1964) and reproduced for
both one-sided and two-sided tests in Table F of the appendix. If we are comparing insecticides, for example,
and the control is a standard one, two-sided tests would be proper since we do not know a priori if.the new
insecticides would be better or worse than the standard insecticide. More extensive tables of V2t(a; q, v) for
one-sided tests are given in Gupta and Sobel (1957) for up to 50 treatments.
To illustrate the method, suppose that variety A in our numerical example is a standard variety, thus
calling for two-sided tests of A against each of the others. From Table F, with 30 d.f. for error and q = 6 other
treatments besides control, the critical t value in a 5% two-sided test is t(.05; 6,,30) = 2.72. The standard error
of a difference is sd = /2s2/n = 5.15. The LSD between control and each of the others is LSD = 2.72 (5.15) =
14.0. Since the mean of A is 49.6, any variety will be different from A, if its mean is at least 49.6 + 14.0 = 63.6.
The result is that B, C, and D are noc different from A, ouL E, F, G are better than A. The two-sided interval
estimate of the difference between a standard variety and any other variety is their observed mean difference
14.0.
The preceding discussion assumes equal replications. If the control is replicated n, times and the i -th
treatment is replicated n, times, we define sd = Vs2[(1/n,) + (1/n1)], which reduces to the previous definition if
all replications are equal. More generally, if within treatment variances are not homogeneous, we define sd =
V/(s./n,.) + (s/n1) and use Satterthwaite's result for getting the d.f. of a linear combination of mean squares. It
may suffice to calculate only two error mean squares, one for within control and the other for within other
treatments. For a refinement, see Dunnett (1964). Dunnett's paper also gives the following optimal allocation
of experimental units. If n, = n., = . = n,-, = n, say, we should take n, = n V'fT. Bechhofer (1969)
generalizes this result to the case where the variances are unequal but their ratios o-/or2 (i = 1, 2, . t -1)
are known.
Robson (1961) extends Dunnett's procedure to the case of a balanced incomplete block design, giving rise
to correlated treatment means.


3.8.2 Gupta and Sobel's Method
Using the statistic in Dunnett's method, Gupta and Sobel (1958) give the following procedure for
selecting all treatments that are as good as or better than the control or standard treatment. The procedure
guarantees a probability of at least (1 -a) that the selected subset of treatments contains all treatments that
are at least as good as the control. The rule is to include in the subset all treatments whose means yi exceed
that of the control y0 by the amount

(i -Y o) t(a; q, v)sd, (3.8a)
where t(a: q, v) is the one-sided critical value in Dunnett's test.
In using Equation (3.8a) as the criterion, we throw away treatments that are significantly worse than
control. Treatments whose sample means are slightly less than those of control (so that yi y, will be slightly
negative) will be included in the subset. If we use Dunnett's test as a screening procedure, we declare the
i-th treatment to be as good as or better than control if

(Yi yo) 2- + t(a; q, v)Sd. (3.8b)

Comparing Equations (3.8a) and (3.8b), it is obvious that Gupta and Sobel's procedure will give a larger
subset of treatments. Dunnett's method retains only those treatments that have proved themselves superior
to control, while Gupta and Sobel's method discards only those treatments that have proved inferior to
standard treatment.
Gupta and Sobel (1958) also discuss other related problems---ecomparing variances and binomial parame-
ters.
Sobel and Tong (1971) consider the optimal allocation of observations for partitioning a set of normal
populations in comparison with a control.





3.8.3 Williams' Method
Williams (1971) considers the case where the t treatments are t levels or doses of some substance, with
the control corresponding to zero dose. This situation was discussed in Section 2.3.1, where the recommended
analysis was either to compare zero against the average of the nonzero doses and fit a regression to the q =
(t-1) nonzero levels or to fit a curve through all t doses (including zero). Williams claims there are
circumstances in which the experimenter may not wish to fit a curve to the t doses. He may wish, instead, to
compare zero dose against each of the other doses. As an example, he cites toxicity studies in which the aim of
the experiment may be to determine the lowest dose at which there is activity. (The assumption is that the
response is zero up to this "lowest dose" and increases thereafter, instead of continuously increasing from
zero, slowly at first and more rapidly afterwards.) Another reason for not wishing to fit a curve may be the
experimenter's unwillingness to assume a particular form (logistic, etc.) for the response function. The
number of levels is usually very small (3 to 5), making model fitting rather difficult.
Dunnett's procedure may be used to compare zero with the other doses, but some power is lost in not
making use of the structure in the treatments. Williams assumes a nondecreasing response function so that ,,
-< /AL <. . gq if the treatments To, T1, T2,. . T, are in increasing order of dosages. (If, say, the third
dose (i.e., second nonzero dose) is the level at which activity first becomes noticeable, we have /o = j, < 2^z
. <. .) The first step in Williams' test is to estimate pz (i = 0,1,. . q). Because of the constraints on the
''s, Ai is not necessarily estimated by yi, the sample mean. Bartholomew (1961) gives the following maximum
likelihood estimates of the g's. Ifyo y< < Y2 <. - Yq, then t = yt (i.e., Ai is estimated by7y).. Otherwise,
there is at least one i for which 7y > y7+,. We replace both :7 and y,+, by their weighted average
Yi.i+, = (n1yt + ni, 7yi+,)/(n+n1+,),
where ni is the number of replications of treatment or dose i. We now have only q means yo, Y,. . Yi-1,
Y7,i+1, 7y+2, * 7q. If these means are in nondecreasing order, we stop and estimate gJ by 7j (forj = 0, 1,
. ., i-1, i+ 1, . q) and estimate both gt and /i+, by 71,1+i. Otherwise, we repeat the averaging process,
giving 7yii+ a weight of (ni + n.+,). For instance, if 7i.i+, > y1+2, we average them to give
Y.i+,i+2 = [ni + n+,)ii+, + n+2Yi+2Y(n1 + n+1 + n1+2)
as the common estimate of tz, At+,, and /1+2, if the sample means are now in correct ascending order.
We now have the estimated population means /2o, Z, ,. . /,, where some of these may be equal, from
the averaging process. Assuming equal replications for all doses (including zero), we now test
tp = (/p yo)/V2Ss2/n, (3.8c)
taking p = q, q -1, . 1 in this order, stopping as soon as we get a nonsignificant result. We declare the
p -th nonzero dose to be different from control iff, above exceeds the critical value (a;p,v), given in Table G
in the appendix. (Note that for simplicity of statistical distribution, we test bZp against the unadjusted sample
mean To and not against 2o, even if /o is not estimated by Yo.) Of course, we-can apply the test in the following
alternative way. Declare ,p and /o different if

(/A. 7o) > T(a;p,v)Sd. (3.8d)
Williams (1971) gives an example of a randomized block experiment with 8 blocks and t = 7 doses (zero
and q = 6 nonzero doses), and an error mean square s2 = 1.16 with v = 42 d.f. The observed means areyo =
10.4, i, = 9.9, 72 = 10.0, Y7 = 10.6, 74 = 11.4, 75 = 11.9, and 76 = 11.7. The effect of the substance in the
experiment, if anything, can only increase the mean of the response. Since yo > 7y, we average these to give
Yo7, = (10.4 + 9.9)/2 = 10.15, and because this average exceeds 72, we form the weighted average Yo.1.2 = (27o.,
+ Y2)/3 = 10.1. Since 7y and 76 are not in the correct ascending order, we average them to give y5,6 = 11.8. We
thus have the following estimates of the population means.
Ao = A, = As = Yo.1.2 10.1; 43 = T3 = 10.6; 4 = 4 = 11.4;
,A = -- = 7.5 = 11.8.
The standard error of a difference is Sd = v'2(1.16)/8 = .539. For a test at a =.05, Table G gives the following
critical values for 40 d.f.




p : 6, 5, --.
t(.05;p,40) : 1.81, 1.80, 1.80, 1.79, 1.76, 1.68
t(.05;p,40)sd: .98, .97, .97, .96, .95, .91
Applying equation (3.8d),
g, Yo = 11.8 10.4 = 1.4 > .98; conclude ie > ep.
a5 Yo = 11.8 10.4 = 1.4 > .97; conclude ,5 > o.
LO fo = 11.4 10.4 = 1.0 > .97; conclude /44 > Po.
.s, Yo = 10.6 10.4 = 0.2 < .96; conclude 93 = ; = 1, =/ o.
The conclusion is that the fourth nonzero dose was the lowest dose at which response was observed.
Williams (1972) extends the procedure to handle the case where the zero dose has a different (larger)
number of replications than that cf the nonzero levels, for both one-sided and two-sided tests.
In general, we would recommend the regression approach of Section 2.3.1. Suppose we have the
following results:
Dose: 0 1 2 3 4 5
Response: 5 7 10 15 25 40
Using the present procedure, we may conclude that treatment is first effective at dose 3. The author would
rather believe that the response is increasing continuously from dose 0, gradually at first and more rapidly at
higher doses. We might fit a curve and estimate the lowest dose at which the response will be at least* say. If
higher doses are more expensive and cost is a consideration, we could adjust the response to a per dollar basis
and estimate the dose that will produce the highest adjusted response.

3.8.4 Sequential Methods
See Dudewicz, Ramberg, and Chen (1975) for a two-stage procedure when variances are unequal and
unknown, and Paulson (1962) for a sequential procedure, assuming equal variances. In the latter, inferior
treatments are dropped at each stage.


3.9. Miscellaneous Methods

In this section we shall discuss briefly various related techniques or merely cite their references.

3.9.1 Bonferroni Procedure for Preselected Contrasts
Tukey's and Scheffe's methods enable us to construct confidence intervals for an infinite number of linear
contrasts among the t means so that the probability is (1 a) that they are all simultaneously true. Usually an
experimenter is only interested in a rather small subset of m contrasts, say. If these m contrasts are
preselected and not suggested by the data, Dunn (1961) recommends the usual method based on the Student's
t distribution to construct an interval for each contrast independently, with confidence coefficient 1 (a/m),
so that from Bonferroni's inequality, the overall or simultaneous confidence level for all m contrasts is at least
(1 a), as in Fisher's unprotected LSD. Two-sided (100 a/m)% points of the t distribution are given in the
paper and reproduced in Table A in the appendix. In the notation of Section 3.5, the confidence interval for
each contrast is
C t(a/m;v) V )-, (3.9a)
where t(a/m;v) is the two-sided (100 a/m)% point of the t distribution with v degrees of freedom. These
intervals often will- be narrower than those given by Tukey's or Scheffe's methods. See also Schafer and
MacReady (1975).

3.9.2 Gabriel's Simultaneous Test Procedure (STP)
Gabriel (1964, 1969a) gives a procedure for testing the homogeneity of the (2t t 1) subsets (with at
least two means) from a set of t means. Let P be any subset containing at least two treatments and S2 be the
treatment sum of square for those treatments in P. These treatments will be declared to be different if









S2 > (t-1)s2F (a;t-l,v), (3.9b)
where s2 is the error mean square with v d.f. from the analysis of variance of the complete data (with t
treatments), and F(a;t -1,v) is the upper (100 a)% point of the F distribution with (t-1) and v d.f. Note that
the critical value of F in Equation (3.9b) is that for the complete data so that the righthand side is identical for
all subsets.
The error rate is experimentwise. If H, is true (all t means are equal), the probability is only a that one or
more of the (21 t 1) subsets will be declared incorrectly to be heterogeneous. The procedure also has the
following nice property. Any set containing a significant subset is itself significant. (However, the converse is
not necessarily true, and it is possible for a significant set to contain no significant proper subsets.) Because of
this property, it is not necessary to test all subsets. For example, if the set (A, B, C) is significant, the set (A.
B, C, D) will be significant; and if(E, F, G) is not significant, the subsets (E, F), (E, G), and (F, G) also will be
not significant.
The 1964 paper has a numerical example. Tukey's HSD method, which is conservative compared with
Newman-Keuls' or Duncan's multiple range tests, found two significant pairs. Gabriel's STP and Scheffe'
test found all subsets of two means (i.e., all paired differences) to be not significant. Generally, a set P will be
declared significant by Gabriel's STP if and only if some contrast involving only those means-in P is judged
significant by Sche.ff6's procedure.

3.9.3 Kurtz-Link-Tukey-Wallace Range Procedure
The analysis of variance is based on sums of squares. For computational convenience, analogous
procedures based on ranges are available. Kurtz, Link, Tukey, and Wallace (1965) give a similar shortcut
procedure for multiple comparisons. This paper also has an interesting general discussion on the philosophy of
multiple comparisons.

3.9.4 Covariance Adjusted Means
For multiple comparisons of adjusted treatment means in an analysis of covariance., see Kramer (1957),
Halperin and Greenhouse (1958); Scheffd (1959, pp. 209-213); Bancroft (1968, Section 8.7); and Thigpen and
Paulson (1974).

3.9.5 Procedures for Two-Way Interactions
Suppose that the t treatments are in the form of a p x q factorial, both factors being qualitative. The
partitioning of the pq-1 degrees of freedom for the t = pq treatments is discussed in Section 2.2. Harter (1970)
gives a procedure for comparing interaction effects of the form
A,Bu + A1Bv A,B, A1B, = [(A, A,)Bu]-[(Ai AI)BV]
= [A,(Bu Bv)l-fA,(BE B,)],
where ABu, for example, is the mean for the i-th level of factor A and the u-th level of factor B. The
preceding interaction is the difference between two differences; viz., (difference between the i-th and the
j -th levels of factor A, both at the u-th level of B) minus (difference between the i-th and the j -th levels of
A, both at the v -th level of B). As the second form of the expression shows, the interaction also can be written
as the difference between the u -th and the v -th levels of B at the i -th level of A minus the same difference at
the j -th level of A. See also Dunn and Massey (1965), Sen (1969), Johnson (1976), and Bradu and Gabriel
(1974). The last paper describes three methods for testing and simultaneous interval estimation.

3.9.6 Nonparametric Methods
In all the methods considered so far, we have assumed that the data are distributed normally. If we
cannot or do not wish to make this assumption, we must resort to nonparametric methods for separating the
means. See Steel (1959, 1961); Dunn (1964); Miller (1966, ch4); Rhyne and Steel (1965, 1967); McDonald and
Thompson (1967); Tobach et al. (1967); Rizvi, Sobel, and Woodworth (1968); Sen (1969); Puri and Puri (1969);
Slivka (1970); and Hollander and Wolfe (1973, Sections 6.3, 7.3, and 7.7).





3.9.7 Gupta's Random Subset Selection Procedure
In experiments where the scientist is looking for the best treatment (e.g., a plant breeder selecting a new
variety for highest yield or resistance to some disease), multiple comparison techniques are inappropriate.
We cited Gupta and Sobel (1958) in Section 3.8.2 for a method for selecting treatments that are as good as or
better than a control or standard treatment. Some selected references on problems of selecting the best out of
t treatments are Paulson (1964); Gupta (1965); Robbins, Sobel, and Starr (1968); Bechhofer, Kiefer, and Sobel
(1968); Sobel (1969); Tong (1970); Rizvi (1971); Chiu (1974a, 1974b); a review paper with 71 references by
Weatherill and Ofosu (1974); Wackerly (1975); Santner (1975); and Gupta and Panchapakesan (1971).
Selection problems may be posed in several ways, of which the following two are the most common.
(a) Given 8* > 0 and P* < 1, find a procedure that will, with probability of at least P*, choose the
population with the largest mean if this mean exceeds the second largest mean by at least 8*.
(b) Given 1/t < P* < 1, find the smallest subset of the t treatments such that the probability is at least P*
that the subset will contain the best population.
The preceding formulations are referred to as the "indifference zone" and the "random subset" approaches,
respectively. In (a), we are indifferent to all differences that are less than 8*; and in (b), the number of
treatments that are included in the subset is a random variable. Decision theoretic approaches (minimax,
Bayesian, etc.) are also possible.
Gupta (1965) gives the following random subset solution. Include the i-th treatment in the subset if its
sample mean Yi satisfies the condition
71 -Y 7max. t(a;t,v)sd, (3.9c)
where t(a;t,v) is the one-sided critical value of Dunnett's test statistic (Section 3.8.1). Values of t(a;t,v) are
given in Table F1 in the appendix, with t = (q+1); e.g., if t = 7, we look under q = (t-1) = 6.
In our numerical example, we have t = 7, v = 30 d.f., Sd = V2s2/n = V2(79.64)/6 = 5.15, and max. = 71.3.
Taking a = 1 -P* = .05, the value of t(.05; 7, 30) from Table F1 with t = 7 (or q = 6) is 2.40. From Equation
(3.9c), we include in the subset all treatments whose means exceed 71.3 (2.40) (5.15) = 71.3 12.36 = 58.94.
Thus, we are 95% confident that the set (C, D, E, F, G) will contain the best treatment (variety).

3.9.8 Scott and Knott's Cluster Analysis Method
If a scientist has collected a mass of data (usually multivariate), he may wish to know if these came from
one or more populations. If the latter, he would like to know into how many groups or clusters the data should
be divided, and the best way of forming these groups. (For a recent paper and book on cluster analysis, see
Kuiper and Fisher (1975) and Hartigan (1975).) With univariate data, we can arrange the observations in
ascending order. If the data are 10, 11, 55, 56, 59, for example, they can be divided into two clusters in an
obvious manner,' namely (10, 11) and (55, 56, 59). In less clearcut situations, an objective criterion for
grouping is required. If we know that the data came from two populations only, we can form the two groups
by maximizing the sum of squares between the two groups (or equivalently, such that the sum of the within
groups sums of squares is a minimum). With t observations (or means), we need only consider the (t -1)
possible partitions formed by dividing between two successive ordered means. The multiple range tests we
have considered do, in fact, group the means, but they allow a particular mean to be in more than one group.
Duncan's test, for example, groups the means in the example into (A, B), (B, C, D, E), and (C, D, E, F, G).
Tukey (1949) was the first to consider forming nonoverlapping clusters by looking at the gaps in the ordered
means and testing their statistical significance, but he retracted this procedure in his 1953 manuscript
(circulated privately) on the problem of multiple comparisons.
Scott and Knott (1974) propose the following sequential partitioning and testing procedure. Arrange the
t = 7 means in ascending order, denoted by A, B, C, D, E, F and G, respectively. Partition these into two
groups, using the above criterion. Suppose this results in (A, B, C, D) and (E, F, G) as the two groups. Now
test the null hypothesis Ho: kp = = . = t.7 against the alternative hypothesis Ha: /4, = m, or min.
(Presumably, the overall F test with 6 and v d.f. need not be performed. The usual F statistic tests Ho against
the most general alternative that not all means are equal. The proposed procedure tests H( against the much
more specific alternative that all the means are either m, or m2, with at least one mean in each group, and,
therefore, should be more powerful than the usual F test.) If Ho is rejected, we partition (A, B, C, D) into two






groups and test the equality of these groups. The procedure is similar for (E, F, G). It is repeated until H,, is
accepted.
The test is as follows. We assume that the t means y,, Y2, . ., t are uncorrelated and homoscedastic.
which implies equal replications n, say. As usual, let s" be the estimate (with v d.f.) of the common variance o-'
of single observations. (In the completely randomized design, v = t(n-1).) Suppose that the partitioning
criterion forms two groups with t, and t., = (t -t,) means. The groups G, and G. will contain nt, and nt. original
observations, respectively. Let T, be the sum of the nt, observations in G,, and similarly for T_. In the usual
analysis of variance computations, the between groups sum of squares is
B,, = [IT /(nt) I [T.,2/(nt) I [T2/(nt) (3.9d)
where T = (T, + T2). Under the null hypothesis, the maximum likelihood estimate of (r2 is
t
do2 = n )2 + vs2 V(t + v), (3.9e)
i=1
where 7 = (y, +. .. + 't)/t.
The test statistic is
A = m 1B,i',i,,)i 2(,r -2)1 = 1.37(6 (B, 'i(j-,). '.'.
The 95% points for the distribution of X were obtained by simulation and were found to be approximated
adequately, for practical purposes, by the chi-square distribution with v(, = t/(7r -2) = t/(1.1416) d.f.
(The simulation also included the case with v = 0, for which the 95% points of X were estimated to be 2.75,
6.60, 12.11, and 21.74 for t = 2, 5, 10, and 20, respectively. This shows that we can test the homogeneity of t
means, even when each mean is based on n = 1 replication. This is, of course, impossible with the usual F test
and its general alternative hypothesis since the error mean square has zero d.f. As mentioned earlier, the
present X test makes an extra assumption about the alternative hypothesis.)
In our numerical example, t = 7, n = 6, s2 = 79.64 with v = 30 d.f. (design being that of a randomized
block experiment). The means in ascending order were 49.6(A), 58.1(B), 61.0(C), 61.5(D), 67.6(E), 71.2(F),
and 71.3(G). To find the partition with the largest between groups sum of squares, we should try, theoreti-
cally, the t 1 = 6 possible partitions: (A, BCDEFG), (AB, CDEFG), (ABC, DEFG), (ABCDE, FG),
(ABCDEF,G). In practice, we need try two or three possibilities only. (With a computer it is easy enough to
try all (t-1) partitions.) In this example, (A, BCDEFG) and (ABCD, EFG) are the two most serious
candidates.. It can be shown that (ABCD, EFG) is the optimum partition. Here, t, = 4, t., = 3, T, = 6 (49.6 -
58.1 + 61.0 + 61.5) = 3181.2, T., = 1260.6, T = T, + T., = 2641.8,y = (49.6 +. . + 71.3)/7 = 62.9. and !.(V, -
y)2 = 370.04.

From Equations (3.9d) and (3.9e), B, = (1381.2)2/24 + (1260.6)2/18 (2641.8)2/42 = 1602.86 and d' =
[6(370.04) + 30(79.64)]/(7 + 30) = 124.58. From Equation (3.9f), the test statistic is X = 1.376 (1602.86/124.58)
= 17.70. Using the chi-square approximation with vo = t/1.1416 = 7/1.1416 = 6.1 d.f., the value 17.70 is
significant. (The 95% point of the chi-square distribution is 12.6 for 6 d.f. and 14.1 for 7 d.f.')
We next have to partition (ABCD) and (EFG). In partitioning (EFG), t is now equal to three. For t = 3
means, the optimum partition is at the larger of the two gaps, giving (E, FG) with t, = 1, t. = 2, T, = 405.6, T.,
= 855.0, 1(y, y)2 = 8.8866, giving d,,2 = [6 (8.8866) + 30 (79.64)1/33 = 74.02, B,, = 405.62/6 + 855 2/12 -
1260.62/18 = 53.29, and X = (1.376) (53.29)/74.02 = 0.99, which is not significant. The significance of the
partition of (ABCD) into (A, BCD) is borderline. If we accept this as being significant, the final groupings are
A, BCD, ard EFG, which is what inspection of the means would suggest.
For another cluster analysis approach to multiple comparisons, see Jolliffe (1975).

3.9.9 Multivariate Populations
We hav-e so far considered univariate populations only. Quite often, we may collect several kinds of
measurements from each experimental unit. For example, in comparing t brands of chocolate cake mixes, we
may evaluate the resulting cakes with respect to each of p characteristics (flavor, aroma, texture, moistness,
etc.). As another example, we may compare t treatments (storage conditions) for degreening lemons and take





color measurements on each of p dates. We may (and sometimes do) carry outp separate univariate analyses
of variance, one for each of the p characteristics or dates, but we sacrifice some power in not making use of the
correlations among the p characteristics. There is also a problem with the overall significance level in making
p separate analyses. Preferably, we should perform one multivariate (p-dimensional) analysis of variance. If
the null hypothesis of equal mean vectors (each population mean is now a set of p numbers) is rejected, we now
have two different kinds of multiple comparison problems. With respect to which of the p characteristics do
the populations differ? (In the preceding cake example, do the cakes differ in flavor only, in flavor and texture
only, or in all p characteristics?) We do not, of course, have this problem in univariate (p = 1) situations. We
have been considering the other kind of multiple comparisons in this report (viz., which populations differ
from which). These comparisons are discussed in Kramer (1972, Section 5.11), Gabriel (1968, 1969b),
Krishnaiah (1969), Miller (1966, Chapter 5), and Morrison (1967, Section 5.4).

3.9.10 Subset Selection Approach to Multiple Comparisons
We mentioned in the last paragraph of Chapter 1 that hypothesis testing is usually almost totally
irrelevant. Two treatments will be declared significantly different if they are sufficiently replicated. If two
means are declared significantly different, many experimenters often are misled into thinking that the
difference is of practical importance. Reading (1975) applies the indifference zone formulation of subset
selection problems to multiple comparisons. The experimenter specifies three quantities: P(probability that
all decisions concerning pairwise means are correct, an experimentwise probability), 6S (largest amount that
two populations can differ and still be considered practically the same), and 8* (smallest amount by which two
population means must differ to be considered definitely different). The interval (86,8*) is the indifference
zone. If two treatments differ by an amount in this zone, the experimenter does not care whether the
treatments are declared different or the same. Given these three quantities, Reading gives tables for the
necessary sample size and the critical value that must be exceeded for the difference between two means to be
declared significant. Unfortunately, at present, the tables go up to t = 4 treatments only and assume that oy2
is known.

3.9.11 Other Parameters and Populations
In this publication, we have been comparing, estimating, or selecting normal populations with respect to
their means. We conclude this chapter by citing selected references to similar work for other parameters and
other populations.
(a) Variances of normal 'populations. See David (1956), Ryan (1960), Bechhofer (1968), and Levy (1975a,
1975b) for multiple comparisons; Jensen and Jones (1969) for simultaneous interval estimation; Gupta
(1965), Ofosu (1975), and Arvesen and McCabe (1975) for subset selection.
(b) Various kinds of simultaneous prediction intervals. Hahn (1970, 1972).
(c) Regression coefficients. Duncan (1970) for multiple comparisons, and Hahn and Hendrickson (1971) for
simultaneous interval estimation.
(d) Subset selection for normal population with the largest (or smallest) a quantile. Barlow and Gupta

(e) Subset selection for normal population with the largest exceedance probability. Kappenman (1972)
gives a method for selecting the normal population with the highest hi = P(X, > c), where X, ~ N(t, ai)
and c is a given constant.
(f) .Comparison of several independent treatment mean squares against a common error mean square. See
Nair (1948); Hartley (1955); and David (1962, pages 155-156).
(g) Subset selection for gamma populations. Gupta (1963).
(h) Ranking and selection of binomial populations. Gupta and Sbbel (1960), Ryan (1960), Taylor and David
(1962), Paulson (1967), Bland and Bratcher (1968), Hoel and Sobel (1972), and Leonard (1972).
(i) Multinomial populations. Goodman (1965) and Fienberg and Holland (1973) for simultaneous estima-
tion; Bechhofer, Elmaghraby, and Morse (1959) for selection; and Gabriel (1966) for multiple compari-
sons.
(j) Subset selection for Poisson, negative binomial, and Fisher's logarithmic distributions. Gupta and
Panchapakesan (1971).

31






(k) Multiple comparisons of regression functions. Spj0tvoll (1972).
(1) Multiple comparisons of logistic curves. Reiers0l (1961).
(m) Selection of best treatment in paired-comparison experiments. Trawinski and David (1963).
(n) Ranking of main effects in analysis of variance, variances of normal populations, and correlation
coefficients of bivariate normal distributions. Eaton (1967).
(o) Interval estimation of a ranked parameter. Alam and Saxena (1974).
(p) Simultaneous interval estimation of contrasts among means of a multivariate normal population.
Bhargava and Srivastava (1973).
(q) Applications to multiple regression problems. Miller (1966), Morrison (1967, Section 3.6), Wynn and
Bloomfield (1971), Hochberg and Quade (1975), and Tarone (1976).

CHAPTER 4. CONCLUSION

The findings from some Monte Carlo sampling studies that have been conducted to evaluate the relative
performances of the various multiple comparison procedures are summarized in this chapter. Here, we
assume that multiple comparisons are appropriate, ruling out situations covered in Chapter 2, where the
proper statistical technique is the partitioning of the degrees of freedom for treatments into orthogonal
contrasts. When it is not possible a priori to form meaningful orthogonal contrasts, it is assumed that the
problem is really one of multiple comparisons and not of ranking and subset selection. A plant breeder who is
interested in selecting a new variety should not be concerned with multiple comparisons of all possible pairs of
varieties.
Scheff6's method is the most versatile. It allows unequal replications, correlated means from covariance
adjustment, general contrasts (and not just paired comparisons), and simultaneous interval estimation. The
penalty for this generality is reduced power (failure to detect true differences in testing and wide confidence
intervals in interval estimation of differences between two means). Tukey's HSD method also can handle
general contrasts and interval estimation, but it requires equal replications and uncorrelated means.
Duncan's and Newman-Keuls' multiple range tests are exact only for paired comparisons of uncorrelated
means with equal replications and are not adaptable for interval estimation. The LSD easily can handle
unequal replications, can be used for interval estimation, and can be extended in a simple and obvious manner
to general contrasts. Duncan's Bayesian k-ratio rule is too new to have found widespread acceptance by
experimental scientists. Duncan is very enthusiastic about this procedure and, in a private communication.
expressed the hope that his Biometrics 1975 paper "will mark the beginning of the end of all of the earlier (pre-
1960) a-level multiple comparison procedures."
We refer the reader to Section 3.6.2, where we tabulate the LSD's for the various procedures (in
multiples of the standard error of the difference between two means). In ascending order, we have Fisher's
LSD, Duncan's k-ratio rule, Duncan's MRT, Newman-Keuls' MRT, Tukey's MRT, Tukey's HSD, and
Scheffe's method. (Duncan's k-ratio rule is data dependent. It may be more "reckless" than Fisher's LSD or
more conservative than Tukey's HSD, depending on the observed value of the F ratio for treatments.) The
above order is, therefore, in decreasing order of the number of paried comparisons that will be declared
significant. If the objective is to find as many significantly different pairs as possible, Fisher's LSD is best.
The problem, however, is not this simple.
There are two main difficulties in assessing the relative merits of the multiple comparison procedures.
"In testing a hypothesis involving a simple two-decision situation, such as that to which the Neyman-Pearson
theory is directly applicable, one compares two competing test criteria by fixing the Type I errors to be the
same for both and compare the two power curves. Unfortunately, multiple-comparison procedures do not
pertain to a single simple two-decision situation, but are special cases of multiple-decision procedures. At
present there is no generally acceptable analytical method of comparing, in a manner similar to that for the
two-decision situation, two competing multiple-decision test criteria." (Bancroft 1968, p. 105.)
Another difficulty is due to the different error rates used. Tukey's and Scheff6's'methods use an
experimentwise error rate, while Fisher's LSD adopts a comparisonwise error rate. The multiple range tests
of Duncan and of Newman-Keuls use different error rates, both of which are neither experimentwise nor
comparisonwise. Duncan's k-ratio rule does not even use the concept of error rate; it uses the ratio of the
relative seriousness of the two types of errors.




Because of these difficulties, the procedures have been compared using Monte Carlo sampling methods
only. There is a difficulty with such empirical sampling studies. It is easy to study the probability of Type I
error (declaring two equal means to be unequal) because there is, of course, only one way in which t means can
be equal. It is much more difficult to compare the probability of Type II error (declaring two unequal means to
be equal), because t means can be unequal in many ways. They can be all unequal (equally spaced, clustered in
t\N o or more groups, etc.), all equal but one, etc. It is unlikely that one method will be best for all patterns of
inequality.
Balaam (196S) was the first to publish results of a sampling study. He considered only four means, each
with five observations, in eighteen configurations: (0;0,0,0), (1,0,0,0),. ., (6,0,0,0); (1,1,0,0), (2,1,0,0), . .,
(5,1,0,0); (2,2,0,0), (3,2,0,0), (4,2,0,0); (3,3,0,0), (4,2,1,0), and (4,4,1,0). Three procedures (LSD, Newman-
Keuls'. and Duncan's MRT) were compared, each with and without a significant preliminary F test. The
Newman-Keuls' procedure was found inferior. The LSD was superior to Duncan's MRT, in both protected
and unprotected cases, but the difference in performance was small in the protected case.
Boardman and Moffitt (1971) compared five procedures (LSD, Scheffe's, Tukey's HSD, Newman-Keuls'
MRT, and Duncan's MRT) for testing all possible pairs of means with respect to their Type I comparisonwise
and experimentwise error rates. They carried out 30 sets of 10,000 sampling experiments with t = 2, 3,. .,
11 normal populations; samples of equal sizes n = 5, 10, and 15; and a = .05.
For t = 10 treatments, and taking a = 5%, the Type I comparisonwise error rate for Duncan's MRT is
a:ibt 2.5 .21% for Tukey's, and .01% for Scheff6's procedure, showing the conservativeness of the latter
two procedures.
On an experimentwise basis, the error rate in Tukey's HSD and Newman-Keuls' multiple range test
remains constant at 5% as t increases from 2 to 10, while for Duncan's MRT and Fisher's LSD, it increases to
q- and 63% respectively. For Scheff6's procedure, it decreases from 5% to .23%, showing conservativeness
of the Scheffe procedure for pairwise contrasts. Thus, with t = 10 populations with equal means (and (10 x 0)/2
= 45 po ssible pairwise comparisons), there is a 38% probability that one or more of the 45 comparisons will be
diec red significantly different by Duncan's procedure.
in view of this rather high experimentwise probability, Gill (1973) recommends that Duncan's procedure
be discontinued. Of course, Gill has even stronger feelings against the LSD procedure. In defense of these
tvo procedures, the comparison, rather than the experiment, is the basic unit for the comparisonwise
adherents. One wrong conclusion will not affect the usefulness of the remaining 44 comparisons. On the other
hand, the rationale of the experimentwise error rate philosophy is that one wrong comparison vitiates all of
the remaining .44 comparisons. Thus, making one wrong conclusion is as serious as making 45 wrong
judgments in the same experiment (is this reasonable, in most cases?). We have to ensure that all 45
comparisons are correct, not without having to pay a high premium, of course. For example, in a cubic lattice
dic.:ign wit h t = 729 varieties, (Cochran and Cox 1957, page 423), it will be virtually impossible to ensure that
'!/ 729 x 7';)/2 '" 265,356 paired comparisons will be judged correctly.
Be.-cause of the independence of the validity of the individual comparisons (in the comparisonwise school),
we can "aforrd" one wrong comparison out of 45. After all, in a 5% test, there is a one in 20 chance of an
incorr-ct rejection so that out of 45 comparisons we should expect and tolerate about two wrong conclusions.
io .:!, ,,i tiho probability of one or more wrong rejections out of 45, it will be interesting to know also the
ro,,,iiiy ,i wo or more wrong rejections. If the probability of two or more incorrect conclusions is
a .nsik riraily lower than that of one or more wrong conclusions, this should remove much of Gill's objections to
)uncan's MRT and Fisher's LSD procedures.
In agricultural experiments, the treatment means are much more likely to be unequal so that Type II
error consideration should be at least as important as Type I error consideration. In the Boardman and
Moffitt study. the procedures were applied without a prior significant overall F test, which is, in fact, a
prerequisite of the Fisher's protected LSD method. Although not required for the Duncan procedure, it may
he desirable to apply the procedure only after a significant F test. As Dunnett (1970) points out, multiple
comparison procedures are techniques for ferreting out differences among the t means, and there is no reason
for doing so, unless there is an indication that differences exist, either priori or as evidenced by a significant
i- test. The experimentwise error rates for the protected Fisher's LSD and the "protected" Duncan's MRT
.ill, of course, be 5%. See Bernhardson (1975).

33








Based on the Boardman-Moffitt study (who considered only the null case of equal means). Gill recom-
mended Tukey's HSD and, to a lesser extent, the Newman-Keuls' procedure. In another simulation study,
Carmer and Swanson (1973) recommended just the opposite. Their conclusions were:
. that Scheffd's test, Tukey's test, and the Student-Newman-Keuls' test are less appropriate than either the least
significant difference with the restriction that the analysis of variance F value be significant at a = .05, two Bayesian
modifications of the least significant difference or Duncan's multiple range test. Because of its ease of application, many
researchers may prefer the restricted least significant difference.
Carmer and Swanson conducted 88,000 simulations in all, with various numbers of treatments and
replicates, and different patterns of heterogeneity among the treatment means. The study "was prompted
mainly by the authors' own uncertainty as to the most appropriate procedure to recommend to students and
researchers in the agricultural sciences." In an earlier publication, Carmer and Swanson (1971) reported on 5
of the present 10 procedures.
The following multiple comparison procedures were studied:
1. LSD (unprotected)
2. TSD (Tukey's HSD)
3. SNK (Student-Newman-Keuls)
4. MRT (Duncan's multiple range test)
5. SSD (Scheffd's procedure)
6. FSD1 (Fisher's protected LSD, with the preliminary F test applied at the 1% level)
7. FSD2 (as in FSD1 but F test at 5% level)
8. FSD3 (as in FSD1 but F test at 10% level)
9. BSD (Duncan's approximate Bayesian k-ratio LSD rule for t -- 15 treatment and error d.f. v t 30;
see Equation (3.6e) of present report)
10. BET (Waller-Duncan's exact Bayesian k-ratio LSD rule)

We quote from Section 7 ("Concluding Remarks") of Carmer and Swanson (1973):
. the SSD should never be employed for pairwise multiple comparisons. . the TSD and SNK are clearly inferior in
ability to detect real differences. Although the SSD, TSD, and SNK provide excellent protection against Type I errors, it is the
authors' feeling that, in evaluation of the various procedures, concern for ability to detect real differences should receive a high
priority. . the FSD1 procedure also appears to stress protection against Type I errors at the expense of sensitivity . it
also seems reasonable not to recommend procedures which unduly deemphasize protection against Type I errors. From this
point of view, then, the ordinary LSD and perhaps the FSD3 can be eliminated from consideration: in addition, their
sensitivities to real differences are not appreciably greater than those of the FSD2, BSD, BET, and MRT. These latter four
procedures thus constitute a group from which the consulting statistician or experimenter might generally make a choice .
while the MRT often produces a lower frequency of Type I errors, the other three are generally more sensitive in detecting real
differences . dependence of the critical value on the observed analysis of variance F value is more appealing than
dependence on the number of treatments in the experiment. Since the BET is an improved and more exact version than the
BSD, it seems reasonable to prefer the former. . the procedure (BET) is easier to apply than the MRT . many subject
matter researchers will find the FSD2 attractive because of its simplicity and the fact that they are already familiar with
Student's t table.
Carmer and Swanson's final choice is thus between FSD2 and BET. Waller and Duncan (1969) claim that
the similarity in performance between the FSD2 and BET says a lot for BET, but as Carmer and, Swanson
point out, it is just as reasonable to claim that this similarity speaks a lot for the FSD2.
Thomas (1974) compared "seven methods of pairwise comparisons and four for constructing simultane-
ous sets of confidence limits. The general conclusions are that Duncan's multiple range test is the best method
of those considered for the former and the Bonferroni t-based limits for the latter."
We mentioned at the beginning of this chapter that one main difficulty in comparing the procedures is due
to the different kinds of Type I error rates used. Comparing one procedure using a 5% comparisonwise Type I
error rate with another procedure using a 5% experimentwise Type I error rate is almost like comparing
oranges with bananas. As Einot and Gabriel (1975) pointed out, any observed difference in the performance of
the two procedures is more likely to be due to the different Type I error probabilities than to the techniques
used. Therefore, one should force all procedures to have the same experimentwise (or comparisonwise) Type
I error rate and compare their powers, as in the Neyman-Pearson two-decision situations. With orthogonal
contrasts and large numbers of degrees of freedom for error mean square, we have seen in Section 3.1 that for
t = 10 treatments, say, a 5% experimentwise error rate corresponds to a .57% comparisonwise error rate, and
a 5% comparisonwise error rate is equivalent to a 36.98% experimentwise error rate.





Einot and Gabriel (1975) studied the powers of multiple comparison procedures for fixed maximal
experimentwise levels, and ". ..generally recommend the Tukey technique for its elegant simplicity and
existent confidence bounds-its power is little below that of any other method. Simulation was for 3, 4, and 5
treatments: the conclusions might need modification for more treatments."
No doubt the reader will think that the last word has not been written on the choice of a multiple
comparison procedure. (Some statisticians do not even believe in multiple comparisons. In his discussion of
the review paper by O'Neill and Wetherill (1971), R. L. Plackett expressed his "view that much of the subject
of multiple comparisons is essentially artificial," while J. A. Nelder went so far as stating that in his opinion
"multiple comparison methods have no place at all in the interpretation of data.") In the final analysis, the
choice will be subjective. To a very large extent, this choice will hinge on a choice between an experimentwise
error rate (for which Tukey's HSD is the recommended procedure) and a comparisonwise error rate (for
which Duncan's MRT is recommended). As mentioned earlier, the author's opinion is that in the majority of
cases, the comparisonwise basis is more appropriate since one wrong inference usually does not make the
other inferences in the same experiment meaningless. There is really not that much difference between the
methods. We can remove or reduce objections to Duncan's MRT by requiring an initial significant overall F
test or by taking Duncan's comparisonwise a to be 0.01 or 0.001. Similarily, we can remove or reduce
objections to Tukey's HSD by taking Tukey's experimentwise a to be 0.10 or 0.25, but, as Einot and Gabriel
wondered, it may be that "it does not seem scientifically respectable to work explicitly with a level of 0.25."
The choice of the kind of Type I error rates is bypassed altogether in the Waller-Duncan Bayesian k-ratio
LSD rule. It also has the extremely appealing feature that the observed F value is used in the calculation of
the LSD. With a large F (of 3.0 and above, indicating strong evidence of existence of differences), the test
behaves like the comparisonwise procedures (Duncan's MRT and Fisher's LSD) with good power properties,
while for a small F, it becomes conservative with good protection against Type I error, as in the Tukey HSD
procedure. It is as if the choice between a comparisonwise and an experimentwise error rate is taken out of
the experimenter's hands and is determined by the experiment itself (the experimental F value). "In this way
the decision theoretic rule enjoys the advantages of both comparisonwise and experimentwise a rules without
their disadvantages." (Dixon and Duncan 1975, p. 822). This procedure will become more popular in the
future, especially if more extensive tables become available.



























35


7"












TABLE A.-Two-sided (100 alm)% points of student's t-distribution with v degrees of freedom*


a = .05

2 3 4 5 6 7 8 9 10 15 20 25 30 35 40 45 50

5 3.17 3.54 3.81 4.04 4.22 4.38 4.53 4.66 4.78 5.25 5.60 5.89 6.15 6.36 6.56 6.70 6.86
7 2.84 3.13 3.34 3.50 3.64 3.76 3.86 3.95 4.03 4.36 4.59 4.78 4.95 5.09 5.21 5.31 5.40
10 2.64 2.87 3.04 3.17 3.28 3.37 3.45 3.52 3.58 3.83 4.01 4.15 4.27 4.37 4.45 4.53 4.59
12 2.56 2.78 2.94 3.06 3.15 3.24 3.31 3.37 3.43 3.65 3.80 3.93 4.04 4.13 4.20 4.26 4.32
15 2.49 2.69 2.84 2.95 3.04 3.11 3.18 3.24 3.29 3.48 3.62 3.74 3.82 3.90 3.97 4.02 4.07

20 2.42 2.61 2.75 2.85 2.93 3.00 3.06 3.11 3.16 3.33 3.46 3.55 3.63 3.70 3.76 3.80 3.85
24 2.39 2.58 2.70 2.80 2.88 2.94 3.00 3.05 3.09 3.26 3.38 3.47 3.54 3.61 3.66 3.70 3.74
30 2.36 2.54 2.66 2.75 2.83 2.89 2.94 2.99 3.03 3.19 3.30 3.39 3.46 3.52 3.57 3.61 3.65
40 2.33 2.50 2.62 2.71 2.78 2.84 2.89 2.93 2.97 3.12 3.23 3.31 3.38 3.43 3.48 3.51 3.55
60 2.30 2.47 2.58 2.66 2.73 2.79 2.84 2.88 2.92 3.06 3.16 3.24 3.30 3.34 3.39 3.42 3.46

120 2.27 2.43 2.54 2.62 2.68 2.74 2.79. 2.83 2.86 2.99 3.09 3.16 3.22 3.27 3.31 3.34 3.37
2.24 2.39 2.50 2.58 2.64 2.69 2.74 2.77 2.81 2.94 3.02 3.09 3.15 3.19 3.23 3.26 3.29
a = .01


5.89
4.78
4.15
3.93
3.74

3.55
3.47
3.39
3.31
3.24


6.15
4.95
4.27
4.04
3.82

3.63
3.54 .
3.46
3.38
3.30


6.36
5.09
4.37
4.13
3.90

3.70
3.61
3.52
3.43
3.34


6.56
5.21
4.45
4.20
3.97

3.76
3.66
3.57
3.48
3.39


6.70
5.31
4.53
4.26
4.02

3.80
3.70
3.61
3.51
3.42


6.86
5.40
4.59
4.32
4.07

3.85
3.74
3.65
3.55
3.46


7.51
5.79
4.86
4.56
4.29


8.00
6.08
5.06
4.73
4.42


8.68
6.49
5.33
4.95
4.61

4.33
4.2t
4.13
3.93
3.81


8.95 9.19 9.41 9.68
6.67 6.83 6.93 7.06
5.44 5.52 5.60 5.70
5.04 5.12 5.20 5.27
4.71 4.78 4.84 4.90

4.39 4.46 4.52 4.56
4.3t 4.3t 4.3t 4.4t
4.26 4.1t 4.2t 4.2t
3.97 4.01 4.1t 4.1t
3.84 3.89 3.93 3.97


I
120 2.86 2.99 3.09 3.16 3.22 3.27 3.31 3.34 3.37 3.50 3.58 3.64 3.69 3.73 3.77 3.80 3.83
O 2.81 2.94 3.02 3.09 3.15 3.19 3.23 3.26 3.29 3.40 3.48 3.54 3.59 3.63 3.116 3.69 3.72

tObtained by graphical interpolation.
Source: Reproduced from Olhie Jean Dunn, Multiple Comparisons Among Means. Journal of the American Statistical As n-.iition, vol. 56 (19(11), pp. 52-414, with the
permission of tha author and the editor.


4.78
4.03
3.58
3.43
3.29

3.16
3.09
3.03
2.97
2.92


5.25
4.36
3.83
3.65
3.48

3.33
3.26
3.19
3.12
3.06


5.60
4.59
4.01
3.80
3.62

3.46
3.38
3.30
3.23
3.16









TABLE B.--Percentage points of the studentized range q(a;p,v)*

a = .05


2 3 4 5 6 7 8 9 10

1 17.97 26.98 32.82 37.08 40.41 43.12 45.40 47.36 49.07
2 6.085 8.331 9.798 10.88 11.74 12.44 13.03 13.54 13.99
3 4.501 5.910 6.825 7.502 8.037 8.478 8.853 9.177 9.462
4 3.927 5.040 5.757 6.287 6.707 7.053 7.347 7.602 7.826
5 3.635 4.602 5.218 5.673 6.033 6.330 6.582 6.802 6.995
6 3.461 4.339 4.896 5.305 5.628 5.895 6.122 6.319 6.493
7 3.344 4.165 4.681 5.060 5.359 5.606 5.815 5.998 6.158
8 3.261 4.041 4.529 4.886 5.167 5.399 5.597 5.767 5.918
9 3.199 3.949 4.415 4.756 5.024 5.244 5.432 5.595 5.739
10 3.151 3.877 4.327 4.654 4.912 5.124 5.305 5.461 5.599

11 3.113 3.820 4.256 4.574 4.823 5.028 5.202 5.353 5.487
12 3.082 3.773 4.199 4.508 4.751 4.950 5.119 5.265 5.395
13 3.055 3.735 4.151 4.453 4.690 4.885 5.049 5.192 5.318
14 3.033 3.702 4.111 4.407 4.639 4.829 4.990 5.131 5.254
15 3.014 3.674 4.076 4.367 4.595 4.782 4.940 5.077 5.198
16 2.998 3.649 4.046 4.333 4.557 4.741 4.897 5.031 5.150
17 2.984 3.628 4.020 4.303 4.524 4.705 4.858 4.991 5.108
18 2.971 3.609 3.997 4.277 4.495 4.673 4.824 4.956 5.071
19 2.960 3.593 3.977 4.253 4.469 4.465 4.794 4.924 5.038
20 2.950 3.578 3.958 4.232 4.445 4.620 4.768 4.896 5.008

24 2.919 3.532 -3.901 4.166 4.373 4.541 4.684 4.807 4.915
30 2.888 3.486 3.845 4.102 4.302 4.464 4.602 4.720 4.824
40 2.858 3.442 3.791 4.039 4.232 4.389 4.521 4.635 4.735
60 2.829 3.399 3.737 3.977 4.163 4.314 4.441 4.550 4.646
120 2.800 3.356 3.685 3.917 4.096 4.241 4.363 4.468 4.560
0 2.772 3.314 3.633 3.858 4.030 4.170 4.286 4.387 4.474







TABLE B. -Percentage points of the studentized range q(a;p, v)*-Continued

a = .05
P
11 12 13 14 15 16 17 18 19
1 50.59 51.96 53.20 54.33 55.36 56.32 57.22 58.04 58.83
2 14.39 14.75 15.08 15.38 15.65 15.91 16.14 16.37 16.57
3 9.717 9.946 10.15 10.35 10.53 10.69 10.84 10.98 11.11
4 8.027 8.208 8.373 8.525 8.664 8.794 8.914 9.028 9.134
5 7.168 7.324 7.466 7.596 7.717 7.828 7.932 8.030 8.122
6 6.649 6.789 6.917 7.034 "7.143 7.244 7.338 7.426 7.508
7 6.302 6.431 6.550 6.658 6.759 6.852 6.939 7.020 7.097
8 6.054 6.175 6.287 6.389 6.483 6.571 6.653 6.729 6.802
9 5.867 5.983 6.089 6.186 6.276 6.359 6.437 6.510 6.579
10 5.722 5.833 5.935 6.028 6.114 6.194 6.269 6.339 6.405

11 5.605 5.713 5.811 5.901 5.984 6.062 6.134 6.202 6.265
12 5.511 5.615 5.710 5.798 5.878 5.953 6.023 6.089 6.151
13 5.431 5.533 5.625 5.711 5.789 5.862 5.931 5.995 6.055
14 5.364 5.463 5.554 5.637 5.714 5.786 5.852 5.915 5.974
15 5.306 5.404 5.493 5.574 5.649 5.720 5.785 5.846 5.904
16 5.256 5.352 5.439 5.520 5.593 5.662 5.727 5.786 5.843
17 5.212 5.307 5.392 5.471 5.544 5.612 5.675 5.734 5.790
18 5.174 5.267 5.352 5.429 5.501 5.568 5.630 5.688 5.743
19 5.140 5.231 5.315 5.391 5.462 5.528 5.589 5.647 5.701
20 5.108 5.199 5.282 5.357 5.427 5.493 5.553 5.610 5.663

24 5.012 5.099 5.179 5.251 5.319 5.381 5.439 5.494 5.545
30 4.917 5.001 5.077 5.147 5.211 5.271 5.327 5.379 5.429
40 4.824 4.904 4.977 5.044 5.106 5.163 5.216 5.266 5.313
60 4.732 4.808 4.878 4.942 5.001 5.056 5.107 5.154 5.199
120 4.641 4.714 4.781 4.842 4.898 4.950 4.998 5.044 5.086
4.552 4.622 4.685 4.743 4.796 4.845 4.891 4.934 4.974




,. ~,-


TABLE B. -Percentage points of the studentized range q(a;p, v)*-Continued

a = .05


v20 22 24 26 28 30 32 34 36

1 59.56 60.91 62.12 63.22 64.23 65.15 66.01 66.81 67.56
2 16.77 17.13 17.45 17.75 18.02 18.27 18.50 18.72 18.92
3 11.24 11.47 11.68 11.87 12.05 12.21 12.36 12.50 12.63
4 9.233 9.418 9.584 9.736 9.875 10.00 10.12 10.23 10.34
5 8.208 8.368 8.512 8.643 8.764 8.875 8.979 9.075 9.165
6 7.587 7.730 7.861 7.979 8.088 8.189 8.283 8.370 8.452
7 7.170 7.303 7.423 7.533 7.634 7.728 7.814 7.895 7.972
8 6.870 6.995 7.109 7.212 7.307 7.395 7.477 7.554 7.625
9 6.644 6.763 6.871 6.970 7.061 7.145 7.222 7.295 7.363
10 6.467 6.582 6.686 6.781 6.868 6.948 7.023 7.093 7.159

11 6.326 6.436 6.536 6.628 6.712 6.790 6.863 6.930 6.994
12 6.209 6.317 6.414 6.503 6.585 6.660 6.731 6.796 6.858
13 6.112 6.217 6.312 6.398 6.478 6.551 6.620 6.684 6.744
14 6.029 6.132 6.224 6.309 6.387 6.459 6.526 6.588 6.647
15 5.958 6.059 6.149 6.233 6.309 6.379 6.445 6.506 6.564
16 5.897 5.995 6.084 6.166 6.241 6.310 6.374 6.434 6.491
17 5.842 5.940 6.027 6.107 6.181 6.249 6.313 6.372 6.427
18 5.794 5.890 5.977 6.055 6.128 6.195 6.258 6.316 6.371
19 5.752 5.846 5.932 6.009 6.081 6.147 6.209 6.267 6.321
20 5.714 5.807 5.891 5.968 6.039 6.104 6.165 6.222 6.275

24 5.594 5.683 5.764 5.838 5.906 5.968 6.027 6.081 6.132
30 5.475 5.561 5.638 5.709 5.774 5.833 5.889 5.941 5.990
40 5.358 5.439 5.513 5.581 5.642 5.700 5.753 5.803 5.849
60 5.241 5.319 5.389 5.453 5.512 5.566 5.617 5.664 5.708
120 5.126 5.200 5.266 5.327 5.382 5.434 5.481 5.526 5.568
00 5.012 5.081 5.144 5.201 5.253 5.301 5.346 5.388 5.427












TABLE B.--Percentage points of the studentized range q(a;p,v)*-Continued

a = .05

38 40 50 60 70 80 90 100
1 68.26 68.92 71.73 73.97 75.82 77.40 78.77 79.98
2 19.11 19.28 20.05 20.66 21.16 21.59 21.96 22.29
3 12.75 12.87 13.36 13.76 14.08 14.36 14.61 14.82
4 10.44 10.53 10.93 11.24 11.51 11.73 11.92 12.09
5 9.250 9.330 9.674 9.949 10.18 10.38 10.54 10.69
6 8.529 8.601 8.913 9.163 -.370 9.548 9.702 9.839
7 8.043 8.110 8.400 8.632 8.824 8.989 9.133 9.261
8 7.693 7.756 8.029 8.248 8.430 8.586 8.722 8.843
9 7.428 7.488 7.749 7.958 8.132 8.281 8.410 8.526
10 7.220 7.279 7.529 7.730 7.897 8.041 8.166 8.276

11 7.053 7.110 7.352 7.546 7.708 7.847 7.968 8.075
12 6.916 6.970 7.205 7.394 7.552 7.687 7.804 7.909
13 6.800 6.854 7.083 7.267 7.421 7.552 7.667 7.769
14 6.702 6.754 6.979 7.159 7.309 7.438 7.550 7.650
15 6.618 6.669 6.888 7.065 7.212 7.339 7.449 7.546
16 6.544 6.594 6.810 6.984 7.128 7.252 7.360 7.457
17 6.479 6.529 6.741 6.912 7.054 7.176 7.283 7.377
18 6.422 6.471 6.680 6.848 6.989 7.109 7.213 7.307
19 6.371 6.419 6.626 6.792 6.930 7.048 7.152 7.244
20 6.325 6.373 6.576 6.740 6.877 6.994 7.097 7.187

24 6.181 6.226 6.421 6.579 6.710 6.822 6.920 7.008
30 6.037 6.080 6.267 6.417 6.543 6.650 6.744 6.827
40 5.893 5.934 6.112 6.255 6.375 6.477 6.566 6.645
60 5.750 5.789 5.958 6.093 6.206 6.303 6.387 6.462
120 5.607 5.644 5.802 5.929 6.035 6.126 6.205 6.275
0 5.463 5.498 5.646 5.764 5.863 5.947 6.020 6.085




TABLE B.-Percentage poi-ts of the stvdentized range q(a;p, v)*-Continued

a = 01

v 2 3 4 5 6 7 8 9 10
1 90.03 135.0 164.3 185.6 202.2 215.8 227.2 237.0 245.6
2 14.04 19.02 22.29 24.72 26.63 28.20 29.53 30.68 31.69
3 8.261 10.62 12.17 13.33 14.24 15.00 15.64 16.20 16.69
4 6.512 8.120 9.173 9.958 10.58 11.10 11.55 11.93 12.27
5 5.702 6.976 7.804 8.421 8.913 9.321 9.669 9.972 10.24
6 5.243 6.331 7.033 7.556 7.973 8.318 8.613 8.869 9.097
7 4.949 5.919 6.543 7.005 7.373 7.679 7.939 8.166 8.368
8 4.746 5.635 6.204 6.625 6.960 7.237 7.474 7.681 7.863
9 4.596 5.428 5.957 6.348 6.658 6.915 7.134 7.325 7.495
10 4.482 5.270 5.769 6.136 6.428 6.669 6.875 7.055 7.213

11 4.392 5.146 5.621 5.970 6.247 6.476 6.672 6.842 6.992
12 4.320 5.046 5.502 5.836 6.101 6.321 6.507 6.670 6.814
13 4.260 4.964 5.404 797 5.981 6.192 6.372 6.528 6.667
14 4.210 4.895 5.322 5.634 5.881 6.085 6.258 6.409 6.543
15 4.168 4.836 5.252 5.556 5.796 5.994 6.162 6.309 6.439
16 4.131 4.786 5.192 5.489 5.722 5.915 6.079 6.222 6.349
17 4.099 4.742 5.140 5.430 5.659 5.847 6.007 6.147 6.270
18 4.071 4.703 5.094 5.379 5.603 5.788 5.944 6.081 6.201
19 4.046 4.670 5.054 5.334 5.554 5.735 5.889 6.022 6.141
20 4.024 4.639 5.018 5.294 5.510 5.688 5.839 5.970 6.087

24 3.956 4.546 4.907 5.168 5.374 5.542 5.685 5.809 5.919
30 3.889 4.455 4.799 5.048 5.242 5.401 5.536 5.653 5.756
40 3.825 4.367 4.696 4.931 5.114 5.265 5.392 5.502 5.599
60 3.762 4.282 4.595 4.818 4.991 5.133 5.253 5.356 5.447
120 3.702 4.200 4.497 4.709 4.872 5.005 5.118 5.214 5.299
00 3.643 4.120 4.403 4.603 4.757 4.882 4.987 5.078 5.157




:7;


TABLE B. -Percentage points of the studentized range q(a;p, v)*-Continued

a = .01

11 12 13 14 15 16 17 18 19
1 253.2 260.0 266.2 271.8 277.0 281.8 286.3 290.4 294.3
2 32.59 33.40 34.13 34.81 35.43 36.00 36.53 37.03 37.50
3 17.13 17.53 17.89 18.22 18.52 18.81 19.07 19.32 19.55
4 12.57 12.84 13.09 13.32 13.53 13.73 13.91 14.08 14.24
5 10.48 10.70 10.89 11.08 11.24 11.40 11.55 11.68. 11.81
6 9.301 9.485 9.653 9.808 9.951 10.08 10.21 10.32 10.43
7 8.548 8.711 8.860 8.997 9.124 9.242 9.353 9.456 9.554
8 8.027 8.176 8.312 8.436 8.552 1 8.659 8.760 8.854 8.943
9 7.647 7.784 7.910 8.025 8.132 8.232 8.325 8.412 8.495
10 7.356 7.485 7.603 7.712 7.812 7.906 7.993 8.076 8.153

11 7.128 7.250 7.362 7.465 7.560 7.649 7.732 7.809 7.883
12 6.943 7.060 7.167 7.265 7.356 7.441 7.520 7.594 7.665
13 6.791 6.903 7.006 7.101 7.188 7.269 7.345 7.417 7.485
14 6.664 6.772 6.871 6.962 7.047 7.126 7.199 7.268 7.333
15 6.555 6.660 6.757 6.845 6.927 7.003 7.074 7.142 7.204
16 6.462 6.564 6.658 6.744 6.823 6.898 6.967 7.032 7.093
17 6.381 6.480 6.572 6.656 6.734 6.806 6.873 6.937 6.997
18 6.310 6.407 6.497 6.579 6.655 6.725 6.792 6.854 6.912
19 6.247 6.342 6.430 6.510 6.585 6.654 6.719 6.780 6.837
20 6.191 6.285 6.371 6.450 6.523 6.591 6.654 6.714 6.771

24 6.017 6.106 6.186 6.261 6.330 6.394 6.453 6.510 6.563
30 5.849 5.932 6.008 6.078 6.143 6.203 6.259 6.311 6.361
40 5.686 5.764 5.835 5.900 5.961 6.017 6.069 6.119 6.165
60 5.528 5.601 5.667 5.728 5.785 5.837 5.886 5.931 5.974
120 5.375 5.443 5.505 5.562 5.614 5.662 5.708 5.750 5.790
m 5.227 5.290 5.348 5.400 5.448 5.493 5.535 5.574 5.611







TABLE B.--Percentage points of the studentized range q(a;p,v)*-Continued

a = .01

2" 20 22 24 26 28 30 32 34 36
1 298.0 304.7 310.8 316.3 321.3 326.0 330.3 334.3 338.0
2 37.95 38.76 39.49 40.15 40.76 41.32 41.84 42.33 42.78
3 19.77 20.17 20.53 20.86 21.16 21.44 21.70 21.95 22.17
4 14.40 14.68 14.93 15.16 15.37 15.57 15.75 15.92 16.08
5 11.93 12.16 12.36 12.54 12.71 12.87 13.02 13.15 13.28
6 10.54 10.73 10.91 11.06 11.21 11.34 11.47 11.58 11.69
7 9.646 9.815 9.970 10.11 10.24 10.36 10.47 10.58 10.67
8 9.027 9 182 9.322 9.450 9.569 9.678 9.779 9.874 9.964
9 8.573 8.717 8.847 8.966 9.075 9.177 9.271 9.360 9.443
10 8.226 8 361 8.483 8.595 8.698 8.794 8.883 8.966 9.044

11 7.952 8.080 8.196 8.303 8.400 8.491 8.575 8.654 8.728
12 7.731 7.853 7.964 8.066 8.159 8.246 8.327 8.402 8.473
13 7.548 7.665 7.772 7.870 7.960 8.043 8.121 8.193 8.262
14 7.395 7.508 7.611 7.705 7.792 7.873 7.948 8.018 8.084
15 7.264 7.374 7.474 7.566 7.650 7.728 7.800 7.869 7.932
16 7.152 7.258 7.356 7.445 7.527 7.602 7.673 7.739 7.802
17 7.053 7.158 7.253 7.340 7.420 7.493 7.563 7.627 7.687
18 6.968 7.070 7.163 7.247 7.325 7.398 7.465 7.528 7.587
19 6.891 6.992 7.082 7.166 7.242 7.313 7.379 7.440 7.498
20 6.823 6.922 7.011 7.092 7.168 7.237 7.302 7.362 7.419

24 6.612 6.705 6.789 6.865 6.936 7.001 7.062 7.119 7.173
30 6.407 6.494 6.572 6.644 6.710 6.772 6.828 6.881 6.932
40 6.209 6.289 6.362 6.429 6.490 6.547 6.600 6.650 6.697
60 6.015 6.090 6.158 6.220 6.277 6.330 6.378 6.424 6.467
120 5.827 5.897 5.959 6.016 6.069 6.117 6.162 6.204 6.244
cc 5.645 5.709 5.766 5.818 5.866 5.911 5.952 5.990 6.026






Page
Missing
or
Unavailable






Page
Missing
or
Unavailable









TABLE C.-Critical values for Duncan's Multiple Range Test--( continuedd


20

17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526


22

17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526


1
2
3
4
5
6
7
8
9
10

11
12
13
14
15
16
17
18
19
20

24
30
40
60
120
mG


a = .05 __


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.481
3.486
3.492
3.498
3.505


26 28
17.97 17.97
6.085 6.085
4.516 4.516
4.033 4.033
3.814 3.814
3.697 3.697
3.626 3.626
3.579 3.579
3.547 3.547
3.526 3.526

3.510 3.510
3.499 3.499
3.490 3.490
3.485 3.485
3.481 3.481
3.478 3.478
3.476 3.476
3.474 3.474
3.474 3.474
3 474 3.474

3.477 3.477
3.484 3.486
3.492 3.497
3.501 3.509
3.511 3.522
3.522 3.536


40


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.500
3.515
3.532
3.550


50


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.503
3.521
3.541
3.562


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.525
3.548
3.574


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.529
3.555
3.584


3.510 3.510
3.499 3.499
3.490 3.490
3.485 3.485
3.481 3.481
3.478 3.478
3.476 3.476
3.474 3.474
3.474 3.474
3.473 3.474

3.471 3.475
3.470 3.477
3.469 3.479
3.467 3.481
3.466 3.483
3.466 3.486


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.531
3.561
3.594


I I


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.534
3.566
3.603


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.537
3.585
3.640


= .05


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.537-
3.596
3.668W


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510.
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.537
3.600
3.690


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.537
3.601
3.708


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490-
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.537
3.601
3.722


17.97
6.085
4.516
4.033
3.814
3.697
3.626
3.579
3.547
3.526

3.510
3.499
3.490
3.485
3.481
3.478
3.476
3.474
3.474
3.474

3.477
3.486
3.504
3.537
3.601
3.735








TABLE.C.-Critical values for Duncan's Multiple Range Test-Continued

a = .01

P 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.03 90.93 90.03 90.03
2 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04 14.04
3 8.261 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321 8.321
.4 6.512 6.677 6.740 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756 6.756
5 5.702 5.893 5.989 6.040 6.065 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074 6.074
6 5.243 5.439 5.549 5.614 5.655 5.680 5.694 5.701 5.703 5.703 5.703 5.703 5.703 5.703 5.703 5.703 5.703 5.703
7 4.949 5.145 5.260 5.334 5.383 5.416 5.439 5.454 5.464 5.470 5.472 5.472 5.472 5.472 5.472 5.472 5.472 5.472
8 4.746 4.939 5.057 5.135 5.189 5.227 5.256 5.276 5.291 5.302 5.309 5.314 5.316 5.317 5.317 5.317 5.317 5.317
9 4.596 4.787 4.906 4.986 5.043 5.086 5.118 5.142 5.160 5.174 5.185 5.193 5.199 5.203 5.205 5.206 5.206 5.206
10 4.482 4.671 4.790 4.871 4.931 4.975 5.010 5.037 5.058 5.074 5.088 5.098 5.106 5.112 5.117 5.120 5.122 5.124

11 4.392 4.579 4.697 4.780 4.841 4.887 4.924 4.952 4.975 4.994 5.009 5.021 5.031 5.039 5.045 5.050 5.054 5.057
12 4.320 4.504 4.622 4.706 4.767 4.815 4.852 4.883 4.907 4.927 4.944 4.958 4.969 4.978 4.986 4.993 4.998 5.002
13 4.260 4.442 4.560 4.644 4.706 4.755 4.793 4.824 4.850 4.872 4.889 4.904 4.917 4.928 4.937 4.944 4.950 4.956
14 4.210 4.391 4.508 4.591 4.654 4.704 4.743 4.775 4.802 4.824 4.843 4.859 4.872 4.884 4.894 4.902 4.910 4.916
15 4.168 4.347 4.463 4.547 4.610 4.660 4.700 4.733 4.760 4.783 4.803 4.820 4.834 4.846 4.857 4.866 4.874 4.881
16 4.131 4.309 4.425 4.509 4.572 4.622 4.663 4.696 4.724 4.748 4.768 4.786 4.800 4.813 4.825 4.835 4.844 4.851
17 4.099 4.275 4.391 4.475 4.539 4.589 4.630 4.664 4.693 4.717 4.738 4.756 4.771 4.785 4.797 4.807 4.816 4.824
18 4.071 4.246 4.362 4.445 4.509 4.560 4.601 4.635 4.664 4.689 4.711 4.729 4.745 4.759 4.772 4.783 4.792 4.801
19 4.046 4.220 4.335 4.419 4.483 4.534 4.575 4.610 4.639 4.665 4.686 4.705 4.722 4.736 4.749 4.761 4.771 4.780
20 4.024 4.197 4.312 4.395 4.459 4.510 4.552 4.587 4.617 4.642 4.664 4.684 4.701 4.716 4.729 4.741 4.751 4.761

24 3.956 4.126 4.239 4.322 4.386 4.437 4.480 4.516 4.546 4.573 4.596 4.616 4.634 4.651 4.665 4.678 4.690 4.700
30 3.889 4.056 4.168 4.250 4.314 4.366 4.409 4.445 4.477 4.504 4.528 4.550 4.569 4.586 4.601 4.615 4.628 4.640
40 3.825 3.988 4.098 4.180 4.244 4.296 4.339 4.376 4.408 4.436 4.461 4.483 4.503 4.521 4.537 4.553 4.566 4.579
60 3.762 3.922 4.031 4.111 4.174 4.226 4.270 4.307 4.340 4.368 4.394 4.417 4.438 4.456 4.474 4.490 4.504 4.518
120 3.702 3.858 3.965 4.044 4.107 4.158 4.202 4.239 4.272 4.301 4.327 4.351 4.372 4.392 4.410 4.426 4.442 4.456
0 '3.643 3.796 3.900 3.978 4.040 4.091 4.135 4.172 4.205 4.235 4.261 4.285 4.307 4.327 4.345 4.363 4.379 4.394









TABLE C.-Critical values bfor Duncan's Multiple Range Test-Continued

a = .01


20 22 24 26 28 30 32 34 36 38 40 50 60 70 80 90 100


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.059
5.006
4.960
4.921
4.887
4.858
4.832
4.808
4.788
4.769

4.710
4.650
4.591
4.530
4.469
4.408


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.010
4.966
4.929
4.897
4.869
4.844
4.821
4.802
4.786

4.727
4.669
4.611
4.553
4.494
4.434


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.970
4.935
4.904
4.877
4.853
4.832
4.812
4.795

4.741
4.685
4.630
4.573
4.516
4.457


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.938
4.909
4.883
4.860
4.839
4.821
4.805

4.752
4.699
4.645
4.591


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.912
4.887
4.865
4.846
4.828
4.813

4.762
4.711
4.659
4.607


4.535 4.552 4.568
4.478 4.497 4.514


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.890
4.869
4.850
4.833
4.818

4.770
4.721
4.671
4.620


90.03
14.04
8.321
6.756
6.074
5.703
5.472-
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.872
4.854
4.838
4.823

4.777
4.730
4.682
4.633
4.583
4.530


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.873
4.856
4.841
4.827

4.783
4.738
4.692
4.645
4.596
4.545


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.874
4.857
4.843
4.830

4.788
4.744
4.700
4.655
4.609
4.559


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.874
4.858
4.844
4.832

4.791
4.750
4.708
4.665
4.619
4.572


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.874
4.858
4.845
4.833

4.794
4.755
4.715
4.673
4.630
4.584


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.874
4.858
4.845
4.833

4.802
4.772
4.740
4.707
4.673
4.635


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.874
4.858
4.845
4.833

4.802
4.777
4.754
4.730
4.703
4.675


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.874
4.858
4.845
4.833

4.802
4.777
4.761
4.745
4.727
4.707


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.874
4.858
4.845
4.833

4.802
4.777
4.764
4.755
4.745
4.734


90.03
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.874
4.858
4.845
4.833

4.802
4.777
4.764
4.761
4.759
4.756


90.03.
14.04
8.321
6.756
6.074
5.703
5.472
5.317
5.206
5.124

5.061
5.011
4.972
4.940
4.914
4.892
4.874
4.858
4.845
4.833

4.802
4.777
4.764
4.765
4.770
4.776


Source: Reproduced from H. Leon Harter, Critical Values for Duncan's New Multiple Range Test, Biometrics, vol. 16(1960), with the permission of the author and
the editor.




TABLE Dl.-Critical values of k-ratio t test (k = 100)
v(denominator d.f. for F)


qinum. d.f. for F) 6


8 10 12 14 16 18 20 24 30 40 60 120


F = 1.2 (a = .913, b = 2.449)
* *


* *
.00 3.00
.09 3.10
.16 3.17
.21 3.23
.26 3.28
.33 3.37
.52 3.58
.67 3.76
.80 3.91


F = 1.4 (a = .845, b = 1.871)
*
2.82 2.82 2.81 2.80 2.80 2.79 2.78
2.90 2.90 2.89 2.89 2.89 2.88 2.88
2.95 2.95 2.96 2.96 2.96 2.95 2.95
2.99 3.00 3.00 3.01 3.01 3.01 3.01
3.02 3.03 3.04 3.04 3.05 3.05 3.06
3.04 3.06 3.07 3.08 3.08 3.09 3.09
3.08 3.10 3.11 3.12 3.13 3.14 3.15
3.16 3.19 3.22 3.24 3.25 3.28 3.30
3.22 3.26 3.29 3.32 3.34 3.38 3.41
3.26 3.31 3.35 3.39 3.42 3.46 3.50


F = 1.7 (a = .767, b = 1.558)
* ,* *


F = 2.0 (a = .707,

2 *
4 2.74 2.67 2.63 2.59 2.56 2.54
6 2.79 2.74 2.70 2.67 2.64 2.62
2.81 2.77 2.74 2.71 2.69 2.67
10 2.83 2.80 2.77 2.74 2.72 2.70
I 2.-4 2.82 2.79 2.77 2.75 2.73
14 2.85 2.83 2.81 2.79 2.77 2.75
!6 2.85 2.84 2.82 2.80 2.78 2.76
20 2.86 2.85 2.84 2.82 2.80 2.78
40 2.88 2.89 2.88 2.86 2.85 2.83
100 2.89 2.91 2.90 2.89 2.88 2.86
2.90 2.92 2.92 2.91 B.S)0 2.88

See footnotes at end of table.


2.59 2.58
2.70 2.69
2.76 2.75
2.81 2.80
2.84 2.84
2.87 2.86
2.89 2.89
2.92 2.92
3.00 2.99
3.05 3.05
3.09 3.08

b = 1.414)
* *
2.52 2.51
2.60 2.59
2.65 2.64
2.69 2.67
2.71 2.70
2.73 2.72
2.74 2.73
2.77 2.75
2.81 2.80
2.84 2.82
2.86 2.85


* *


2.49 2.46 2.44 2.41 2.3
2.57 2.54 2.52 2.49 2.4
2.62 2.59 2.56 2.53 2.4
2.65 2.62 2.59 2.56 2.5
2.67 2.64 2.61 2.57 2.5
2.69 2.66 2.63 2.59 2.5
2.70 2.67 2.64 2.59 2.5
2.72 2.69 2.65 2.61 2.5
2.77 2.73 2.68 2.62 2.5
2.79 2.75 2.69 2.62 2.5
2.81 2.76 2.69 2.61 2.5









TABLE D1.-Critical values of k-ratio t test (k = 100)-Continued
v(denominator d.f. for F)

q(num. d.f. for F) 6 8 10 12 14 16 18 20 24 30 40 60 120

F = 2.4 (a = .645, b = 1.309)
2 2.18
4 2.71 2.63 2.57 2.53 2.49 2.47 2.44 2.43 2.40 2.37 2.34 2.31 2.28
6 2.75 2.68 2.63 2.58 2.55 2.52 2.50 2.48 2.46 2.42 2.39 '2.36 2.32
8 2.77 2.71 2.66 2.62 2.59 2.56 2.54 2.52 2.49 2.45 2.42 2.38 2.34
.10 2.79 2.73 2.68 2.64 2.61 2.58 2.56 2.54 2.50 2.47 2.43 2.39 2.34
12 2.79 2.74 2.70 2.66 2.62 2.60 2.57 2.55 2.52 2.48 2.44 2.39 2.35
14 2.80 2.75 2.71 2.67 2.64 2.61 2.58 2.56 2.53 2.49 2.44 2.40 2.35
16 2.81 2.76 2.72 2.68 2.65 2.62 2.59 2.57 2.53 2.49 2.45 2.40 2.34
20 2.82 2.77 2.73 2.69 2.66 2.63 2.60 2.58 2.54. 2.50 2.45 2.40 2.34
40 2.83 2.80 2.76 2.72 2.69 2.66 2.63 2.60 2.56 2.51 2.46 2.39 2.33
100 2.84 2.81 2.78 2.74 2.71 2.67 2.64 2.62 2.57 2.51 2.45 2.39 2.:'2
co 2.85 2.83 2.79 2.76 2.72 2.68 2.65 2.62 2.57 2.51 2.45 2.38 2.3,1

F = 3.0 (a = .577, b = 1.225)

2 2.41 2.36 2.32 2.29 2.27 2.25 2.22 2.20 2.17 2.14 2.11
4 2.68 2.57 2.50 2.45 2.41 2.38 2.35 2.33 2.30 2.27 2.24 2.20 2.17
6 2.71 2.61 2.54 2.49 2.44 2.41 2.39 2.36 2.33 2.29 2.26 2.22 2.18
8 2.72 2.63 2.56 2.51 2.47 2.43 2.40 2.38 2.34 2.31 2.27 2.22 2.18
10 2.74 2.65 2.58 2.52 2.48 2.44 2.41 2.39 2.35 2.31 2.27 2.22 2.18
12 2.74 2.66 2.59 2.53 2.49 2.45 2.42 2.40 2.36 2.31 2.27 2.22 2.18
14 2.75 2.66 2.60 2.54 2.49 2.46 2.43 2.40 2.36 2.32 2.27 2.22 2.17
16 2.75 2.67 2.60 2.55 2.50 2.46 2.43 2.40 2.36 2.32 2.27 2.22 2.17
20 2.76 2.68 2.61 2.55 2.51 2.47 2.43 2.41 2.36 2.32 2.27 2.22 2.17
40 2.77 2.70 2.63 2.57 2.52 2.48 2.44 2.41 2.37 2.32 2.26 2.21 2.16
100 2.78 2.71 2.64 2.58 2.53 2.49 2.45 2.42 2.37 2.31 2.26 2.21 2.16
0 2.79 2.71 2.65 2.59 2.53 2.49 2.45 2.42 2.37 2.31 2.26 2.20 2.15

F = 4.0 (a = .500, b = 1.155)

2 2.58 2.44 2.35 2.29 2.25 2.22 2.20 2.18 2.15 2.12 2.09 2.06 2.03
4 2.63 2.50 2.41 2.35 2.30 2.27 2.24 2.22 2.18 2.15 2.12 2.08 2.05
6 2.65 2.52 2.43 2.37 2.32 2.28 2.25 2.23 2.19 2.16 2.12 2.08 2.04
10 2.67 2.55 2.46 2.39 2.34 2.30 2.26 2.24 2.20 2.16 2.12 2.08 2.04
20 2.69 2.57 2.47 2.40 2.35 2.30 2.27 2.24 2.20 2.15 2.11 2.0.7 2.03
0 2.71 2.59 2.49 2.42 2.36 2.31 2.27 2.24 2.19 2.15 2.11 2.06 2.02

F = 6.0 (a= .408, b = 1.095)

2 2.53 2.37 2.27 2.21 2.16 2.13 2.10 2.08 2.05 2.02 1.99 1.96 1.93
4 2.56 2.40 2.30 2.23 2.18 2.14 2.12 2.09 2.06 2.02 1.99 1.96 1.93
6 2.58 2.42 2.31 2.24 2.19 2.15 2.12 2.09 2.06 2.02 1.99 1.95 1.92
10 2.59 2.43 2.32 2.24 2.19 2.15 2.12 2.09 2.06 2.02 1.99 1.95 1.92
20 2.60 2.44 2.32 2.25 2.19 2.15 2.12 2.09 2.05 2.02 1.98 1.95 1.92
0 2.61 2.44 2.33 2.25 2.19 2.15 2.12 2.09 2.05 2.02 1.98 1.95 1.92

See footnotes at end of table.




v(denominator d.f. for F)

q(num. d.f. for F) 6 8 10 12 14 16 18 20 24 30 40 60 120
F= 10.0 (a = .316, b = 1.054)

2 2.48 2.30 2.19 2.12 2.07 2.04 2.01 1.99 1.96 1.93 1.90 1.87 1.85
4 2.49 2.31 2.20 2.13 2.08 2.04 2.01 1.99 1.96 1.93 1.90 1.87 1.84
6 2.50 2.31 2.20 2.13 2.08 2.04 2.01 1.99 1.96 1.93 1.90 1.87 1.84
10-00 2.51 2.32 2.20 2.13 2.08 2.04 2.01 1.99 1.96 1.93 1.90 1.87 1.84

F = 25.0(a = .200, b = 1.021)

2-4 .2.40 2.20 2.10 2.03 1.99 1.95 1.93 1.91 1.88 1.86 1.83 1.80 1.78
6-ao 2.41 2.21 2.10 2.03 1.99 1.95 1.93 1.91 1.88 1.86 1.83 1.80 1.78

F = (a = 0, b = 1)

2-ao 2.33 2.13 2.03 1.97 1.93 1.90 1.88 1.86 1.84 1.81 1.79 1.76 1.74

*All differences not significant. a = 1/F b = [F/(F 1)] .
If v=4, t=2.83 for all q and F satisfying F > 8.12/q.
Source: Reproduced from Waller, Ray A., and Duncan, David B., A Bayes Rule for the Symmetric Multiple Comparisons Problem,
Corrigenda, Journal of the American Statistical Association, vol. 67 (1972), with permission of author and publisher.








































51







TABLE D2.-Critical values of k-ratio t test (k =500)

v (denominator d.f. for F)

q (num. d.f. for F) 6 8 10 12 14 16 18 20 24 30 40 60 120

F= 1.2 (a = .913, b = 2.449)

2-16 *
20 4.70 4.82 4.89 .
40 4.75 4.91 5.03 5.12 5.20 5.25 5.30 5.34 5.41 5.48 5.55 5.61 5.67
100 4.79 4.98 5.13 5.25 5.34 5.43 5.50 5.56 5.65 5.76 5.89 6.02 6.13
0 4.81 5.03 5.20 5.34 5.46 5.56 5.65 5.73 5.86 6.02 6.20 6.41 6.56

F = 1.4 (a= .845, b = 1.871)

2-14 *
16 4.61 4.66 4.68 4.69 4.69 4.69 4.69 4.68 4.67 4.65 4.62 .4.58 4.53
20 4.64 4.70 4.73 4.75 4.76 4.77 4.77 4.76 4.76 4.74 4.72 4.68 4.62
40 4.68 4.78 4.85 4.89 4.92 4.94 4.96 -4.96 4.97 4.97 4.95 4.90 4.81
0 4.74 4.88 4.99 5.06 5.12 5.17 5.20 5.23 5.26 5.28 5.26 5.16 4.82

F = 1.7 (a = .767, b = 1.558)

2-8 *
10 4.08 4.02 3.95 3.87
12 4.50 4.46 4.42 4.38 4.34 4.30 4.27 4.24 4.19 4.14 4.07 3.99 3.90
20 4.55 4.54 4.52 4.49 4.46 4.43 4.40 4.37 4.32 4.26 4.18 4.08 3.95
40 4.59 4.61 4.61 4.60 4.57 4.55 4.52 4.49 4.44 4.36 4.26 4.12 3.93
o 4.64 4.69 4.71 4.72 4.71 4.69 4.66 4.63 4.57 4.46 4.31 4.07 3.76

F = 2.0 (a = .707, b = 1.414)

2-6 *
8 3.98 3.93 3.89 3.83 3.76 3.69 3.60 3.51
10 4.41 4.31 4.22 4.15 4.08 4.03 3.98 3.94 3.88 3.80 3.72 3.63 3.53
20 4.48 4.41 4.34 4.27 4.21 4.16 4.10 4.06 3.98 3.89 3.78 3.65 3.51
40 4.51 4.47 4.41 4.35 4.29 4.23 4.17 4.12 4.03 3.92 3.78 3.62 3.44
0o 4.55 4.53 4.49 4.43 4.37 4.31 4.25 4.19 4.07 3.93 3.75 3.54 3.33

F = 2.4 (a = .645, b = 1.309)

2-4 *
6 3.77 3.71 3.65 3.61 3.54 3.47 3.39 3.30 3.22
8 4.31 4.14 4.01 3.91 3.83 3.76 3.70 3.66 3.58 3.50 3.41 3.32 3.22
10 4.33 4.18 4.05 3.95 3.87 3.79 3.73 3.68 3.60 3.51 3.42 3.31 3.21
20 4.39 4.26 4.14 4.04 3.95 3.87 3.80 3.74 3.64 3.53 3.41 3.28 3.15
0 4.45 4.35 4.25 4.14 4.03 3.94 3.85 3.78 3.64 3.50 3.34 3.18 3.04

F = 3.0-(a = .577, b = 1.225)

2 *
4 3.43 3.38 3.33 3.26 3.19 3.12 3.04 2.97
6 4.19 3.95 3.79 3.66 3.56 3.49 3.43 3.37 3.30 3.21 3.13 3.04 2.95
10 4.24 4.02 3.85 3.72 3.62 3.53 3.46 3.40 3.31 3.21 3.12 3.02 2.92
20 4.28 4.08 3.91 3.77 3.65 3.56 3.48 3.41 3.31 3.20 3.09 2.98 2.87
0 4.33 4.15 3.97 3.82 3.69 3.57 3.48 3.40 3.28 3.15 3.03 2.92 2.82






TABLE D2.-Critical values of k-ratio t test (k = 500)-Continued
v (denominator d.f. for F)


q(num. d.f. for F) 6


8 10 12 14 16 18 20 24 30 40 60 120


F = 4.0 (a = .500, b = 1.155)

*
3.40 3.30 3.22 3.16 3.11 3.04 2.96
3.43 3.32 3.24 3.17 3.12 3.04 2.95
3.46 3.34 3.25 3.17 3.11 3.03 2.94
3.48 3.35 3.25 3.17 3.10 3.01 2.92
3.49 3.35 3.24 3.15 3.09 2.99 2.89


*
3.74
4.08 3.78
4.12 3.83
4.15 3.86
4.19 3.90



3.90 3.54
3.93 3.57
3.95 3.59
3.97 3.60
3.99 3.62
3.99 3.62


F = 10.0 (a = .316, b = 1.054)


2.96 2.86 2.79 2.74 2.70
2.96 2.86 2.79 2.73 2.69
2.96 2.85 2.78 2.72 2.68
2.96 2.85 2.78 2.72 2.68
2.95 2.85 2.77 2.72 2.67


F = 25.0 (a = .200, b = 1.021)

2.92 2.79 2.70 2.64 2.59 2.56
2.92 2.79 2.70 2.64 2.59 2.55
2.92 2.78 2.70 2.63 2.59 2.55


2.64
2.63
2.62
2.62
2.61


2.58
2.57
2.56
2.56
2.56


2.52
2.51
2.50
2.50
2.50


2.47
2.46
2.45
2.45
2.45


2.42
2.41
2.40
2.40
2.40


2.51 2.46 2.41 2.36 2.32
2.50 2.45 2.41 2.36 2.32
2.50 2.45 2.41 2.36 2.32


F = (a = 0, b = 1)


2.80 2.69 2.61


2.55 2.51 2.48 2.44 2.39 2.35 2.31 2.27


*All differences not significant. a = 1/F4, b = [F/(F 1)]' .
If v=4, t = 4.52 for all q and F satisfying V' > 20.43/q.
Source: Reproduced from Waller, Ray A., and Duncan, David B. A. Bayes Rule for the Symmetric Multiple Comparisons Problem,
Corrigenda, Journal of the American Statistical Association, vol. 67 (1972), pp. 253-255, with the permission of the author and the
publisher.



















53


~TT


F = 6.0 (a = .408, b = 1.095)

3.14 3.04 2.97 2.91 2.87
3.17 3.06 2.98 2.92 2.87
3.18 3.06 2.98 2.91 2.86
3.18 3.06 2.97 2.91 2.85
3.18 3.06 2.97 2.90 2.84
3.18 3.05 2.96 2.89 2.83


3.10
3.11
3.11
3.11
3.11


3.72 3.33
3.75 3.35
3.78 3.36
3.79 3.36
3.80 3.37



3.55 3.14
3.57 3.14
3.57 3.14


2-co 3.39 3.00











TABLE E.-100-y% points of the distribution of the largest absolute value of k uncorrelated Stndent t variiates
with v degrees of freedom

v k 1 2 3 4 5 6 8 10 12 15 20
y-=0.90


2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.813
1.796
1.782
1.753
1.725
1.708
1.697
1.684
1.671



3.183
2.777
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.132
2.086
2.060
2.042
2.021
2.000


2.989
2.662
2.491
2.385
2.314
2.262
2.224
2.193
2.169
2.149
2.107
2.065
2.041
2.025
2.006
1.986



3.960
3.382
3.091
2.916
2.800
2.718
2.657
2.609
2.571
2.540
2.474
2.411
2.374
2.350
2.321
2.292


5.841 7.127
4.604 5.462
4.032 4.700
3.707 4.271
3.500 3.998
3.355 3.809
3.250 3.672
3.169 3.567
3.106 3.485
3.055 3.418
2.947 3.279
2.845 3.149
2.788 3.075
2.750 3.027
2.705 2.969
2.660 2.913


Source: Reproduced from Hahn and Hendrickson (1971), Biometrika 58, p. 323, with the permission of the author and publisher.


3.844
3.368
3.116
2.961
2.856
2.780
2.:73
2.678
2.642
2.612
2.548
2.486
2.450
2.426
2.397
2.368


4.011
3.506
3.239
3.074
2.962
2.881
2.819
2.771
2.733
2.701
2.633
2.567
2.528
2.502
2.470
2.439


3.369
2.976
2.769
2.642
2.556
2.494
2.447
2.410
2.381
2.357
2.305
2.255
2.226
2.207
2.183
2.160



4.430
3.745
3.399
3.193
3.056
2.958
2.885
2.829
2.784
2.747
2.669
2.594
2.551
2.522
2.488
2.454



7.914
5.985
5.106
4.611
4.296
4.080
3.922
3.801
3.707
3.631
3.472
3.323
3.239
3.185
3.119
3.055


3.637
3.197
2.965
2.822
2.725
2.656
2.603
2.562
2.529
2.501
2.443
2.386
2.353
2.331
2.305
2.278



4.764
4.003
3.619
3.389
3.236
3.128
3.046
2.984
2.933
2.892
2.805
2.722
2.673
2.641
2.603
2.564



8.479
6.362
5.398
4.855
4.510
4.273
4.100
3.969
3.865
3.782
3.608
3.446
3.354
3.295
3.223
3.154


y = 0.95


5.023
4.203
3.789
3.541
3.376
3.258
3.171
3.103
3.048
3.004
2.910
2.819i
2.766
2.732
2.690
2.649


5.233
4.366-
3.928
3.664
3.489
3.365
3.272
3.199
3.142
3.095
2.994
2.898
2.842
2.805
2.760
2.716


4.272
3.722
3.430
3.249
3.127
3.038
2.970
2.918
2.875
2.840
2.765
2.691
2.648
2.620
2.585
2.550



5.562
4.621
4.145
3.858
3.668
3.532
3.430
3.351
3.288
3.236
3.126
3.020
2.959
2.918
2.869
2.821



9.838
7.274
6.106
5.449
5.031
4.742
4.532
4.373
4.247
4.146
3.935
3.738
3.626
3.555
3.468
3.384


4.471
3.887
3.576
3.384
3.253
3.158
3.086
3.029
2.984
2.946
2.865
2.786
2.740
2.709
2.671
2.634



5.812
4.817
4.312
4.008
3.805
3.660
3.552
3.468
3.400
3.345
3.227
3.114
3.048
3.005
2.952
2.900



10.269
7.565
6.333
5.640
5.198
4.894
4.672
4.503
4.370
4.263
4.040
3.831
3.713
3.637
3.545
3.456


4.631
4.020
3.694
3.493
3.355
3.255
3.179
3.120
3.072
3.032
2.947
2.863
2.814
2.781
2.741
2.701



6.015
4.975
4.447
4.129
3.916
3.764
3.651
3.562
3.491
3.433
3.309
3.190
3.121
3.075
3.019
2.964



10.6161
7.801
6.519
5.796
5.335
5.017
4.785
4.609
4.470
4.359
4.125
3.907
3.783
3.704
3.607
3.515


4.823
4.180
3.837
3.624
3.478
:3.373
3.292
3.229
3.178
3.136
3.045
2. i56
2.903:
2.868
2.825
2.782



6.259
5.166
4.611
4.275
4.051
3.891
3.770
3.677
3.602
3.541
3.409
3.282
3.208
3.160
3.100
3.041



11.034
8.087
6.744
5.985
5.502
5.168
4.924
4.739
4.593
4.475
4.229
3.v99
3.869
3.785
3.683
3.586


3..x66
4.383
4.018
3.790

3.522
3.436

3.:1;3

2. i70
3.0o7
3.016
2.97s
2.931
2.,S.4



6.567
5.409
4.819
4.462
4.223
4.052
3.923
3.0o23
3.743
:3.677
3.536
3.399
3.320
3.267
3.203
3.139



11.559
8.451
7.050
6.250
5.716
5.361
5.103
4.905
4.750
4.625
4.363
4.117
3.978
3.889
3.780
3.676


y = 0.99


8.919
6.656
5.625
5.046
4.677
4.424
4.239
4.098
3.988
3.899
3.714
3.541
3.442
3.379
3.303
3.229


9.277
6.897
5.812
5.202
4.814
4.547
4.353
4.205
4.087
3.995
3.800
3.617
3.514
3.448
3.367
3.290







TA4BLEF 1.-Critical (val it s /'t/( (v;tq, ./for one-sided DIilfett's ftestsfiOr t)n)(nparig c'ltroIl against ea('lI o( theIertrcatmenets


_____- -I---- a .01


6 7 8


q '


2.02
1.94
1.89
1.86
1.83

1.81
1.80
1.78
1.77
1.76

1.75
1.75
1.74
1.73
1.73

1.72
1.71
1.70
1.68
1.67

1.66
1.64


2.44
2.34
2.27
2.22
2.18


a= .05
3

2.68
2.56
2.48
2.42
2.37

2.34
2.31
2.29
2.27
2.25

2.24
2.23
2.22
2.21
2.20

2.19
2.17
2.15
2.13
2,10

2.08
2.06


4

2.85
2.71
2.62
2.55
2.50

2.47
2.44
2.41
2:39
2.37

2.36
2.34
2.33
2.32
2.31

2.30
2.28
2.25
2.23
2.21

2.18
2.16


5

2.98
2.83
2.73
2.66 (
2.60

2.56
2.53
2.50
2.48
2.46

2.44
2.43
2.42
2.41
2.40

2.39
2.36
2.33
2.31
2.28


2 3


3.08
2.92
2.82
2.74
2.68

2.64
2.60
2.58.
2.55
2.53

2.51
2.50
2.49
2.48
2.47

2.46
2.43
2.40
2.37
2.35


4 5 6 7 8 9


3.16
3.00
2.89
2.81
2.75

2.70
2.67
2.64
2.61
2.59

2.57
2.56
2.54
2.53
2.52

2.51
2.48
2.45
2.42
2.39

2.37
2.34


3.24
3.07
2.95
2.87
2.81

2.76
2.72
2.69
2.65
2.64

2.62
2.61
2.59
2.58
2.57

2.56
2.53
2.50
2.47
2.44

2.41
2.38


9

3.30
3.12
3.01
2.92
2.86

2.81
2.77
2.74
2.71
2.69

2.67
2.65
2.64
2.62
2.61

2.60
2.57
2.54
2.51
2.48

2.45
2.42


1

3.37
3.14
3.00(
2.!90
2.82

2.76
2.72
2.68
2.65
2.62

2.60
2.58
2.57
2.55
2.54

2.53
2.49
2.46
2.42
2.39

2.36
2.33


3.90(
3.61
3.42
3.29
3.19

3.11
3.06
3.01
2.97
2.94

2.91
2.88
2.86
2.84
2.83

2.81
2.77
2.72
2.68
2.64


4.21
3.88
3.66
3.51
3.40

3.31
3.25
3.19
3.15
3.11

3.08
3.05
3.03
3.01
2.99

2.97
2.92
2.87
2.82
2.78


4.43
4.07
3.83
3.67
3.55

3.45
3.38
3.32
3.27
3.23

3.20
3.17
3.14
3.12
3.10

3.08
3.03
2.97
2.92
2.87

2.82
2.77


2.15
2.13
2.11
2.09
2.08

2.07
2.06
2.05
2.04
2.03

2.03
2.01
1.99
1.97
1.95

1.93
1.92


Source: Reproduced from C.W. Dunnett, A multiple comparison procedure for comparing several treatments with a control, Journal of the American Statistical
Association, vol. 50 (1955).


4.60
4.21
3.96
3.79
3.66

3.56
3.48
3.42
3.37
3.32

3.29
3.26
3.23
3.21
3.18

3.17
3.11
3.05
2.99
2.94

2.89
2.84


4.73
4.33
4.07
3.88
3.75

3.64
3.56
3.50
3.44
3.40

3.36
3.33
3.30
3.27
3.25

3.23
3.17
3.11
3.05
3.00

2.94
2.89


4.85
4.43
4.15
3.96
3.82

3.71
3.63
3.56
3.51
3.46

3.42
3.39
3.36
3.33
3.31

3.29
3.22
3.16
3.10
3.04

2.99
2.93


4.94
4.51
4.23
4.03
3.89

3.78
3.69
3.62
3.56
3.51

3.47
3.44
3.41
3.38
3.36

3.34
3.27
3.21
3.14
3.08

3.03
2.97


2.26 2.32
2.23 I 2.29


2.60 2.73
2.56 2.68


5.03
4.59
4.30
4.09
3.94

3.83
3.74
3.67
3.61
3.56

3.52
3.48
3.45
3.42
3.40

3.38
3.31
3.24
3.18
3.12

3.06
3.00


1








TABLE F2.--Critical values of t(aa,v) for two-sided Dunnett's tests for comparing control against each of q
other treatments
____________a = .05____________
v, 1 2 3 4 5 6 7 8 .9 10 11 12 15 20
5 2.57 3.03 3.29 3.48 3.62 3.73 3.82 3.90 3.97 4.03 4.09 4.14 4.26 4.42
6 2.45 2.86 3.10 3.26 3.39 3.49 3.57 3.64 3.71 3.76 3.81 3.86 3.97 4.11
7 2.36 2.75 2.97 3.12 3.24 3.33 3.41 3.47 3.53 3.58 3.63 3.67 3.78 3.91
8 2.31 2.67 2.88 3.02 3.13 3.22 3.29 3.35 3.41 3.46 3.50 3.54 3.64 3.76
9 2.26 2.61 2.81 2.95 3.05 3.14 3.20 3.26 3.32 3.36 3.40 3.44 3.53 3.65

10 2.23 2.57 2.76 2.89 2.99 3.07 3.14 3.19 3.24 3.29 3.33 3.36 3.45 3.57
11 2.20 2.53 2.72 2.84 2.94 3.02 3.08 3.14 3.19 3.23 3.27 3.30 3.39 3.50
12 2.18 2.50 2.68 2.81 2.90 2.98 3.04 3.09 3.14 3.18 3.22 3.25 3.34 3.45
13 2.16 2.48 2.65 2.78 2.87 2.94 3.00 3.06 3.10 3.14 3.18 3.21 3.29 3.40
14 2.14 2.46 2.63 2.75 2.84 2.91 2.97 3.02 3.07 3.11 3.14 3.18 3.26 3.36

15 2.13 2.44 2.61 2.73 2.82 2.89 2.95 3.00 3.04 3.08 3.12 3.15 3.23 3.33
16 2.12 2.42 2.59 2.71 2.80 2.87 2.92 2.97 3.02 3.06 3.09 3.12 3.20 3.30
17 2.11 2.41 2.58 2.69 2.78 2.85 2.90 2.95 3.00 3.03 3.07 3.10 3.18 3.27
18 2.10 2.40 2.56 2.68 2.76 2.83 2.89 2.94 2.98 3.01 3.05 3.08 3.16 3.25
19 2.09 2.39 2.55 2.66 2.75 2.81 2.87 2.92 2.96 3.00 3.03 3.06 3.14 3.23

20 2.09 2.38 2.54 2.65 2.73 2.80 2.86 2.90 2.95 2.98 3.02 3.05 3.12 3.22
24 2.06 2.35 2.51 2.61 2.70 2.76 2.81 2.86 2.90 2.94 2.97 3.00 3.07 3.16
30 2.04 2.32 2.47 2.58 2.66 2.72 2.77 2.82 2.86 2.89 2.92 2.95 3.02 3.11
40 2.02 2.29 2.44 2.54 2.62 2.68 2.73 2.77 2.81 2.85 2.87 2.90 2.97 3.06
60 2.00 2.27 2.41 2.51 2.58 2.64 2.69 2.73 2.77 2.80 2.83 2.86 2.92 3.00

120 1.98 2.24 2.38 2.47 2.55 2.60 2.65 2.69 2.73 2.76 2.79 2.81 2.87 2.95
0 1.96 2.21 2.35 2.44 2.51 2.57 2.61 2.65 2.69 2.72 2.74 2.77 2.83 2.91
a = .01

j 1 2 3 4 5 6 7 8 9 10 11 12 15 20
5 4.03 4.63 4.98 5.22 5.41 5.56 5.69 5.80 5.89 5.98 6.05 6.12 6.30 6.52
6 3.71 4.21 4.51 4.71 4.87 5.00 5.10 5.20 5.28 5.35 5.41 5.47 5.62 5.81
7 3.50 3.95 4.21 4.39 4.53 4.64 4.74 4.82 4.89 4.95 5.01 5.06 5.19 5.36
8 3.36 3.77 4.00 4.17 4.29 4.40 4.48 4.56 4.62 4.68 4.73 4.78 4.90 5.05
9 3.25 3.63 3.85 4.01 4.12 4.22 4.30 4.37 4.43 4.48 4.53 4.57 4.68 4.82

10 3.17 3.53 3.74 3.88 3.99 4.08 4.16 4.22 4.28 4.33 4.37 4.42 4.52 4.65
11 3.11 3.45 3.65 3.79 3.89 3.98 4.05 4.11 4.16 4.21 4.25 4.29 4.39 4.52
12 3.05 3.39 3.58 3.71 3.81 3.89 3.96 4.02 4.07 4.12 4.16 4.19 4.29 4.41
13 3.01 3.33 3.52 3.65 3.74 3.82 3.89 3.94 3.99 4.04 4.08 4.11 4.20 4.32
14 2.98 3.29 3.47 3.59 3.69 3.76 3.83 3.88 3.93 3.97 4.01 4.05 4.13 4.24

15 2.95 3.25 3.43 3.55 3.64 3.71 3.78 3.83 3.88 3.92 3.95 3.99 4.07 4.18
16 2.92 3.22 3.39 3.51 3.60 3.67 3.73 3.78 3.83 3.87 3.91 3.94 4.02 4.13
17 2.90 3.19 3.36 3.47 3.56 3.63 3.69 3.74 3.79 3.83 3.86 3.90 3.98 4.08
18 2.88 3.17 3.33 3.44 3.53 3.60 3.66 3.71 3.75 3.79 3.83 3.86 3.94 4.04
19 2.86 3.15 3.31 3.42 3.50 3.57 3.63 3.68 3.72 3.76 3.79 3.83 3.90 4.00

20 2.85 3.13 3.29 3.40 3.48 3.55 3.60 3.65 3.69 3.73 3.77 3.80 3.87 3.97
24 2.80 3.07 3.22 3.32 3.40 3.47 3.52 3.57 3.61 3.64 3.68 3.70 3.78 3.87
30 2.75 3.01 3.15 3.25 3.33 3.39 3.44 3.49 3.52 3.56 3.59 3.62 3.69 3.78
40 2.70 2.95 3.09 3.19 3.26 3.32 3.37 3.41 3.44 3.48 3.51 3.53 3.60 3.68
60 2.66 2.90 3.03 3.12 3.19 3.25 3.29 3.33 3.37 3.40 3.42 3.45 3.51 3.59

120 2.62 2.85 2.97 3.06 3.12 .3.18 3.22 3.26 3.29 3.32 3.35 3.37 3.43 3.51
0o 2.58 2.79 2.92 3.00 3.06 3.11 3.15 3.19 3.22 3.25 3.27 3.29 3.35 3.42
Source: Reproduced from C.W. Dunnett, New tables for multiple comparisons with a control, Biometrics 20 (1964), with the
permission of the author and the editor..





TABLE G.-Critical values oft(a;p, v) for teid.ig zero against nonzero dose levels
(p = number of nonzero levels)

a = .05
1 2 3 4 5 6 7 8 9 10
5 2.02 2.14 2.19 2.21 2.22 2.23 2.24 2.24 2.25 2.25
6 1.94 2.06 2.10 2.12 2.13 2.14 2.14 2.15 2.15 2.15
7 1.89 2.00 2.04 2.06 2.07 2.08 2.08 2.09 2.09 2.09
8 1.86 1.96 2.00 2.01 2.02 2.03 2.04 2.04 2.04 2.04
9 1.83 1.93 1.96 1.98 1.99 2.00 2.00 2.01 2.01 2.01

10 1.81 1.91 1.94 1.96 1.97 1.97 1.98 1.98 1.98 1.98
11 1.80 1.89 1.92 1.94 1.94 1.95 1.95 1.96 1.96 1.96
12 1.78 1.87 1.90 1.92 1.93 1.93 1.94 1.94 1.94 1.94
13 1.77 1.86 1.89 1.90 1.91 1.92 1.92 1.93 1.93 1.93
14 1.76 1.85 1.88 1.89 1.90 1.91. 1.91 1.91 1.92 1.92

15 1.75 1.84 1.87 1.88 1.89 1.90 1.90 1.90 1.90 1.91
16 1.75 1.83 1.86 1.87 1.88 1.89 1.89 1.89 1.90 1.90
17 1.74 1.82 1.85 1.87 1.87 1.88 1.88 1.89 1.89 1.89
18 1.73 1.82 1.85 1.86 1.87 1.87 1.88 1.88 1.88 1.88
19 1.73 1.81 1.84 1.85 1.86 1.87 1.87 1.87 1.87 1.88

20 1.72 1.81 1.83 1.85 1.86 1.86 1.86 1.87 1.87 1.87
22 1.72 1.80 1.83 1.84 1.85 1.85 1.85 1.86 1.86 1.86
24 1.71 1.79 1.82 1.83 1.84 1.84 1.85 1.85 1.85 1.85
26 1.71 1.79 1.81 1.82 1.83 1.84 1.84 1.84 1.84 1.85
28 1.70 1.78 1.81 1.82 1.83 1.83 1.83 1.84 1.84 1.84

30 1.70 1.78 1.80 1.81 1.82 1.83 1.83 1.83 1.83 1.83
35 1.69 1.77 1.79 1.80 1.81 1.82 1.82 1.82 1.82 1.83
40 1.68 1.76 1.79 1.80 1.80 1.81 1.81 1.81 1.82 1.82
60 1.67 1.75 1.77 1.78 1.79 1.79 1.80 1.80 1.80 1.80
120 1.66 1.73 1.75 1.77 1.77 1.78 1.78 1.78 1.78 1.78
0c 1.645 1.716 1.739 1.750 1.756 1.760 1.763 1.765 1.767 1.768










TABLE G.--Critical values of t(ap,v) for testing zero against nonzero dose levels
(p = number of nonzero levels)---Continued

a = .01

1 2 3 4 5 C1 7 8 9 10
5 3.36 3.50 3.55 3.57 3.59 3.60 3.60 3.61 3.61 3.61
6 3.14 3.26 3.29 3.31 3.32 3.33 3.34 3.34 3.34 3.35
7 3.00 3.10- 3.13 3.15 3.16 3.16 3.17 3.17 3.17 3.17
8 2.90 2.99 3.01 3.03 3.04 3.04 3.05 3.05 3.05 3.05
9 2.82 2.90 2.93 2.94 2.35 2.95 2.96 2.96 2.96 2.96

10 2.76 2.84 2.86 2.88 2.88 2.89 2.89 2.89 2.90 2.90
11 2.72 2.79 2.81 2.82 2.83 2.83 2.84 2.84 2.84 2.84
12 2.68 2.75 2.77 2.78 2.79 2.79 2.79 2.80 2.80 2.80
13 2.65 2.72 2.74 2.75 2.75 2.76 2.76 2.76 2.76 2.76
14 2.62 2.69 2.71 2.72 2.72 2.73 2.73 2.73 2.73 2.73

15 2.60 2.66 2.68 2.69 2.70 2.70 2.70 2.71 2.71 2.71
16 2.58 2.64 2.66 2.67 2.68 2.68 2.68 2.68 2.68 2.69
17 2.57 2.63 2.64 2.65 2.66 2.66 2.66 2.66 2.67 2.67
18 2.55 2.61 2.63 2.64 2.64 2.64 2.65 2.65 2.65 2.65
19 2.54 2.60 2.61 2.62 2.63 2.63 2.63 2.63 2.63 2.63

20 2.53 2.58 2.60 2.61 2.61 2.62 2.62 2.62 2.62 2.62
22 2.51 2.56 2.58 2.59 2.59 2.59 2.60 2.60 2.60 2.60
24 2.49 2.55 2.56 2.57 2.57 2.57 2.58 2.58 2.58 2.58
26 2.48 2.53 2.55 2.55 2.56 2.56 2.56 2.56 2.56 2.56
28 2.47 2.52 2.53 2.54 2.54 2.55 2.55 2.55 2.55 2.55

30 2.46 2.51 2.52 2.53 2.53 2.54 2.54 2.54 2.54 2.54
35 2.44 2.49 2.50 2.51 2.51 2.51 2.51 2.52 2.52 2.52
40 2.42 2.47 2.48 2.49 2.49 2.50 2.50 2.50 2.50 2.50
60 2.39 2.43 2.45 2.45 2.46 2.46 2.46 2.46 2.46 2.46
120 2.36 2.40 2.41 2.42 2.42 2.42 2.42 2.42 2.42 2.43
oo 2.326 2.366 2.377 2.382 2.385 2.386 2.387 2.388 2.389 2.389

Source: Reproduced from D.A.Williams, A test for differences between treatment means when
several dose levels are comparedwith a zero dose control, Biometrics 27 (1971), with the permission of
the author and the editor.






LIST OF REFERENCES


Alam, K., and Saxena, K. M. L. 1974. On Interval Estimation of a Ranked Parameter. Jour. Roy. Statis. Soc.
B 36: 277-283.
Anderson. V. L., and McLean, R. A. 1974. Design of Experiments: A Realistic Approach. Marcel Dekker,
Inc., New York.
Arvesen, J. N., and McCabe, G. P., Jr. 1975. Subset Selection Problems for Variances With Applications to
Regression Analysis. Jour. Amer. Statis. Assoc. 70: 166-170.
Balaam, L. N. 1963. Multiple-Comparisons A Sampling Experiment. Austral. Jour. Statis. 5: 62-85.
Bancroft, T. A. 1968. Topics in Intermediate Statistical Methods. V. 1. Iowa State Univ. Press, Ames.
Barlow, R. E., and Gupta, S. S. 1969. Selection Procedures for Restricted Families of Probability Distribu-
tions. Ann. Math. Statis. 40: 905-934.
Bartholomew, D. J. 1961. Ordered Tests in the Analysis of Variance. Biometrika 48: 325-332.
Bechhofer, R. E. 1968. Single-stage Procedures for Ranking Multiply-Classified Variances of Normal
Populations. Technometrics 10: 693-714. r
___ 1969. Optimal Allocation of Observations When Comparing Several Treatments With a Control. In
Multivariate Analysis-II, P. R. Krishnaiah, ed., pp. 463-473. Academic Press, New York.
-_ Kiefer. J., and Sobel, M. 1968. Sequential Identification and Ranking Procedures. Univ. Chicago
Press, Chicago.
Elmaghraby, S., and Morse, N. 1959. A Single-Sample Multiple-Decision Procedure for Selecting the
Multinomial Event Which Has the Highest Probability. Ann. Math. Statis. 30: 102-119.
Bernhardson, C. A. 1975. Type I Error Rates When Multiple Comparison Procedures Follow a Significant F
Test of ANOVA. Biometrics 31: 229-232.
Beyer, W. H., ed. 1968. Handbook of Tables for Probability and Statistics. 2d ed. The Chemical Rubber Co.,
* Cleveland.
Bhargava, R. P., and Srivastava, M. S. 1973. On Tukey's Confidence Intervals for the Contrasts of Means for
the Intraclass Correlation Model. Jour. Roy. Statis. Soc. B 35: 147-152.
Bland, R. P., and Bratcher, T. L. 1968. A Bayesian Approach to the Problem of Ranking Binomial
Probabilities. SIAM Jour. Appl. Math. 16: 843-850.
Boardman, T. J., and Moffitt, D. R. 1971. Graphical Monte Carlo Type I Error Rates for Multiple Comparison
Procedures. Biometrics 27: 738-744.
Bohrer, R. 1967. On Sharpening Scheffe's Bounds. Jour. Roy. Statis. Soc. B 29: 110-114.
Box. G. E. P., and Hunter, J. F. 1958.. Experimental Designs for Exploring Response Surfaces. In
Experimental Designs in Industry. Victor Chew, ed., pp. 138-190. John Wiley and Sons, Inc., New York.
Bradu, D., and Gabriel, K. R. 1974. Simultaneous Statistical Inference on Interactions in Two-Way Analysis
of Variance. Jour. Amer. Statis. Assoc. 69: 428-436.
Brown, M. B., and Forsythe, A. B. 1974. The ANOVA and Multiple Comparisons for Data With Heterogene-
ous Variances. Biometrics 30: 719-724.
Carter, S. G., and Swanson, M. R. 1971. Detection of Differences Between Means: A Monte Carlo Study of
Five Pairwise Multiple Comparisons Procedures. Agron. Jour. 63: 940-945.
Carmer, S. G., and Swanson, M. R. 1973. Evaluation of Ten Pairwise Multiple Comparison Procedures by
Monte Carlo Methods. Jour. Amer. Statis Assoc. 68: 66-74.
Chew, V. 1962. Regression Techniques in the Analysis of Variance. Industrial Quality Control. v. 18, No. 12,
pp. 1-2.
Chiu, W. K. 1974a. Selecting the m Populations With Largest Means From k Normal Populations With
Unknown Variances. Austral. Jour. Statis. 16: 144-147.
___ 1974b. The Ranking of Means of Normal Populations for a Generalized Selection Goal. Biometrika 61:
579-584.
Cochran, W. G., and Cox, G. M. 1957. Experimental Designs, 2d ed. John Wiley and Co., New York.
Cornell, J. A. 1971. A Review of Multiple Comparison Procedures for Comparing a Set of k Population
Means. Soil Crop Sci. Soc. Fla. Proc. 31: 92-97.
Cox, D. R. 1965. A Remark on Multiple Comparison Methods. Technometrics 6: 223-224.

59








David, H. A. 1956. The Ranking of Variances in Normal Populations. Jour. Amer. Statis. Assoc. 51: 621-626.
__ 1962. Multiple Decisions and Multiple Comparisons, Chapter 9. In Contributions to Order Statistics.
Sarhan, A. E., and Greenberg, G. B., ed., John Wiley and Sons, Inc., pp. 144-162, New York.
Davies, 0. L., ed. 1956. The Design and Analysis of Industrial Experiments. Oliver and Boyd, Edinburgh.
Dixon, D. 0., and Duncan, D. B. 1975. Minimum Bayes Risk t-Intervals for Multiple Comparisons. Jour.
Amer. Statis. Assoc. 70: 822-831.
Dudewicz, E. J. 1976. Introduction to Statistics and Probability (Ch. 11, Ranking and Selection Procedures).
Holt, Rinehart and Winston, New York.
__ Ramberg, J. S., and Chen, H. J. 1975. New Tables for Multiple Comparisons With a Control
(Unknown Variances). Biometrische Zeitschrift 17: 13-26.
Duncan, D. B. 1955. Multiple Range and Multiple F Tests. Biometrics 11: 1-42.
___ 1957. Multiple Kange Tests for Correlated and Heteroscedastic Means. Biometrics 13: 164-1.76.
_ 1965. A Bayesian Approach to Multiple Comparisons. Technometrics 7: 171-222.
______ 1970. Answer to Query #273, Multiple Comparison Methods for Comparing Regression Coefficien:,s.
Biometrics 26: 141-143.
___ 1975. t Tests and Intervals for Comparisons Suggested by the Data. Biometrics 8:1: -.:-:.1.
Dunn, 0. J. 1961. Multiple Comparisons Among Means. Jour. Amer. Statis. Assoc.'56: 52-64
__ 1964. Multiple Comparisons Using Rank Sums. Technometrics 6: 241-252.
___ and Massey, F. J., Jr. 1965. Estimation of Multiple Contrasts Using t-Distributions. .Jour. Amer.
Statis. Assoc. 60: 573-583.
Dunnett, C. W. 1955. A Multiple Comparisons Procedure for Comparing Several Treatments With a Controi.
Jour. Amer. Statis. Assoc. 50: 1096-1121.
___ 1964. New Tables for Multiple Comparisons With a Control. Biometrics 20: 482-491.
___ 1970. Multiple Comparison Tests (Query #272). Biometrics 26: 139-141.
Eaton, M. L. 1967. Some Optimum Properties of Ranking Procedures. Ann. Math. Statis. 38: 124-137.
Einot, I., and Gabriel, K. R. 1975. A Study of the Powers of Several Methods of Multiple Comparisons. Jour.
Amer. Statis. Assoc. 70: 574-583.
Federer, W. T. 1955. Experimental Design, Theory and Application. Macmillan & Co., New York.
___ 1961. Experimental Error Rates. Amer. Soc. Hort. Sci. Proc. 78: 605-615.
Fienberg, S. E., and Holland, P. W. 1973. Simultaneous Estimation of Multinomial Cell Probabilities. Jour.
Amer. Statis. Assoc. 68: 683-691.
Fisher, R. A. 1935. The Design of Experiments. 1st ed. Oliver and Boyd, London.
and Yates, F. 1963. Statistical Tables for Biological, Agricultural, and Medical Research. 6th ed.
Oliver and Boyd Ltd., Edinburgh.
Gabriel, K. R. 1964. A Procedure for Testing the Homogeneity of all Sets of Means in Analysis of Variance.
Biometrics 20: 459-477.
--__ 1966. Simultaneous Test Procedures for Multiple Comparisons on Categorical Data. Jour. Amer.
Statis. Assoc. 61: 1081-1096.
___ 1968. Simultaneous Test Procedures in Multivariate Analysis of Variance. Biometrika 55: 489-504.
-__ 1969a. Simultaneous Test Procedures Some Theory of Multiple Comparisons. Ann. Math. Statis.
40: 224-250.
Gabriel, K. R. 1969b. A Comparison of Some Methods of Simultaneous Inference in MANOVA. In Mul-
tivariate Analysis-II. P. R. Krishnaiah, ed., pp. 67-88. Academic Press, New York.
Games, P. A. 1971. Multiple Comparisons of Means. Amer. Ed. Res. Jour. 8: 531-565.
Gill, J. L. 1973. Current Status of Multiple Comparisons of Means in Designed Experiments. Jour. Dairy Sci.
56: 973-977.
Goodman, L. A. 1965. On Simultaneous Confidence Intervals for Multinomial Proportions. Technometrics 7:
247-254.
Gupta, S. S. 1963. On a Selection and Ranking Procedure for Gamma Populations. Ann. Inst. Statis, Math.
14: 199-216.
___ 1965. On Some Multiple Decision (Selection and Ranking) Rules. Technometrics 6: 225-245.
____ and Sobel, M. 1957. On a Statistic Which Arises in Selection and Ranking Problems. Ann. Math.
Statis. 28: 957-967.




and Sobel, M. 1958. On Selecting a Subset Which Contains All Populations Better Than a Standard.
Ann. Math. Statis 29: 235-244.
-- and Sobel, M. 1960. Selecting a Subset Containing the Best of Several Binomial Populations. In
Contribution to Probability and Statistics, ch. 20. Stanford University Press, Stanford.
and Panchapakesan, S. 1971. Contributions to Multiple Decision (Subset Selection) Rules, Mul-
tivariate Distribution Theory and Order Statistics. Report No. 71-0218. Aerospace Res. Lab., AFSC,
USAF, Wright-Patterson AFB, Ohio.
-__ and Panchapakesan, S. 1972. On a Class of Subset Selection Procedures. Ann. Math. Statis. 43:
814-822.
Hahn, G. J. 1970. Prediction Intervals for a Normal Distribution. Gen. Elec. Co. TIS Rpt. No. 71-C-038. Gen.
Elec. Co., Schenectady.
-__ 1972. Simultaneous Prediction Intervals for a Regression Model. Technometrics 14: 203-214.
Sand Hendrickson, R. W. 1971. A Table of Percentage Points of the Distribution of the Largest
Absolute Value of k Student t Variates and its Applications. Biometrika 58: 323-332.
Halperin, M., and Greenhouse, S. W. 1958. A Note on Multiple Comparisons for Adjusted Means in the
Analysis of Covairiance. Biometrika 45: 256-259.
Harter, H. L. 1957. Error Rates and Sample Sizes for Range Tests in Multiple Comparisons. Biometrics 13:
511-536.
-__ 1960a. Critical Values for Duncan's New Multiple Range Tests. Biometrics 16: 671-685.
-__ 1960b. Tables of Range and Studentized Range. Ann. Math. Statis. 31: 1122-1147.
-__ 1961. Corrected Error Rates for Duncan's New Multiple Range Test. Biometrics 17: 321-324.
-__ 1970. Order Statistics and Their Use in Testing and Estimation. v. 1. Tests Based on Range and
Studentized Range of Samples from a Normal Population. (Contains updated versions of Harter's Biomet-
rics (1957, 1960,1961), Technometrics (1961), and AMS (1960) papers.) U.S. Govt. Print. Off., Washington,
D.C.
___ 1970. Multiple comparison procedures for interactions. Amer. Statis. 24: 30-32.
Hartigan, J. A. 1975. Clustering Algorithms. John Wiley and Co., Inc., New York.
Hartley, H. 0. 1955. Some Recent Developments in Analysis of Variance. Communications on Pure and
Applied Mathematics 8: 47-72.
Hochberg, Y. 1975. An Extension of the T-Method to General Unbalanced Models of Fixed Effects. Jour.
Roy. Statis. Soc. B 37: 426-433.
___ 1976. A Modification of the T-Method of Multiple Comparisons for a One-Way Layout with Unequal
Variances. Jour. Amer. Statis. Assoc. 71: 200-203.
and Quade, D. 1975. One-Sided Simultaneous Confidence Bounds on Regression Surfaces With
Intercepts. Jour. Amer. Statis. Assoc. 70: 889-891.
Hoel, D., and Sobel, M. 1972. Comparisons of Sequential Procedures for Selecting the Best Binomial
Population. In Sixth Berkeley Symposium Math. Statis. Probability Proc., v. 4, pp. 53-69.
Hollander, M., and Wolfe, D. A. 1973. Nonparametric Statistical Methods. John Wiley and Co., New York.
Jensen, D. R. 1976. The Comparison of Several Response Functions With a Standard. Biometrics 32: 51-59.
___ and Jones, M. Q. 1969. Simultaneous Confidence Intervals for Variances. Jour. Amer. Statis. Assoc.
64: 324-332.
John, P. W. M. 1971. Statistical Design and Analysis of Experinients. The MacMillan Company, New York.
Johnson, D. E.. 1976. Some New Multiple Comparison Procedures for the Two-Way-AOV Model With
Interaction. Biometrics 32: 929-934.
Jolliffe, I. T. 1975. Cluster Analysis as a Multiple Comparison Method. In Applied Statistics. R. P. Gupta, ed.
North-Holland Pub. Co., New York.
Kappenman. R. F. 1972. A Note on Selection of the Greatest Exceedance Probability. Technometrics 14:
219-222.
Keselman, H. J., Toothaker, L. E., and Shooter, M. 1975. An Evaluation of Two Unequal nk forms of the
Tukey Multiple Comparison Statistic. Jour. Amer. Statis. Assoc. 70: 584-587.
Keuls, M. 1952. The Use of the "Studentized Range" in Connection With an Analysis of Variance. Euphytica
1: 112-122.
Kirk, R. E. 1968. Experimental Design Procedures for the Behavioral Sciences. Brooks/Cole, Belmont.

61





IT^^~y








Kramer, C. Y. 1956. Extension of Multiple Range Tests to Group Means With Unequal Numbers of
Replications. Biometrics 12: 309-310.
-__ 1957. Extension of Multiple Range Tests to Group Correlated Adjusted Means. Biometrics 13: 13-18.
-__ 1972. A First Course in Methods of Multivariate Analysis. Va. Polytech. Inst. State Univ.,
Blacksburg.
Krishnaiah, P. R. 1969. Simultaneous Test Procedures Under General MANOVA Models. In Multivariate
Analysis-II, P. R. Krishnaiah, ed., pp. 121-144. Academic Press, New York.
Kuiper, F. K., and Fisher, L. 1975. A Monte Carlo Comparison of Six Clustering Procedures. Biometrics 31:
777-784.
Kurtz, T. E., Link, R. F., Tukey, J. W., and Wallace, D. L. 1965. Short-Cut Multiple Comparisons for
Balanced Single and Double Classifications: Part 1, Results. Technometrics 7: 95-169.
LeClerg, F I1 1957. Mean Separation by the Functional Analysis of Variance and Multiple Comparisons,
U.S. Dept. Agr., Agr. Res. Serv., ARS 20-3. (Reprinted July 1970.)
Leonard, T. 1972. Bayesian Methods for Binomial Data. Biometrika 59: 581-589.
Levy, K. J. 1975a. An Empirical Comparison of Several Multiple Range Tests for Variances. Jour. Amer.
Statis. Assoc. 70: 180-183.
__ 1975b. A Multiple Range Procedure for Correlated Variances in a Two-Way Classification. Biomet-
rics 31: 243-246.
Little, T. M., and Hills, F. J. 1972. Statistical Methods in Agricultural Research. Univ. Calif., Agr. Ext.
Serv., Davis.
Marriott, F. H. C. 1971. Practical Problems in a Method of Cluster Analysis. Biometrics 27: 501-514.
McCool, J. I. 1975. Multiple Comparisons for Weibull Parameters. IEEE Transactions on Reliability R-24:
186-192.
McDonald, B. J., and Thompson, W. A., Jr. 1967. Rank Sum Multiple Comparisons in One- and Two-Way
Classifications. Biometrika 54: 487-497.
Mead, R., and Pike, D. J. 1975. A Review of Response Surface Methodology From a Biometrics Viewpoint.
Biometrics 31: 803-852.
Miller, R. G., Jr. 1966. Simultaneous Statistical Inference. McGraw-Hill Book Co., New York.
Morrison, D. F. 1967. Multivariate Statistical Methods. McGraw-Hill Book Co., New York.
Myers, R. H. 1971. Response Surface Methodology. Allyn and Bacon. Inc., Boston.
Nair, K. R. 1948. The Studentized Form of the Extreme Mean Square Test in the Analysis of Variance.
Biometrika 35: 16-31.
Newman, D. 1939. The Distribution of the Range in Samples From a Normal Population, Expressed in Terms
of an Independent Estimate of Standard Deviation. Biometrika 31: 20-30.
Ofosu, J. B. 1975. A Two-Stage Minimax Procedure for Selecting the Normal Population With the Small
Variance. Jour. Amer. Statis. Assoc. 70: 171-174.
O'Neill, R., and Wetherill, G. B. 1971. The Present State of Multiple Comparison Methods. Jour. Roy. Statis.
Soc. 70 :171-174.
Patel, J. K. 1976. Ranking and Selection of IFR Populations Based on Means. Jour. Amer. Statis. Assoc. 71:
143-146.
Paulson, E. 1962. A Sequential Procedure for Comparing Several Experimental Categories With a Standard
or Control. Ann. Math. Statis. 33: 438-443.
___ 1964 A Sequential Procedure for Selecting the Population with the Largest Mean From K Normal
Populations. Ann. Math. Statis. 35: 174-180.
-__ 1967. Sequential Procedures for Selecting the Best One of Several Binomial Populations. Ann. Math.
Statis. 38: 117-123.
Pearson, E. S., and Hartley, H. 0. 1966. Biometrika Tables for Statisticians. V. 1, 3d ed. Cambridge Univ.
Press, London.
Peng, K. C. 1967. The Design and Analysis of Scientific Experiments. Addison-Wesley Pub. Co., Inc.,
Reading.
Petrinovich, L. F., and Hardyck, C. D. 1969. Error Rates for Multiple Comparison Methods. Psychol. Bul.
71: 43-54.





Puri, M. L., and Puri, P. S. 1969. Multiple Decision Procedures Based on Ranks for Certain Problems in
Analysis of Variance. Ann. Math. Statis. 40: 619-632.
Ramachandran, K. V. 1956. Contributions to simultaneous confidence interval estimation. Biometrics 12:
51-56.
Reading, J. C. 1975. A Multiple Comparison Procedure for Classifying All Pairs out of K Means as Close or
Distant. Jour. Amer. Statis. Assoc. 70: 832-838.
Reiersol, 0. 1961. Linear and Non-Linear Multiple Comparisons in Logit Analysis. Biometrika 48: 359-365.
Corrigenda, Biometrika 49: 284.
Rhyne, A. L., and Steel, R. G. D. 1965. Tables for a Treatments Versus Control Multiple Comparisons Sign
Test. Technometrics 7: 293-306.
Sand Steel, R. G. D. 1967. A Multiple Comparisons Sign Test: All Pairs of Treatments. Biometrics 23:
539-549.
Rizvi, M. H., Sobel, M., and Woodworth, G. C. 1968. Nonparametric Ranking Procedures for Comparisons
With a Control. Ann. Math. Statis. 39: 2075-2093.
-__ 1971. Some Selection Problems Involving Folded Normal Distributions. Technometrics 13: 355-369.
Robbins, H., Sobel, M., and Starr, N. 1968. A Sequential Procedure for Selecting the Largest of K Means.
Ann. Math. Statis. 39: 88-92.
Robson, D. S. 1961. Multiple Comparisons With a Control in Balanced Incomplete Block Designs.
Technometrics 3: 103-105.
Ryan, T. A. 1959. Multiple Comparisons in Psychological Research. Psychol. Bul. 56: 26-47.
--__ 1960. Significance Tests for Multiple Comoarison of Proportions, Variances, and Other Statistics.
Psychol. Bul. 57: 318-328.
Ryan, T. A., Jr., and Antle, C. E. 1976. A Note on Gupta's Selection Procedure. Jour. Amer. Statis. Assoc.
71: 140-142,
Santner, T. J. 1975. A Restricted Subset Selection Approach to Ranking and Selection Problems. Ann. Stat.
3: 334-349.
Saxena, K. M. L. 1976. A Single-Sample Procedure for Estimation of the Largest Mean. Jour. Amer. Statis.
Assoc. 71: 147-148.
Schafer, W. D., and MacReady, G. B. 1975. A Modification of the Bonferroni Procedure on Contrasts Which
Are Grouped Into Internally Independent Sets. Biometrics 31: 227-228.
Scheffe, H. 1953. A Method for Judging All Contrasts in the Analysis of Variance. Biometrika 40: 87-104.
-__ 1959. The Analysis of Variance. John Wiley and Sons, Inc., New York.
Scott, A. J., and Knott, M. 1974. A Cluster Analysis Method for Grouping Means in the Analysis of Variance.
Biometrics 30: 507-512.
Seeger, P. 1966. Variance Analysis of Complete Designs. Almqvist and Wiksell, Stockholm.
Sen, P. K. 1969. A Generalization of the T-Method of Multiple Comparisons. Jour. Amer. Statis. Assoc. 64:
290-295.
___ 1969. On Nonparametric T-Method of Multiple Comparisons for Randomized Blocks. Ann. Inst.
Statis. Math. 21: 329-333.
Sherman, E. 1965. A Note on Multiple Comparisons Using Rank Sums. Technometrics 6: 255-256.
Siotani, M. 1964. Interval Estimates for Linear Combinations of Means. Jour. Amer. Statis. Assoc. 59:
1141-1164.
Slivka, J. 1970. A One Sided Nonparametric Multiple Comparison Control Percentile Tests: Treatments
Versus Control. Biometrika 57: 431-438.
Sobel, M. 1969. Selecting a Subset Containing at Least One of the T Best Populations. In Multivariate
Analysis-II. P. R. Krishnaiah, ed. pp. 515-539. Academic Press, New York.
_. and Tong, Y. L. 1971. Optimal Allocation of Observations for Partitioning a Set of Normal Popula-
tions in Comparison With a Control. Biometrika 58: 177-181.
Spjqtvoll, E. 1972. Multiple Comparisons of Regression Functions. Ann. Math. Statis. 72: 1076-1088.
___ 1972. Joint Confidence Intervals for All Linear Functions of Means in the One-Way Layout With
Unknown Group Variances. Biometrika 59: 683-685.
___ and Stoline, M. R. 1973. An Extension of the T-Method of Multiple Comparison to Include the Cases
With Unequal Sample Sizes. Jour. Amer. Statis. Assoc. 68: 975-978.




560-572.
__ _- 1961. Some Rank Sum Multiple Comparisons Tests. Biometrics 17: 539-552.
___ and Torrie, J. H. 1960. Principles and Procedures of Statistics. McGraw-Hill, New York.
Tarone, R. E. 1976. Simultaneous Confidence Ellipsoids in the General Linear Model. Technometrics 18:
85-87.
Taylor, R. J., and David, H. A. 1962. A Multi-Stage Procedure for the Selection of the Best of Several
Binomial Populations. Jour. Amer. Statis. Assoc. 57: 785-796.
Thigpen, C. C., and Paulson. A. S. 1974. A Multiple Range Test for Analysis of Covariance. Biometrika 61:
475 4.
Thomas, D. A. H. 1973. Multiple Comparisons Among Means A Review. Statistician 22: 16-42.
__ 1974. Error Rates in Multiple Comparisons Among Means Results of a Simulation Exercise. Jour.
Roy. Statis. Soc. C 23: 284-294.
Tobach, E., Smith, M., Rose, G., and Richter, D. 1967. A Table for Rank Sum Multiple Paired Comparisons.
Technometrics 9: 561-567.
Tong, Y. L. 1970. Multi-Stage Interval Estimation of the Largest Mean of K Normal Populations. Jour. Roy.
Statis. Soc. B 32: 272-277.
Trawinski, B. J., and David, H. A. 1963. Selection of the Best Treatment in a Paired-Comparison Experi-
ment. Ann. Math. Statis. 34: 75-94.
Tukey, J. W. 1949. Comparing Individual Means in the Analysis of Variance. Biometrics 5: 99-114.
___ 1951. Quick- and-dirty Methods in Statistics, Part 2. Simple Analyses for Standard Designs. Amer.
Soc. Qual. Control, 5th Ann. Conv. Trans. pp. 189-197.
___ 1953a. Some Selected Quick and Easy Methods of Statistical Analysis. Trans. N.Y. Acad. Sci. (2) 16:
88-97.
___ 1953b. The Problem of Multiple Comparisons. Unpublished Dittoed Notes, Princeton Univ., 396 pp.
_ 1960. Conclusions vs. Decisions. Technometrics 2: 423-433.
Ury, H. K. 1976. A Comparison of Four Procedures for Multiple Comparisons Among Means (Pairwise
Contrasts) for Arbitrary Sample Sizes. Technometrics 18: 89-97.
Verhagen, A. M. W. 1963. The "Caution Level" in Multiple Tests of Significance. Austral. Jour. Statis. 5:
41-48.
Wackerly, D. D. 1975. An Alternative Approach to the Problem of Selecting the Best of K Populations.
Technical Report #91. Univ. Fla. Dept. Statis., Gainesville.
Waldo, D. R. 1976. An Evaluation of Multiple Comparison Procedures. Jour. Animal Sci. 42: 539-544.
Waller, R. A., and Duncan, D. B. 1969 and 1972. A Bayes Rule for the Symmetric Multiple Comparison
Problem. Jour. Amer. Statis. Assoc. 64: 1484-1503, and Corrigenda 67: 253-255.
Weatherill, G. B., and Ofosu, J. B. 1974. Selection of the Best of K Normal Populations. Jour. Roy. Statis.
Soc. C 23: 253-277.
Williams, D. A. 1971. A Test for Differences Between Treatment Means When Several Dose Levels Are
Compared With a Zero Dose Control. Biometrics 27: 103-117.
1972. The Comparison of Several Dose Levels With a Zero Dose Control. Biometrics 28: 519-531.
Wynn, H. P., and Bloomfield, P. 1971. Simultaneous Confidence Bands in Regression Analysis. Jour. Roy.
Statis. Soc. B 33: 202-217.













64


* U.S. GOVERNMENT PRINTING OFFICE: 1978 0-280-931/SEA-5




ARS/H/6


C


COMPARISONS
AMONG TREATMENT
MEANS IN AN
ANALYSIS
OF VARIANCE


AGRICULTURAL
RESEARCH
SERVICE
OF UNITED STATES
DEPARTMENT OF
AGRICULTURE


HEADQUARTERS