m j em I. UT
/?L/ 9
q,5-. o by
STATISTICAL METHODS FOR
DIVIDING SITES INTO RECOMMENDATION
DOMAINS ON THE BASIS OF
EXPERIMENTAL RESULTS
Training Working Document No. 4
Prepared by
Roger Mead
Consultant
in collaboration
with CIMMYT staff
CIMMYT
Lisboa 27
Apdo. Postal 6-641,
06600 M6xico, D.F., Mexico
PREFACE
This is one of a new series of publications from CIMMYT entitled Training Working
Documents. The purpose of these publications is to distribute, in a timely fashion,
training-related materials developed by CIMMYT staff and colleagues. Some Training
Working Documents will present new ideas that have not yet had the benefit of extensive
testing in the field while others will present information in a form that the authors have
tested and found useful for teaching. Training Working Documents are intended for
distribution to participants in courses sponsored by CIMMYT and to other interested
scientists, trainers, and students. Users of these documents are encourage to provide
feedback as to their usefulness and suggestions on how they might be improved. These
documents may then be revised based on suggestions from readers and users and
published in a more formal fashion.
CIMMYT is pleased to begin this new series of publications with a set of six documents
developed by Professor Roger Mead of the Applied Statistics Department, University of
Reading, United Kingdom, in cooperation with CIMMYT staff. The first five documents
address various aspects of the use of statistics for on-farm research design and analysis,
and the sixth addresses statistical analysis of intercropping experiments. The documents
provide on-farm research practitioners with innovative information not yet available
elsewhere. Thanks goes out to the following CIMMYT staff for providing valuable input
into the development of this series: Mark Bell, Derek Byerlee, Jose Crossa, Gregory
Edmeades, Carlos Gonzalez, Renee Lafitte, Robert Tripp, Jonathan Woolley.
Any comments on the content of the documents or suggestions as to how they might be
improved should be sent to the following address:
CIMMYT Maize Training Coordinator
Apdo. Postal 6-641
06600 Mexico D.F., Mexico.
Document 4A
RECOMMENDATION DOMAIN CONSTRUCTION AND TESTING
The purpose of identifying recommendation domains must be for making recommendations from current
data and for planning and interpreting future experiments. In a general sense every applied scientist must
have a concept of a recommendation domain for his/her research. Some concept of the population to which
results are relevant is integral to any research (even statistics!)
The data on which the division of sites into groups for potential domains may be some combination of
(1) economic/sociological: based usually on surveys:
(2) physical/meteorological/soils/vegetation: based on observation or on records from nearby available
sources:
(3) experimental results.
There is also always potential for general qualitative judgement about site similarities.
There are two stages to the identification of domains, which is a dynamic process rather than a permanent
decision. We should separate the process of constructing groups from that of testing, or validating the
group structure. The group construction may be attempted using any of the three forms of data. The
validation process appears (to me) to be peculiar to experimental data because that data carries with it
information about the precision of estimates calculated from the data. It would, of course, be possible to
use the precision information inherent in experimental data to test groups (i.e. tentative domains) derived
from other forms of data.
1. Group Construction
Many techniques for identifying groups have been tried. These have been based on various forms of cluster
analysis, dimension-reducing methods such as principal components analysis, and breaking down the site x
treatment interaction variation. The latter, of course,can be used only with experimental data. Cluster
analysis or principal components analysis can be applied to any form of multiple measurement data.
The underlying information for all data-based techniques for forming groups must be that contained in the
distance matrix for between-site variation. The measurements from which the distances are calculated may
be chosen in many ways. For example, for experimental data we could use treatment mean yields for all, or
a subset of, treatments, or we could use a defined set of treatment contrasts. Whatever the particular
measurements chosen, the between-site distance for each pair of sites is calculated from the squares of
difference between the two sites for each of the measurements used. For some forms of measurement,
scaling of different measurements may be necessary to make information from different measurements
compatible, but this is unlikely to be necessary for measurements based on experimental yield data.
There is no doubt in my mind that the appropriate technique for searching for clusters is some form of
cluster analysis, rather than a more indirect method. There are, though, many different forms of cluster
analysis and it is important to choose one that tends to form compact clusters. Such is, perhaps surprisingly,
not true of all methods. Some, such as single link clustering, tend to produce strings of individuals in a
cluster each linked to only one or two other members of the cluster.
The obvious candidate for the choice of clustering method for manual calculation is the average link
method. We shall look later at some results using both average-link and complete link clustering with
computer packages. Whichever clustering algorithm is used it produces a single clustering structure,
regardless of whether there are alternatives which are nearly as good. This is a very clear justification for
using more than one clustering method to gain some idea whether there are viable alternatives.
2. Cluster Validity Assessment
Invariably clustering methods produce clusters, or more precisely systems of clusters at various levels of
clustering. Because they are thus defined to be successful we cannot assume that the resulting clusters are
meaningful. The peculiar advantage of using data derived from experimental data for clustering is that we
usually have an estimate of the precision of the experimental results. Hence it is possible to consider testing
the validity of the clusters obtained from the clustering process by testing the prediction for the
measurement values of a site from the average of other sites in the proposed cluster.
Using the precision of the treatment mean yields, or contrasts, we can test the prediction of the cluster for
the individual sites within the cluster. For each site in the proposed cluster we compare, using the precision
derived from the experimental error mean squares, the value for that site of each measurement with the
value of the measurement predicted by the average of all other sites in the cluster. The significance of the
comparison can be assessed by the extent to which the difference between site value and prediction value is
large relative to the standard error of that difference.
Such a series of comparisons produces a set of t-values, one for each measurement at each site. Although
the values are interdependent we can obtain a rough idea whether the t-values are compatible with the
appropriate t-distribution. Significance of individual t-values is not so important as the overall patten of
the set oft-values.
3. Results
Two data sets have been clustered manually using average link clustering. The results are discussed in
Documents 4B and 4C. Three further data sets have been clustered using SAS and SPSS average link and
complete link clustering algorithms. The results are discussed briefly in Document 4D, which consists
mainly of computer output.
Document 4B
CLUSTERING AND VALIDATION EXAMPLE: DATA FROM IPIALES
BEANS/MAIZE VERIFICATION TRIAL 1985
The initial data is the mean yield of (3xBeans + Maize) for each of the 8 treatments in each of 7 sites (site 5
had incomplete data)
1. Finding Groups of Sites
Site
1 2 3 4 6 7 8
Treatment
1 428 165 231 244 536 171 272
2 487 290 342 303 517 310 254
3 352 328 358 315 422 355 202
4 564 328 324 441 479 382 266
5 412 461 531 504 478 248 230
6 556 274 346 350 405 360 290
7 476 436 366 320 484 471 287
8 479 382 420 370 698 256 286
To represent the similarity of the sites in terms of the eight treatment values, we define a distance measure
between two sites. This is calculated from the set of differences between the treatment values at the two
sites. Thus, for sites 1 and 2 the yield differences for the eight treatments are:
263 (=428-165),
197
24
236
-49
282
40
97
The total distance measure between sites 1 and 2 is the sum of squares of these distances:
69169 +38809 +576 +55696 +2401 +79524 +1600 +9409 = 257184
(strictly the distance is the square root of this quantity, but it is convenient to work with squared distances).
We now calculate the complete set of between site differences:
Site
1 2 3 4 6 7 8
Site
1 257184 191312 171327 99871 245577 366779
2 24404 40573 340417 73947 115679
3 22589 249269 126203 155059
4 282134 111892 133400
6 440430 515916
7 90182
(Since the distance between site 2 and site 1 is the same as that between site 1 and site 2 we only need to
display half of the matrix.)
We can observe some patterns of similarity and dissimilarity by direct inspection of the matrix. The most
similar pair of sites are sites 3 and 4; the pairs (2,3) and (2,4) are also similar, suggesting the beginning of a
group (the "inner circle"). At the other extreme the most dissimilar pair are sites 6 and 8 and we can see
that site 6 is dissimilar to each other site except site 1. Since site I is also fairly dissimilar to all other sites
except site 6 this suggests the beginning of another group (the "outcasts").
A simple group selection strategy is to choose the grouping which makes the distances between sites within
a group as small as possible and conversely makes the distances between sites in different groups as large
as possible. Let us try some possible groupings.
Possible grouping 1) Two groups: (1,2,3,4) and (6,7,8)
Within-group Distances Between-group Distances
Group(1.2,3,4) Group(6,7,8)
257184 191312 440430 99871 245577 366779
171327 24404 515916 340417 73947 115679
40573 22589 90182 249269 126203 155059
282134 111892 133400
Means
117898 348843 191686
194880
A poor attempt since the mean between-group distance is almost exactly the same as the mean within-
group distance.
Possible grouping 2) Two groups: (1,6) and (2,3,4,7,8)
Within-Group Distances
Group(1,6) Group(2,3,4,7,8)
99871
Means
99871
24404
73947
22589
155059
133400
40573
115679
126203
111892
90182
80375
Between-aroup Distances
257184
245577
249269
515916
191312
366779
282134
306054
82147
Much better, the ratio between/within is nearly 4:
(306054/80375 = 3.8).
In fact we can not find a better pair of groups. However we can find a set of three groups which at least
deserves comparison with the (1,6),(2,3,4,7,8) grouping.
Possible grouping 3) Three groups: (1,6), (2,34) and (7,8)
Within-group Distances
Group(l,6) Group(2,3.4) Group(7,8)
99871 24404 90182
40573
22589
Means
99871
29189
55524
Between-group Distances
257184
245577
73947
126203
111892
515916
90182
191312
366779
115679
155059
133400
230409
The ratio between/within is slightly higher. However, we should expect this ratio to increase when groups
are split to form more groups, since we will inevitably take the larger between-site distances out of the set
of within-group distances to add them to the between-group distances.
There are no general theoretical results about the extent to which we should expect the ratio to increase.
The decision about how many groups to select from a clustering system is always rather arbitrary, based
largely on experience with other similar data. I think that, with little formal justification I would select
(1,6),(2,3,4,7,8) as the best guess of the clustering for this example. However, because we are using
experimental data we have the possibility of comparing the agreement between sites in a cluster with the
experimental error.
171327
340417
440430
171327
340417
249269
282134
440430
2. Testing the Clusters
Any clustering method 'vill produce clusters and these will be such that the sites within a cluster will be
more similar than sites m different clusters. The actual size of the between/within ratio may give an
indication of whether the clustering is "genuine". particularly for experienced users of the clustering
algorithm. However, the intention of clustering sites is to attempt to define recommendation domains
which should be such that the groups of sites we have identified represent a single population and that the
sites in a group could have some predictive power for each other.
A reasonable way to test ttus potential for prediction would be to compare the treatment yields for a site
with the means of the yields for that treatment at the other sites in the group. For example we consider site
2 (comparing the treatment means with the average from sites 3,4,7,8).
Site 2 yields Prediction (3,4,7,8) Difference
165 230 -65
290 302 -12
328 308 +20
328 353 -25
461 378 +83
274 336 -62
436 361 +75
382 333 +49
These seem to indicate quite good agreement hut we should try to test this. To do this we use the standard
errors of the mean yields which we can obtain from the analysis of variance for each site experiment.
The Error Mean Square values from the analyses of variance of the plot values of (3xBeans + Maize) are
shown.
Site EMS(on 7 df)
1 9929
2 9167
3 6228
4 1976
6 27828
7 1384
8 2036
Note that the error mean squares are clearly heterogeneous so that combined analyses across sites would be
extremely dubious and, in particular, significance tests would not be valid.
The standard error of a treatment mean (based on two plot values) at site 2 is
4(9167/2) = 68.
The standard error of the average of the corresponding treatment means from sites 3,4.7.8 is
/(((6228+1976+1384+2036)/16)/2) = 19.
Thus the standard error of the difference between a site 2 treatment mean and the average treatment mean
from sites 3,4,7,8 is
/(9167 +(6228+1976+1384+2036)/16)/2) = 70.
Now we can express each difference between the site 2 value and that predicted from the other sites in the
group as a t-statistic.
Site 2 yields Prediction(3,4,7,8) Difference t-statistic
165 230 -0.92 (= -65/70)
290 302 -0.17
328 308 +0.29
328 353 -0.36
461 378 +1.19
274 336 -0.89
436 361 +1.07
382 333 +0.70
An overall measure of the agreement is provided by the sum of squares of the t-statistics. Since we would
expect each t-statistic to be about 1.0 if the prediction agreement is good the sum of squares should be
about 8 (the number of comparisons). In this case the sum of squares is 4.9 confirming that the predictions
are reasonable.
The comparisons for other sites are listed below:
Site Prediction t Site Prediction t
3 (2,4,7,8) 4 (2,3,7,8)
231 213 +0.30 244 210 +0.87
342 289 +0.89 303 299 +0.10
358 300 +0.96 315 311 +0.10
324 354 -0.50 441 325 +2.97
531 361 +2.83 504 368 +3.49
346 318 +0.47 350 318 +0.82
366 378 -0.20 320 390 -1.79
420 324 +1.60 370 336 +0.87
SE(diff) = 60 SE(diff) =39
sum of t-squares = 12.9 sum of t- squares = 26.4
Site Prediction t Site Prediction t
7 (2,3,4,8) 8 (2,3,4,7)
171 228 -1.58 272 203 +1.72
310 297 +0.36 254 311 -1.42
355 301 +1.50 202 339 -3.42
382 340 +1.06 266 369 -2.58
248 432 -5.11 230 436 -5.15
360 315 +1.25 290 332 -1.05
471 352 +3.31 287 398 -2.78
256 364 -3.00 286 357 -1.78
SE(diff)= 36 SE(diff) = 40
sum of t-squares = 53.6 sum of t-squares = 60.6
Prediction
(6)
428 536 -0.78
487 517 -0.22
352 422 -0.51
564 479 +0.62
412 478 -0.48
556 405 +1.10
476 484 -0.06
479 698 -1.60
SE(diff)= 137
sum oft-squares 5.3
The immediate conclusions are that predictions are acceptable in the (1,6) group but sites 4,7 and 8 are not
well predicted in the (2,3,4.7,8) group. We have already noted, however, that the experimental precision is
very different in different site experiments and this affects the potential sensitivity if the different
prediction tests.
The comparison between sites 1 and 6 (which operates in both directions) is less precise both because of
the large standard errors in each site and because the prediction is based on only one site. Nevertheless the
agreement is an indication that the group is predictive to the precision that should be expected.
The group (2,3,4,7,8) does not provide adequate prediction and we could look for possible subdivisions of
the group. The subdivision considered earlier into (2,3,4) and (7,8) gives the following results foe testing
prediction:
Site Prediction Sum of t-squares
2 (3.4) 4.8
3 (2,4) 3.4
4 (2,3) 8.0
7 8 51.2
It is clear that the grouping (2,3,4) is acceptable (if anything, too good) but that sites 7 and 8, both of which
have small error mean squares, are not adequate predictors for each other.
Overall, this data set demonstrates heterogeneity of sites more strongly than anything else, though the
(2,3,4) grouping is consistently homogeneous.
3. Alternative Data for Clustering.
Clustering has also been attempted for the Ipiales data for
(i) bean yields,
(ii) maize yields,
(iii) treatment contrasts instead of treatment yields
Detailed results are not given since the cluster patterns are rather less clear, even, than the
(3xBeans+Maize) results.
(i) Bean yields.
The distance matrix is shown
Site
2 3 4 6 7 8
Site
1 5575 12756 10477 34421 20165 3317
2 3523 3020 18170 8230 14746
3 1189 13835 10275 23573
4 11058 7416 22752
6 10890 51842
7 31024
The best grouping is (1,8) and (2,3,4,6,7) with a between/within ratio of 2.7.
(ii) Maize yields.
The distance matrix (values divided by 100) is shown
Site
2 3 4 6 7 8
Site
1 3622 3931 4040 2674 6525 2435
2 411 135 499 650 308
3 169 445 562 211
4 670 410 340
6 1629 282
7 1111
The best grouping is the very uninteresting pattern of(1) and (2,3.4.6,7.8) which gives the large
between/within ratio of 7.4. The next best grouping is the (1,6) and (2.3,4,7,8) as in (3xBeans +Maize) with
a much smaller ratio of 3.9.
(iii) Treatment contrasts.
In our main analysis the basic data for each site was the set of eight treatment means. Since we are
concerned to define recommendation domains an alternative form of data would be to use treatment
contrasts. This would eliminate the effect on the clustering process of site mean yields and would be
attempting to cluster sites on the basis of similar treatment differences.
The set of treatment comparisons intended for the Ipiales experiments was
(i) Treatment 3 Treatment 2
(ii) Treatment 4 Treatment 3
(iii) Treatment 5 Treatment 4
(iv) Treatment 6 treatment
(v) Treatment 7 Treatment 6
(vi) Treatment 8 Treatment 6
Note these contrasts are not orthogonal so that they are not independent which may reduce the efficiency
of. but in no way invalidate, the clustering process. Also since each contrast is a simple difference between
two treatments the treatment contrasts will be less precise than the treatment means.
Reverting to using (3xBeans +Maize) the distance matrix (values divided by 100) is shown:
Site
2 3 4 6 7 8
Site
1 2510 2459 1969 2150 1044 545
2 342 674 804 1206 866
3 645 1245 1627 787
4 1070 897 323
6 1990 1076
7 455
The best grouping is (1,4,7,8) and (2,3,6) with a between/within ratio of 1.8. It is interesting that the
grouping is distinctly different from that using the treatment mean yields, even the (2,3,4) group being now
split. However the between/within ratio really is rather low and it was decided not to pursue the testing of
groups.
The overall pattern of results from using clustering methods on this set of experimental mean data is not
very encouraging. The groupings postulated prior to the experiments were (1.2,3,8), (4,7) with 6 grouped
with site 5 for the data was incomplete. The set of site results had been felt to be surprisingly inconsistent
with this grouping. The cluster analysis results tend to confirm that the results do not give clear-cut and
useful patterns. They are not, of course, less valid and informative because they do not produce clear
patterns.
Document 4C
CLUSTERING AND VALIDATION EXAMPLE: DATA FROM GHANA ZERO
TILLAGE TRIAL 82TRIAL2 (G.EDMEADES)
The data are from 14 sites of an experiment with 18 experimental treatments arranged in four blocks of
nine plots per block. The main set of treatments are all combinations of four two-level factors and there are
two "satellite" treatments. Details are:
a) Slashing/no slashing of ground cover,
b) Two rates of Gramoxone (knock-down herbicide),
c) Bellater/no Bellater applied (residual herbicide), and
d) Handweeding/no handweeding.
The satellite treatments are hoeing or scraping prior to handweeding.
The experimental design was a confounded factorial in two blocks of eight treatment combinations,
confounding the four-factor interaction between blocks, with one satellite treatment added to each block.
The same randomization was utilized at each site with the plot and block configurations also constant
across sites.
The data analysis, a summary of which is presented here, is in five stages:
1) The initial analysis included the analysis of variance from each site to extract treatment yields and the
error mean square.
2) A set of eleven effect contrasts were defined and calculated from the treatment mean yields at each site.
The use of effects rather than treatment mean yields was to eliminate differences in mean yield between
sites and to concentrate the information by ignoring the higher order interactions. The effects used were
(1) the difference between the mean of 16 factorial combinations and the mean of 2 satellite
treatments (Fact-Sat),
(2) the four main effects for the factors,
(3) the six interaction effects.
3) The inter-site distance matrix, based on sums of squared differences for the eleven effects, was
calculated.
4) The clustering of sites was investigated both formally through minimizing mean within cluster distance
and by considering alternative similar cluster patterns when there were several near alternatives for the
optimum clustering.
5) The proposed clusterings were tested by assessing, for each site in a cluster, the significance of the
deviation for that site from the average of the other sites in the cluster, for each of the eleven effects, using
the site error mean squares.
1. The Analysis of Variance and Treatment Means
Treatment means (yields in kg/plot *100, uncorrected for moisture)
Site
Treatment 3 4 5 6 7 8 12 13 14 15 16 17 18 21
1111 0 455 150 145 190 195 450 5 170 205 232 160 5 25
1112 95 435 395 325 305 360 600 425 385 455 342 468 200 110
1121 5 462 460 480 280 465 565 60 405 525 240 435 42 302
1122 290 502 465 375 415 470 480 685 415 525 315 470 225 352
1211 5 430 160 310 255 180 540 205 300 340 390 420 30 130
1212 195 460 570 405 430 430 550 660 375 480 355 515 125 138
1221 120 460 255 425 280 500 480 535 385 540 452 535 150 315
1222 335 500 425 285 285 490 490 775 415 545 300 568 245 312
2111 0 425 105 210 245 195 455 10 390 300 428 270 8 142
2112 120 448 355 330 320 360 440 400 300 480 270 398 255 225
2121 5 492 355 490 265 395 495 25 300 495 400 485 185 365
2122 370 390 535 340 240 415 450 630 385 525 302 580 295 410
2211 0 350 90 295 240 265 490 265 230 280 365 245 40 35
2212 270 430 340 360 260 410 575 490 430 400 315 462 115 178
2221 120 415 310 315 295 575 500 475 440 600 252 545 272 345
2222 310 465 445 465 380 440 595 730 355 550 420 490 175 375
S1 385 431 360 405 290 415 320 745 460 535 240 635 102 198
S2 425 425 285 365 420 400 535 650 410 565 318 458 285 258
Site Error Mean Square
3 2804
4 4689
5 5257
6 11117
7 6342
8 5268
12 13324
13 8722
14 7225
15 7947
16 2859
17 4731
18 8920
21 9240
2. Treatment Contrasts
CONTRASTS
Fact- Main Effects Two-factor Interactions
Sat A B C D AxB AxC AxD BxC BxD CxD
Site
3 -365 +18 +58 +108 +216 -8 -5 +20 -5 0 +48
4 +16 -35 -12 +30 +18 -10 -6 -4 +10 +34 -12
5 +16 -43 -28 +135 +205 -13 +53 -2 -67 +35 -83
6 -38 +7 +20 +100 +27 -4 +4 +20 -69 +16 -88
7 -62 -24 +20 +24 +73 +6 +4 -34 -11 -1 -23
8 -24 -4 +54 +170 +76 +27 -20 +26 +10 -13 -106
12 +82 -19 +36 -5 +25 +45 +25 +5 -17 +24 -32
13 -300 -41 +237 +181 +401 -13 -8 -33 +41 -108 +29
14 -80 -2 +22 +64 +58 -2 -32 -28 0 0 -44
15 -97 +2 +28 +171 +84 -20 +7 -14 +13 -29 -88
16 +57 +16 +40 -3 -18 -52 +6 -18 +2 +1 +16
17 -106 -12 +64 +147 +106 -62 +35 -11 -22 -35 -80
18 -46 +40 -8 +102 +112 -28 +26 -29 +32 -71 -40
21 +7 +48 -12 +224 +55 -40 +4 +20 -8 -10 -24
3. The Inter-site Distance Matrix
For each pair of sites the squared distance is calculated as the sum of the squares of the differences in value
for all eleven contrasts. Thus for sites 18 and 21 the squared distance is
(-46 -7)2 +(40 -48)2 +(-8 -(-12))2 +...+(-40 -(-24))2
= (-53)2 +(-8)2 +(4)2+(-122)2 +(57)2 +(12)2 +(22)2 +(-49)2
+(40)2 +(-61)2 +)-16)2 = 29628.
The full distance matrix is shown: all values being reduced by a factor of 100 (note the matrix is split in
two parts for reasons only of space).
Site
4 5 6 7 8 12 13 14 15 16 17 18 21
Site
3 1375 1170 1118 802 1071 1825 932 744 719 1756 622 816 1256
4 608 237 133 434 135 3550 178 501 134 575 378 500
5 440 544 442 704 2600 461 467 982 440 454 516
6 203 178 353 3087 128 220 420 246 327 290
7 364 284 2536 41 303 305 309 214 564
8 554 2502 237 112 699 223 286 235
12 3932 391 788 172 827 588 828
13 2469 2086 3944 1825 2321 3042
14 175 373 231 183 424
15 776 65 160 246
16 779 348 665
17 279 370
18 296
The distance matrix contains all the information about the relative similarities and dissimilarities of sites in
respect of the eleven contrasts considered. Of course if we change the set of contrasts by omitting or adding
contrasts we would change the set of distances, though we would hope that if the patterns of similarity have
a genuine basis the patterns of distances would show consistency. We notice immediately that site 13 is
very different from all other sites with the possible exception of site 3; that site 3 is not strongly similar to
any other site; that site 5 is also not very similar to any other site; and that some sites (4,14,15) are similar
to many other sites. In making these semi-quantitative assessments we seem to be identifying values under
about 300 as indicating similarity and values over about 600 as indicating dissimilarity.
Continuing subjectively I would guess that possible groupings might be
In 3 groups
In 6 groups
(4,6,7,12,14,16); (5,8,15,17,18,21); (3,13);
(4,12,16); (6,7,14); (8,15,17,18,21) ; with 3, 5 and 13 isolated.
More formally we may use a system of developing clusters and since we desire that all pairs of sites in a
particular cluster be strongly linked the method adopted is to group sites, initially, which are most similar
and recalculate distance of a site from a group of sites as the average distance of the site from all sites in
the group.
The two smallest distances are 41(sites 7 and 14) and 65(sites 15 and 17). If these two groups are formed
then all distances of sites from these groups are recalculated and some of the other small distances become
larger. For instance the 112 between sites 8 and 15 now becomes (112+223)/2 = 168 between site 8 and
group (15,17). In the same way the 128 between sites 6 and 14 becomes (203+128)/2 = 166 between site 6
and group (7,14). The distance of site 4 from group (7,14) becomes 156. We could recalculate the complete
matrix but shall, to keep the analysis compact make a further grouping first. After the initial two groupings
the next grouping is the three-way linking of sites 4, 12 and 16.
The new distance matrix for sites and groups 3, (4,12,16), 5, 6, (7,14), 8, 13, (15,17), 18, 21 is shown:
Site
(4,12,16) 5 6 (7,14) 8 13 (15,17) 18 21
Site
3 1652 1170 1118 773 1071 933 670 816 1256
(4,12,16) 620 337 220 562 3809 708 438 664
5 440 502 442 2600 454 454 516
6 166 178 3087 233 327 290
(7,14) 300 2502 254 198 494
8 2502 168 286 235
13 1956 2321 3042
(15,17) 220 308
18 296
The next two joins are site 6 with group (7,14) and site 8 with group (15,17) and these two groupings,
being quite separate can be made together, giving the new distance matrix shown
Site
(4,12,16) 5 (6,7,14) (8,15,17) 13 18 21
Site
3 1652 1170 888 804 933 816 1256
(4,12,16) 620 259 659 3809 438 664
5 481 450 2600 454 516
(6,7,14) 251 2697 241 426
(8,15,17) 2138 242 284
13 2321 3042
18 296
The distances between (6,7,14),(8,15,17) and 18 are all now very similarly small and this is the next
grouping
Site
(4,12,16) 5 (6,7,8,14,15,17,18) 3 21
Site
3 1652 1170 842 933 1256
(4,12,16) 620 456 3809 664
5 464 2600 516
(6,7,8,14,15,17,18) 2404 306
13 3042
The subsequent joins, for which the revised distance matrices are not shown are
21 joins (6,7,8,14,15,17,18),
5 joins (6,7,8,14,15,17,18,21)
(4,12,16) joins (6,7,8,14,15,17,18,21)
and 3 joins 13.
5. Testing the Clusters
The method of testing the membership of clusters involves comparing the observed value of each contrast
at a site with the mean value predicted for the contrast by the other sites in the putative cluster. The form of
the test is to calculate the ratio of the difference between site and predicted values to the standard error of
that difference. The standard error is calculated from the error mean squares obtained in section 1 from the
analysis of variance at each site. For example for comparing values for site 4 and the mean of sites 12 and
16 we need the error mean squares for those three sites:- 4689, 13324 and 2859. The variance of a
difference between site 4 and (12,16) is
4689 + (13324 + 2859)/4 = 8734.
The standard error of a difference for a factorial main effect or interaction is
4(2(8734)/16) = 38.
The standard error for comparing the mean of the factorial treatments with the mean of the satellite
treatments is
4(8734/32 + 8734/4) = 50.
Initially we shall test the clusters (4,12,16), (6,7,14) and (8,15,17,18,21).
Site Prediction
Contrast 4 (12,16) Difference t-value
F-S +16 +70 -54 -1.08
A -35 -2 -33 -0.87
B -12 +38 -50 -1.32
C +30 -4 +34 +0.89
D +18 +4 +14 +0.37
AB -10 -4 -6 -0.16
AC -6 +16 -22 -0.58
AD -4 -6 +2 +0.05
BC +10 -8 +18 +0.47
BD +34 +12 +22 +0.58
CD -12 -8 -4 -0.11
An overall summary of the agreement is provided by the sum of squares of the t-values which in this case
is 5.5. Formal theory for testing this criterion is not, I think, available but, since the t-values should be
about 1.0, if the sum of squares is less than 11.0 that must indicate an excellent agreement. If the sum of
squares is greater than the 5% point of the chi-square distribution on 11 df then the agreement is becoming
dubious at something like the 5% significance level (though the theoretical arguments behind this assertion
are very approximate).
The unsigned t-values for sites 12 and 16 compared with (4,16) and (4,12) respectively are shown with the
site 4 values repeated:
Site 4 Site 12 Site 16
1.08 0.17 0.71
0.87 1.43 0.20
1.32 0.93 0.50
0.89 0.50 0.43
0.37 1.33 0.57
0.16 2.33 1.73
0.58 0.13 0.57
0.05 0.60 0.36
0.47 0.20 0.52
0.58 0.93 0.18
0.11 1.27 0.77
SS(t) 5.5 13.3 5.6 Total 24.4
Both the individual t-value and the SS(t) give no reason to be unhappy about this cluster.
For the possible cluster (6,7,14) there are no t-values above 1.57 and the SS(t) are 8.5 for site 6, 2.8 for site
7 and 6.4 for site 14. The agreement is only worrying in the sense that it is too good!
For the possible cluster (8,15,17,18,21) there are four t-values in excess of 2.0, with no site having more
than one. The SS(t) are 13.2 for site 8, 3.6 for site 15, 15.4 for site 17, 12.3 for site 18 and 12.8 for site 21.
This cluster therefore gives almost exactly the degree of agreement which should be expected.
The further steps in testing involve either combining two of these three acceptable groups together or
adding site 5 to one of the groups or testing the (3,13) group.
The (3.13) group test (which is identical in both directions) gives two t-values over 4.5 (for main effects B
and D and a t-value of 2.7 for the BxD interaction. The SS(t) is 63.4 and the disagreement between the sites
is very significant.
Adding site 5 to that group to which it seems closest, namely (8,15,17,18,21) gives SS(t) of 41.5 with a t-
value of 4.4 for main effect D and three other t-values of 2.0 and over. It seems that site 5 is not sufficiently
like this group nor either of the others.
Finally we try to combine groups. Clearly, from the distance matrix, group (6,7,14) could combine with
either (4,12,16) or (8,15,17,18,21) but the latter two are further apart. We therefore try combining (6,7,14)
with each in turn.
For the combination (4,6,7,12,14,16) the SS(t) for the six sites are shown:
Site SS(t) Comments
4 6.6 max t-value of 1.4 for B
6 11.1 t-values of 1.95, 1.65 for C and BxC
7 7.8 max t-value of 1.7 for D
12 7.3 max t-value of 1.6 for F-S
14 8.2 max t-value of 1.8 for F-S
16 28.6 t-values between 2.0 and 2.5 for F-S, D,AxB and CxD
The total of the 6 SS(t) is 69.6 which is about what should be expected. Also there are only 4 out of 66 t-
values which could possibly be viewed as significant at 5% and the largest of these is 2.46. Therefore in
spite of the discrepancy for site 16 this cluster is acceptable.
For the combination (6,7,8,14,15,17,18,21) the SS(t) for the eight sites are shown
Site SS(t) Comments
6 8.2 max t-value of 1.8 for BxC
7 21.1 a large t-value of 3.7 for C
8 15.4 several t-values of 1.7 or 1.8 (CAxB,CxD)
14 11.0 one large t-value of 2.6 for D
15 5.1 max t-value of 1.5 forC
17 13.7 max t-value of 1.96 for AxB
18 10.8 max t-value of 1.7 for BxD
21 35.3 a large t-value of 4.8 for C and a smaller one of 2.2 for D
The total SS(t) is 120.5 which is about the approximate 5% significance level. On the other hand there are
only four individually significant t-values out of 88 which suggests that we have two bad predictions, both
for the main effect of C. I wouldn't feel bad about accepting this as a viable cluster. As in many clustering
situations there are almost equally convincing alternative sets of clusters. Marginally I still feel that the
(6,7,14) group goes better with (4,12,16).
My conclusion would, therefore, be that there are two main clusters
(4,6,7,12,14,16), (8,15,17,18,21).
If the other sites have to be grouped in some manner the best clustering is
(4,6,7,12,14,16), (5,8,15,17,18,21), (3,13).
Finally it is interesting to compare the observed distribution of the set of t-values in a cluster with the
expected proportions for the t-distribution on 15df:
t-distribution
.0 to 0.69 0.69 to 1.34 1.34 to 1.75 1.75 to 2.13 over 2.13
Expected 50% 30% 10% 5% 5%
Proportion
Cluster
(4,12,16) 20(60%) 10(30%) 2(7%) 0 1(3%)
(6,7,14) 21(64%) 10(30%) 2(7%) 0 0
(8,15,17, 23(42%) 22(40%) 6(11%) 3(5%) 1(2%)
18,21)
(5,8,15, 27(41%) 19(29%) 10(15%) 6(9%) 4(6%)
17,18,21)
(4,6,7,12, 36(55%) 17(26%) 6(9%) 4(6%) 3(4%)
14,16)
(6,7,8,14. 40(45%) 31(35%) 9(10%) 4(5%) 4(5%)
15,17,18,21)
In each case the distribution of t-values is very close to that which should be expected, the agreement in the
last case b r:,g startlingly good and perhaps pushing the preference back towards to accepting
(6,7,8,14,1L,17,18,21) and (4,12,16) as the better system of clusters.
Document 4D
CLUSTERING USING COMPUTER PACKAGES
Three data sets have been clustered using the SAS and SPSS packages. The primary objective was simply
to demonstrate the equivalent computer procedure to the manual calculations described in documents 4B
and 4C. Some comments on the procedures and results are included here, together with some brief further
analysis and some suggestions on extensions to the analysis.
The experimental data on which these analyses are based are from three years of an experiment in Ghana
on "Factors of Production" (data supplied by Greg Edmeades). The experimental design was two replicates
of a 24 factorial in four blocks of eight plots per block, with the four-factor interaction confounded in each
replicate. The treatment factors were
1) Variety
V1 La Posta
V2 Local variety (was an improved variety in 1979)
2) Weed Control
Wl 1 weeding (6 weeks)
3) Plant Density
D1 25,000 plants/ha
4) Fertilizer
FI No Fertilizer
W2 2 weedings (3 & 6 weeks)
D2 50,000 plants/ha
F2 At sowing and after 4 weeks
The numbers of sites were 24 in 1979, 12 in 1980 and 7 in 1981. Sites were not repeated in different years.
The information extracted from each site was, first the set of 16 treatment mean yields, and subsequently
the estimates of the four main effects and the six two-factor interactions
The three analyses are discussed in reverse chronological order, or equivalently in order of increasing size.
1. The 1981 Experiment
The values of the ten effects for the seven sites are given in Table 1. The distance matrix and the
dendrograms for Average Linkage and for Complete Linkage provided by SPSS are given in Tables 2
and 3.
The distance matrix shows small distances for (1,4), (1,7), (2.7), (4,7) (3,6) and (3,7) with large distances
between 5 and each of 2,3,6 and 7. The pattern of clusters appears more compact from the Complete
Linkage and the two cluster structure is used for further analysis.
Distances (not squared) between sites, between sites and the cluster centroids (averages) and between the
two cluster centroids are shown below
Site Cluster
1 2 3 4 5 6 7 C1 C2
Site/Cluster (145) (2367)
1 206 196 76 196 181 118 81 147
2 182 195 309 172 180 215 123
3 174 309 139 140 217 92
4 156 166 121 51 135
5 296 264 114 278
6 119 203 81
7 158 85
C1 178
This set of distances displays the pattern one might expect. The centers of clusters are nicely closer to each
site in the cluster than most of the distances between sites within a cluster. In each cluster the least close
site is on the opposite side of the cluster from the alternative cluster. Site 4 is almost as close to the
"wrong" cluster as one of the members of that cluster (site 2). All these are typical patterns after a
clustering.
2. The 1980 Experiment.
The values of the ten effects for the 12 sites are shown in Table 4. The distance matrix and dendrograms
for Average Linkage and Complete Linkage are shown in Table 5. This time the more compact clusters are
obtained with the Average Linkage clustering and we shall assume clusters of (3,4,6,8,10,11,12) and
(2,5,7,9) with site as unclusterable. The set of distances between sites, between sites and clusters and
between clusters are shown.
Sites Clusters
1 2 3 4 5 6 7 8 9 10 11 12 C1 C2
Sites
1 297 291 312 382 245 355 296 335 209 200 241 244 338
2 144 140 163 105 133 141 112 137 123 166 109 88
3 153 252 108 213 99 183 160 172 94 95 184
4 209 129 186 129 117 142 140 134 95 136
5 227 88 221 158 218 217 275 217 87
6 188 92 134 116 116 79 51 150
7 176 117 190 195 237 180 56
8 147 144 160 97 56 153
9 166 171 180 133 81
10 89 129 86 163
11 146 97 163
12 64 203
Cl 144
All the cluster 1 sites are clearly not in cluster 2 and only site 2 from cluster 2 is a candidate for cluster as
an alternative. The neat dividing distance of 100 for being in or out of a cluster is coincidental and rather
less than the corresponding distance of 125 for the 1981 data.
3. The 1979 Experiment
Tables 6, 7, and 8 give the effects data for each site, the distance matrix (two and a bit sheets) and the two
dendrograms. The two clustering methods both give interesting, and interestingly different cluster patterns.
Somewhat arbitrarily I have chosen to use the average linkage clusters (1,3,9,12,22), the big cluster
(7,13,15,17,18,19,20,21,23,24), (4,5,6,8,11,14) and (2,10) with 16 as an outsider.
It would be interesting to look at the site and cluster distances but too long(!) and really several different
clusterings should be examined. So instead I shall use the unusual circumstance of having the same set of
treatment effects for all three years and compare the ten groups (two in 1981, three in 1980 and five in
1979.)
Effect Means
Group Size F1 F2 F3 F4 F1F2 F1F3 F1F4 F2F3 F2F4 F3F4
81(1) 3 -67 -28 -51 -123 -3 +7 +30 +2 +2 +22
81(2) 4 -134 -22 -25 -278 -17 -1 +24 +5 +1 +8
80(1) 7 -81 -9 -39 -74 -9 +11 +15 -11 -3 +1
80(2) 4 -112 -23 -40 -206 +15 +27 +36 +10 -28 +14
80(3) 1 -238 -38 -104 +94 -17 +22 +24 +20 -12 -6
79(1) 5 +131 -49 -91 -85 +42 +20 -8 -12 -18 +51
79(2) 10 +13 -25 -39 -48 +7 +6 +1 -8 -2 +20
79(3) 6 +29 -36 -57 -174 -20 +28 -23 +24 +8 +26
79(4) 2 +11 -5 -46 -209 -48 -6 -56 -67 +2 +82
79(5) 1 +3 -21 -71 -284 +44 +33 +39 -2 -33 +118
These means tell us quite a lot about the three sets of clusters. First, of course the clustering is dominated
by the main effects of factors 1 and 4. Second, 1979 is different in the main effect of factor 1, though notice
how the correlation between main effects I and 4 has the same pattern in 1979 as in the other two years.
Third, although the 1979 clusters are different from those for the other two years the third 1980 cluster (site
1) is even more different from the rest. Notice also the much stronger interactions in 1979.
We finish with the distance matrix between clusters
1981 1980 1979
C1 C2 C1 C2 C3 Cl C2 C3 C4 C5
1981(1) 178 63 104 285 219 115 126 179 211
1981(2) 213 93 396 346 276 206 211 198
1980(1) 145 243 235 183 192 208 266
1980(2) 335 286 208 166 201 179
1980(3) 420 301 383 426 472
1979(1) 146 124 229 257
1979(2) 140 200 268
1979(3) 131 179
1979(4) 180
We observe that clusters 1 in 1981 and 1980 are strikingly similar and that cluster 2 in 1980 is midway
between clusters 1 and 2 in 1981. Clusters in 1980 mostly far from those in 1979, compared with distances
between the 1979 clusters and cluster 2 in 1981 is similarly far from the 1979 clusters.
As the students always say "No time"!
Table 1
F1 F2 F3 F4 F1F2 F1F3 F1F4 F2F3 F2F4 F3F4
Si -122.0 -46.0 -87.0 -156.0 -15.5 3.0 42.0 33.5 21.0 37.0
S2 -170.5 38.0 -10.5 -257.5 -61.5 -10.0 99.5 42.5 -29.0 -55.5
S3 I -86.0 -24.0 -48.0 -337.0 -28.0 13.5 33.0 -2.5 1.0 17.0
S4 -79.5 -36.0 -43.5 -167.5 3.0 10.5 32.0 16.5 -2.5 13.0
5S I -0.5 1.5 -21.5 -44.5 2.5 7.5 16.0 -11.0 -12.5 16.0
S6 -159.5 -19.0 9.5 -278.0 29.5 -22.0 68.0 -22.0 23.5 39.5 |
S7 -128.5 -81.0 -50.0 -239.5 -6.5 15.5 97.0 3.0 10.0 32.5 |
Table 2
*,*' HIERARCHICAL CLUSTER A ANALYSIS* S I
AVERAGE LINIAGE
81T2FP
Data information
7 jnve:ghted cases accepted.
0 cases rejected because of missing value.
Squared Euclidean measure used.
1 Aggiomeration metnod specified.
Squared Euclidean Diss:iilarity Coefficient Matrix
Case 2
2 42291.2500
3 38505.5000 32988.7500
4 5846.5000 38153.2500 30297.0000
5 38304.7500 95343.5000 95729.7500 24390.7500
6 32750.2500 29549.0000 19398.7500 27623.2500
7 13942.2500 32316.0000 19479.2500 14711.2500
Case 5 6
--------------------------------------------------
6 87457.0000
7 69708.0000 14227.0000
...............................................................................
Page 5 SPSS/PC, 4/3/90
Dendrcgrat using Average Linkage (Within Group)
Rescaled Dis:ance Cluster Combine
CA S E 0 5 10 15 20 25
Label Seq I -
4-
2 I
5
Table 3
Sss s HI RA CHIC AL CLUSTII ANALYSIS ss
COMPLITI LINIAGI
81T2FP
Data Information
7 unveighted cases accepted.
0 cases rejected because of missing value.
Squared'luclideao measure used.
1 Agglomeration method specified.
Squared Kuclidean Dissimilarity Coefficient Hatrix
Case 1 2 3 4
2 42291.2500
3 38505.5000 32988.7500
4 5846.5000 38153.2500 30297.0000
5 38304.7500 95343.5000 95729.7500 24390.7500
6 32750.2500 29549.0000 19398.7500 27623.2500
7 13942.2500 32316.0000 19479.2500 14711.2500
Case 5 6
6 87457.0000
7 69708.0000 14227.0000
DendrograP using Complete Linkage
Rescaled Distance Cluster Combine
C A S E 0 5 10 15 20 25
Label Seq t- -
1
4
5
6
3
2
Table 4
Fl F2 F3 F4 F1F2 F1F3 F1F4 F2F3 F2F4 F3F4
-238.5
-85.0
-53.5
-46.0
-162.5
-72.0
-138.0
-60 .0
-61.5
-131.5
-144.5
-57.5
-38.5
-24.5
-12.0
11.5
55
-25.5
-46.0
-20.5
-26.5
-6 .5
-17.5
-103.5
-46.0
27.0
-68.0
-24.0
-29.5
-30.5
20.5
-77.5
-80.5
-79.5
94.0
-142.5
-73.5
-124.5
-264.5
-63.5
-234.5
-94.5
-181.0
-74 .5
-64.0
-17.0
7.5
-51.5
17.0
28.0
-3.0
6.5
8 .0
13.5
-26.0
31.5
21.5
43.0
-17.0
12.5
48.5
30.5
8.5
-10.0
7.5
30.0
33.5
23.5
48.5
12.5
-28.5
20.5
37.0
53.5
36.5
23.5
24.0
13.0
20.5
6.0
-14.0
-45.5
16.5
11.5
1.0
-6.0
18.0
55
-26.0
-11.5
-64.0
-37.0
7.5
-20.0
-24 .5
-11.5
26.5
-17.0
28.5
-27.0
-5.5
-22.0
-16.0
13.0
7.5
27.0
18.0
13.5
51.0
-29.5
-23 .5
4.0 -19.5 -24.5 -24.0 -5.5 11.0 -5.0 3.0 22.5 I
I
512
I
*** BH1 I A1 CHICAL CLOST AI AL TSIS S I
AVIUGI LIHIACI
80T2FP
Data Information
12 uoneighted cases accepted.
0 cases rejected because of missing value.
Squared uclidean measure used.
1 Alglomeration etbod specified.
Squared luclidean Dissimilarity Coefficient Hatrix
Cale 1 2 3 4
2 87923.0000
3 84758.0000 20780.5000
4 97559.0000 19564.0000 23291.5000
5 145574.7500 26425.2500 63715.2500 43592.7500
6 59939.0000 11073.5000 11681.0000 16731.5000
7 125951.5000 17573.0000 45250.0000 34675.0000
8 87388.0000' 19915.0000 9729.5000 16686.0000
9 112129.0000 12634.0000 33610.0000 13784.5000
...........................................................--------------------
Pale 3 SPSS/PC4 4/4/90
Case 1 2 3 4
10 43780.5000 18873.0000 25515.0000 20100.5000
11 40150.0000 15195.5000 29483.5000 19612.0000
12 58244.2500 27535.7500 8889.2500 17856.7500
Case 5 6 7 8
6 51565.2500
7 7768.7500 35225.0000
8 48845.2500 8481.0000 31082.0000
9 24863.2500 17879.5000 13682.0000 21504.5000
10 47700.2500 13348.0000 36128.0000 20827.0000
11 47243.2500 13558.5000 37907.5000 25707.5000
12 75574.0000 6163.2500 56400.2500 9435.2500
Case 9 10 11
10 27531.5000
11 29359.5000 7939.0000
12 32275.2500 16655.2500 21253.7500
--------------------------------------------------
Pale 4 SPSS/PC+ 4/4/90
Deadrogram usitn Average Linkahe (Vithin Group)
Rescaled Distance Cluster Combine
C I S 0 5 10 15 20 25
Label SI I I I I
6
12
8
3
4
10
11
5-J
2
9
Table 5
Table 5 (con't)
Dendrotras using Complete Linkale
Rescaled Dial
CA S 0 5 1(
Label Seq ------
6
12
3 J
10
5
7 -
stance Cluster Combine
0 15 20
i --I
I
Table 6
Fl F2 F3 F4 F1F2 F1F3 F1F4 F2F3 F2F4 F3F4
S | 133 -19 -100 -154 69 11 -27 18 -4 93
52 15 -21 -20 -227 -68 -34 -99 -44 41 86
53 145 -29 -161 -28 38 25 -25 -8 44 22
54 63 -33 -5 -108 -14 57 -24 8 -5 -32
S5 52 -114 -48 -178 -12 42 24 67 7 23
56 102 -107 -107 -234 -14 -14 -79 59 32 52
S7 8 -58 34 -51 34 76 40 -2 11 87
S8 -26 -60 -91 -155 -35 7 -30 16 37 37
59 129 -65 -70 -133 47 12 -35 -34 -7 44
S10 7 11 -71 -192 -28 21 -14 -90 -36 79
Sl1 30 -37 -70 -137 -7 64 -37 11 -15 30
512 119 -30 -81 -35 63 49 29 19 -60 44
513 17 -11 -18 -92 41 -67 -11 -52 12 -21
S14 -46 -50 -22 -232 -40 10 7 -15 -5 52
15 -9 -29 -41 -48 20 -32 10 -8 -64 46
516 3 -21 -71 -284 44 33 39 -2 -33 118
517 0 9 2 -13 -3 -6 -3 -11 11 -11
18 | -1 -36 -106 -67 -63 8 -12 -29 -23 18
519 47 -3 16 -47 -3. 16 -35 -22 -10 22
520 44 4 -80 -10 19 14 3 8 11 12
521 -37 -23 -55 -45 46 -17 -3 -9 -5 71
S22 130 -105 -41 -73 -5 1 19 -54 -61 50
523 47 -43 -128 -65 -13 60 28 24 -2 -39
524 15 -61 -12 -45 -10 12 -5 18 34 13
sasi! I IA I II CAL CLUS I AII AL SIS 1 S*
Table 7
AVIACI LIUGI Table 7
787IPFF
Data 1formatiao
24 unweilbted cases accepted.
0 cases rejected because of lisslas value.
Squared Iuclidsu eanrrs uisd.
1 Agglosertioo method specified.
Squared luclidean Disisillarity Coefficient Matrix
Case 1 2 3 4
2 57553.0000
3 29023.0000 102040.0000
4 40977.0000 52400.0000 46778.0000
5 36411.0000 54690.0000 63328.0000 22614.0000
6 30030.0000 38145.0000 65959.0000 52369.0000
7 56311.0000 79800.0000 70356.0000 29698.0000
8 43414.0000 27599.0000 57615.0000 27029.0000
9 9136.0000 49873.0000 24969.0000 23641.0000
...............................................................................
Page 3 SPSS/PC+ 4/3/90
Case 1 2 3 4
10 41623.0000 24858.0000 76506.0000 40962.0000
11 24946.0000 37547.0000 41047.0000 10391.0000
12 24993.0000 107850.0000 23272.0000 31974.0000
13 49364.0000 51401.0000 54731.0000 25605.0000
14 60979.0000 22660.0000 108542.0000 39266.0000
15 47085.0000 66672.0000 55554.0000 30150.0000
16 41976.0000 50335.0000 113241.0000 70369.0000
17 66689.0000 73028.0000 54576.0000 20396.0000
18 51703.0000 51780.0000 41516.0000 25126.0000
19 44507.0000 53046.0000 46944.0000 11082.0000
20 39881.0000 81512.0000 20886.0000 22818.0000
21 45925.0000 63124.0000 51800.0000 36984.0000
22 35421.0000 75356.0000 40712.0000 30964.0000
23 46291.0000 89504.0000 25754.0030 20362.0000
24 49883.0000 58326.0000 44144.0000 13154.0000
Case 5 6 7 8
6 24445.0000
7 40326.0000 94559.0000
8 19613.0000 30688.0000 46159.0000
...............................................................................
Page 4 SPSS/PCf 4/3/90
Case 5 6 7 8
9 29539.0000 30630.0000 45317.0000 36354.0000
10 50150.0000 59209.0000 55470.0000 27269.0000
11 16477.0000 33756.0000 30985.0000 11242.0000
12 46016.0000 78185.0000 35608.0000 60861.0000
13 52167.0000 73034.0000 43939.0000 33114.0000
14 27098.0000 47875.0000 51448.0000 15759.0000
15 45786.0000 80499.0000 26866.0000 32347.0000
16 41643.0000 52792.0000 71555.0000 43874.0000
17 57928.0000 99853.0000 26648.0000 39341.0000
18 39772.0000 62839.0000 43804.0000 16073.0000
19 46052.0000 76121.0000 20546.0000 36555.0000
20 49040.0000 79829.0000 30980.0000 35297.0000
21 49738.0000 80541.0000 22410.0000 26741.0000
22 38988.0000 62397.0000 40046.0000 53787.0000
23 30350.0000 63711.0000 47516.0000 28229.0000
24 28138.0000 63805.0000 16672.0000 21635.0000
Case 9 10 11 12
10 35491.0000
11 18510.0000 21721.0000
...............................................................................
Table 7 (con't)
Page 5 SPSS/PC+ 4/3/90
Case 9 10 11 12
12 22389.0000 63570.0000 30261.0000
13 31608.0000 39655.0000 33014.0000 47973.0000
14 52721.0000 18552.0000 24475.0000 86774.0000
15 37025.0000 37778.0000 25519.0000 27674.0000
16 53716.0000 26915.0000 40170.0000 83001.0000
17 49427.0000 55448.0000 32489.0000 40426.0000
18 36995.0000 28132.0000 15863.0000 39672.0000
19 28513.0000 40030.0000 20443.0000 29672.0000
20 32559.0000 53472.0000 23891.0000 17430.0000
21 40513.0000 39584.0000 26081.0000 35368.0000
22 15135.0000 48426.0000 33411.0000 21184.0000
23 35347.0000 55480.0000 18249.0000 24638.0000
24 33615.0000 51744.0000 19105.0000 34290.0000
Case 13 14 15 16
14 44907.0000
15 17773.0000 44966.0000
16 76324.0000 22265.0000 68627.0000
17 14833.0000 60100.0000 14856.0000 103831.0000
18 29035.0000 39072.0000 16578.0000 74059.0000
...............................................................................
Page 6 SPSS/PC+ 4/3/90
Case 13 14 15 16
19 16779.0000 50670.0000 15608.0000 84117.0000
20 23453.0000 69562.0000 16066.0000 93017.0000
21 19829.0000 45490.0000 6202.0000 66291.0000
22 40509.0000 65754.0000 29658.0000 80543.0000
23 41615.0000 61304.0000 33160.0000 84015.0000
24 20167.0000 44090.0000 16880.0000 83553.0000
Case 17 18 19 20
18 22804.0000
19 6864.0000 23104.0000
20 10504.0000 17096.0000 14016.0000
21 16172.0000 20670.0000 19610.0000 14892.0000
22 46636.0000 33650.0000 28772.0000 36240.0000
23 32112.0000 16144.0000 34566.0000 14338.0000
24 8668.0000 18558.0000 9758.0000 12454.0000
Case 21 22 23
22 44604.0000
23 36754.0000 39478.0000
Page 7 SPSS/PC+ 4/3/90
Case 21 22 23
24 15592.0000 33086.0000 22642.0000
Table 7 (con't)
Page 8 SPSS/PC+ 4/3/90
Dendrogram using Average Linkage (Within Group)
Rescaled Distance Cluster Combine
C A S I 0 5 10 15 20 25
Label Seq I I
21
17
19
24
20
13
18
23
1
12 ----- -
22 ------ I -I
22
10
14 ,
4 --
6
Table 8
Page 8 SPSS/PC4 4/3/90
Dendrogras using Coaplete Linkate
Rescaled Distance Cluster Coibine
C A S I 0 5 10 15 20 25
Label Seq I
15
21
19
24
20
13
18
23
12
22
4
11
5
7
16
6
-------
|