##### Copter ANOVAs I ##### # Experimental design and analysis must match. If they don't match we risk making wrong # conclusions about the results. This may occur more often in science than we wish to admit. # Here we use classic ANOVAs to test our hypotheses that helicopter flight times are affected by # wing length and whether wings included a fold or not. # We build up to the full ANOVA in steps so you can see the parts, but the experiment calls for # the full analysis: a two-way factorial (wing length X folding), with blocks (groups) and a # covariate (steps). # First load the dplyr, car, and lattice packages # Import and attach the Helicopter Data v2 file from the course web page: # http://jenkins.cos.ucf.edu/wordpress/wp-content/uploads/copter-data-F16.csv # Note that we assume hereafter you name this as “data.” # Examine the data file: DesGr is a code for Design+Group. That and other variables (Group, # Wing, Fold, and Step) are numbers, but we will want to make them categorical factors, using # something like this command: fdesign <- factor(DesGr) # do likewise for other factors: Design, Group, Fold. Below we assume you called these fwing, fgroup, ffold # You have already examined normality of these data. To save time right now, let's assume that # normality is not too far out of whack (most Designs per Group were normal), but we should re- # examine homogeneity of variance more closely because it is more critical to ANOVAs. #Now run Levene's test (from the "car" package) on design-group combinations leveneTest(Time~fdesign) # You can see variances among designs & groups are not quite homogeneous. For example, boxplot(Time~fdesign) # But for purposes of getting to ANOVAs here we will bet that once we take into account the # effect of Steps that this will not be too bad.... (ahem, typing with fingers crossed...) # First, conduct an ANOVA on just Wing. We will do this two ways (the first is a simple # ANOVA, the second a linear model, as for a regression): Wanova <- aov(Time ~ fwing) Wlm <- lm(Time ~ fwing) summary(Wanova) summary(Wlm) # What does the aov command show you that the lm command does not? # Vice versa - What does the lm command show you that the aov command does not? # Finally – what is the same between aov and lm? # Now edit those commands to repeat those analyses but use Fold in place of Wing. # Summarize the results so far: did wing length significantly affect flight time? # What about folded wings? # How much of the variance in flight times is explained by each factor? Is this satisfactory? # Now run a factorial statement to more closely match the actual experiment – so far we have # only analyzed pieces that do not fully represent the whole. A factorial design allows us to # examine the effects of one factor on the effects of a second factor. Our factorial experiment # answers the question: “Does the effect of wing length on flight time depend on whether wings # are folded?” You could flip that as “Does the effect of folded wings on flight time depend on # wing length?” WxFanova <- aov(Time ~ fwing*ffold) # This a shortcut for [Wing + Fold + Wing * Fold] FxWanova <- aov(Time ~ ffold*fwing) # This a shortcut for [Wing + Fold + Fold * Wing] WxFlm <- lm(Time ~ fwing * ffold) # a linear model (regression) version of WxFanova FxWlm <- lm(Time ~ ffold * fwing) # a linear model (regression) version of FxWanova summary(WxFanova) summary(FxWanova) summary(WxFlm) summary(FxWlm) # Now squint at the summaries for WxFanova & FxWanova: similar but not identical? That's # because the aov command uses Type I Sums of Squares, meaning sequential sums of squares # are calculated in the order listed. This would be no problem if we had no missing data (i.e., a # balanced design). # But We Have Missing Data, And This Causes Problems with simple aov. This is especially # obvious if you compare WxFanova and WxFlm, for example. In principle, these should match # with a balanced design (i.e., no missing data). With missing data, plain old aov won't cut it. # We should be using better approaches, such as lm (and others). So consider the lm output # again: If a significant interaction exists, then individual effects (e.g., Wing, Fold) are moot # because there is already by definition reason to talk about each other effect, whether # significant by itself or not. # Let's plot results to help see what happened: par(mfrow=c(2,3)) # this sets up a 2x3 plot array to be used in the next line xyplot(Time~ fwing | fgroup, groups = ffold, type="a") # lines that are not parallel indicate # Wing * Fold interaction, and any NAs won't work in the plot # or try this one to see the effect of Steps on our Time~Wing results: xyplot(Time~ Wing | Group, groups = Step, type="a") # lines that are not parallel indicate # Wing * Fold interaction, and any NAs won't work in the plot # So we better include Groups as blocks and Step as a covariate, because they were part of our # actual experimental design and seem to matter. Let's build the complete model (which we # would do anyway): wfgsaov <- aov(Time ~ fwing*ffold + fgroup + Step) # This a shortcut for [Wing + Fold + Wing * Fold + fgroup as blocks and Step as a covariate] wfgslm <- lm(Time ~ fwing*ffold + fgroup + Step) # same gig but using lm summary(wfgsaov) summary(wfgslm) # IMPORTANT NOTE: the complex lm output can be read this way: Intercept represents the 0 # Fold, 0 Step, group 1. All other coefficients are by addition (or subtraction) to that. This is by # alphabetical/numerical order. If you want to change that order, use the relevel command as # described in Hector's book. # Still see differences between aov and lm outputs? Remember, aov (for ANOVAs) works for # balanced designs; those without missing data. If data are missing, aov is not approriate. # What would happen if you mistakenly included Group (numbers) instead of the factor(Group)? # Finally, we should inspect residuals of the lm model to validate it for its assumptions. Run this: par(mfrow=c(2,2)) # this sets up a 2x2 plot array to be used in the next line plot(wfgslm) # given a model, this ESSENTIAL command views residuals (errors) # upper left: you want a flat line (indicating a linear pattern in the data, fitted by the model), # and an even scatter left to right (indicating homogeneous variance throughout) # upper right: a typical QQ plot – dots on the line indicate normality to residuals # lower left: values above were squared and then square-rooted to make them positive. # A more sensitive view of the first plot, with the same criteria. # lower right: A plot to show how much "leverage" or "pull" data points have. Again, even # scatter near 0 and a flat pattern is best # So what have we accomplished? We learned: # 1. how to compute a factorial linear model with blocks and a covariate to find that the copter # treatments (Wing & Fold) interacted to significantly affect flight times. # 2. that an aov is not appropriate when there are missing data # 3. that groups were important to time, after accounting for effects of Wing*Fold and Step. # 4. each mm of Wing added to flight time, in a more-than-linear fashion # 5. but adding a folded wing increased flight time by ~0.5 sec, after accounting for all else # 6. each step reduced flight time by ~ 0.1 sec # 7. and though assumptions were not ideal, the final, rather complex model appears to have met assumptions OK.