##### Copter ANOVAs I #####

# Experimental design and analysis must match. If they don't match we risk making wrong
# conclusions about the results. This may occur more often in science than we wish to admit.
# Here we use classic ANOVAs to test our hypotheses that helicopter flight times are affected by
# wing length and whether wings included a fold or not.

# We build up to the full ANOVA in steps so you can see the parts, but the experiment calls for
# the full analysis: a two-way factorial (wing length X folding), with blocks (groups) and a
# covariate (steps). 

# First load the dplyr, car, and lattice packages

# Import and attach the Helicopter Data v2 file from the course web page:
# http://jenkins.cos.ucf.edu/wordpress/wp-content/uploads/copter-data-F16.csv
# Note that we assume hereafter you name this as “data.”

# Examine the data file: DesGr is a code for Design+Group. That and other variables (Group,
# Wing, Fold, and Step) are numbers, but we will want to make them categorical factors, using
# something like this command:

fdesign <- factor(DesGr) # do likewise for other factors: Design, Group, Fold. Below we assume you called these fwing, fgroup, ffold

# You have already examined normality of these data. To save time right now, let's assume that
# normality is not too far out of whack (most Designs per Group were normal), but we should re-
# examine homogeneity of variance more closely because it is more critical to ANOVAs.

#Now run Levene's test (from the "car" package) on design-group combinations

leveneTest(Time~fdesign)

# You can see variances among designs & groups are not quite homogeneous. For example,

boxplot(Time~fdesign)

# But for purposes of getting to ANOVAs here we will bet that once we take into account the
# effect of Steps that this will not be too bad.... (ahem, typing with fingers crossed...)

# First, conduct an ANOVA on just Wing. We will do this two ways (the first is a simple
# ANOVA, the second a linear model, as for a regression):

Wanova <- aov(Time ~ fwing)
Wlm <- lm(Time ~ fwing)
summary(Wanova)
summary(Wlm)

# What does the aov command show you that the lm command does not?

# Vice versa - What does the lm command show you that the aov command does not?

# Finally – what is the same between aov and lm? 

# Now edit those commands to repeat those analyses but use Fold in place of Wing. 
# Summarize the results so far: did wing length significantly affect flight time? 
# What about folded wings?

# How much of the variance in flight times is explained by each factor? Is this satisfactory?

# Now run a factorial statement to more closely match the actual experiment – so far we have
# only analyzed pieces that do not fully represent the whole. A factorial design allows us to
# examine the effects of one factor on the effects of a second factor. Our factorial experiment
# answers the question: “Does the effect of wing length on flight time depend on whether wings
# are folded?” You could flip that as “Does the effect of folded wings on flight time depend on
# wing length?” 

WxFanova <- aov(Time ~ fwing*ffold) # This a shortcut for [Wing + Fold + Wing * Fold]
FxWanova <- aov(Time ~ ffold*fwing) # This a shortcut for [Wing + Fold + Fold * Wing]
WxFlm <- lm(Time ~ fwing * ffold) # a linear model (regression) version of WxFanova
FxWlm <- lm(Time ~ ffold * fwing) # a linear model (regression) version of FxWanova

summary(WxFanova)
summary(FxWanova)
summary(WxFlm)
summary(FxWlm)

# Now squint at the summaries for WxFanova & FxWanova: similar but not identical? That's
# because the aov command uses Type I Sums of Squares, meaning sequential sums of squares
# are calculated in the order listed. This would be no problem if we had no missing data (i.e., a
# balanced design).

# But We Have Missing Data, And This Causes Problems with simple aov. This is especially
# obvious if you compare WxFanova and WxFlm, for example. In principle, these should match
# with a balanced design (i.e., no missing data). With missing data, plain old aov won't cut it.

# We should be using better approaches, such as lm (and others). So consider the lm output
# again: If a significant interaction exists, then individual effects (e.g., Wing, Fold) are moot
# because  there is already by definition reason to talk about each other effect, whether
# significant by itself or not.

# Let's plot results to help see what happened:

par(mfrow=c(2,3)) # this sets up a 2x3 plot array to be used in the next line
xyplot(Time~ fwing | fgroup, groups = ffold, type="a") # lines that are not parallel indicate
# Wing * Fold interaction, and any NAs won't work in the plot

# or try this one to see the effect of Steps on our Time~Wing results:

xyplot(Time~ Wing | Group, groups = Step, type="a") # lines that are not parallel indicate
# Wing * Fold interaction, and any NAs won't work in the plot

# So we better include Groups as blocks and Step as a covariate, because they were part of our
# actual experimental design and seem to matter. Let's build the complete model (which we
# would do anyway):

wfgsaov <- aov(Time ~ fwing*ffold + fgroup + Step) # This a shortcut for [Wing + Fold +	Wing * Fold + fgroup as blocks and Step as a covariate]
wfgslm <- lm(Time ~ fwing*ffold + fgroup + Step) # same gig but using lm
summary(wfgsaov)
summary(wfgslm)

# IMPORTANT NOTE: the complex lm output can be read this way: Intercept represents the 0
# Fold, 0 Step, group 1. All other coefficients are by addition (or subtraction) to that. This is by
# alphabetical/numerical order. If you want to change that order, use the relevel command as
# described in Hector's book.

# Still see differences between aov and lm outputs? Remember, aov (for ANOVAs) works for
# balanced designs; those without missing data. If data are missing, aov is not approriate.

# What would happen if you mistakenly included Group (numbers) instead of the factor(Group)?

# Finally, we should inspect residuals of the lm model to validate it for its assumptions. Run this:

par(mfrow=c(2,2)) # this sets up a 2x2 plot array to be used in the next line
plot(wfgslm) # given a model, this ESSENTIAL command views residuals (errors)
# upper left: you want a flat line (indicating a linear pattern in the data, fitted by the model),
#	and an even scatter left to right (indicating homogeneous variance throughout)
# upper right: a typical QQ plot – dots on the line indicate normality to residuals
# lower left: values above were squared and then square-rooted to make them positive. 
#	A more sensitive view of the first plot, with the same criteria.
# lower right: A plot to show how much "leverage" or "pull" data points have. Again, even
#	 scatter near 0 and a flat pattern is best

# So what have we accomplished? We learned:
# 1. how to compute a factorial linear model with blocks and a covariate to find that the copter
# 	treatments (Wing & Fold) interacted to significantly affect flight times.
# 2. that an aov is not appropriate when there are missing data
# 3. that groups were important to time, after accounting for effects of Wing*Fold and Step.
# 4. each mm of Wing added to flight time, in a more-than-linear fashion
# 5. but adding a folded wing increased flight time by ~0.5 sec, after accounting for all else
# 6. each step reduced flight time by ~ 0.1 sec
# 7. and though assumptions were not ideal, the final, rather complex model appears to have met assumptions OK.