##### GRAPHING DATA II: Seeing data is the first step toward effective stats modeling. # Last lab you made boxplots of copter data, using basic plot and ggplot. Here we continue to play with ggplot options using scatter plots - a very useful graph for regressions. In addition, we make lattice and interaction plots; a great way to see how one variable affects another. We use our copter data again and another large data set that comes with R. require(ggplot2) require(viridis) require(lattice) # assuming it is already installed demo(lattice) # then hit Enter in the Console to click thru example lattice graphs # set your working directory, download copter data (if you don't already have it): # https://sciences.ucf.edu/biology/d4lab/wp-content/uploads/sites/23/2021/09/helicopter-data.csv # Here I assume you called it "data". Do not attach because we use a few data sets, so we have to specify which data we use each time. # It will be easier if we designate our groups, IDs, and Fold as categorical factors: data$fID <- factor(ID) # this converts the continuous variable into a factor (i.e., category) # Notice that Groups and Folds are named with letters - R will already know those are factors # Let's try a lattice graph for some copter data – a density plot, which shows a histogram. # We'll use Time data: densityplot(~data$Time) # Now alter that command to show "bell curves" per ID densityplot(~data$Time | data$fID) # This shows "bell curves" per categorical design (ID), using | to say "for each" # What about for Groups? # Is each data set "normal-ish"? This above logical step is essential to compare categories and often mistaken: we need to # evaluate normality OF EACH SET. NOT(!!) the overall pattern --ACROSS-- all sets. This comes up in following weeks, too. # How about scatter plots per each combination of Wing Length and Group? xyplot(data$Time ~ data$WL | data$Group, layout=c(1,3)) # Does the effect of Wing Length on Time look consistent among groups? # Ooh! Ooh! I know! How about a 3D scatter plot of Time ~ Wing, Step, and Fold? cloud(data$Time~data$WL*data$Step | data$Fold) # You get the idea – you have lots of options in lattice # Perhaps more useful will be grids of multiple variable. For example: a Scatter Plot Matrix is Very Handy For First Squints at Data Patterns! splom(data[c(3,4,7:8)]) # using only columns 3, 4, 7 & 8. The 2nd row (Time) is most relevant as your Y axis. # What if we had more complex data, like the mtcars data set in R? This has a bunch of measures of 1993 cars. # To see the first few rows in that data set,type head(mtcars) # note that car names are row labels, not a first column # Make a big grid of scatter plots for all those data using splom like above. # If you wanted to predict mpg, which variables might be most useful? # So winnow that list down to the best apparent predictors of mpg and make a smaller scatterplot grid. # Do some of these potential predictors look closely correlated themselves (i.e. are they redundant)? If yes, this is called collinearity – something to avoid. For now select predictors that do not seem to correlated. # Now make a prettier ggplot with those. Here is some example code to work with - ### TWEAK THIS ### to make your own plot and play with the code: ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point(aes(color = disp), size=5, alpha = 0.5,show.legend=T) + geom_smooth(method = lm, col="black") + theme_classic(base_size = 20) + scale_color_viridis(discrete=F, option= 'D') + # NOTICE the discrete bit here? theme(axis.text = element_text(size = 14)) + # Too Big? Change the font size here labs(x = 'Car weight (1000 lbs)', y = 'miles per gallon') # Can you make a similar plot for the helicopter data? ##### Interaction plots help understand interactive effects between experimental treatments. # Did one treatment change the effect of the other treatment? This is important for factorial experiments, and even sampling # designs across complex landscapes, where factors may counter-act each other to yield complex outcomes. ### An interaction is represented by crossed lines - or even a sideways V. Why? # Let's explore the possible interaction between Wing length and Folding in our copter data. # This assumes treatments are categories. We already made categorical treatments with the factor commands at the start. interaction.plot(data$Fold,data$WL,data$Time,type="b") # Did wing folding and wing length interact to alter each other's effects? # What if you replace Fold with other ## Factors ##?