##### GLMs and multiple regressions and model selection... we're having fun now! # This data set represents counts of amphbians killed by cars at 52 sample points on roads. # The goal: to reliably predict amphibian roadkill, and thus find ways to mitigate it. # But there are 17 explanatory variables - too many to explain only 52 data points when all is done. # So you have to narrow down the list to no more than 5-6 likely predictors. ### Import and attach the data: amphibRK.csv on the course web site attach(amphibRK) ### Use the lattice package or the pairs command to plot variables together in a grid - squint at patterns: # Variables represent landcover surrounding roads (within a "buffer" distance - a GIS thing), or distances to habitats: # D.PARK - distance to a park # D.POND - distance to nearest pond or lake # D.STREAM - distance to nearest stream # PONDS - area (ha) of ponds and lakes in buffer # OPEN.L - area (ha) of open land in buffer # SHRUB - area (ha) of shrub land in buffer # URBAN - area (ha) of urban land in buffer # L.P.ROAD - length (km) of paved road in buffer # L.D.ROAD - length (km) of dirt road in buffer # L.STREAM - length (km) of streams in buffer # L.SDI - landscape habitat diversity, etimated as Shannon diversity index # Make a list of potential explanatory factors in a simple lm. Can you cross some off your list due to multicollinearity? library(car) vif(lm(HITS~D.PARK+D.POND+D.STREAM+PONDS+OPEN.L+SHRUB+URBAN+L.P.ROAD+L.D.ROAD+L.STREAM+L.SDI)) # Now make some hypotheses. For example, maybe you could organize them by upland vs. aquatic habitat variables. # Or distance to habitats vs. distance or area inside buffers. # Be creative! # Now build and compute GLMs for your hypotheses. Here's two examples for format (but you can make better models): pondsmodelp1 <- glm(HITS~D.POND+PONDS, family=poisson) # Alternatively, use the slick, tres chic glmmADMB package to be able to run both Poisson and negative binomials. library(glmmADMB) # For example; pondsmodeln <- glmmadmb(HITS~D.POND+PONDS, family="nbinom") ### But Note: you should probably stick with one package. pondsmodelp2 <- glmmadmb(HITS~D.POND+PONDS, family="poisson") # Notice that glm and glmmadmb find different AIC values for the same data, same distribution: # Compare your models with AICc, for example using bbmle library(bbmle) AICctab(pondsmodelp1,pondsmodeln,pondsmodelp2, base=T,delta=T,weights=T) # And evaluate residuals plot(pondsmodelp1) # Note: glmmadmb does not make simple residual plots like lm and glm do. Instead, do something like this: p2res <- residuals(pondsmodelp2) p2pred <- predict(pondsmodelp2) plot(p2res~p2pred) ### One last (important) squint at patterns: make a plot of predicted vs. observed values, such as this: plot(p2pred~HITS) # a good fit to the data should be close to a straight line ### What variables best predict amphibian roadkill?