##### GLMs and multiple regressions and model selection... we're having fun now!

# This data set represents counts of amphbians killed by cars at 52 sample points on roads.
# The goal: to reliably predict amphibian roadkill, and thus find ways to mitigate it.
# But there are 17 explanatory variables - too many to explain only 52 data points when all is done.
# So you have to narrow down the list to no more than 5-6 likely predictors.

### Import and attach the data: amphibRK.csv on the course web site
attach(amphibRK)

### Use the lattice package or the pairs command to plot variables together in a grid - squint at patterns:
# Variables represent landcover surrounding roads (within a "buffer" distance - a GIS thing), or distances to habitats:
# D.PARK - distance to a park
# D.POND - distance to nearest pond or lake
# D.STREAM - distance to nearest stream
# PONDS - area (ha) of ponds and lakes in buffer
# OPEN.L - area (ha) of open land in buffer
# SHRUB - area (ha) of shrub land in buffer
# URBAN - area (ha) of urban land in buffer
# L.P.ROAD - length (km) of paved road in buffer
# L.D.ROAD - length (km) of dirt road in buffer
# L.STREAM - length (km) of streams in buffer
# L.SDI - landscape habitat diversity, etimated as Shannon diversity index

# Make a list of potential explanatory factors in a simple lm. Can you cross some off your list due to multicollinearity?
library(car)
vif(lm(HITS~D.PARK+D.POND+D.STREAM+PONDS+OPEN.L+SHRUB+URBAN+L.P.ROAD+L.D.ROAD+L.STREAM+L.SDI))

# Now make some hypotheses. For example, maybe you could organize them by upland vs. aquatic habitat variables.
# Or distance to habitats vs. distance or area inside buffers.
# Be creative!

# Now build and compute GLMs for your hypotheses. Here's two examples for format (but you can make better models):

pondsmodelp1 <- glm(HITS~D.POND+PONDS, family=poisson)

# Alternatively, use the slick, tres chic glmmADMB package to be able to run both Poisson and negative binomials.
library(glmmADMB)
# For example;
pondsmodeln <- glmmadmb(HITS~D.POND+PONDS, family="nbinom")

### But Note: you should probably stick with one package.
pondsmodelp2 <- glmmadmb(HITS~D.POND+PONDS, family="poisson")
# Notice that glm and glmmadmb find different AIC values for the same data, same distribution:

# Compare your models with AICc, for example using bbmle 
library(bbmle)
AICctab(pondsmodelp1,pondsmodeln,pondsmodelp2, base=T,delta=T,weights=T)

# And evaluate residuals
plot(pondsmodelp1)
# Note: glmmadmb does not make simple residual plots like lm and glm do. Instead, do something like this:
p2res <- residuals(pondsmodelp2)
p2pred <- predict(pondsmodelp2)
plot(p2res~p2pred)

### One last (important) squint at patterns: make a plot of predicted vs. observed values, such as this:
plot(p2pred~HITS) # a good fit to the data should be close to a straight line

### What variables best predict amphibian roadkill?