##### Generalized Linear Models Lab 1 #####

# Today we keep to a simple data set, with the point to explore different distribution families
# in GLM and find the most plausible one.
# Next class we turn it up a notch and explore more complex model options.

# The idea of a cancer cluster is that an area has more cancer cases than would be expected. 
# If true, cancer rates should decline with distance away from a cluster, and indicate a cluster
# centered around point zero, perhaps due to environmental contamination.

# The clusters.txt data set (on the course web page) includes two columns; cancers (number of cancers
# reported per year per clinic) and distance from a nuclear plant (km).

attach(clusters)
# First evaluate the cancers distribution with a histogram
# Will a log-transform help? (try adding 1 to values to include zeroes in the transform)

# Now let's try some regressions
# We can already expect problems with plain ol lm, but run one to include as a reference for GLMs.
# Call that lm modelA.

# Now some GLMs. Here's the first one (a poisson) - then you follow with some other family and link combinations
# of your choice, such gamma with link=log, or quasipoisson, etc.
modelB <- glm(Cancers~Distance, family=poisson)
summary(modelB)
par(mfrow=c(2,2)) # to set up 2x2 residuals plots
plot(modelB)

par(mfrow=c(1,1)) # to go back to one plot
plot(Cancers~Distance, cex=0.5, col="gray") # scatterplot of data
lines(predict.glm(modelB)~Distance) # plot model prediction on top of data
# Now run a few more of your choice. Use the Help for glm to ID families and links to use.

#Having run a few GLMs, which one worked best? Find out with AIC (insert more models you made)
library(bbmle)
AICctab(modelA,modelB, base=T,delta=T,weights=T) #include additional models you made

# Still wish we also had a R^2 to report? Try this:
library(MuMIn) # you may install it first
# This command reports two values: the first is for fixed effects, the second is for fixed + random effects.
# In the absence of random effects (as here), the values match
r.squaredGLMM(modelA) 

# Now let's try these again but using a package which is fast becoming one of my favorites: glmmADMB.
# That mouthful is based on the ADMB program, with glmm adapted to R.
#Load it if it is already installed, and if not, go to http://glmmadmb.r-forge.r-project.org/ and follow Installation instructions

# For this simple data set, we can ignore many command options and run with this:
modelX <- glmmadmb(Cancers~Distance, data=clusters, family="poisson") # which should be identical to the poisson model above
#or
modelY <- glmmadmb(Cancers~Distance, data=clusters, family="nbinom") # which uses a negative binomial distribution.
# This is not available in plain ol glm, but should be our default for count data, according to O'Hara & Kotze 
# (2010; http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2010.00021.x/full)

# Does the negative binomial distribution work "better" than the other models?
# Finally, based on your most plausible model, is there a significant decay of Cancers with Distance?
# Imagine being an expert witness in court - would you be willing to argue for / against a lawsuit based on this?