##### Generalized Linear Models Lab 1 ##### # Today we keep to a simple data set, with the point to explore different distribution families # in GLM and find the most plausible one. # Next class we turn it up a notch and explore more complex model options. # The idea of a cancer cluster is that an area has more cancer cases than would be expected. # If true, cancer rates should decline with distance away from a cluster, and indicate a cluster # centered around point zero, perhaps due to environmental contamination. # The clusters.txt data set (on the course web page) includes two columns; cancers (number of cancers # reported per year per clinic) and distance from a nuclear plant (km). attach(clusters) # First evaluate the cancers distribution with a histogram # Will a log-transform help? (try adding 1 to values to include zeroes in the transform) # Now let's try some regressions # We can already expect problems with plain ol lm, but run one to include as a reference for GLMs. # Call that lm modelA. # Now some GLMs. Here's the first one (a poisson) - then you follow with some other family and link combinations # of your choice, such gamma with link=log, or quasipoisson, etc. modelB <- glm(Cancers~Distance, family=poisson) summary(modelB) par(mfrow=c(2,2)) # to set up 2x2 residuals plots plot(modelB) par(mfrow=c(1,1)) # to go back to one plot plot(Cancers~Distance, cex=0.5, col="gray") # scatterplot of data lines(predict.glm(modelB)~Distance) # plot model prediction on top of data # Now run a few more of your choice. Use the Help for glm to ID families and links to use. #Having run a few GLMs, which one worked best? Find out with AIC (insert more models you made) library(bbmle) AICctab(modelA,modelB, base=T,delta=T,weights=T) #include additional models you made # Still wish we also had a R^2 to report? Try this: library(MuMIn) # you may install it first # This command reports two values: the first is for fixed effects, the second is for fixed + random effects. # In the absence of random effects (as here), the values match r.squaredGLMM(modelA) # Now let's try these again but using a package which is fast becoming one of my favorites: glmmADMB. # That mouthful is based on the ADMB program, with glmm adapted to R. #Load it if it is already installed, and if not, go to http://glmmadmb.r-forge.r-project.org/ and follow Installation instructions # For this simple data set, we can ignore many command options and run with this: modelX <- glmmadmb(Cancers~Distance, data=clusters, family="poisson") # which should be identical to the poisson model above #or modelY <- glmmadmb(Cancers~Distance, data=clusters, family="nbinom") # which uses a negative binomial distribution. # This is not available in plain ol glm, but should be our default for count data, according to O'Hara & Kotze # (2010; http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2010.00021.x/full) # Does the negative binomial distribution work "better" than the other models? # Finally, based on your most plausible model, is there a significant decay of Cancers with Distance? # Imagine being an expert witness in court - would you be willing to argue for / against a lawsuit based on this?