##### T TESTS #####
# A t-test compares two distributions to test the null hypothesis that the two means are not
# different. t-Tests are classic, related to ANOVAs, and still legit for simple comparisons. Here
# we explore a number of binary (two condition) potential drivers of birth weights in North
# Carolina.

# Here we use data collected in 2004 in North Carolina related to births there. 
# Import the file (NC births data on our web site).
# This data set is useful to researchers studying predictors of newborn weights,
# which itself predicts infant health. 

### We will test the following three hypotheses:
# A. Smoking by pregnant women causes their newborn babies to weigh less at birth than babies
# born to mothers who do not smoke. This would support efforts to reduce smoking.

# B. Babies born to married women weigh more on average than those born to unmarried
# mothers (note that same-sex marriage was not legal in NC in 2004). This would support
# need-based prenatal programs (where single- vs. dual income is assumed to affect access to
# health care, etc.).

# C. Babies born to white mothers weigh more on average than those born to non-white mothers.
# This would support prenatal programs in minority-dominated areas to improve
# newborn health (because race remains a proxy for economic disadvantages, health care
# access, etc.).
### Notice that we focus on ### Quantitative Responses ### to Categorical Predictors ###.

# Load the ncbirths data set, then attach it for convenience, and view it. Variables are:
# fage = father's age
# mage = mother's age
# mature = category for mother's age
# weeks = pregnancy interval
# premie = category for weeks
# visits = number of prenatal doctor visits
# marital = legal marital status
# gained = weight gained while pregnant (lbs)
# weight = baby's weight at delivery (lbs)
# lowbirthweight = category for weight
# gender = baby's sex
# habit = mother is smoker or not
# whitemom = mother is white or not

summary(ncbirths) # to inspect the data - any problems with the data columns we will use?
# What to do with NAs? - include this line in commands below: na.rm=TRUE
# this essentially says “NA removal = true” and omits NAs from analyses

### Assumptions
# Do the data we use here (weights) fit assumptions of normality [and homogeneous variance]
# for each comparison we will make (habit, marital, whitemom)?
# If not, make data fit assumptions using transformations, as you already learned to do.

# Here's the scoop on t-tests. Unlike some statistical packages, the default assumes unequal
# variance (convenient!) and applies the Welsh df modification. The basic commands are:
# independent 2-group t-test:
t.test(y~x) # where y is numeric and x is a binary factor
# independent 2-group t-test:
t.test(y1,y2) # where y1 and y2 are numeric
# paired t-test:
t.test(y1,y2,paired=TRUE) # where y1 & y2 are numeric
# one sample t-test:
t.test(y,mu=0) # Ho: mu=0

# where options include:
# the var.equal = TRUE statement, inside (), to specify equal variances
# the alternative="less" or alternative="greater" option to specify a one tailed test, as opposed
# to the default alternative="two.sided". Notice that the order of subtraction for that option is
# alphabetical (e.g., nonsmoker – smoker).

# Now test the three hypotheses using t-tests, where ### YOU CHOOSE ### appropriately among the
options above.

# Try one more hypothesis that you make up.

# One last consideration: we have now tried (at least) 4 different t-tests on birth weight. With
# enough attempts, we might eventually stumble on a significant effect at random. Thus a
# Bonferroni correction: where we adjust the critical p-value to find significance for the number
# of t-tests we conduct.
# So if we stick to just the three hypotheses (A-C), the Bonferonni correction would lead to a
# critical p-vale of 0.05 / 3 = 0.0167. Thus any one test would have to attain a p-value of 0.0167
# or less to be considered signficant, rather than the customary 0.05. Did this change any of your
# interpretations?