-
Notifications
You must be signed in to change notification settings - Fork 0
/
EEMB_Final_Project.Rmd
295 lines (228 loc) · 24.2 KB
/
EEMB_Final_Project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
---
title: "Moving On Up: Coral Growth in Moorea"
author: "Final Project For EEMB 146"
date: "Allison Kelton"
output:
html_document: default
pdf_document: default
---
## Abstract
This analysis is motivated by an enthusiasm for tropical marine life and a hope to better understand the dynamics of coral reef life on the island of Moorea. The data examined is composed of two categorical and two quantitative variables all randomly sampled. The data was assessed by performing a two-way ANOVA test using the categorical variables, timepoint and location, and a linear regression to determine the predictability of quantitative variable change in coral diameter via the use of the other, change in reef biomass. It was found that both timepoint and location have a significant effect on the change in coral diameter and that the change in biomass can be used to predict the change in coral diameter. The results explain the effect of local abiotic conditions on wild coral and the way in which a coral's biomass is influenced by its net change in growth.
## Introduction
This dataset was gathered by researchers at the University of Rhode Island in conjunction with the Moorea Coral Reef Long Term Ecological Research program, or the MCR LTER. This dataset focuses on coral in Moorea, a small island in French Polynesia, and their change in biomass over a short period of time in order to assess how the species appears to be faring in its current environment. The data was generated by labeling and taking 240 measurements of the local coral on two separate occasions in order to observe how the coral's biomass and diameter have changed since the first measurement. Knowing how the biomass and diameter of the coral has changed is essential to understanding if the reef habitat has stopped being conducive to coral life. If this is the case, investigations to pinpoint the disturbance and explore potential origins and remedies can begin immediately (Edmunds and Putnam, 2020). The studies can also reveal if one type of habitat is more suitable for coral growth, and if so, what the conditions are that make it preferential to locally growing coral species.
The goal for this analysis is to evaluate the MCR LTER dataset in order to investigate which independent variables or interactions between independent variables have a significant effect on the change in coral diameter, and which, if any, can be used to predict it. I hypothesize that timepoint and location will have a significant effect on the change in coral diameter, but the interaction between timepoint and location will not. I also hypothesize that the measured change in coral biomass will be able to predict the measured change in coral diameter.
## Exploratory Data Analysis
#### Boxplot
To graph the data meaningfully, I created a boxplot that compares the difference in coral diameter growth when examining shallow versus deep water and initial versus final measurements for both locations. The boxplot shows that deep water coral exhibited a larger range in diameter change than shallow water did for both initial and final measurements. It also shows that, for both deep and shallow water locations, coral exhibited overall larger diameters in their final measurements than when compared to their initial measurements. Each boxplot has at least 1 outlier, but they are few in numbers and the overall data appears to be consistent.
```{r, include=FALSE}
library(car)
library(psych)
library(stats)
library(multcomp)
library(ggplot2)
```
```{r warnings = FALSE, echo=FALSE, output=FALSE, fig.show="hold", fig.cap="Figure 1. Boxplot of change in coral diameter in centimeters by depth and timepoint. Final timepoint of deep water coral has the highest number of outliers, but all data remain approximately normal according to the Central Limit Theorem."}
#Load Coral Data
coral_data <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data <- coral_data[,-3]
coral_data1 <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data1 <- coral_data1[,-4]
names(coral_data)[3]<-"Coral_Diameter"
names(coral_data1)[3]<-"Coral_Diameter"
coral_data <- rbind(coral_data, coral_data1)
coral_data$Timepoint <- c(rep("Final",240), rep("Initial",240))
#Explore The Data
boxplot(Coral_Diameter~Location+Timepoint ,data=coral_data,
xlab="Location in Water", names=c("Final (Deep)", "Final (Shallow)", "Initial (Deep)", "Initial (Shallow)"),
ylab="Coral Diameter (Centimeters)",col='lightskyblue',
main="Boxplot of Growth in Coral")
```
#### Histograms
A histogram was used to explore the distribution of the coral diameter data. Upon viewing the histogram, the data appears normal due to its bell-like curve shape and exhibits no outliers. The data shows ever-so-slight right skew towards the right edge of the graph, but it is weak and the data's adherence to the Central Limit Theorem assures us that the data can be considered approximately normal. The Central Limit Theorem states that data with a sufficiently large random sample can be considered approximately normal, and this dataset has a large random sample of 240 data points.
```{r warnings = FALSE, echo=FALSE, output=FALSE, fig.show="hold", fig.cap="Figure 2. Histogram of change in coral diameter data. The data closely follows a bell-shaped curve with a very slight skew to the right that can be considered insignificant as per the Central Limit Theorem."}
#Load Coral Data
coral_data <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data <- coral_data[,-3]
coral_data1 <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data1 <- coral_data1[,-4]
names(coral_data)[3]<-"Coral_Diameter"
names(coral_data1)[3]<-"Coral_Diameter"
coral_data <- rbind(coral_data, coral_data1)
coral_data$Timepoint <- c(rep("Final",240), rep("Initial",240))
#Explore The Data
hist(coral_data$Coral_Diameter,
xlab= "Change in Diameter of Coral (Centimeters)",
main= "Histogram of Coral Diameter Growth",
col="lightslateblue")
```
A histogram was also used to explore the distribution of the reef biomass data. The histogram shows that the data exhibit right-leaning skew. However, it was found that attempts to normalize the data via logarithm and square transformations of the datapoints only increased the severity of right skew; the data appear most normal when left untransformed. Due to the high number of datapoints in the random sample, the data satisfies the criteria of the Central Limit Theorem and thus can still be considered approximately normal.
```{r warnings = FALSE, echo=FALSE, output=FALSE, fig.show="hold", fig.cap="Figure 3. Histogram of change in reef biomass data. The data have right skew, but mostly closely follow a normal distribution when in the untransformed state. The data can be considered approximately normal under the Central Limit Theorem."}
#Load Coral Data
coral_data <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data <- coral_data[,-3]
coral_data1 <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data1 <- coral_data1[,-4]
names(coral_data)[3]<-"Coral_Diameter"
names(coral_data1)[3]<-"Coral_Diameter"
coral_data <- rbind(coral_data, coral_data1)
coral_data$Timepoint <- c(rep("Final",240), rep("Initial",240))
#Explore The Data
hist(coral_data$Biomass.mg.cm2,
xlab= "Change in Biomass of Reef (mg/cm squared)",
main= "Histogram of Reef Biomass Growth",
col="mediumspringgreen")
```
## Statistical Methods
#### ANOVA
An ANOVA test is a statistical test that assesses how a quantitative variable changes according to categorical variables.
Running an anova test will allow me to analyze how coral diameter changes according to timepoint and location. For this ANOVA test, coral diameter is the dependent variable, and timepoint and location are the independent variables. The timepoint variable has two levels, initial measurement and final measurement. The location variable also has two levels, shallow water and deep water.
The ANOVA test has three assumptions. The ANOVA test assumes homogeneity of variance, normality of residuals, and random sampling. Variance is defined by the difference between the collected datapoints and the mean of the dataset and can be assessed by graphing the residuals. Residuals are defined by the difference between the experimental and theoretical data. The theoretical data is represented by a red trendline that runs through the residual graph.
The homogeneity of variance assumption is tested by the "Residuals vs. Fitted" plot, and the assumption is shown to be met by the datapoints on the graph being of relatively equal breadth about the 0 line, as well as the residuals having no distinct pattern about the 0 line, which provides evidence for the equality of variances.
The normality of residuals is tested and shown to be met by the "Normal Q-Q" plot, as when looking at the plot, it can be seen that the datapoints adhere closely to the trendline. There are several outliers, but given the general adherence to the trendline and large dataset, the residuals can be assumed to be normal.
The random sampling assumption is met because all coral datapoints were randomly sampled from the Moorea coral reef.
The null hypothesis states that the p-value is greater than 0.05 (p>0.05), which means that the independent variable or interaction does not have a significant effect on the change in coral diameter. The alternative hypothesis states that the p-value is less than 0.05 (p<0.05), which means that the independent variable or interaction does have a significant effect on the change in coral diameter.
#### Linear Regression
A linear regression is a mathematical model used to perform predictive analysis. In this study, performing a linear regression using my dataset allows me to understand whether or not the change in reef biomass is able to predict the change in coral diameter.
The linear regression model has three assumptions. It assumes that the data come from a random sample, the Y variable is normally distributed with equal variance for all values of X, and the residuals are normal. The random sampling assumption is met because the data come from a random sample taken from the Moorea coral reef.
The equality of variance assumption is tested by plotting the residuals and assessing the "Residuals vs. Fitted" plot. The data appears slightly heteroskedastic. However, running transformations to attempt to normalize the data only increased the heteroskedaticity, so the most homoskedastic model is the untransformed one. Due to this, as well as the very large, random sample size, we can still consider the assumption met with the acknowledgment of the slight heteroskedasticity.
The normality of the residuals is tested and shown to be met by the "Normal Q-Q" plot, as when looking at the plot, it can be seen that the datapoints adhere closely to the trendline. There are several outliers, but given the general high adherence to the trendline and large dataset, the residuals can be assumed to be normal.
The null hypothesis for this linear regression states that the p-value is greater than (p>0.05), which means that the change in coral diameter cannot be predicted by the change in reef biomass. The alternative hypothesis states that the p-value is less than 0.05 (p<0.05), which means that the change in coral diameter can be predicted by the change in reef biomass.
## Results
#### ANOVA
The two-way ANOVA test yields three p-values. Every p-value is compared to an alpha level of 0.05. P-values below the alpha level result in a rejection of the null hypothesis, and p-values above the alpha level result in a failure to reject the null hypothesis.
The ANOVA assessment of how the change in coral diameter changed according to the location variable yields a p-value of 0.000192. Due to the small nature of the p-value, I reject the null hypothesis and conclude that location does have a significant effect on the change in coral diameter.
The ANOVA assessment of how the change in coral diameter changed according to the timepoint variable yields a p-value of 0.003029. Due to the small nature of the p-value, I reject the null hypothesis and conclude that the timepoint does have a significant effect on the change in coral diameter.
The ANOVA assessment of how the change in coral diameter changed according to the interaction between timepoint and location yields a p-value of 0.604195. Due to the large nature of this p-value, I fail to reject the null hypothesis and conclude that the interaction between timepoint and location does not have a significant effect on the change in coral diameter.
```{r warnings = FALSE, echo=FALSE, output=FALSE, fig.show="hold", fig.cap="Figure 4. Residual plots for the ANOVA test. The residuals vs. fitted plot exhibits the homoskedasicity required to assume equality of variances due to the breadth and distribution of the data points. The normal Q-Q plot exhibits the normality required to assume normality of residuals via the close alignment of the datapoints to the generated trendline."}
#Load Coral Data
coral_data <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data <- coral_data[,-3]
coral_data1 <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data1 <- coral_data1[,-4]
names(coral_data)[3]<-"Coral_Diameter"
names(coral_data1)[3]<-"Coral_Diameter"
coral_data <- rbind(coral_data, coral_data1)
coral_data$Timepoint <- c(rep("Final",240), rep("Initial",240))
#Two Way ANOVA Test
fit1 <- aov(Coral_Diameter ~ Location*Timepoint, data=coral_data)
par(mfrow=c(2,2))
plot(fit1)
```
```{r warnings = FALSE, echo=FALSE, output=FALSE, fig.show="hold", fig.cap="Figure 5. A ggPlot visualizing the results of the ANOVA test. The plot shows that deep water coral experiences higher amounts of coral growth than shallow water. The plot also shows that, regardless of location, coral diameter was found to be generally larger upon the final measurement than the initial measurement."}
#Load Coral Data
coral_data <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data <- coral_data[,-3]
coral_data1 <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data1 <- coral_data1[,-4]
names(coral_data)[3]<-"Coral_Diameter"
names(coral_data1)[3]<-"Coral_Diameter"
coral_data <- rbind(coral_data, coral_data1)
coral_data$Timepoint <- c(rep("Final",240), rep("Initial",240))
#ggPlot The Data
ggplot(coral_data, aes(x=Coral_Diameter, y=Location, color=Timepoint))+
geom_point(alpha=0.8)+
geom_boxplot(alpha=0.5)+
labs(title="ggPlot of ANOVA Test", x="Change in Coral Diameter (Centimeters)", y="Water Location")
```
#### Linear Regression
The linear regression yields a p-value for the relationship between the change in coral diameter and the change in reef biomass over a period of time. The p-value is compared to an alpha level of 0.05. P-values below the alpha level result in a rejection of the null hypothesis, and p-values above the alpha level result in a failure to reject the null hypothesis.
The linear regression model for change in coral diameter and change in reef biomass yields a p-value of 0.000816. Due to the small nature of the p-value, I reject the null hypothesis and conclude that the change in reef biomass can be used to predict the change in coral diameter.
```{r warnings = FALSE, echo=FALSE, output=FALSE, fig.show="hold", fig.cap="Figure 5. Residual plots for the linear regression. The residuals vs. fitted plot exhibits slight heteroskedasicity due to the slight cone shape of the distribution of data points. The data were not made more homoskedastic by transformations, and the dataset is large, so the data can be considered normal despite the minor anomaly. The normal Q-Q plot exhibits the normality required to assume normality of residuals via the close alignment of the datapoints to the generated trendline."}
#Load Coral Data
coral_data <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data <- coral_data[,-3]
coral_data1 <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data1 <- coral_data1[,-4]
names(coral_data)[3]<-"Coral_Diameter"
names(coral_data1)[3]<-"Coral_Diameter"
coral_data <- rbind(coral_data, coral_data1)
coral_data$Timepoint <- c(rep("Final",240), rep("Initial",240))
#Linear Regression
coral.lm <- lm(Coral_Diameter ~ Timepoint+Biomass.mg.cm2, data=coral_data)
par(mfrow=c(2,2))
plot(coral.lm)
```
```{r warnings=FALSE, echo=FALSE, output=FALSE, fig.show="hold", fig.cap="Figure 6. A ggPlot visualizing the results of the linear regression. The plot shows that the datapoints and trendline for change in reef biomass very closely align with the datapoints and trendline for change in coral diameter, which evidences the idea that change in reef biomass can serve as a good predictor for change in coral diameter.", message=FALSE}
#Load Coral Data
coral_data <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data <- coral_data[,-3]
coral_data1 <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data1 <- coral_data1[,-4]
names(coral_data)[3]<-"Coral_Diameter"
names(coral_data1)[3]<-"Coral_Diameter"
coral_data <- rbind(coral_data, coral_data1)
coral_data$Timepoint <- c(rep("Final",240), rep("Initial",240))
#ggPlot The Data
ggplot(coral_data, aes(x=Coral_Diameter, y=Biomass.mg.cm2, color=Timepoint))+geom_point()+geom_smooth(method="lm")+
labs(title="ggPlot of Linear Regression", x="Change in Coral Diameter (Centimeters)", y="Change in Reef Biomass (mg/cm squared)")
```
## Discussion
The results of the ANOVA test demonstrate that the depth at which the coral grows and the passage of time both contribute significantly to the growth in diameter of a coral colony. Naturally, this supports the idea that coral expand upon themselves given the proper amount of time, as their mechanism of growth is slow but steady additions to their calcium carbonate exoskeleton (Hutson, 1985). These results also support the idea that coral growth improves when the coral is located at deeper depths in the water, which speaks to the effect of erosion on coral colonies. In the environment, the constant tidal action of waves (especially during tropical storms) beating against the exoskeleton of corals erodes the reef by breaking pieces off and shrinking the overall size of the colony (Hutson, 1985). Thus, coral located at a deeper depths in the water are able to better escape the violence of tidal action by residing beneath it, leading to less frequent erosion. The lower frequency of erosion allows for an overall larger net gain in coral diameter for deep water corals versus shallow water corals when recorded over a period of time.
The ANOVA test for the interaction between timepoint and location as it affects change in coral diameter was insignificant. However, this is expected, as in the natural world the interaction of the two variables has no physical effect on the growth of a coral colony, rendering it unable to alter the changing diameters of wild corals.
The results of the linear regression demonstrate that the change in reef biomass can be used to predict the change in coral diameter. This supports the idea that as coral diameter increases, biomass increases as well, but stagnant or decreasing biomass could mean no or negative change in coral diameter after a period of time. Naturally, as coral diameter increases, its exoskeleton becomes larger, which increases the mass of the organism used in calculating biomass. So, adverse environmental conditions such as low sunlight or increased water acidity could lead to little to no coral growth or even coral loss, as the bigger corals become unable to support themselves and the smaller corals cannot gather enough energy to experience a net gain in diameter (Edmunds and Putnam, 2020). Subsequently, we could expect to see that the lack of change or negative change in coral biomass would predict and justify the same change in coral diameter under the environmental conditions.
Limitations to this analysis are present. Given the extreme sensitivity and variety in coral reefs all over the world, the conclusions I have gathered for coral in Moorea may not apply to coral in different parts of the ocean who thrive best under different conditions or are subject to different environmental factors. Given more time and data, I would run similar assessments on coral in different parts of the world and examine an additional variable, sunlight intensity, in order to allow the results to be more broadly applicable to worldwide coral reefs.
The final takeaway presented by this study is that the keystone species coral can be easily influenced by the abiotic conditions of its habitat, making for all different kinds of living experiences depending on the place the sessile coral settle in. The overall status of the coral reef can be used to investigate coral on a more individual scale that tracks the biological changes of the polyps.
## References
H. Wickham. ggplot2: Elegant Graphics for Data
Analysis. Springer-Verlag New York, 2016.
Huston, M. Variation in coral growth rates with depth at Discovery Bay, Jamaica. Coral Reefs 4, 19–25 (1985). https://doi.org/10.1007/BF00302200
John Fox and Sanford Weisberg (2019). An {R} Companion
to Applied Regression, Third Edition. Thousand Oaks
CA: Sage. URL: https://socialsciences.mcmaster.ca/jfox/Books/Companion/
Moorea Coral Reef LTER, P. Edmunds, and H. Putnam. 2020. MCR LTER: Coral Reef: Porites biomass data in support of Edmunds and Putnam Roy. Soc. Biology Letters 2020 ver 11. Environmental Data Initiative. https://doi.org/10.6073/pasta/643be961dc6ba5791023a0526b6ceef4 (Accessed 2020-09-10).
R Core Team (2020). R: A language and environment for
statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. URL: https://www.R-project.org/.
Revelle, W. (2020) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version = 2.0.7,.
Torsten Hothorn, Frank Bretz and Peter Westfall (2008). Simultaneous Inference in General Parametric Models. Biometrical Journal 50(3), 346--363.
## Appendix
```{r Coral}
#Load Coral Data
coral_data <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data <- coral_data[,-3]
coral_data1 <- read.csv("C:\\Users\\akelt\\OneDrive\\Documents\\R\\coral_data.csv")
coral_data1 <- coral_data1[,-4]
names(coral_data)[3]<-"Coral_Diameter"
names(coral_data1)[3]<-"Coral_Diameter"
coral_data <- rbind(coral_data, coral_data1)
coral_data$Timepoint <- c(rep("Final",240), rep("Initial",240))
#Explore The Data
boxplot(Coral_Diameter~Location+Timepoint ,data=coral_data,
xlab="Location in Water", names=c("Final(Deep)", "Final(Shallow)", "Initial(Deep)", "Initial(Shallow)"),
ylab="Coral Diameter (Centimeters)",col='lightskyblue',
main="Boxplot of Growth in Coral")
hist(coral_data$Coral_Diameter,
xlab= "Change in Diameter of Coral (Centimeters)",
main= "Histogram of Coral Diameter Growth",
col="lightslateblue")
hist(coral_data$Biomass.mg.cm2,
xlab= "Change in Biomass of Reef (mg/cm squared)",
main= "Histogram of Reef Biomass Growth",
col="mediumspringgreen")
#Two Way ANOVA Test
fit1 <- aov(Coral_Diameter~Location*Timepoint, data=coral_data)
par(mfrow=c(2,2))
plot(fit1)
summary(fit1)
#Graphically Represent ANOVA Results
ggplot(coral_data, aes(x=Coral_Diameter, y=Location, color=Timepoint))+
geom_point(alpha=0.8)+
geom_boxplot(alpha=0.5)+
labs(title="ggPlot of ANOVA Test", x="Change in Coral Diameter (Centimeters)", y="Water Location")
#Linear Regression
coral.lm <- lm(Coral_Diameter ~ Timepoint+Biomass.mg.cm2, data=coral_data)
par(mfrow=c(2,2))
plot(coral.lm)
summary(coral.lm)
#Graphically Represent Linear Regression Results
ggplot(coral_data, aes(x=Coral_Diameter, y=Biomass.mg.cm2, color=Timepoint))+
geom_point()+
geom_smooth(method="lm")+
labs(title="ggPlot of Linear Regression", x="Change in Coral Diameter (Centimeters)", y="Change in Reef Biomass (mg/cm squared)")
#Cite Packages
citation("psych")
citation("car")
citation("stats")
citation("multcomp")
citation("ggplot2")
```