### Introduction

Theory informs us that tides are complex, influenced by geography of the undersea surface at the measuring station, gravitational forces of the Sun and Moon, and the oceanic basin a station is located in.^{[1], [2], [3]} This analysis attempted to determine if there is a season in which the *tidal range* for Northern Hemisphere observing stations is greatest. The measure *tidal range* was defined as *the difference in height between consecutive high and low waters*.^{[4]} The questions I sought to answer:

- Is the
*tidal range* in the Northern Hemisphere greatest during a solstice or an equinox?^{[5]} If so, which one?
- If a difference ws detected, does the season-to-season change fit a linear or non-linear pattern?

Theory says that the distance of a station from the Equator is just one factor in predicting a tide, and hence the *tidal range*. As a result, I didn't compare stations to each other without selecting a sufficiently large and random set of stations. Such a comparison was beyond the scope of this analysis. Instead, I aggregated water level observations to form a mean for a group of stations situated in a similar latitude, or region, and analyze the mean across seasons.

It might not actually be seasons that explain variation in *tidal range* but the degrees of tilt of the Earth's axis. I used seasons here as a proxy for the degrees of tilt.

### Materials and methods

I performed secondary data analysis on water level data collected and curated by the National Oceanic and Atmospheric Administration (NOAA).^{[6]} Because tide data is well kept, I did not expect variance in the data that was due to collection or curation methodologies. The datum I used was *MLLW* (mean lower low water), taken at six-minute intervals.

Data for 14 days before and 14 days after each solstice/equinox was be obtained. Each of the three data sets contained 29 days of data for each of the approximately 15 stations. This number of days, 29, was used to approximately match one, complete, 28-day lunar cycle, for each season.

The geodata for each station was converted from degrees, minutes, and seconds (DMS) to decimal using a freely available service^{[7]}. Conversion of this data is not algortihmically difficult and I assumed no variance was introduced into the data as a result of errors in the conversion.

The map figure shows the locations and names of the stations selected for use in the analysis. The meaningful classification was by latitude. No regard was made to longitude; the hemispheric division in the figure was an artifact of the R technique used for coding locations.

My null hypothesis was that the *tidal range* for a region was the same during each of the four seasons. Since I did not have an expectation that *tidal range* does vary from season-to-season, this was the appropriate statement. I formed a null hypothesis and then tested the data. At the end of testing, I was able to show show whether or not the station data provided evidence to support my null hypothesis.

The statement of null hypothesis and testing procedure was as follows:

- Fit the data to a linear model and perform a one-way ANOVA to obtain test statistics
- If significance was determined, proceed to identify the groups that are different and attempt to explain why they are different

The respective values of in each test were the *mean of the difference between respective high and low tides for one 29 day period for one region for one season*. The *season* was the factor (categorical predictor value) that was used for the ANOVA.

An assumption I made was that that water levels for the autumnal equinox and vernal equinox are the same. As a result, I acquired and used data only the vernal equinox.

Station locations were carefully selected. Where I had personal knowledge of the location, I made a judgement on whether or not to include the station. Where I did not have personal experience with the location of a station, I instead relied on Google Maps to identify physical features that that might introduce variance to observations. For example, stations near the Columbia Bar, near the entrance to San Francisco Bay, or at the head of a fjord in Alaska, were disregarded as candidate sites. The ideal site was one that had unimpeded ocean water flow. For this reason, the Harvest Oil Platform situated off the coast of Lompoc, CA, was considered ideal.

Counting a pairing of tides means that I started with the first high tide in the data for the station. It was possible that the acquired data did not include the same count of high and low tides. I took this into account by avoided pairings of tides by taking calculating the range as the difference between the mean high water and the mean low water. (See the *Appendix* for acknowledgement of an error in this approach.)

### Results

I began the review of my results by examining the data. The data shown in this table was aggregated from approximately 105,000 observations. The size of the resulting data set used in the linear models was only 39 records, 15 records for the low- and mid-latitude regions and 13 for the high latitude region. This table of descriptive statistics shows the data ordered by region and season:

To aid the reader in gaining some familiarity with the data, the columns can be interactively sorted by mouse clicks on the column header.

Interesting patterns are revealed when the data are plotted in a boxplot of season by region. I can see quite clearly that in this data set, the median tidal range increases as regions move from the a low latitude to a high latitude. The interquartile range (IQR) of the high latitude tidal range was far wider than the range of either of the other two regions. Very clearly, there was far more variation in the water level readings obtained for the stations in Alaska. I discuss these characteristics of this data set further in the *Discussion* section.

The summary statistics from my ANOVA are summarized as follows:

- Region
*low latitude*: F = 0.1204, with a p-value = 0.89.
- Region
*mid latitude*: F = 0.6631, with a p-value = 0.53.
- Region
*high latitude*: F = 0.0058, with a p-value = 0.99.

With one exception, the diagnostic plots, residuals against fitted values and qqnorm of residuals, appear to conform with the normality and similar variance across seasons. The exception was the high region that contains data from stations in Alaska.

It's possible that the non-normal distribution for the high region was explained in part be explained by the stations I selected. The stations include Prudhoe Bay and Nome, AK, sites that are known to have frozen ocean water at the site for several months of the year. I am not educated on the impact sea ice has on the stations being able to record the water level. Further, the three remaining sites for the high region are located along the eastern shore of the Gulf of Alaska, an area known to have a large tidal range.

Summarizing my results, the p-values for each of the three regions are very high, suggesting that it was not very unlikely to obtain the observed tidal ranges given the data available. I therfore **fail to reject** the null hyptohesis and conclude that there was not enough evidence to suggest that there was a relationship between the season and tidal range.

### Discussion

In looking closely at the number of observations, column `n`

, in the table of data at the beginning of the *Results* section, there are six stations that have a lower than expected number of tidal range pairings. I discuss this in the *Appendix*.

Notwithstanding, the approach was valid and the results are sufficiently lacking any evidence for an effect; I am not able to draw a conclusion that there was an effect on tidal range due to season.

The boxplot introduced in the *Results* section above shows an apparent pattern between latitude and tidal range. The pattern is shown here:

Visually inspecting this chart provides further confidence in my results that the season does not explain the variance in tidal range. But, the primary reason for sharing this chart is that it appears that there is a pattern of the tidal range increasing as the distance from the Equator increases, i.e., distance from the Equator would explain a significant portion of tidal range. However, I advise against studying this line of reasoning as there is not necessarily a correlation between latitude and tidal range. In the data selected for the high latitude region here, I know that three of the stations have a particularly large tidal range compared to the readings obtained at the average station.

An interesting next step in investigating this data would be to analyze the distribution of the times between tidal events. I know that natural phenomenon occur according to distributions that are understood and can be modelled. I would like to analyze this data set to see if the distribution of tidal events over time occur according to this distribution.

The ANOVA tables should be included in the next round of edits to this report.

Two covariates could have been done with a single ANOVA and a *region* x *season* interaction rather than doing a one-way ANOVA thress times.

### Sources and references

- Laplace theory of tides: http://en.wikipedia.org/wiki/Theory_of_tides
- Tidal nodes: http://en.wikipedia.org/wiki/Amphidromic
- Discussion of earth tides: http://en.wikipedia.org/wiki/Earth_tide
- See
*range of tide*, National Oceanic and Atmospheric Association (NOAA) glossary: http://tidesandcurrents.noaa.gov/glossary.html#R
- Discussion on tidal datum: http://tidesandcurrents.noaa.gov/publications/tidal_datums_and_their_applications.pdf
- Source data, retrieved by station location: http://tidesandcurrents.noaa.gov/
- Conversion of DMS to decimal for station locations: http://andrew.hedges.name/experiments/convert_lat_long/

### Appendix

The custom code written for this analsysis is included below. As discussed in *Conclusions*, further work is necessary to control for fluctations in water level near maxima and minima.

The approach taken in this analysis was to determine the local maxima and minima by comparing the mean of *x* number of future data points in the series to the current observation. Then skipping *y* number of observations to avoid falsely identifying the next maxima or minima. The ideal values for the two parameters *x* and *y* needs to be further evaluated and perhaps region needs to be taken into account.

```
# Build a vector of high water levels and a vector of low water levels. One value
# in each vector represents a high or low tide, respectively.
# Parameters: Vector of water levels.
# Preconditions: Vector passed in numeric. No validation is performed.
# Post conditions: None
# Returns: Index value of low tide.
buildWaterLevels <- function(waterLevel = waterLevel) {
lowWater <- rep(0, 60)
highWater <- rep(0, 60)
i <- 1 # index for high water
j <- 1 # index for low water
k <- firstLowTide(head(waterLevel, 500)) # index for water level
lowWater[j] <- waterLevel[k]
j <- j + 1
lookForHighTide <- TRUE # high tide is TRUE, low tide FALSE
moreObs <- TRUE
while (moreObs) {
if (lookForHighTide) {
if (waterLevel[k + 1] >= waterLevel[k]) {
k <- k + 20 # hack to move past fluctations that might result in false high/low tide
if (k < length(waterLevel) && waterLevel[k + 1] < waterLevel[k]) {
highWater[i] <- waterLevel[k]
i <- i + 1
lookForHighTide <- FALSE
}
}
} else {
if (waterLevel[k + 1] <= waterLevel[k]) {
k <- k + 20
if (k < length(waterLevel) && waterLevel[k + 1] > waterLevel[k]) {
lowWater[j] <- waterLevel[k]
j <- j + 1
lookForHighTide <- TRUE
}
}
}
if (k > length(waterLevel)) {
moreObs <- FALSE
}
}
if (i > j) {
highWater <- highWater[1:j - 1]
lowWater <- lowWater[1:j - 1]
} else {
highWater <- highWater[1:i - 1]
lowWater <- lowWater[1:i - 1]
}
return(list(highWater = highWater, lowWater = lowWater))
}
# Identify the first low tide in a set of water level data.
# Parameters: Vector of water levels.
# Preconditions: Vector passed in numeric. No validation is performed.
# Post conditions: None.
# Returns: Index value of low tide.
firstLowTide <- function(waterLevel) {
k <- 1
higher <- TRUE
while (higher) {
if (waterLevel[k + 1] <= waterLevel[k]) {
k <- k + 1
} else {
higher <- FALSE
}
}
return(k)
}
```

All the code will eventually be made freely available in a repo on Github under user sculpturearts.