I was checking out Quandl and some of the datasets with commodity and oil price information. I was curious about price changes in the price of oil between markets and just how highly correlated price changes between markets are. Stock prices of oil companies correlate pretty well sometimes, but what about the actual commodity in different world markets?

I didn’t find the dataset I wanted on Quandl but did find a lead and went out to the British Petroleum site. A service that BP provides is an annual, statistical review of world energy production and consumption. The most recent reports is titled BP Statistical Review of World Energry Full Report 2012 and is available in an Excel workbook format from the BP site. The workbook includes a worksheet with historical, spot crude prices. On the worksheet and used in my data are the columns. The first column is the year. The second through fourth columns are the spot oil crude prices in U.S. Dollars per barrel ($/bbl) for each of three benchmark crude markets, Dubai, Brent, and West Texas Intermediate (WTI), the fifth column contains Nigerian Forcados crude, also in $/bbl. I renamed the columns in the R data frame to make the variables easier to refer to.

The measure I was going for was the Pearson correlation coefficient:

.

The coefficient is within a range . The value -1 represents a perfect, negative, linear correlation, 0 means there is no linear correlation, and +1 is a perfect, linear correlation.

I used R for the analysis. Included below are portions of the data obtained from the R `head()`

and `tail()`

functions:

> head(oil.prices) Year Dubai Brent Nigeria WTI 1 1972 1.900000 NA NA NA 2 1973 2.833333 NA NA NA 3 1974 10.412500 NA NA NA 4 1975 10.702500 NA NA NA 5 1976 11.625000 12.80 12.87 12.23 6 1977 12.375000 13.92 14.21 14.22 > tail(oil.prices) Year Dubai Brent Nigeria WTI 31 2006 61.50 65.14 67.07 66.02 32 2007 68.19 72.39 74.48 72.20 33 2008 94.34 97.26 101.43 100.06 34 2009 61.39 61.67 63.35 61.92 35 2010 78.06 79.50 81.05 79.45 36 2011 106.18 111.26 113.65 95.04

Each of the dot (.) connected variables below represents the Pearson correlation coefficient between the two, respective markets. For example, e.g., `brent.dubai`

represents the correlation between the Brent and Dubai benchmarks.

The default coefficient for computing correlation and covariance in the R `cor()`

and `cov()`

functions is the Pearson’s *r*. In my calls to the <code&>cor() below, I am using the Pearson method. Because there is missing data in the first four rows; I specify that I want to use complete observations only. Calling the `cor()`

with this avoids the missing data and ensures the function doesn’t return *NA*.

> brent.dubai <- cor(brent, dubai, use = "complete.obs"); print(brent.dubai) [1] 0.9988885 > brent.wti <- cor(brent, wti, use = "complete.obs"); print(brent.wti) [1] 0.9925021 > dubai.wti <- cor(dubai, wti, use = "complete.obs"); print(dubai.wti) [1] 0.9915567

Interpreting these results then, for each bivariate relationship, there is a *very strong relationship. *

Here is the visual examination of the data:

There is nearly a perfect, linear correlation in the prices of benchmark crude oil spot prices for the years 1976 through 2011. This does not match my intuition. My estimate prior to this analysis was the correlation was in the range of the lower- to upper-90s, with variations explained by the demand for the different qualities of crude and the geopolitical implications of taking delivery in the Middle East and the Gulf of Guinea. After this analysis, I’m changing my view to accept that liquidity of contracts and price transparency take precedence over quality and geopolitics in understanding the correlation of spot prices for crude oil.