I am a physical oceanographer interested in how ocean water is mixed and transformed. I am currently working as Research Scientist at the Bedford Institute of Oceanography in Halifax, Nova Scotia.

Recent Posts:

A Makefile for knitr documents

One of the best things I’ve found about using R for all my scientific work is powerful and easy to use facilities for generating dynamic reports, particularly using the knitr package. The seamless integration of text, code, and the resulting figures (or tables) is a major step toward fully-reproducible research, and I’ve even found that it’s a great way of doing “exploratory” work that allows me to keep my own notes and code contained in the same document.

Being a fan of a “Makefile” approach to working with R scripts, as well as an Emacs/ESS addict, I find the easiest way to automatically run/compile my knitr latex documents is with a Makefile. Below is a template I adapted from here:

all: pdf

RFILES    := 
CACHEDIR  := cache
FIGUREDIR := figures
##### Explicit Dependencies #####
RNWTEX = $(RNWFILES:.Rnw=.tex)

# Dependencies

.PHONY: pdf tex clean 

pdf: $(MAINPDF)


	Rscript \
	  -e "library(knitr)" \
	  -e "knitr::opts_chunk[['set']](fig.path='$(FIGUREDIR)/$*-')" \
	  -e "knitr::opts_chunk[['set']](cache.path='$(CACHEDIR)/$*-')" \
	  -e "knitr::knit('$<','$@')"

	Rscript -e "Sweave('$^', driver=Rtangle())"

	R CMD BATCH "$^" "$@"

%.pdf: %.tex 
	latexmk -pdf $<

	-latexmk -c -quiet $(MAINFILE).tex
	-rm -f $(MAINTEX) $(RNWTEX)
	-rm -rf $(FIGUREDIR)
	-rm *tikzDictionary
	-rm $(MAINPDF)

Making section plots with oce and `imagep()`

section objects in the oce package are a convenient way of storing a series of CTD casts together – indeed, the object name derives from the common name for such a series of casts collected from a ship during a single campaign.

In it’s heart, a section object is really just a collection of ctd objects, with some other metadata. The CTD stations themselves are stored as a list of ctd objects in the @data slot, like:

str(section@data$station, 1)
List of 124
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
  [list output truncated]

Just to prove it, we can plot make a standard ctd plot of one of them, by accessing them directly with the [[ accessor syntax. Let’s plot the 100th station:

## Loading required package: testthat

plot of chunk plotctd

Making nice plots of the sections themselves

The main advantage of a section object is to be able to quickly make plots summarizing all the data in the section. This is accomplished using the plot method for section objects, which you can read about by doing ?"plot,section-method". For example, to make a contour plot of the temperature:

plot(section, which='temperature', xtype='distance')

plot of chunk temperature

Ok, cool. But what about some colors? Use the ztype='image' argument!

plot(section, which='temperature', xtype='distance',

plot of chunk temperature2

Finer control over the section plot

To get finer control over the section plot than is possible with the section plot() method, one trick I will sometimes do is extract the data I want from the section as a gridded matrix, and then plot the matrix directly using the imagep() function.

First, we “grid” the section so that all the stations comprise the same pressure levels:

s <- sectionGrid(section, p='levitus')

Now, we can loop through the station fields, extracting the data as we go.

nstation <- length(s[['station']])
p <- unique(s[['pressure']])
np <- length(p)
T <- S <- array(NA, dim=c(nstation, np))
for (i in 1:nstation) {
    T[i, ] <- s[['station']][[i]][['temperature']]
    S[i, ] <- s[['station']][[i]][['salinity']]

Basically, what we’re doing here is creating an empty matrix, then filling each row with the data from the section stations. We can make a quick plot with imagep():

distance <- unique(s[['distance']])
par(mfrow=c(2, 1))
imagep(distance, p, T, col=oceColorsTemperature, flipy=TRUE)
imagep(distance, p, S, col=oceColorsSalinity, flipy=TRUE)

plot of chunk plot1

Or we can do some fancier things, like use the colormap() function and plot some filled contours:

par(mfrow=c(2, 1))
Tcm <- colormap(T, breaks=seq(4, 24, 2), col=oceColorsTemperature)
Scm <- colormap(S, breaks=seq(34, 36.8, 0.2), col=oceColorsSalinity)
imagep(distance, p, T, colormap=Tcm, flipy=TRUE,
       ylab='p [dbar]', filledContour=TRUE,
       zlab='temperature [degC]')
imagep(distance, p, S, colormap=Scm, flipy=TRUE,
       xlab='distance [km]', ylab='p [dbar]', filledContour=TRUE,

plot of chunk plot2

Using the oce colormap function in R

When I talk to fellow colleagues about why I use R as my language of choice for scientific data analysis, I typically point out all the advantages, and because I’m honest, the disadvantages.

Typically the biggest disadvantage, especially for those coming from the java-GUI world of Matlab, is the non-interactive graphics. Now, I’ve managed to convince myself that I actually prefer making plots this way (because it forces me to script rather than noodling around with a mouse, the final plot is predictable, etc), but there are always a few things that I wish were easier.

One of those is handling colors in “image” plots and in scatter plots. The former is usually handled pretty easily using the oce function imagep(..., col=oceColorsJet), but the latter tends to be trickier. There is no base R functionality for automatically coloring points by some other attribute. I believe this is relatively easy to do with ggplot2, but that of course requires using ggplot2 (nothing against ggplot2, it just really isn’t an option for me – perhaps the subject of a future blog post).

the colormap() function

With that in mind, Dan and I set out to create a function that could be used to make an explicit “map” between colors and values to facilitate making plots, but also to ensure that the results of the plot are correct. The concept of a “colormap”, as implemented in Matlab, where the information connecting colors to values is inherent in the plot attributes, doesn’t exist in R. One can plot any colors one would like without thinking twice about whether they mean anything. On the one hand, this can be an advantage because it makes it easier to have multiple colormaps in a single figure. The downside is that using colors to represent numerical values requires some care.

The basic idea of colormap() is that it creates an object that connects a series of colors with values, which can be passed to various plotting functions to ensure that the color-mapping is done correctly. Probably the best way to illustrate the various options is through some examples. In most cases the colormap is communicated through the use of a “palette”, which is either drawn implicitly by the plotting function, or through an explicit call to oceDrawPalette().

imagep() plots

The imagep() function is a tweaked and customizable version of the base image() function. It is used for making pseudo-color maps of matrix-style data. A nice example comes from the included argo dataset:


## remove bad data, and grid to regular pressure levels
argo <- argoGrid(handleFlags(argo))

## two-panel plot of T and S, using the cmocean colors
par(mfrow=c(2, 1))
Tcm <- colormap(argo[['temperature']], col=oceColorsTemperature)
imagep(argo[['time']], argo[['pressure']][,1], t(argo[['temperature']]),
       ylab='pressure [dbar]',
       colormap=Tcm, flipy=TRUE, main='Temperature')
Scm <- colormap(argo[['salinity']], col=oceColorsSalinity)
imagep(argo[['time']], argo[['pressure']][,1], t(argo[['salinity']]),
       ylab='pressure [dbar]',
       colormap=Scm, flipy=TRUE, main='Salinity')

plot of chunk argo

Pretty easy. But using colors with imagep() is pretty easy anyway, since the colormap is defined based on the input data and automatically scaled to match the palette.

What if we wanted to add points showing the temperature values at certain depths to the salinity plot? In Matlab, combining the two colormaps is nigh impossible. Using oce, all we do is reference the Tcm object when we set the colors of the points, specifically the $zcol element within it – which contains a colormapped color for every element in the original data used to create Tcm:

## random points to plot
II <- sample(seq_along(argo[['temperature']]), 100)
t <- matrix(rep(argo[['time']], dim(argo[['pressure']])[1]),
            nrow=dim(argo[['pressure']])[1], byrow=TRUE)[II]
p <- argo[['pressure']][II]
T <- argo[['temperature']][II]

par(mfrow=c(2, 1))
Tcm <- colormap(argo[['temperature']], col=oceColorsTemperature)
imagep(argo[['time']], argo[['pressure']][,1], t(argo[['temperature']]),
       ylab='pressure [dbar]',
       colormap=Tcm, flipy=TRUE, main='Temperature')
points(t, p, pch=22, bg=Tcm$zcol[II], cex=0.75)
Scm <- colormap(argo[['salinity']], col=oceColorsSalinity)
imagep(argo[['time']], argo[['pressure']][,1], t(argo[['salinity']]),
       ylab='pressure [dbar]',
       colormap=Scm, flipy=TRUE, main='Salinity with discrete temperature measurements')

## Add the temperature colormapped points overtop
points(t, p, pch=22, bg=Tcm$zcol[II], cex=0.75)

plot of chunk argo-points

Plotting a colored scatterplot, with a palette

The example above introduced how to use the $zcol return value of the colormap object to color the plotted points according to the desired colormap. Here I’ll explore that a bit further, highlighting how to use it with a basic plot, but with a palette on the side.

x <- seq(0, 6*pi, 0.1)
y <- sin(x)

## create a colormap for cos(x)
cm <- colormap(cos(x), col=oceColorsPalette)

drawPalette(colormap=cm, zlab=expression(cos(x)))
plot(x, y, col=cm$zcol, pch=19)

plot of chunk sine

Using named GMT-style palettes

In creating colormap(), Dan and I were impressed with the color palettes available in the Generic Mapping Tools (GMT) software, and decided to implement a similar approach to defining custom colormaps. In addition, colormap() includes a number of “named” GMT palettes (see ?colormap), several of which are quite handy for plotting topography.

par(mar=c(0.5, 0.5, 0.5, 0.5))

## Use the gmt_relief palette
cm <- colormap(name='gmt_relief')

mapImage(topoWorld, colormap=cm)

plot of chunk gmt


The colormap() function is pretty powerful, and as a result somewhat complex to use. I hope the above examples have helped shed some light on how to use oce to map colors to values consistently and reliably in plots.

Calculating buoyancy frequency for argo/section objects using the apply() family

The most recent CRAN release of oce includes some nice new functionality for reading and converting argo objects (see http://www.argo.ucsd.edu/ for more information about the fantastic Argo float program). One question that arose out of this increased functionality was how to calculate (also known as the buoyancy or Brunt-Väisälä frequency) for such objects.

Buoyancy frequency

The definition of is:

where is the acceleration due to gravity, is the fluid density, and is the vertical coordinate. Essentially describes the vertical variation of fluid density (also known as “stratification”).

Calculating for regular ctd objects is easily accomplished with the function oce::swN2(). A caution: readers are encouraged to read the documentation carefully, as the details of the actual calculation can have important consequences when applied to real ocean data.

for station objects

For the case of a station object (which is essentially a collection of ctd stations), the most straightforward way to calculate is to use the lapply() function to “apply” the swN2() function to each of the stations in the object. An example:

section[['station']] <- lapply(section[['station']],
                               function(x) oceSetData(x, 'bvf', swN2(x)))

The line with the lapply() command takes the list of stations from the section object, and evaluates each of the resulting ctd objects using the oceSetData() function to add the result of swN2() back into the station @data slot.

If we wanted to make a nice plot of the result, we could do:

col <- colorRampPalette(c('white', rev(oceColorsViridis(32))))
plot(section, which='bvf', ztype='image', zbreaks=seq(0, 1e-4, 0.5e-5), zcol=col)

plot of chunk stationplot where I’ve defined a custom colormap just for the fun of it.

for argo objects

In an argo object, the default storage for the profiles is a matrix, rather than a list of ctd objects. To calculate and make a plot, the simplest approach would be to use as.section() to convert the argo object to a section class object and then do as above. However, having the field as a matrix allows for greater flexibility in plotting, e.g. using the imagep() function, so one might want to calculated in a manner consistent with the default argo storage format.

Let’s load some example data from the argo dataset included in oce:

argo <- argoGrid(argo)

Note that I’ve gridded the argo fields so the matrices are at consistent pressure levels. Now we create a function that can be applied to each of the matrix columns, to calculate from a single column of the density matrix:

N2 <- function(x) {
    swN2(argo[['pressure']][,1], x)

Now we use the above function N2 to calculate buoyancy frequency and add it back to the original object, like:

argo <- oceSetData(argo, 'N2', apply(swSigmaTheta(argo), 2, N2) )

Note that because of the difference between the “list” and “matrix” approach, the oceSetData() occurs outside of the apply(). Also note the second argument in the apply() call, which specifies to apply the N2() function along the 2nd dimension of the density matrix, i.e. along columns.

Now, lets make a sweet plot of the N2 field using imagep()!

p <- argo[['pressure']][,1]
t <- argo[['time']]
imagep(t, p, t(argo[['N2']]), flipy=TRUE, col=col, breaks=seq(0, 1e-4, 1e-6),
       ylab='pressure [dbar]',
       zlab=expression(N^2~group('[', 1/s^2, ']')), zlabPosition = 'side',
       mar=c(3, 3, 2, 0.5))

plot of chunk argoplot

A thing of stratified beauty, if I do say so myself.

R tutorial at the 2016 AGU Ocean Sciences Meeting

Today I had the pleasure of presenting a talk about R during one of the tutorial sessions at the AGU Ocean Sciences Meeting in New Orleans. I made a deliberate point of saying that my main message was more like: “R is really cool, and here’s why”, rather than: “You should all stop using Matlab”. Being divisive doesn’t help anybody work better.

I was quite surprised at the number of people who raised their hand when I asked if they use R as their main analysis tool – perhaps about a third of the (surprisingly full) room!

Anyway, see below for the pdf of my talk and the Rnw source file. Note that the Rnw has links to some external images so you won’t be able to knitr() it for yourself, but feel free to use the slide content.