I am a physical oceanographer interested in how ocean water is mixed and transformed. I am currently working as Research Scientist at the Bedford Institute of Oceanography in Halifax, Nova Scotia.

Recent Posts:

Adding NOAA bottom profile to section plots

I use the section-class plotting method in the oce package a lot. It’s one of the examples I really like showing to new oceanographic users of R and oce, to see the power in making quick plots from potentially very complicated data sets. A canonical example is to use the built-in data(section) dataset:

plot(section, which='temperature')

plot of chunk example

Note the grey bottom profile that is automatically overlaid on the plot – the values for those points come from the individual stations in the section object, from the waterDepth metadata item in each of the stations in the section. The values can be extracted to a vector with our trusty friend lapply1:

depth <- unlist(lapply(section[['station']], function(x) x[['waterDepth']]))
distance <- unique(section[['distance']])
plot(distance, depth, type='l')

plot of chunk depth

However, many CTD datasets don’t automatically include the water depth at the station, and even if they do the large spacing between stations may make the bottom look clunky.

Using the marmap package to add a high res bottom profile

To add a nicer looking profile to the bottom, we can take advantage of the awesome marmap package, which can download bathymetric data from NOAA.

To add a nice looking bottom profile to our section plot, we can use the getNOAA.bathy() and get.depth() functions. Note the resolution=1 argument, which downloads the highest resolution data available from NOAA (1 minute resolution), and the keep=TRUE argument, which saves a local copy of the data to prevent re-downloading every time the script is re-run (note that at 1 minute resolution the csv file obtained below is 29 MB):

lon <- section[["longitude", "byStation"]]
lat <- section[["latitude", "byStation"]]
lon1 <- min(lon) - 0.5
lon2 <- max(lon) + 0.5
lat1 <- min(lat) - 0.5
lat2 <- max(lat) + 0.5

## get the bathy matrix -- 29 MB file
b <- getNOAA.bathy(lon1, lon2, lat1, lat2, resolution=1, keep=TRUE)
## File already exists ; loading 'marmap_coord_-74.1727;35.703;-8.0263;38.7373_res_1.csv'
plot(section, which="temperature", showBottom=FALSE)
loni <- seq(min(lon), max(lon), 0.1)
lati <- approx(lon, lat, loni, rule=2)$y
dist <- rev(geodDist(loni, lati, alongPath=TRUE))
bottom <- get.depth(b, x=loni, y=lati, locator=FALSE)
polygon(c(dist, min(dist), max(dist)), c(-bottom$depth, 10000, 10000), col='grey')

plot of chunk marmap-example


Downloading and plotting MODIS Chlorophyll-a data

I was recently asked for help with a project that involves correlating occurrences of marine animals found at the surface with satellite measurements of surface chlorophyll. Being a physical oceanographer, I’m not too familiar with the details of such data sets (though I did previously help someone read in MODIS netcdf with the oce package), but saw it as a nice opportunity to learn a bit about a different data set, but also to gain some new data processing and plotting skills.

MODIS Chla data

The MODIS chlorophyll data are provided by NASA through the OceanColor WEB site, which provides various manual ways of downloading binary files (e.g. hdf and netCDF) files. For the present application, which potentially required approximately 400 or so images, this wasn’t a very appealing option.

A quick google search turned up two very relevant (and fantastic looking!) packages, the spnc package and the obpgcrawler package (both authored by Ben Tupper from the Bigelow Laboratory). spnc provides some simplified methods for dealing with “spatial” datasets and netCDF files, and the obpgcrawler provides an interface for programmatically downloading various datasets from the NASA Ocean Biology Processing Group (including MODIS!).

Installing spnc and obpgcrawler

As the packages are not on CRAN (yet?), they have to be installed using the devtools package (and of course all its dependencies, etc). The obpgcrawler package also depends on another non-CRAN package, called threddscrawler which must be installed first. Note that due to an issue with OpenDAP support for the ncdf4 package on windows, the below only works on Linux or OSX.

To install the packages, do:

# if you don't have threddscrawler installed

There are some great examples provided on the Github pages, from which I built on to accomplish what I needed. The below example is pulled straight from the obpgcrawler page, to download a subset of the most recent MODIS data and plot it as a “raster” image (more on that later).

## Loading required package: sp
query <- obpg_query(top = 'https://oceandata.sci.gsfc.nasa.gov/opendap/catalog.xml',
   platform = 'MODISA', 
   product = 'L3SMI',
   what = 'most_recent',
   greplargs = list(pattern='8D_CHL_chlor_a_4km', fixed = TRUE))
q <- query[[1]]
chl <- SPNC(q$url)
bb <- c(xmin = -77, xmax = -63, ymin = 35, ymax = 46)
r <- chl$get_raster(what = 'chlor_a', bb = bb)
spplot(log10(r), main=paste('MODIS Chla for', format(chl$TIME, '%Y-%d-%m')))

plot of chunk example

The animal data

The animal data consists of a data frame containing: a date of observation, a longitude, and a latitude. To mimic the data set, I’ll just create a single random point and time in the North Atlantic:

library(latticeExtra) # for the `layer()` function
## Loading required package: lattice
## Loading required package: RColorBrewer
date <- as.Date('2017-02-25')
lat <- 43.783179
lon <- -62.860410
query <- obpg_query(top = 'http://oceandata.sci.gsfc.nasa.gov/opendap/catalog.xml',
                    platform = 'MODISA', 
                    product = 'L3SMI',
                    what = 'within',
                    greplargs = list(pattern='8D_CHL_chlor_a_4km', fixed = TRUE),
                    date_filter=c(date-3, date+4) ## find nearest image within one week
q <- query[[1]]
bb <- c(lon-10, lon+10, lat-10, lat+10) # define a 10x10 degree box around the point
chl <- SPNC(q$url, bb=bb)
r <- chl$get_raster(what='chlor_a')
p <- spplot(log10(r), main=paste0('Image=', format(chl$TIME)),
            scales=list(draw=TRUE), auto.key=list(title="log10[chl]"))
p <- p + layer(panel.points(lon, lat, pch=19, col=1, cex=2))

plot of chunk animal-point

Now, we can extract a chlorophyll value from the location, using the extract() function:

## Note the `radius` argument, which takes a radius in meters
surface_chl <- mean(extract(r, cbind(lon, lat), buffer=50000)[[1]], na.rm=TRUE)
print(paste0('The surface chlorophyll at ', lon, ', ', lat, ' is: ',
             format(surface_chl, digits=3), ' mg/m^-3'))
## [1] "The surface chlorophyll at -62.86041, 43.783179 is: 1.09 mg/m^-3"

Things to figure out (sp plots, rasters, projections, etc)

The world of “spatial” objects (e.g. through the sp package), and things that derive from them, is a new one for me. For example, in the oce package, we have developed methods for plotting matrices (e.g. imagep()) and geographical data (e.g. mapPlot(), mapImage(), etc) that differ from the GIS-like approach contained in the world of spatial analyses in R. I have long desired to learn more about this “other” world, and so have taken this opportunity with MODIS data to do so.

Projected rasters and lon/lat labels

The neat thing about raster objects is that they contain the coordinate projection information. For example, the coordinate system for the MODIS data that we downloaded can be seen with:

## [1] "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"

For those used to doing projected maps in oce, this string should be familiar as a proj4 string, which specifies that the coordinate system is simply “longlat” (i.e. not projected). To change the projection of the raster, we can use the projectRaster() function to update it to a new reference, e.g. polar stereographic centred on -20 degrees W:

rp <- projectRaster(r, crs='+proj=sterea +lon_0=-20')

Now, if we use spplot() again, we get a raster that is plotted in a projected coordinate:


plot of chunk plot-projected

There are some things I haven’t figured out yet, particularly how to plot nice graticules (e.g. grid lines for constant latitude and longitude), since if the above is plotted as before with the scales= argument, what is plotted are the projected coordinates:

spplot(log10(rp), scales=list(draw=TRUE))

plot of chunk scales

It looks like the graticules package will be helpful for this, but it still doesn’t appear to be non-trivial. See also here for some other good-looking examples.

Extract a matrix from raster to use imagep()

One solution (at least for making maps), would be to extract the matrix data from the raster along with the longitude and latitude vectors. This would then allow for plotting in a projection using mapImage() from the oce package as I’m used to. Let’s try and pull stuff out of the raster object r:

lon <- unique(coordinates(r)[,1])
lat <- unique(coordinates(r)[,2])
chl_mat <- as.matrix(r)
## [1] 231361      1

Note that using as.matrix() on a raster should extract the matrix values of the object, however in this case it extracts a vector with all the values. So, we need to reshape it:

chl_mat <- array(as.matrix(r)[,1], dim=c(length(lon), length(lat)))
imagep(lon, lat, log10(chl_mat), col=oceColorsChlorophyll)

plot of chunk reshape Ok! Now we’re getting somewhere! Let’s try a projected version:

## Loading required package: testthat
mapPlot(coastlineWorldFine, projection=projection(rp),
        longitudelim=range(lon), latitudelim=range(lat))
mapImage(lon, lat, log10(chl_mat), col=oceColorsChlorophyll)

plot of chunk mapPlot Awesome!

A Makefile for knitr documents

One of the best things I’ve found about using R for all my scientific work is powerful and easy to use facilities for generating dynamic reports, particularly using the knitr package. The seamless integration of text, code, and the resulting figures (or tables) is a major step toward fully-reproducible research, and I’ve even found that it’s a great way of doing “exploratory” work that allows me to keep my own notes and code contained in the same document.

Being a fan of a “Makefile” approach to working with R scripts, as well as an Emacs/ESS addict, I find the easiest way to automatically run/compile my knitr latex documents is with a Makefile. Below is a template I adapted from here:

all: pdf

RFILES    := 
CACHEDIR  := cache
FIGUREDIR := figures
##### Explicit Dependencies #####
RNWTEX = $(RNWFILES:.Rnw=.tex)

# Dependencies

.PHONY: pdf tex clean 

pdf: $(MAINPDF)


	Rscript \
	  -e "library(knitr)" \
	  -e "knitr::opts_chunk[['set']](fig.path='$(FIGUREDIR)/$*-')" \
	  -e "knitr::opts_chunk[['set']](cache.path='$(CACHEDIR)/$*-')" \
	  -e "knitr::knit('$<','$@')"

	Rscript -e "Sweave('$^', driver=Rtangle())"

	R CMD BATCH "$^" "$@"

%.pdf: %.tex 
	latexmk -pdf $<

	-latexmk -c -quiet $(MAINFILE).tex
	-rm -f $(MAINTEX) $(RNWTEX)
	-rm -rf $(FIGUREDIR)
	-rm *tikzDictionary
	-rm $(MAINPDF)

Making section plots with oce and `imagep()`

section objects in the oce package are a convenient way of storing a series of CTD casts together – indeed, the object name derives from the common name for such a series of casts collected from a ship during a single campaign.

In it’s heart, a section object is really just a collection of ctd objects, with some other metadata. The CTD stations themselves are stored as a list of ctd objects in the @data slot, like:

str(section@data$station, 1)
List of 124
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
 $ :Formal class 'ctd' [package "oce"] with 3 slots
  [list output truncated]

Just to prove it, we can plot make a standard ctd plot of one of them, by accessing them directly with the [[ accessor syntax. Let’s plot the 100th station:

## Loading required package: testthat

plot of chunk plotctd

Making nice plots of the sections themselves

The main advantage of a section object is to be able to quickly make plots summarizing all the data in the section. This is accomplished using the plot method for section objects, which you can read about by doing ?"plot,section-method". For example, to make a contour plot of the temperature:

plot(section, which='temperature', xtype='distance')

plot of chunk temperature

Ok, cool. But what about some colors? Use the ztype='image' argument!

plot(section, which='temperature', xtype='distance',

plot of chunk temperature2

Finer control over the section plot

To get finer control over the section plot than is possible with the section plot() method, one trick I will sometimes do is extract the data I want from the section as a gridded matrix, and then plot the matrix directly using the imagep() function.

First, we “grid” the section so that all the stations comprise the same pressure levels:

s <- sectionGrid(section, p='levitus')

Now, we can loop through the station fields, extracting the data as we go.

nstation <- length(s[['station']])
p <- unique(s[['pressure']])
np <- length(p)
T <- S <- array(NA, dim=c(nstation, np))
for (i in 1:nstation) {
    T[i, ] <- s[['station']][[i]][['temperature']]
    S[i, ] <- s[['station']][[i]][['salinity']]

Basically, what we’re doing here is creating an empty matrix, then filling each row with the data from the section stations. We can make a quick plot with imagep():

distance <- unique(s[['distance']])
par(mfrow=c(2, 1))
imagep(distance, p, T, col=oceColorsTemperature, flipy=TRUE)
imagep(distance, p, S, col=oceColorsSalinity, flipy=TRUE)

plot of chunk plot1

Or we can do some fancier things, like use the colormap() function and plot some filled contours:

par(mfrow=c(2, 1))
Tcm <- colormap(T, breaks=seq(4, 24, 2), col=oceColorsTemperature)
Scm <- colormap(S, breaks=seq(34, 36.8, 0.2), col=oceColorsSalinity)
imagep(distance, p, T, colormap=Tcm, flipy=TRUE,
       ylab='p [dbar]', filledContour=TRUE,
       zlab='temperature [degC]')
imagep(distance, p, S, colormap=Scm, flipy=TRUE,
       xlab='distance [km]', ylab='p [dbar]', filledContour=TRUE,

plot of chunk plot2

Using the oce colormap function in R

When I talk to fellow colleagues about why I use R as my language of choice for scientific data analysis, I typically point out all the advantages, and because I’m honest, the disadvantages.

Typically the biggest disadvantage, especially for those coming from the java-GUI world of Matlab, is the non-interactive graphics. Now, I’ve managed to convince myself that I actually prefer making plots this way (because it forces me to script rather than noodling around with a mouse, the final plot is predictable, etc), but there are always a few things that I wish were easier.

One of those is handling colors in “image” plots and in scatter plots. The former is usually handled pretty easily using the oce function imagep(..., col=oceColorsJet), but the latter tends to be trickier. There is no base R functionality for automatically coloring points by some other attribute. I believe this is relatively easy to do with ggplot2, but that of course requires using ggplot2 (nothing against ggplot2, it just really isn’t an option for me – perhaps the subject of a future blog post).

the colormap() function

With that in mind, Dan and I set out to create a function that could be used to make an explicit “map” between colors and values to facilitate making plots, but also to ensure that the results of the plot are correct. The concept of a “colormap”, as implemented in Matlab, where the information connecting colors to values is inherent in the plot attributes, doesn’t exist in R. One can plot any colors one would like without thinking twice about whether they mean anything. On the one hand, this can be an advantage because it makes it easier to have multiple colormaps in a single figure. The downside is that using colors to represent numerical values requires some care.

The basic idea of colormap() is that it creates an object that connects a series of colors with values, which can be passed to various plotting functions to ensure that the color-mapping is done correctly. Probably the best way to illustrate the various options is through some examples. In most cases the colormap is communicated through the use of a “palette”, which is either drawn implicitly by the plotting function, or through an explicit call to oceDrawPalette().

imagep() plots

The imagep() function is a tweaked and customizable version of the base image() function. It is used for making pseudo-color maps of matrix-style data. A nice example comes from the included argo dataset:


## remove bad data, and grid to regular pressure levels
argo <- argoGrid(handleFlags(argo))

## two-panel plot of T and S, using the cmocean colors
par(mfrow=c(2, 1))
Tcm <- colormap(argo[['temperature']], col=oceColorsTemperature)
imagep(argo[['time']], argo[['pressure']][,1], t(argo[['temperature']]),
       ylab='pressure [dbar]',
       colormap=Tcm, flipy=TRUE, main='Temperature')
Scm <- colormap(argo[['salinity']], col=oceColorsSalinity)
imagep(argo[['time']], argo[['pressure']][,1], t(argo[['salinity']]),
       ylab='pressure [dbar]',
       colormap=Scm, flipy=TRUE, main='Salinity')

plot of chunk argo

Pretty easy. But using colors with imagep() is pretty easy anyway, since the colormap is defined based on the input data and automatically scaled to match the palette.

What if we wanted to add points showing the temperature values at certain depths to the salinity plot? In Matlab, combining the two colormaps is nigh impossible. Using oce, all we do is reference the Tcm object when we set the colors of the points, specifically the $zcol element within it – which contains a colormapped color for every element in the original data used to create Tcm:

## random points to plot
II <- sample(seq_along(argo[['temperature']]), 100)
t <- matrix(rep(argo[['time']], dim(argo[['pressure']])[1]),
            nrow=dim(argo[['pressure']])[1], byrow=TRUE)[II]
p <- argo[['pressure']][II]
T <- argo[['temperature']][II]

par(mfrow=c(2, 1))
Tcm <- colormap(argo[['temperature']], col=oceColorsTemperature)
imagep(argo[['time']], argo[['pressure']][,1], t(argo[['temperature']]),
       ylab='pressure [dbar]',
       colormap=Tcm, flipy=TRUE, main='Temperature')
points(t, p, pch=22, bg=Tcm$zcol[II], cex=0.75)
Scm <- colormap(argo[['salinity']], col=oceColorsSalinity)
imagep(argo[['time']], argo[['pressure']][,1], t(argo[['salinity']]),
       ylab='pressure [dbar]',
       colormap=Scm, flipy=TRUE, main='Salinity with discrete temperature measurements')

## Add the temperature colormapped points overtop
points(t, p, pch=22, bg=Tcm$zcol[II], cex=0.75)

plot of chunk argo-points

Plotting a colored scatterplot, with a palette

The example above introduced how to use the $zcol return value of the colormap object to color the plotted points according to the desired colormap. Here I’ll explore that a bit further, highlighting how to use it with a basic plot, but with a palette on the side.

x <- seq(0, 6*pi, 0.1)
y <- sin(x)

## create a colormap for cos(x)
cm <- colormap(cos(x), col=oceColorsPalette)

drawPalette(colormap=cm, zlab=expression(cos(x)))
plot(x, y, col=cm$zcol, pch=19)

plot of chunk sine

Using named GMT-style palettes

In creating colormap(), Dan and I were impressed with the color palettes available in the Generic Mapping Tools (GMT) software, and decided to implement a similar approach to defining custom colormaps. In addition, colormap() includes a number of “named” GMT palettes (see ?colormap), several of which are quite handy for plotting topography.

par(mar=c(0.5, 0.5, 0.5, 0.5))

## Use the gmt_relief palette
cm <- colormap(name='gmt_relief')

mapImage(topoWorld, colormap=cm)

plot of chunk gmt


The colormap() function is pretty powerful, and as a result somewhat complex to use. I hope the above examples have helped shed some light on how to use oce to map colors to values consistently and reliably in plots.