When I talk to fellow colleagues about why I use R as my language of choice for scientific data analysis, I typically point out all the advantages, and because I’m honest, the disadvantages.
Typically the biggest disadvantage, especially for those coming from the java-GUI world of Matlab, is the non-interactive graphics. Now, I’ve managed to convince myself that I actually prefer making plots this way (because it forces me to script rather than noodling around with a mouse, the final plot is predictable, etc), but there are always a few things that I wish were easier.
One of those is handling colors in “image” plots and in scatter plots. The former is usually handled pretty easily using the oce function imagep(..., col=oceColorsJet), but the latter tends to be trickier. There is no base R functionality for automatically coloring points by some other attribute. I believe this is relatively easy to do with ggplot2, but that of course requires using ggplot2 (nothing against ggplot2, it just really isn’t an option for me – perhaps the subject of a future blog post).
the colormap() function
With that in mind, Dan and I set out to create a function that could be used to make an explicit “map” between colors and values to facilitate making plots, but also to ensure that the results of the plot are correct. The concept of a “colormap”, as implemented in Matlab, where the information connecting colors to values is inherent in the plot attributes, doesn’t exist in R. One can plot any colors one would like without thinking twice about whether they mean anything. On the one hand, this can be an advantage because it makes it easier to have multiple colormaps in a single figure. The downside is that using colors to represent numerical values requires some care.
The basic idea of colormap() is that it creates an object that connects a series of colors with values, which can be passed to various plotting functions to ensure that the color-mapping is done correctly. Probably the best way to illustrate the various options is through some examples. In most cases the colormap is communicated through the use of a “palette”, which is either drawn implicitly by the plotting function, or through an explicit call to oceDrawPalette().
The imagep() function is a tweaked and customizable version of the base image() function. It is used for making pseudo-color maps of matrix-style data. A nice example comes from the included argo dataset:
Pretty easy. But using colors with imagep() is pretty easy anyway, since the colormap is defined based on the input data and automatically scaled to match the palette.
What if we wanted to add points showing the temperature values at certain depths to the salinity plot? In Matlab, combining the two colormaps is nigh impossible. Using oce, all we do is reference the Tcm object when we set the colors of the points, specifically the $zcol element within it – which contains a colormapped color for every element in the original data used to create Tcm:
Plotting a colored scatterplot, with a palette
The example above introduced how to use the $zcol return value of the colormap object to color the plotted points according to the desired colormap. Here I’ll explore that a bit further, highlighting how to use it with a basic plot, but with a palette on the side.
Using named GMT-style palettes
In creating colormap(), Dan and I were impressed with the color palettes available in the Generic Mapping Tools (GMT) software, and decided to implement a similar approach to defining custom colormaps. In addition, colormap() includes a number of “named” GMT palettes (see ?colormap), several of which are quite handy for plotting topography.
The colormap() function is pretty powerful, and as a result somewhat complex to use. I hope the above examples have helped shed some light on how to use oce to map colors to values consistently and reliably in plots.
The most recent CRAN release of oce includes some nice new functionality for reading and converting argo objects (see http://www.argo.ucsd.edu/ for more information about the fantastic Argo float program). One question that arose out of this increased functionality was how to calculate (also known as the buoyancy or Brunt-Väisälä frequency) for such objects.
The definition of is:
where is the acceleration due to gravity, is the fluid density, and is the vertical coordinate. Essentially describes the vertical variation of fluid density (also known as “stratification”).
Calculating for regular ctd objects is easily accomplished with the function oce::swN2(). A caution: readers are encouraged to read the documentation carefully, as the details of the actual calculation can have important consequences when applied to real ocean data.
for station objects
For the case of a station object (which is essentially a collection of ctd stations), the most straightforward way to calculate is to use the lapply() function to “apply” the swN2() function to each of the stations in the object. An example:
The line with the lapply() command takes the list of stations from the section object, and evaluates each of the resulting ctd objects using the oceSetData() function to add the result of swN2() back into the station @data slot.
If we wanted to make a nice plot of the result, we could do:
where I’ve defined a custom colormap just for the fun of it.
for argo objects
In an argo object, the default storage for the profiles is a matrix, rather than a list of ctd objects. To calculate and make a plot, the simplest approach would be to use as.section() to convert the argo object to a section class object and then do as above. However, having the field as a matrix allows for greater flexibility in plotting, e.g. using the imagep() function, so one might want to calculated in a manner consistent with the default argo storage format.
Let’s load some example data from the argo dataset included in oce:
Note that I’ve gridded the argo fields so the matrices are at consistent pressure levels. Now we create a function that can be applied to each of the matrix columns, to calculate from a single column of the density matrix:
Now we use the above function N2 to calculate buoyancy frequency and add it back to the original object, like:
Note that because of the difference between the “list” and “matrix” approach, the oceSetData() occurs outside of the apply(). Also note the second argument in the apply() call, which specifies to apply the N2() function along the 2nd dimension of the density matrix, i.e. along columns.
Now, lets make a sweet plot of the N2 field using imagep()!
A thing of stratified beauty, if I do say so myself.
Today I had the pleasure of presenting a talk about R during one of the tutorial sessions at the AGU Ocean Sciences Meeting in New Orleans. I made a deliberate point of saying that my main message was more like: “R is really cool, and here’s why”, rather than: “You should all stop using Matlab”. Being divisive doesn’t help anybody work better.
I was quite surprised at the number of people who raised their hand when I asked if they use R as their main analysis tool – perhaps about a third of the (surprisingly full) room!
Anyway, see below for the pdf of my talk and the Rnw source file. Note that the Rnw has links to some external images so you won’t be able to knitr() it for yourself, but feel free to use the slide content.
Today we released a new version of oce, and it has been uploaded to CRAN. Only the source version is available as of the time of writing, but binary versions for all platforms should become available in the next few days. As always, the best way to install the package is to do:
at an R prompt.
Then you can do stuff like:
The previous version of oce was uploaded to CRAN about 9 months ago. In the meantime, we’ve fixed lots of bugs and added even more improvements. A quick look at the NEWS file gives a summary of the enhancements:
- improve plot.coastline() and mapPlot()
- add support for G1SST satellite
- all objects now have metadata items for units and flags
- ctdTrim() method renamed: old A and B are new A; old C is new B
- support more channels and features of rsk files
- as.adp() added
- convert argo objects to sections
- makeSection() deprecated; use as.section() instead
- read.adp.rdi() handles Teledyne/RDI vsn 23.19 bottom-track data
- geodXyInverse() added; geod functions now spell out longitude etc
- read.odf() speeded up by a factor of about 30
- add colour palettes from Kristen Thyng's cmocean Python package
- as.oce() added
- rename 'drifter' class as 'argo' to recognize what it actually handles
- add oceColorsViridis()
- interpBarnes() has new argument 'pregrid'
- binMean2D() has new argument 'flatten'
- data(topoWorld) now has longitude from -179.5 to 180
- ODF2oce() added
- read.odf() handles more data types
- read.adp.rdi() reads more VmDas (navigational) data
- ITS-90 is now the default temperature unit
- ctd objects can have vector longitude and latitude
- logger class renamed to rsk
- bremen class added
- coastlineCut() added
- rgdal package used instead of local PROJ.4 source code
- mapproj-style map projections eliminated
Some of the best additions (in my opinion) are:
addition of a units metadata field for objects
more tools for working with argo objects (which have been renamed from drifter)
oce has benefited immensely from some great requests recently, and we’d like to keep the momentum going!
R/oce at AGU Ocean Sciences 2016
If you’re going to be at the AGU Ocean Sciences meeting in New Orleans in a couple weeks, make sure you come by to check out the R/oce tutorial I’ll be doing. It’s on Wednesday February 24th at 3:00pm in room R03. See you there!