Welcome!

I am a physical oceanographer interested in how ocean water is mixed and transformed. I am currently working as Research Scientist at the Bedford Institute of Oceanography in Halifax, Nova Scotia.

Recent Posts:

Discrete colorbars with R and the oce package

Introduction

Making plots in oceanography (or anything, really) often requires creating some kind of “color map” – that is, having a color represent a field in a plot that is otherwise two-dimensional. Frequently this is done when making “image”-style plots (known in MatlabTM parlance as “pcolor” or pseudocolor plots), but could also be in coloring points on a 2D scatter plot based on a third variable (e.g. a TS plot with points colored for depth).

There are a whole bunch of different ways to make colormaps in R, including various approaches that are derived from the “tidyverse” and ggplot2 package for analyzing and plotting data. I don’t really use that approach for most of my work, so won’t touch on them here.

Instead, the purpose of this post (inspired by a question from a colleague who I know is a Matlab and Python user) is to show some of the ways various functions contained in the oce package cab be used to make colormaps and colorbars (or “palettes”). In particular, for the case where one wants a “discrete” (i.e. not continuous) colormap.

The imagep() function

For making image-style plots, the function imagep() provided by the oce package is a handy function for quickly making nice-looking pseudocolor plots of matrices. The “p” in imagep() stands for “palette” or “pseudocolor”. Mostly, it is a wrapper around the base R function image(), to allow for increased control of the axes, colors, and palette specification.

library(oce)
## Loading required package: gsw
## Loading required package: testthat
library(ocedata)
library(cmocean)
data(levitus)
par(mfrow=c(1, 3))
with(levitus, imagep(longitude, latitude, SST))
with(levitus, imagep(longitude, latitude, SST, col=oceColorsJet, breaks=-2:25))
with(levitus, imagep(longitude, latitude, SST, col=cmocean('thermal'), breaks=seq(-2, 30, 2)))

plot of chunk unnamed-chunk-1

The above example shows how to use imagep() with 3 different colormaps (the default, the classic “jet” scheme, and the cmocean package) to generate an image plot with a nice palette automatically placed on the side.

The drawPalette() and colormap() functions

Under the hood, the imagep() function calls another function to actually draw the palette on the side of the plot – the drawPalette() function. That function can be called on it’s own, enabling plot building to be much more flexible. Additionally there is the colormap() function, which allows for detailed specification of the colormap properties to use, which can then be passed as an object to drawPalette().

For example, say we wanted to make a TS (temperature-salinity) plot of the Levitus surface data, but with each point colored by latitude in 10 degree increments. We do that first by making a colormap object:

cm <- colormap(expand.grid(levitus$latitude, levitus$longitude)[,1], breaks=seq(-90, 90, 10), col=oceColorsViridis)

(note that we use expand.grid() to make the number of lon/lat points match the matrices).

Looking inside the object, we can see some of the details that further plotting/palette functions can make use of:

str(cm)
## List of 11
##  $ zlim        : num [1:2] -90 90
##  $ breaks      : num [1:19] -90 -80 -70 -60 -50 -40 -30 -20 -10 0 ...
##  $ col         : chr [1:18] "#440154" "#481769" "#472A7A" "#433D84" ...
##  $ zcol        : chr [1:64800] "#440154" "#440154" "#440154" "#440154" ...
##  $ missingColor: chr "gray"
##  $ x0          : num [1:18] -80 -70 -60 -50 -40 -30 -20 -10 0 10 ...
##  $ x1          : num [1:18] -80 -70 -60 -50 -40 -30 -20 -10 0 10 ...
##  $ col0        : chr [1:18] "#440154" "#481769" "#472A7A" "#433D84" ...
##  $ col1        : chr [1:18] "#440154" "#481769" "#472A7A" "#433D84" ...
##  $ zclip       : logi FALSE
##  $ colfunction :function (z)  
##   ..- attr(*, "srcref")= 'srcref' int [1:8] 853 24 855 5 24 5 20257 20259
##   .. ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' <environment: 0x7fffee4dbc60> 
##  - attr(*, "class")= chr [1:2] "list" "colormap"

Some of those fields are obvious (some probably aren’t to inexperienced users) but one field that is handy to know about is the zcol field. This encodes a color for every value in the original object based on the colormap specification. So, we can make a plot with the points colored based on the colormap using the argument col=cm$zcol. We can also add the palette to the plot using the drawPalette() function, which has to be called before the main plot:

drawPalette(colormap=cm)
plotTS(with(levitus, as.ctd(SSS, SST, 0)), pch=19, col=cm$zcol, mar=par('mar'))

plot of chunk unnamed-chunk-4

One problem that can happen when there are a lot of points is that overplotting obscures patterns in the colors. An easy way to fix this is to randomize the order of the plotted points with the sample() function:

drawPalette(colormap=cm)
sI <- sample(seq_along(cm$zcol))
plotTS(with(levitus, as.ctd(SSS[sI], SST[sI], 0)), pch=19, col=cm$zcol[sI], mar=par('mar'))

plot of chunk unnamed-chunk-5

Custom color palettes with colormap()

In addition to the “known” color palettes that are included in R and oce (see also the cmocean package below, as well as RcolorBrewer), the colormap() function has arguments that allow for custom-built palettes. Specifically the x0, x1, col0 and col1 arguments, which are detailed in the help file as:

x0, x1, col0, col1: Vectors that specify a color map.  They must all be
          the same length, with ‘x0’ and ‘x1’ being numerical values,
          and ‘col0’ and ‘col1’ being colors.  The colors may be
          strings (e.g. ‘"red"’) or colors as defined by rgb or
          hsv.

The idea is that the x0 values define the numeric level of the bottom of the color ranges, the x1 values define the top of the color ranges, and the col0 and col1 the colors associated with the levels. An example:

cm <- colormap(expand.grid(levitus$latitude, levitus$longitude)[,1],
               x0=c(-90, -45, 0, 45), x1=c(-45, 0, 45, 90),
               col0=c(1, 2, 3, 4), col1=c(2, 3, 4, 5))
drawPalette(colormap=cm)
sI <- sample(seq_along(cm$zcol))
plotTS(with(levitus, as.ctd(SSS[sI], SST[sI], 0)), pch=19, col=cm$zcol[sI], mar=par('mar'))

plot of chunk unnamed-chunk-6

(Note that to make the above plot, I had to fix a bug in oce that was making the zcol come out as “black” for all cases. Either build oce from source, or wait for the update to get pushed to CRAN in a month or two).

CO2 concentration at birth

Introduction

There is a recent trend in places like Twitter to include in your bio the atmospheric CO2 concentration when you were born. I like it, since it is both a neat measure of the range of ages of people that you can interact with (without being really about age per se), and also since it is a sobering reminder of just how much damage we as a species have done in a very short amount of time.

Anyway, there’s nothing overly complicated about figuring this out – probably a simple Google search would be enough to tell me what the atmospheric concentration of CO2 was when I was born. But where’s the fun in that? :)

The R co2 dataset

Handily, R comes bundled with an example dataset called co2 (as an examples of a “time series”, or ts object), which contains monthly measurements of CO2 from the Mauna Loa observatory from 1959 up to 1997 (I wonder if we can get the R core team to update this dataset for the last 20 years ….).

My birthday is September, 1979, so let’s see where that lands on the curve:

data(co2)
plot(co2)
grid()
abline(v=1979+9/12, col=2)

plot of chunk unnamed-chunk-1

It occurs to me looking at that graph that perhaps the raw monthly value isn’t the right number to choose, since I was clearly born at a seasonal minimum of CO2 concentration (i.e. at the end of Northern Hemisphere summer, when lots of atmospheric CO2 was locked up in plants). So, first I’ll figure out the “raw” value, and then next we’ll smooth the series to get something that is more representative of the background CO2 concentration.

t <- time(co2)
II <- which(1979+9/12 <= t & t <= 1979+10/12)
clark_co2 <- co2[II]
plot(co2)
grid()
abline(v=1979+9/12, col=2)
abline(h=clark_co2, col=2)
text(par('usr')[2], clark_co2, paste0(format(clark_co2, digits=3), '          '), pos=3, offset=0.25)

plot of chunk unnamed-chunk-2

Concentration based on a smoothed time series

I love smoothing splines, so I’ll use that to smooth the co2 data before interpolating:

co2s <- smooth.spline(co2, df=30)
plot(co2)
lines(co2s, col=2, lwd=2)
grid()
abline(v=1979+9/12, col=2)
clark_co2 <- co2s$y[II]
abline(h=clark_co2, col=2)
text(par('usr')[2], clark_co2, paste0(format(clark_co2, digits=3), '          '), pos=3, offset=0.25)

plot of chunk unnamed-chunk-3

So, I was born at 337 ppm.

A perfect CTD profile

I love the function. A lot. It’s such a perfect model for a density interface in the ocean, that it is commonly used in theoretical and numerical models and I regularly used it for both research and demonstration/example purposes. Behold, a interface:

T <- function(z, T0=10, dT=5, z0=-25, dz=5) T0 + dT*tanh((z - z0)/dz)
z <- seq(0, -60)
plot(T(z), z)

plot of chunk unnamed-chunk-1

But whenever I use it, especially for teaching, I’m always saying how it’s idealized and really doesn’t represent what an ocean interface actually looks like. UNTIL NOW.

library(oce)
ctd <- read.oce('../assets/D19002034.ODF')
## Warning in read.odf(file = file, columns = columns, debug = debug -
## 1): "CRAT_01" should be unitless, but the file states the unit as "S/
## m" so that is retained in the object metadata. This will likely cause
## problems. See ?read.odf for an example of rectifying this unit error.
par(mfrow=c(1, 2))
plot(ctd, which=1, type='l')
plot(ctd, which=5)

plot of chunk unnamed-chunk-2

Yes, this is real data.

Just how close to a is it?1

z <- -ctd[["depth"]]
T <- ctd[["temperature"]]
m <- nls(T~a+b*tanh((z-z0)/dz), start=list(a=3, b=1, z0=-10, dz=5))
m
## Nonlinear regression model
##   model: T ~ a + b * tanh((z - z0)/dz)
##    data: parent.frame()
##        a        b       z0       dz 
##   3.7251   0.9516 -25.0448  -2.7648 
##  residual sum-of-squares: 0.05222
## 
## Number of iterations to convergence: 16 
## Achieved convergence tolerance: 4.394e-06
plot((T-predict(m))/T * 100, z, type="o", xlab='Percent error')

plot of chunk unnamed-chunk-3

  1. My PhD advisor, who also taught me introductory physical oceanography, once said to a class of students while using tanh to describe an idealized interface: “tanh – it’s like ‘lunch’, only better!” 

Negating functions and function definitions: an 'opposite' function to the wonderful `%in%` operator

Introduction

R has some neat functions, and even some weird quirks, that you aren’t likely to discover on your own but can either be immensely helpful or horribly confounding.

For example, the “+” operator (i.e. addition) is actually a function, and can even be called using the typical “bracket” notation:

1 + 2
## [1] 3

We can use backticks to evaluate the function as a “regular” function:

`+`
## function (e1, e2)  .Primitive("+")

And can therefore call it as a “regular” function, using brackets to pass the arguments:

`+`(1, 2)
## [1] 3

One consequence of this is that it is possible to redefine how “+” works:

`+` <- function(x1, x2) x1 - x2
1 + 2
## [1] -1

Ok … admittedly that’s confusing. Why would you want to redefine “+”? Well, one example is given by the syntax of the ggplot2 package, which defines it’s own version of “+” that lets you string plotting functions together to build up a plot (e.g. see my post about plotting here).

The %in% function

The %in% function is one of those functions that just clicked when I started using R. It’s an elegant way to write conditional statements – it checks whether the object to the left of the operator occurs anywhere in the object on the right of the operator. An example:

Does the number 5 occur in the vector of 1 through 10?

5 %in% 1:10
## [1] TRUE

Yes, it does (obviously).

One thing that I often find myself doing however, is wanting to know if something doesn’t occur in another object. To make that work, I usually wrap the whole statement in brackets and then precede with an ! operator (logical negation). Like this:

!(11 %in% 1:10)
## [1] TRUE

This evaluates to TRUE because 11 is not in the vector 1:10. While this works, it’s always bugged me because it just looks inelegant.

Well, while browsing Twitter recently, I came across this post from @groundwalkergmb:

First, note the (confusing) syntax that you don’t actually have to negate the entire expression. It is equivalent to write:

!11 %in% 1:10
## [1] TRUE

Ew. Gross! (@groundwalkergmb agrees with me about this)

But reading through the comments led me to this one:

Basically, you can use the Negate() function (never knew about this before) to create a new function which returns the logical negation of the output of the original.

`%nin%`<- Negate(`%in%`)
11 %nin% 1:10
## [1] TRUE

A Shiny app that uses 'reactive' data

Introduction

A powerful, and increasingly useful, tool available to users of R, is the interactive app package known as Shiny. Shiny provides a framework for building interactive web apps that can be used for visualization, exploratory data analysis, data quality control – really anything that could benefit from an interactive environment.

This post will not go into the basics of Shiny. There are many resources on the web for getting started, and of particular value is the RStudio Shiny gallery.

The purpose of this post is two-fold: to test the embedding of a Shiny app within a jekyll blog post, and to document a test app that I made that uses “reactive data”.

Using Shiny apps with jekyll

In order to run a Shiny app, a “shiny server” is required. The easiest way to do this is to install the necessary R packages (e.g. install.packages('shiny')) on your computer, and run a “local” server. This is very easy to do within RStudio, but it can also be done from the system command line, e.g. using a command something like:

$ Rscript -e "library(shiny); runApp(port=4321)"

Then, opening a web browser to the url 127.0.0.1:4321 will display and run the app. However, the app is only available on the machine that it is being run, and not as a distributed web app that other users can connect to.

For webs that are hosted and made available to others, a proper Shiny server environment must be installed on a properly configured web server. Thankfully Shiny Server is free (and open source), however the configuration of a proper server is not a task for the uninitiated.

Another option is to use the ShinyApps.io service that is provided by RStudio. The free option allows for 5 applications, with a limited amount of server time available (25 hours per month at the time of writing).

For a Github-pages hosted site (like this one), the lack of an R environment and a shiny server on Github means that any apps to be included in posts must be hosted somewhere else. For this example, I will use my shinyapps.io account.

Reactive data

The premise of “reactive data” (actually, the concept of “reactive” in the context of Shiny apps is really a fundamental principle) is one where the data being used in the app may change from time to time. In a regular R script, loading data using commands like load(), read.csv(), etc is easy and will occur every time the script is run. However, in a shiny app, using such commands may result in the data being loaded only when the server is started, but then not updating with time as the data being loaded is changed (e.g. if more data is downloaded).

To get around this, we use the shiny function reactiveFileReader(), which allows the data reading to occur as a “reactive” object, meaning that it can be updated (at prescribed intervals) even while the app is running.

The example code below is a simple app I made to demonstrate the principle. In it, the user clicks the action button to generate new data, which is saved as a .rds object in the server file system. That rds object is then read by the file reader, which checks for changes every 1000 milliseconds (1 second) and only re-reads the file if the data has changed when it checks. It’s a silly way to make an interactive plot that updates with new data, but demonstrates the principle of reactive data well.

library(shiny)

new_data <- function() {
    d <- list(a=1:100, b=rnorm(100))
    saveRDS(file='data.rds', d)
}
new_data()

server <- function(input, output, session) {
    data <- reactiveFileReader(1000,
                               session,
                               'data.rds',
                               readRDS)

    observeEvent(input$newdata, {
        new_data()
    })
    
    output$plot1 <- renderPlot({
        hist(data()[['b']])
    })
}

ui <- fluidPage(    
    
    ## Give the page a title
    titlePanel("Title"),
    
    ## Generate a row with a sidebar
    sidebarLayout(      
        
        ## Define the sidebar with one input
        sidebarPanel(
            actionButton(inputId = 'newdata',
                         label = 'Generate new data')
        ),
        
        ## Create a spot for the barplot
        mainPanel(
            plotOutput("plot1")  
        )
        
    )
)

shinyApp(ui = ui, server = server)

The app

To include the app in the post, we upload it to shinyapps.io, and then include it as an iframe object, with (note the html comment tags around the iframe code chunk to keep it from evaluating in HTML):

<!-- <iframe id="example1" src="https://clarkrichards.shinyapps.io/reactiveData/" 
style="border: non; width: 100%; height: 500px" 
frameborder="0">
</iframe> -->