I am a physical oceanographer interested in how ocean water is mixed and transformed. I am currently working as Research Scientist at the Bedford Institute of Oceanography in Halifax, Nova Scotia.

Recent Posts:

An R function to shift vectors by a specified lag

Quite of bit of my work involves looking at “shifts” between two time series. There are lots of reasons why shifts are interesting, including such things as:

  • phase differences in the tides at two different locations,

  • physical separation between sensors on a profiling instrument, and

  • clock drifts between two logging sensors.

To accomplish the actual shifting of the vectors (I’m not going to discuss here how to determine the amount by which the series should be shifted, since that depends on the parameters of the problem), I created the following function:

shift <- function(x, lag) {
    n <- length(x)
    xnew <- rep(NA, n)
    if (lag < 0) {
        xnew[1:(n-abs(lag))] <- x[(abs(lag)+1):n]
    } else if (lag > 0) {
        xnew[(lag+1):n] <- x[1:(n-lag)]
    } else {
        xnew <- x

The function takes as input the vector x and a specified integer lag, and shifts the input series by that amount. NA’s are added to either the beginning or the end (depending on the sign of lag) to pad the shifted vector to be the same length as the input. Note that lag is defined so that a positive lag shifts x “to the right”, i.e. moves values to a higher index.

Back to blogging

I’ve always intended to keep a better blog, partly for the online presence, and partly for the record of things I learn as I muddle my way through computer work and scientific research. My first real attempt, at Coded Ocean, was a bit of a failure – partly for the normal reasons (lazy, etc), but I believe also because ultimately the WordPress format didn’t suit my style very well. Too much moving a mouse and clicking, manually uploading figures and images, and a frustratingly buggy Markdown renderer that often required me to write raw html in my posts.

Since learning about Jekyll, particularly since my friend and colleague (and former supervisor) Dan Kelley set up his own Jekyll blog, it seems like it’s much more along the lines of the kind of blog that would work for me. Specifically, Jekyll:

Blog details

The guts of this website are based off of the knitr-hyde sample blog, which is a great starting point for a R/knitr website and blog, which is in turn built off of the great poole/hyde template. The source code is hosted on GitHub (of course).

Using the scan() function in R to read weirdly formatted data

I am writing some code to parse a weird data format, and using scan() to suck in everything first. Basically, it’s csv-style lines, but some lines have a different number of fields and are for different things — imaging CTD data interspersed with system messages, where the line is identified by the very first field. Something like:

MESSAGE, 20150727T120005,Begin descent
MESSAGE,20150727T121000,Begin ascent
etc ...

Anyway, when I was just reading in the CTD fields, everything was fine, but when I started trying to parse the MESSAGE fields, I found that scan() was doing something unexpected with the spaces in the message field, and producing a char vector like:

"MESSAGE, 20150727T120005,Begin" 

Basically, scan() was treating the space between “Begin” and “descent” as a delimiter (as well as the carriage returns).

Anyway, after much attempting to interpret the man page, and trying different things, I discovered that

scan(con, character(), sep='\n')

would suck in the entire line as a character vector, which is what I wanted.

Using the R apply() family with oce objects


In the oce package, the various different data formats are stored in consistently structured objects. In this post, I’ll explore a way to access elements of multiple oce objects using the R lapply(), from the apply family of functions.

Example with a ctd object

The objects always contain three fields (or “slots”): metadata, data, and processingLog. The layout of the object can be visualized using the str() command, like:

## Loading required package: methods
## Loading required package: gsw

which produces something like:

Formal class 'ctd' [package "oce"] with 3 slots
  ..@ metadata     :List of 26
  .. ..$ header                  : chr [1:42] "* Sea-Bird SBE 25 Data File:"
  .. ..$ type                    : chr "SBE"
  .. ..$ conductivityUnit        : chr "ratio"
  .. ..$ temperatureUnit         : chr "IPTS-68"
  .. ..$ systemUploadTime        : POSIXct[1:1], format: "2003-10-15 11:38:38"
  .. ..$ station                 : chr "Stn 2"
  .. ..$ date                    : POSIXct[1:1], format: "2003-10-15 11:38:38"
  .. ..$ startTime               : POSIXct[1:1], format: "2003-10-15 11:38:38"
  .. ..$ latitude                : num 44.7
  .. ..$ longitude               : num -63.6
  ..@ data         :List of 9
  .. ..$ scan         : int [1:181] 130 131 132 133 134 135 136 137 138 139 ...
  .. ..$ time         : num [1:181] 129 130 131 132 133 134 135 136 137 138 ...
  .. ..$ pressure     : num [1:181] 1.48 1.67 2.05 2.24 2.62 ...
  .. ..$ depth        : num [1:181] 1.47 1.66 2.04 2.23 2.6 ...
  .. ..$ temperature  : num [1:181] 14.2 14.2 14.2 14.2 14.2 ...
  .. ..$ salinity     : num [1:181] 29.9 29.9 29.9 29.9 29.9 ...
  .. ..$ temperature68: num [1:181] 14.2 14.2 14.2 14.2 14.2 ...
  ..@ processingLog:List of 2
  .. ..$ time : POSIXct[1:5], format: "2015-08-18 19:22:36" "2015-08-18 19:22:36" ...
  .. ..$ value: chr [1:5] "create 'ctd' object" "ctdAddColumn(x = res, column = swSigmaTheta(res@data$salinity,     res@data$temperature, res@data$pressure), name = "sigmaThet"| __truncated__ "read.ctd.sbe(file = file, processingLog = processingLog)" "converted temperature from IPTS-69 to ITS-90" ...

(where I’ve trimmed a few lines out just to make it shorter).

For a single object, there are several ways to access the information contained in the object. The first (and generally recommended) way is to use the [[ accessor — for example if you wanted the temperature values from a ctd object you would do

T <- ctd[['temperature']]

Another way is to access the element directly, by using the slot and list syntax, like:

T <- ctd@data$temperature

The disadvantage to the latter is that it requires knowledge of exactly where the desired field is in the object structure, and is brittle to downstream changes in the oce source.

Working with multiple objects

Frequently, especially with CTD data, it is common to have to work with a number of individual ctd objects — usually representing different casts. One way of organizing such objects, particularly if they share a common instrument, or ship, or experiment etc, is to collect them into a list.

For example, we could loop through a directory of individual cast files (or extract multiple casts from one file using ctdFindProfiles()), and append each one to a list like:

files <- dir(pattern='*.cnv')
casts <- list()
for (ifile in 1:length(files)) {
    casts[[ifile]] <- read.oce(files[ifile])

If we summarize the new casts list, we can see that it’s filled with ctd objects:

str(casts, 1) # the "1" means just go one level deep

Extracting fields from multiple objects at once

Say we want to extract all the temperature measurements from each object in our new list? How could we do it?

The brute force approach would be to loop through the list elements, and append the temperature field to a vector, maybe something like:

T_all <- NULL
for (i in 1:length(casts)) {
    T_all <- c(T_all, casts[[i]][['temperature']])

But in R, there’s a more elegant way — lapply()!

T_all <- unlist(lapply(casts, function(x) x[['temperature']]))

Forking and syncing branches with git and github

When forking a branch on github, it was not entirely clear to me how to sync branches other than master (e.g. to make a pull request). The following eventually seemed to work:

Set upstream remotes

First, you need to make sure that your fork is set up to track the original repo as upstream (from here):

List the current remotes:

$ git remote -v
# origin  https://github.com/YOUR_USERNAME/YOUR_FORK.git (fetch)
# origin  https://github.com/YOUR_USERNAME/YOUR_FORK.git (push)

Specify a new remote upstream repository that will be synced with the fork.

$ git remote add upstream https://github.com/ORIGINAL_OWNER/ORIGINAL_REPOSITORY.git

Verify the new upstream repository you’ve specified for your fork.

$ git remote -v
# origin    https://github.com/YOUR_USERNAME/YOUR_FORK.git (fetch)
# origin    https://github.com/YOUR_USERNAME/YOUR_FORK.git (push)
# upstream  https://github.com/ORIGINAL_OWNER/ORIGINAL_REPOSITORY.git (fetch)
# upstream  https://github.com/ORIGINAL_OWNER/ORIGINAL_REPOSITORY.git (push)

Syncing a fork

Now you’re ready to sync changes! See here for more details on syncing a “main” branch:

Fetch the branches:

$ git fetch upstream
# remote: Counting objects: 75, done.
# remote: Compressing objects: 100% (53/53), done.
# remote: Total 62 (delta 27), reused 44 (delta 9)
# Unpacking objects: 100% (62/62), done.
#  * [new branch]      master     -> upstream/master

Check out your fork’s local master branch.

$ git checkout master
# Switched to branch 'master'

Merge the changes from upstream/master into your local master branch. This brings your fork’s master branch into sync with the upstream repository, without losing your local changes.

$ git merge upstream/master
# Updating a422352..5fdff0f
# Fast-forward
#  README                    |    9 -------
#  README.md                 |    7 ++++++
#  2 files changed, 7 insertions(+), 9 deletions(-)
#  delete mode 100644 README
#  create mode 100644 README.md

Syncing an upstream branch

To sync upstream changes from a different branch, do the following (from here):

git fetch upstream                            #make sure you have all the upstream changes
git checkout --no-track upstream/newbranch    #grab the new branch but don't track it
git branch --set-upstream-to=origin/newbranch #set the upstream repository to your origin
git push                                      #push your new branch up to origin