I first tried running the well-known numerical ocean model MITgcm about 12 years ago when I was thinking of trying to run some numerical simulations of shoaling nonlinear internal waves for my thesis work. As it turned out, I had more observational data than even I was going to be able to look at, so I shelved the modeling part of the project and moved on to pure data analysis.
Fast forward a decade or so, and I’m interested again. What follows is a crude log of my attempts over the course of an afternoon to get MITgcm up and running on my (quite new) M1 Macbook Pro, including various pieces such as getting parallel processing to work (mpi) and output to netCDF files.
Setup notes: Installing MITgcm stuff on M1 mac
I am starting out by following instructions here:
https://jklymak.github.io/MITgcmExampleSteadyGauss/install.html
Already had Xcode and homebrew installed. Then:
$ brew install gcc
$ brew install open-mpi
Step 6 at the link doesn’t work, but it seems like the newer version should be:
$ brew install hdf5-mpi
Then:
$ brew install netcdf
which gave:
==> Installing netcdf dependency: hdf5
Error: Cannot install hdf5 because conflicting formulae are installed.
hdf5-mpi: because hdf5-mpi is a variant of hdf5, one can only use one or the other
Please `brew unlink hdf5-mpi` before continuing.
Unlinking removes a formula's symlinks from /opt/homebrew. You can
link the formula again after the install finishes. You can --force this
install, but the build may fail or cause obscure side effects in the
resulting software.
so I did:
$ brew unlink hdf5-mpi
(note based on Jody’s comments that MITgcm doesn’t have to have hdf5-mpi, but it is required for a parallel-aware netcdf interface he is developing. Don’t know the current status of that). Now that there is a separate hdf5 package with mpi built in I don’t know what the consequences would be for the two “build from source” steps in Jody’s example.
Then I did:
$ brew reinstall --build-from-source netcdf
It’s not clear to me if the “build-from-source” is actually necessary here — it might be part of the steps Jody added for his netcdf interface work. A simple brew install netcdf
might actually be fine …
Trying it an example:
Clone the repo and go into the exp2
example (following https://mitgcm.readthedocs.io/en/latest/getting_started/getting_started.html):
$ git clone https://github.com/MITgcm/MITgcm.git
$ cd verification/exp2/build
Generate the Makefile:
$ ../../../tools/genmake2 -mods ../code/ -optfile ../../../tools/build_options/darwin_amd64_gfortran
GENMAKE :
A program for GENerating MAKEfiles for the MITgcm project.
For a quick list of options, use "genmake2 -h"
or for more detail see the documentation, section "Building the model"
(under "Getting Started") at: https://mitgcm.readthedocs.io/
=== Processing options files and arguments ===
getting local config information: none found
Warning: ROOTDIR was not specified ; try using a local copy of MITgcm found at "../../.."
getting OPTFILE information:
using OPTFILE="../../../tools/build_options/darwin_amd64_gfortran"
get Compiler-version: '11'
getting AD_OPTFILE information:
using AD_OPTFILE="../../../tools/adjoint_options/adjoint_default"
check Fortran Compiler... pass (set FC_CHECK=5/5)
check makedepend (local: 0, system: 1, 1)
=== Checking system libraries ===
Do we have the system() command using gfortran... yes
Do we have the fdate() command using gfortran... yes
Do we have the etime() command using gfortran... c,r: yes (SbR)
Can we call simple C routines (here, "cloc()") using gfortran... yes
Can we unlimit the stack size using gfortran... yes
Can we register a signal handler using gfortran... no
Can we use stat() through C calls... yes
Can we create NetCDF-enabled binaries... yes
skip check for LAPACK Libs
Can we call FLUSH intrinsic subroutine... yes
=== Setting defaults ===
Adding MODS directories: ../code/
Making source files in eesupp from templates
Making source files in pkg/exch2 from templates
Making source files in pkg/regrid from templates
=== Determining package settings ===
getting package dependency info from ../../../pkg/pkg_depend
getting package groups info from ../../../pkg/pkg_groups
checking list of packages to compile:
using PKG_LIST="../code//packages.conf"
before group expansion packages are: gfd cd_code
replacing "gfd" with: mom_common mom_fluxform mom_vecinv generic_advdiff debug mdsio rw monitor
after group expansion packages are: mom_common mom_fluxform mom_vecinv generic_advdiff debug mdsio rw monitor cd_code
applying DISABLE settings
applying ENABLE settings
packages are: cd_code debug generic_advdiff mdsio mom_common mom_fluxform mom_vecinv monitor rw
applying package dependency rules
packages are: cd_code debug generic_advdiff mdsio mom_common mom_fluxform mom_vecinv monitor rw
Adding STANDARDDIRS='eesupp model'
Searching for *OPTIONS.h files in order to warn about the presence
of "#define "-type statements that are no longer allowed:
found CPP_EEOPTIONS="../../../eesupp/inc/CPP_EEOPTIONS.h"
found CPP_OPTIONS="../../../model/inc/CPP_OPTIONS.h"
Creating the list of files for the adjoint compiler.
=== Creating the Makefile ===
setting INCLUDES
Determining the list of source and include files
Writing makefile: Makefile
Add the source list for AD code generation
Making list of "exceptions" that need ".p" files
Making list of NOOPTFILES
Add rules for links
Adding makedepend marker
=== Done ===
original 'Makefile' generated successfully
=> next steps:
> make depend
> make (<-- to generate executable)
Then we run:
$ make depend
which for me generates lots of warnings, e.g.
In file included from ini_model_io.F:21:
./EESUPPORT.h:11:20: warning: empty character constant [-Winvalid-pp-token]
C | environment'' code. This data should be private to the |
^
1 warning generated.
but does produce:
Appending dependencies to Makefile
../../../tools/f90mkdepend >> Makefile
rm -f makedepend.out
Looks like it worked! Now to test.
$ cd ../run
$ ln -s ../input/* .
$ cp ../build/mitgcmuv .
$ ./mitgcmuv
Works!
(PID.TID 0000.0001) // Avg. barrier spins = 1.00E+00
PROGRAM MAIN: Execution ended Normally
STOP NORMAL END
However, I didn’t get netCDF files … must be a setting somewhere to change that I missed. Will look later.
Let’s try a 2 core run. Need to recompile with mpi — I think because there is a SIZE.h_mpi in code/
I don’t need to specify anything else, other than doing:
$ ../../../tools/genmake2 -mods ../code -mpi -optfile ../../../tools/build_options/darwin_amd64_gfortran
which produces:
GENMAKE :
A program for GENerating MAKEfiles for the MITgcm project.
For a quick list of options, use "genmake2 -h"
or for more detail see the documentation, section "Building the model"
(under "Getting Started") at: https://mitgcm.readthedocs.io/
=== Processing options files and arguments ===
getting local config information: none found
Warning: ROOTDIR was not specified ; try using a local copy of MITgcm found at "../../.."
getting OPTFILE information:
using OPTFILE="../../../tools/build_options/darwin_amd64_gfortran"
get Compiler-version: '11'
getting AD_OPTFILE information:
using AD_OPTFILE="../../../tools/adjoint_options/adjoint_default"
check Fortran Compiler... pass (set FC_CHECK=5/5)
check makedepend (local: 0, system: 1, 1)
Turning on MPI cpp macros
=== Checking system libraries ===
Do we have the system() command using mpif77... yes
Do we have the fdate() command using mpif77... yes
Do we have the etime() command using mpif77... c,r: yes (SbR)
Can we call simple C routines (here, "cloc()") using mpif77... yes
Can we unlimit the stack size using mpif77... yes
Can we register a signal handler using mpif77... no
Can we use stat() through C calls... yes
Can we create NetCDF-enabled binaries... yes
skip check for LAPACK Libs
Can we call FLUSH intrinsic subroutine... yes
=== Setting defaults ===
Adding MODS directories: ../code
Making source files in eesupp from templates
Making source files in pkg/exch2 from templates
Making source files in pkg/regrid from templates
=== Determining package settings ===
getting package dependency info from ../../../pkg/pkg_depend
getting package groups info from ../../../pkg/pkg_groups
checking list of packages to compile:
using PKG_LIST="../code/packages.conf"
before group expansion packages are: gfd cd_code
replacing "gfd" with: mom_common mom_fluxform mom_vecinv generic_advdiff debug mdsio rw monitor
after group expansion packages are: mom_common mom_fluxform mom_vecinv generic_advdiff debug mdsio rw monitor cd_code
applying DISABLE settings
applying ENABLE settings
packages are: cd_code debug generic_advdiff mdsio mom_common mom_fluxform mom_vecinv monitor rw
applying package dependency rules
packages are: cd_code debug generic_advdiff mdsio mom_common mom_fluxform mom_vecinv monitor rw
Adding STANDARDDIRS='eesupp model'
Searching for *OPTIONS.h files in order to warn about the presence
of "#define "-type statements that are no longer allowed:
found CPP_OPTIONS="./CPP_OPTIONS.h"
found CPP_EEOPTIONS="./CPP_EEOPTIONS.h"
Creating the list of files for the adjoint compiler.
=== Creating the Makefile ===
setting INCLUDES
Determining the list of source and include files
Writing makefile: Makefile
Add the source list for AD code generation
Making list of "exceptions" that need ".p" files
Making list of NOOPTFILES
Add rules for links
Adding makedepend marker
=== Done ===
I don’t know how much I need to do the stuff about “modules” and $MPI_INC_DIR specified here:
https://mitgcm.readthedocs.io/en/latest/getting_started/getting_started.html#building-with-mpi
Let’s just see if it works with what I did above:
$ make clean
$ make depend
worked.
$ make
Also seems to have worked.
Now we recopy the executable to the run/ directory, and run with mpi:
cd ../run
cp ../build/mitgcmuv .
mpirun -np 2 ./mitgcmuv
Which … didn’t work:
This suggests it’s because I didn’t configure things properly:
$ less STDERR.0000
(PID.TID 0000.0001) *** ERROR *** EEBOOT_MINIMAL: No. of procs= 2 not equal to nPx*nPy= 1
(PID.TID 0000.0001) *** ERROR *** EEDIE: earlier error in multi-proc/thread setting
(PID.TID 0000.0001) *** ERROR *** PROGRAM MAIN: ends with fatal Error
So let’s try again. Probably I just need to copy the code/SIZE.h_mpi
over to SIZE.h
. Then redo the genmake2
, make depend
, and make
steps (after completely nuking everything in the build/
directory).
Then:
$ mpirun -np 2 ./mitgcmuv
STOP NORMAL END
STOP NORMAL END
Yay!
Now to figure out how to do netCDF and then look at data …
NetCDF output
Ok, looks like I need to add mnc
to the packages.conf
file in code/
. Then nuke the build directory and rebuild as above.
Ok, that didn’t work. I still get only *data/meta files …
According to here:
https://mitgcm.readthedocs.io/en/latest/outp_pkgs/outp_pkgs.html#using-pkg-mnc
I need to run a genmake2 that looks like:
$ ../../../tools/genmake2 -mods ../code -mpi -enable=mnc -optfile ../../../tools/build_options/darwin_amd64_gfortran
Then the classic make depend
, make
, and move mitgcmuv
to the run/
directory, and …. I still don’t get any netCDF files
Maybe it’s because I need to set it in the input/data.pkg
file? I modified to look like:
# Packages
&PACKAGES
useMNC=.TRUE.,
&
And will build/run again. Actually, shouldn’t need to build again, because this is a runtime option. I already built with MNC enabled.
This now fails with:
STOP ABNORMAL END: S/R OPEN_COPY_DATA_FILE
STOP ABNORMAL END: S/R OPEN_COPY_DATA_FILE
Let’s just switch back to single processor, and build without mpi to see if netCDF will work.
This doesn’t work either, but at least I can see in the ./mitgcmuv that it’s because it’s not finding a data.mnc
file:
(PID.TID 0000.0001) MNC_READPARMS: opening file 'data.mnc'
File data.mnc does not exist!
I’m just going to try copying the one from Jody’s “SteadyGauss” example, which contains:
# =====================================================================
# | Parameters for MNC (NetCDF) |
# =====================================================================
# Example "data.mnc" file
# Lines beginning "#" are comments
&MNC_01
/
# Note: Some systems use & as the
# namelist terminator. Other systems
# use a / character (as shown here).
Remember to make sure everything in input/
is linked with ln -s ../input/* .
, and then run.
It worked! I see nc files (and data/meta files):
$ ls
PHrefC.data mitgcmuv* pickup.ckptA.002.002.meta
PHrefC.meta monitor.0000000000.t001.nc pickup_cd.ckptA.001.001.data
PHrefF.data monitor_grid.0000000000.t001.nc pickup_cd.ckptA.001.001.meta
PHrefF.meta phiHyd.0000000000.t001.nc pickup_cd.ckptA.001.002.data
RhoRef.data phiHyd.0000000000.t002.nc pickup_cd.ckptA.001.002.meta
RhoRef.meta phiHyd.0000000000.t003.nc pickup_cd.ckptA.002.001.data
SSS.bin@ phiHyd.0000000000.t004.nc pickup_cd.ckptA.002.001.meta
SST.bin@ phiHydLow.0000000000.t001.nc pickup_cd.ckptA.002.002.data
STDERR.0000 phiHydLow.0000000000.t002.nc pickup_cd.ckptA.002.002.meta
data@ phiHydLow.0000000000.t003.nc salt.bin@
data.mnc@ phiHydLow.0000000000.t004.nc state.0000000000.t001.nc
data.pkg@ pickup.ckptA.001.001.data state.0000000000.t002.nc
eedata@ pickup.ckptA.001.001.meta state.0000000000.t003.nc
eedata.mth@ pickup.ckptA.001.002.data state.0000000000.t004.nc
grid.t001.nc pickup.ckptA.001.002.meta theta.bin@
grid.t002.nc pickup.ckptA.002.001.data topog.bin@
grid.t003.nc pickup.ckptA.002.001.meta windx.bin@
grid.t004.nc pickup.ckptA.002.002.data windy.bin@
Apparently the presence of the PHrefC.* files is a bug — Ruth sees those even in all the runs that she does.
The state*
files are the netCDF files that contain the model variables. Note that there are 4 of them, because even though I wasn’t using MPI the SIZE.h
for this example (exp2) did have nSx=2
and nSy=2
(number of processes per tile, which means that it ran multithreaded. What this means is that in order to look at the full model output you have to piece together the 4 files. There are python/matlab scripts to do this (somewhere), but probably no one has done it for R. Not sure if there’s good documentation on what to do to read these in generally.
To really control output I should be using the “diagnostics” package. Ruth will hopefully go over this with the group.
Note that I tried changing SIZE.h to have:
& nSx = 1,
& nSy = 1,
and when I recompile and try to run I get:
(PID.TID 0000.0001) INI_PARMS ; starts to read PARM04
At line 4910 of file ini_parms.for (unit = 11, file = 'scratch1.000000000')
Fortran runtime error: Repeat count too large for namelist object dely
Error termination. Backtrace:
#0 0x1073d5147
#1 0x1073d5dbf
#2 0x1073d6743
#3 0x10749a387
#4 0x1074a2683
#5 0x1074a2937
#6 0x1048966d7
#7 0x1048a2a93
#8 0x1048bbdf3
#9 0x10483e9c3
#10 0x1048c3597
So … not sure what that’s about. Probably some other config parameter that I got wrong.
So now I go back to seeing if the mpi AND netcdf is working. I revert to the SIZE.h file configured for mpi, which has:
& nSx = 1,
& nSy = 2,
& nPx = 2,
& nPy = 1,
so should need two processors to run. I rebuild everything, then go into the run/
directory and do:
mpirun -np 2 ./mitgcmuv
which works! Runs on 2 processors and makes netcdf files (which I don’t really know how to read properly yet …).