Extracting ArcGIS rasters from the HYCOM global ocean model's 4D netCDFs
The Hybrid Coordinate Ocean Model, or HYCOM, is a sophisticated, high resolution system for simulating ocean physics. HYCOM is a set of equations refined over many years that describe the effects of the tides, winds, earth's rotation, and many other factors on the flow of water. Using supercomputers, the HYCOM team executes these equations at fine spatial and temporal scales to produce daily 3D snapshots of oceanographic variables such as temperature, salinity, and current velocity. At the time of this writing, HYCOM had been applied to global ocean simulations at 1/12º resolution and several ocean basins at higher resolution.
Acknowledgement: Thanks to Alan Wallcraft from the Naval Research Laboratory for providing critical hints that made this example possible.
Advantages of HYCOM over satellite data
- HYCOM resolution is similar to satellite resolution. The 1/12º simulations have a cell size of approximately 8.9 km at the equator. This is not as good as the popular global SST products; NODC AVHRR Pathfinder 5.0 and MODIS have cell sizes of 4 to 5 km. But it is far better than the popular Aviso geostrophic currents product, which has a 37 km cell size.
- HYCOM images are cloud-free. Daily satellite images are often very, very cloudy. Even 8-day images can be quite cloudy.
- HYCOM is 3D. Satellite images only provide data for the ocean surface.
- HYCOM provides forecasts. HYCOM was designed to predict the future state of the ocean in near real time. Thus the operational HYCOM datasets extend several days in the future, while satellite images only show past conditions.
Disadvantages of HYCOM
- HYCOM is a model, not reality. While HYCOM has a high spatial resolution, is very sophisticated, has been under refinement for years, has been tested extensively against in situ measurements and satellite estimates, and uses assimilation to improve its accuracy, it is important to keep in mind that HYCOM is a model. If you have in situ or satellite data available, we recommend you compare it against HYCOM output and form your own opinion about whether HYCOM provides enough accuracy for your situation.
Below is visual comparison of relatively cloud-free GOES satellite SST images and corresponding HYCOM SST images for the Gulf Stream. The top HYCOM image is from the HYCOM Global 1/12° Simulation (expt_05.8), while the bottom one is from the HYCOM + NCODA Global 1/12° Analysis (expt_90.8). As you can see, both HYCOM images resemble the GOES images at a broad spatial scale, but the bottom HYCOM image shows a better resemblence at finer scale. Also, both HYCOM images appear to deviate from the GOES images in the hottest and coldest areas by as much several degrees.
These differences may or may not be important, depending on how the data are used. In showing these comparisons, we make no claims about whether HYCOM output might or might not be suitable for your analysis. We simply urge you to make your own comparison and decide for yourself. An important difference between the two HYCOM datasets shown here is that the top one was a "free run" that simulated the global ocean without attempting to increase accuracy by assimilating in situ or satellite measurements, while the bottom one did use assimilation. The bottom one is also several years newer than the top one and is probably based on improved science. Finally, it is very difficult to model the fine scale structure and dynamics of the Gulf Stream, so it is not surprising to find that HYCOM does not match the satellite at fine scales. Better correspondance might be observed elsewhere, in less dynamic regions.
- HYCOM data are available for limited time ranges. At the time of this writing, global simulations were available from 2003 to the present, a north and equatorial Pacific simulation was avialble for 1979-2003, and a Gulf of Mexico simulation was available from 2008 to the present. Data that incorporate the latest assimilation techniques are only available for the most recent years.
- HYCOM provides only physical variables. These variables typically include temperature, salinity, u and v current vectors (eastward and northward velocity), sea surface height, and various properties of the mixed layer. For biological variables such as chlorophyll density or primary and secondary productivity, you must use products estimated from satellite data or other ocean models such as ROMS-CoSINE.
- Global HYCOM simulations use a complicated coordinate system. As discussed below, global HYCOM simulations use three different coordinate systems, making it difficult to import these data into a GIS as raster data. Most of the complexity of this example relates to this problem; please see below for details.
The structure of HYCOM output
HYCOM output is structured as a time series of snapshots of the state of simulated region. At each time period, typically 1 day, there are a set of 2D grids that provide ocean surface parameters such as sea surface height, thickness of the mixed layer, and so on. There are also a set of 3D grids that specify temperature, salinity, and u and v current vectors at a series of depths. There are typically 33 depth layers: 0, 10, 20, 30, 50, 75, 100, 125, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1750, 2000, 2500, 3000, 3500, 4000, 4500, 5000, and 5500 meters. Although HYCOM executes the simulation at a variety of depths using a sophisticated algorithm, the model outputs are usually interpolated to these 33 common depths, for consistency with other global ocean datasets.
Acquring HYCOM output
At the time of this writing, HYCOM output could be acquired from http://www.hycom.org as OPeNDAP datasets in a THREDDS catalog and as series of netCDF files from an FTP or HTTP server. If you are familiar with OPeNDAP and can write the code necessary to acquire data through it, I suggest you use it, particularly if you only need a small subset of the data.
In the project that gave rise to this example, I needed to acquire four years of temperature and currents data for all 33 depth layers. This worked out to be about 9 TB of data. I found that it was faster to download netCDF files than go through OPeNDAP for that much data. Although the HYCOM FTP server appeared to impose a throughput limit of 1.25 MB/s per FTP download, I was able to run 8 simultaneous downloads with SmartFTP and maintain an overall throughput of 10 MB/s. If you have a fast Internet connection, this may a good way to acquire data quickly. The remainder of this example assumes you will also use netCDF files.
On the HYCOM FTP server, the directory housing each dataset was organized like this:
The data directory contained the model output. The 2d subdirectory contained one netCDF file per day. Each of these contained the 2D variables representing the state of the ocean surface, as you can see in this netCDF header. The salt, temp, uvel, and vvel subdirectories contained the salinity, temperature, eastward current velocity, and northward current velocity data, respectively. These also contained one netCDF per day, as shown above. Each netCDF contained a single physical variable (salinity, temperature, u, or v) as well as several auxiliary variables representing the grid coordinates, as you can see in this netCDF header. Each physical variable had four dimensions, time, depth, x, and y, with just one time slice but 33 depth slices.
The topo directory contained four files:
- regional.depth.a contained the bathymetry grid used by the simulation as a 2D binary array of big endian 32-bit IEEE 754 floats in row-major order (i.e. the order used by the C programming language).
- regional.depth.b was a text file specifying the dimensions of regional.depth.a.
- regional.grid.a contained grids specifying the latitudes and longitudes of the centers of the HYCOM grid cells.
- regional.grid.b was a text file specifying the dimensions of regional.grid.a.
The HYCOM User's Guide, available on the HYCOM documentation page describes these files in detail.
How HYCOM output is georeferenced
Two of the biggest challenges in working with HYCOM output are understanding how it is georeferenced and getting the data into a GIS-compatible raster format. The HYCOM User's Guide provides some essential information but is written for oceanographers who will be running HYCOM, not for ecologists who will consume HYCOM output. The essential parts are chapter 3: The HYCOM Grid, section 2.3: I/O File Formats in HYCOM, and section 5.1: File "regional.grid.[ab]". From these, some hints from Alan Wallcraft, and extensive experimentation, I was able to understand how HYCOM data is georeferenced and develop a strategy for getting it into ArcGIS-compatible format.
HYCOM simulations are either global or regional. This discussion concerns global simulations. I have not looked at any regional simulations yet so I don't know how much of it applies to them.
The main difficulty with global HYCOM simulations is that they use three different map projections in one grid, as shown below. Ignore the data (the colors) in this map and just focus on the geographic elements.
The central part of the grid uses a Mercator projection with square cells and a fixed cell size. This section can be converted directly to an ArcGIS-compatible raster format without much trouble. To the south is an equiangular section (i.e. a traditional "geographic" projection in ArcGIS terminology). Although these cells have a fixed size (in degrees), the cell width and height are different. The common raster formats compatible with ArcGIS require square cells, so this section is more difficult to get into ArcGIS. To the north is a "bi-polar" section. To my knowledge, there is no way to represent this directly in ArcGIS without ESRI introducing support for this projection.
My approach to dealing with this is to extract three ArcGIS rasters from each HYCOM grid. I call these the equatorial, Arctic, and Antarctic rasters. The equatorial raster is simply a verbatim copy of the Mercator section of the grid. The Arctic and Antarctic rasters are created by treating the HYCOM cells in those sections as points, projecting to a polar stereographic projection, and interpolating using ArcGIS's inverse distance weighting algorithm. The details of this are shown in the code below.
HYCOM simulations assume the Earth is a sphere with radius 6371001.0 meters (yes, 6371001.0, not 6371000.0). Alan Wallcraft provided this information; I did not find it in any documentation. Thus the rasters output by the code below use a custom datum having that spheroid. To get other geographic data into these projections, you can use a custom geographic transformation, as described in the Sinusoidal MODIS example.
Step-by-step instructions for extracting HYCOM output
Prerequisites / assumptions
- ArcGIS 9.1 or later is installed (note: this has only been tested with 9.3.1)
- MGET 0.7 or later is installed
- You are comfortable running programs from the Windows command prompt (a.k.a. DOS)
- You are using a global HYCOM dataset, not a regional one; the procedure will probably fail on regional datasets but you could adapt it to work with them
- You are extracting the one of the 3D variables called temperature, salinity, u, or v; you must modify the code to extract one of the 2D surface variables (e.g. ssh)
- Create a folder on your hard drive. I suggest and will assume you will use C:\HYCOM, although you can use a different folder if desired.
- Right-click on the following files and save them into C:\HYCOM:
- Extract_Rasters_From_HYCOM_Global_NetCDFs.py - Python script that performs the extraction
- Extract_Rasters_From_HYCOM_Global_NetCDFs.cmd - Batch file for running the Python script in a loop until it completes successfully, to work around memory leaks in ArcGIS that crash the script
- ascii2shp.exe - Helper program for rapidly converting a CSV file into a shapefile. Warning: be sure to save this with a .exe extension (some browsers may try to prevent it).
- Using your favorite web browser or FTP client, go to the HYCOM server and navigate to the directory containing the files for the HYCOM dataset you want. For this example, I used the HYCOM Global 1/12° Simulation (expt_05.8) which had files available by FTP here, but you should study the different datasets that are available and choose the one that best fits your needs. Remember that the script I wrote for this procedure only works with global HYCOM datasets. (If you have some programming skills, you could probably modify it to work with regional datasets.)
- On the HYCOM server, go to the topo directory, download the files regional.grid.a and regional.grid.b, and save them to C:\HYCOM.
- Create the directory C:\HYCOM\NetCDFs.
- On the HYCOM server, go to the data directory and then to the subdirectory for your oceanographic variable of interest, either salt, temp, u, or v. Download the netCDF files (.nc file extension) for your dates of interest. If you are downloading a lot of files and have a fast Internet connection, consider using a program like SmartFTP that can download multiple files simultaneously, to work around the per-file throughput limitation imposed by the server. (I was told by Michael McDonald of HYCOM that this is ok, although at the time of this writing the HYCOM server limited number of simultaneous connections from the same client IP address to 8.)
- Create the directory C:\HYCOM\Rasters.
- Start a CMD shell. (On Windows XP or Server 2003, click Start, Run, type CMD, and press Enter. On Vista or later, click Start, type CMD, and press Enter.) CD to the C:\HYCOM directory.
- Before extracting a large batch of data, you should verify that you can extract a single depth layer from a single file. Type the following into the CMD shell and press Enter (this assumes you want to extract the temperature variable for the 0 m depth layer from the file archv.2003_232_00_3zt.nc):
Extract_Rasters_From_HYCOM_Global_NetCDFs.py NetCDFs\archv.2003_232_00_3zt.nc temperature regional.grid.a regional.grid.b Rasters 0
You should see output that looks like this:
C:\HYCOM>Extract_Rasters_From_HYCOM_Global_NetCDFs.py NetCDFs\archv.2003_232_00_3zt.nc temperature regional.grid.a regional.grid.b Rasters 0 2009-10-01 10:52:16,187 INFO Initializing the ArcGIS geoprocessor. 2009-10-01 10:52:25,671 INFO This HYCOM grid has 4500 rows and 3298 columns. 2009-10-01 10:52:25,671 INFO Each 2D variable in a HYCOM .a file takes 59375616 bytes, including the padding to a 16 KB boundary. 2009-10-01 10:52:25,671 INFO Reading the longitude and latitude grids from regional.grid.a. 2009-10-01 10:52:30,625 INFO The cell size of the Mercator section of the grid is 8895.5955278281017 m. 2009-10-01 10:52:30,625 INFO The central meridian is -105.88. 2009-10-01 10:52:30,780 INFO The Mercator section of the grid spans rows 1126 through 2907, inclusive, where the top row is 0. 2009-10-01 10:52:30,780 INFO The center latitudes of the top and bottom rows of the Mercator section are 46.9873 and -66.1599. 2009-10-01 10:52:30,780 INFO The projected x and y coordinates of the lower-left corner of the Mercator section are -20015089.937613226, -9914094.8628181182. 2009-10-01 10:52:31,171 INFO Found 1 input files. Looking for existing output rasters. 2009-10-01 10:52:31,217 INFO Extracting 1 total depth slices spread across 1 input files. 2009-10-01 10:52:31,217 INFO Reading depth slice 0 from C:\HYCOM\NetCDFs\archv.2003_232_00_3zt.nc. 2009-10-01 10:52:44,203 INFO Creating ArcGIS raster Rasters\temperature\Equatorial\Depth_0\2003\temp20032322.img... 2009-10-01 10:53:32,233 INFO Writing 1788880 points to a temporary CSV file. 2009-10-01 10:53:50,483 INFO Executing program: C:\HYCOM\ascii2shp.exe C:\Temp\GeoEcoTemp_jjr8\tmp0vucia\points.csv C:\Temp\GeoEcoTemp_jjr8\tmp0vucia\points.shp X Y 2009-10-01 10:54:28,250 INFO ascii2shp.exe returned exit code 0. 2009-10-01 10:54:30,421 INFO Projecting the points to a polar stereographic projection. 2009-10-01 10:58:29,437 INFO Interpolating an Arctic polar stereographic raster with cell size 6065.1235625980898 m. 2009-10-01 10:59:06,421 INFO Copying ArcGIS raster C:\Temp\GeoEcoTemp_jjr8\tmp0vucia\raster to C:\HYCOM\Rasters\temperature\Arctic\Depth_0\2003\temp20032321.img... 2009-10-01 10:59:10,733 INFO Writing 753298 points to a temporary CSV file. 2009-10-01 10:59:19,655 INFO Executing program: C:\HYCOM\ascii2shp.exe C:\Temp\GeoEcoTemp_jjr8\tmpkdlzzm\points.csv C:\Temp\GeoEcoTemp_jjr8\tmpkdlzzm\points.shp X Y 2009-10-01 10:59:34,500 INFO ascii2shp.exe returned exit code 0. 2009-10-01 10:59:34,546 INFO Projecting the points to a polar stereographic projection. 2009-10-01 11:00:55,125 INFO At the northernmost row of the equirectangular Antarctic section, the cells are 3595.4712307818422 m wide and 8809.0454264488071 m high. 2009-10-01 11:00:55,125 INFO Interpolating an Antarctic polar stereographic raster with cell size 8809.0454264488071 m. 2009-10-01 11:01:09,203 INFO Finished extracting: 0:08:37 elapsed, 1 slices extracted, 0:08:37.985000 per slice. C:\HYCOM>
- Examine the C:\HYCOM\Rasters directory. It should look like the example below, but not be as fully populated. I show a more populated one here so you'll get a better idea what it will look like after you extract more data.
Within C:\HYCOM\Rasters is one subdirectory for each netCDF variable that you have extracted. With the steps above, only the temperature subdirectory will be created. The example below shows all four: salinity, temperature, u, and v.
Within each variable subdirectory are three subdirectories: Antarctic, Arctic, and Equatorial. These hold rasters for the three sections of the HYCOM grid. Each time you extract a depth layer for a given variable from a given file, one raster will be written to each of these subdirectories.
Within those are subdirectories specifying the depth. Depth_0 is the surface layer, Depth_100 is the 100 m depth layer, and so on. With the steps above, only Depth_0 will be created. The example below shows four depth layers.
Within the depth directories are year subdirectories. The steps above will produce one year (2003), while the example below shows two years.
The year directories contain rasters. Each raster name begins with an abbreviation for the variable (either salt, temp, u, or v) followed by the four-digit year, followed by the three-digit day of the year, followed by a one-digit code indicating the region (1 - Arctic, 2 - Equatorial, 3 - Antarctic). The steps above will create one raster in each year directory (temp2003292*) while the example below shows two rasters. The rasters are stored in ERDAS Imagine format (.img format). If you prefer another format such as ArcInfo Binary Grid or GeoTIFF, you can edit the script to produce that format.
- Examine the three output rasters. The rasters extracted from depth 0 for the temperature file used in this example look like the following. Note that the temperature scale is different in each of the three images; do not compare the colors between the images here. These are from day 232 of 2003, which is Arctic summer and Antarctic winter, thus the Antarctic image shows a relatively constant temperature of about -1.8 °C. The two red intrusions are about 0 to 0.5 °C.
- Now you are ready to do some batch extractions. The Python script is capable of doing this but there is a problem: some of the ArcGIS tools leak memory. After the Python script runs for a few hours it will crash. To work around this, I wrote the .cmd script to call the Python script in a loop until it completes successfully. You pass the same command line arguments to it. But now you can specify a wildcard expression (technically, a "glob" expression) for the file name and also a list of depths to extract. For example, to process all temperature files from 2003 and extract depth layers at 0, 100, 500, and 5000 meters:
Extract_Rasters_From_HYCOM_Global_NetCDFs.cmd NetCDFs\archv.2003_*_3zt.nc temperature regional.grid.a regional.grid.b Rasters 0 100 500 5000
It is important that you specify valid depth values. If you do not, the tool will report an error and remind you what they are. Also, the tool can only extract one variable at a time. If you downloaded files for multiple variables, be sure your wildcard expression restricts the tool to the files for one variable or you will receive an error.
You can monitor the tool's progress by watching the CMD window. Every time the tool completes an extraction it will issue a status report giving an estimated time of completion. Eventually, if the tool runs long enough it will run out of memory. When this happens you will likely see either of the following two errors. When that happens, the script will pause for 10 seconds and then restart. It will pick up where it left off. If it continues successfully then you know it was the memory leak problem. If it keeps repeating the same thing over and over, then something else is wrong and you should press Ctrl-Break to stop the script and investigate the problem.
GeoEco.ArcGIS.ArcGISError: An ArcGIS geoprocessing function failed. This usually results from a problem with your inputs. Please check them and try again. Also review any preceding error messages and the detailed error information that appears at the end of this message. If you suspect the failure is due to a programming mistake in this tool or ArcGIS, please contact the author of this tool for assistance. Detailed error information: ArcGIS Geoprocessor object 0x01583068: Invocation of Idw_sa(*args=('C:
projected_points.shp', 'Variable', 'C:
raster', 6065.1235625980898, 1, 'FIXED 8577.3799996948419')) raised ExecuteError: ERROR 010067: Error in executing grid expression.
Failed to execute (Idw).
- The HYCOM u variable provides the water velocity in the east/west direction, with east being positive. The v variable provides water velocity in the north/south direction, with north being positive. On the Mercator rasters, east is to the left and north is up. But on the polar stereographic rasters, east is either anti-clockwise around the center of the image (Arctic) or clockwise (Antarctic), while north is either towards the center of the image (Arctic) or away from it (Antarctic). If you use a tool to produce a quiver plot of the current vectors for the polar rasters, the tool must be smart enough to understand this. The tool that I have privately provided to people in the past for creating vector shapefiles does not understand this.
- The extents of the rasters for deep layers are smaller than those for the surface layers. This is because the deeper layers have less data (i.e. the water stops at the sea floor) and the Python script does not try to create layers on a fixed extent. TODO: Add pictures showing this. You could modify the script to set the geoprocessing's extent prior to doing the extractions and it should produce them on a consistent extent.
- In the Python script, a lot of the execution time is spent handling the Arctic and Antarctic regions. The script writes the points out to a CSV file, converts it to a shapefile, and then interpolates a raster. If you are just interested in the Mercator section, you can comment out the code that creates the Arctic and Antarctic regions and the script will go a lot faster. Be sure to also comment out the portion of the code that checks for the existence of the Arctic and Antarctic rasters. If you don't comment that out, every time the script restarts, it will start from the beginning again and probably never complete a large batch of extractions.