Converting HDFs to ArcGIS Rasters
Many oceanography products are published in HDF format but ArcGIS 9.1 cannot read HDF. Arc 9.2 is supposed to support HDF but my experience has not been very good. For example, I have been unable to get it to read PO.DAAC QuickSCAT ocean winds HDFs, and it is extremely slow at reading NOAA NODC 4 km AVHRR Pathfinder SST HDFs. Arc also cannot detect the projection used by HDF files, which is not surprising due to the lack of a widely-adopted standard for HDF metadata. As a result, you must copy the HDF to some other format, shift and rescale the data, and define the projection yourself.
HDF conversion utilities exist but as far as I've seen, none are available as geoprocessing tools. The MGET Convert SDS in HDF to ArcGIS Raster tool will convert a Scientific Data Set (SDS) in an HDF file to ArcGIS raster format.
2009 update: In ArcGIS 9.3, you can use the built-in Extract Subdataset tool to extract an HDF SDS to a raster. Using that tool in conjunction with other built-in tools such as Define Projection, Clip, Single Output Map Algebra, and so on, you can reproduce almost all of the capabilities of MGET's conversion tool using built-in tools. Therefore, if you have ArcGIS 9.3, you may be able to accomplish your conversion job without installing MGET. But there are several reasons you may want to install MGET. First, I have never been able to get Extract Subdataset's Subdataset ID parameter to work; it always fails with "ERROR 999998: Unexpected Error.". Second, the MGET tool automatically handles compressed HDFs (e.g. files with .gz, .bz2, .zip, or .Z extensions) and collects many common post-conversion steps into a single tool, making it unnecessary to create a lengthy geoprocessing model with lots of temporary outputs for all of these steps. Finally, MGET is useful for other tasks besides HDF conversion; here are more examples.
MGET can now download and convert popular products automatically
Since we first wrote this example, we have developed a number of tools that completely automate the procedure of downloading and converting various popular products. These tools perform all of the necessary steps to create a collection of georeferenced ArcGIS-compatible rasters. Given a spatial and temporal extent, they efficiently download data for only the region of interest using the OPeNDAP protocol; when the data provider does not make the data available with OPeNDAP, the tools download and convert files from the providers FTP or HTTP server and clip them to your region of interest.
Before you proceed with manually downloading and converting files, we suggest to check whether MGET already has a specialized tool designed for your product of interest. These tools are available under the Data Products node of the MGET toolbox. Look for tools beginning with the word Create Rasters. Here are a few examples:
- Create Rasters for AVHRR Pathfinder V5 SST
- Create Rasters for NASA OceanColor L3 SMI Product
- Create Rasters for PO.DAAC MODIS L3 SST
- Create Rasters for Aviso SSH Product
- Create Rasters for HYCOM GLBa0.08 Equatorial 4D Variable
This list is incomplete, so of you do not see the product you're interested in, be sure you check the Data Products node before resorting to manually downloading and converting your product. Assuming you need to do that, the rest of this page describes the procedure using AVHRR Pathfinder version 5 SST data (MGET can download and convert it automatically, but we use it as an example because many people are familiar with it).
Finding the tool
The tool is in the Marine Geospatial Ecology Tools toolbox. If you do not see the toolboxes window, click the Show/Hide ArcToolbox Window button on the toolbar.
Examining the tool's parameters
The tool has six required parameters and a number of optional ones.
In addition to specifying the input HDF file and output raster, you must also specify the name of the SDS to extract from the HDF, the coordinates of the lower-left corner, and the cell size. The raster dimensions in rows and columns and the data type are automatically obtained by the tool from the HDF metadata. You will often also specify a NODATA value, and other optional parameters that are not shown.
The parameter values can usually be found in the documentation for the data product you are converting. Many of them are often also present as "attributes" in the HDF metadata, but because there is no reliable naming convention for these attributes, the tool cannot automatically obtain them. You can often figure out the parameter values yourself by extracting the HDF header and examining the attributes. The MGET Extract HDF Header tool will write the header to a text file.
Extracting the HDF header and finding parameter values
Running the MGET Extract HDF Header tool is very simple. You simply specify the input HDF file and the output text file to create:
The hard part is interpreting the header and finding the values to plug into the Convert SDS in HDF to ArcIGS Raster tool. Although all HDF headers adhere to a common data model of variables and attributes, each data provider defines their own variables and attributes. Here, I provide an example that will hopefully give you an idea what to look for.
- 199001.s04m1pfv50-sst-16b.hdf contains global monthly sea surface temperature data for January 1990 from the AVHRR sensor on the NOAA POES satellites, published by the NOAA NODC 4 km AVHRR Pathfinder Project.
Here is the extracted header:
Finding the SDS name in the HDF header
Open the HDF file in a text editor such as Notepad. Each SDS is represented as a "variable" in the HDF header. You can find the names of the SDSes in the header by searching for Variable Name = and then looking at what follows. In the Pathfinder SST header, your search will find three variables:
Variable Name = sst Index = 0 Type= 16-bit unsigned integer Ref. = 2 Rank = 2 Number of attributes = 12 Dim0: Name=lat Size = 4096 Scale Type = 64-bit floating point Number of attributes = 2 Dim1: Name=lon Size = 8192 Scale Type = 64-bit floating point Number of attributes = 2 Attr0: Name = dsp_PixelType Type = 8-bit unsigned integer Count= 1 Value = 1 Attr1: Name = dsp_PixelSize Type = 8-bit unsigned integer Count= 1 Value = 2 Attr2: Name = dsp_Flag Type = 16-bit unsigned integer Count= 1 Value = 0 Attr3: Name = dsp_nBits Type = 16-bit unsigned integer Count= 1 Value = 16 Attr4: Name = dsp_LineSize Type = 32-bit signed integer Count= 1 Value = 0 Attr5: Name = dsp_cal_name Type = 8-bit signed char Count= 11 Value = Temperature Attr6: Name = units Type = 8-bit signed char Count= 4 Value = Temp Attr7: Name = dsp_cal_eqnNumber Type = 16-bit unsigned integer Count= 1 Value = 2 Attr8: Name = dsp_cal_CoeffsLength Type = 16-bit unsigned integer Count= 1 Value = 8 Attr9: Name = dsp_cal_coeffs Type = 32-bit floating point Count= 2 Value = 0.075000 -3.000000 Attr10: Name = scale_factor Type = 32-bit floating point Count= 1 Value = 0.075000 Attr11: Name = add_off Type = 32-bit floating point Count= 1 Value = -3.000000 Dimension Variable Name = lat Index = 1 Scale Type= 64-bit floating point Ref. = 113 Rank = 1 Number of attributes = 2 Dim0: Name=lat Size = 4096 Attr0: Name = unit Type = 8-bit signed char Count= 14 Value = degrees_north\000 Attr1: Name = long_name Type = 8-bit signed char Count= 10 Value = latitude\000\000 Dimension Variable Name = lon Index = 2 Scale Type= 64-bit floating point Ref. = 115 Rank = 1 Number of attributes = 2 Dim0: Name=lon Size = 8192 Attr0: Name = unit Type = 8-bit signed char Count= 13 Value = degrees_east\000 Attr1: Name = long_name Type = 8-bit signed char Count= 11 Value = longitude\000\000
In this case, it is obvious that you want the variable named sst, simply because if its name. But the other two variables deserve some explanation in case you encounter them in other HDF files. They are "dimension" variables. These list the coordinates of other variables. In this file you can see two dimension variables, named lat and lon. In general you can ignore the dimension variables, although you can sometimes find out interesting information by examining their names or attributes. If you try to convert them, the tool will report an error because they are 1-dimensional and the tool can only convert 2-dimensional variables.
Finding the coordinates of the lower-left corner in the HDF header
This is more difficult than finding the SDS name. You must carefully scroll through the entire HDF header and look for something that seems like the right information. After lengthy examination, you may discover the following under File attributes:
Attr69: Name = dsp_nav_ximgcv Type = 32-bit floating point Count= 8 Value = 89.980003 0.000000 -89.980003 0.000000 -179.981979 0.000000 179.981979 0.000000
These look like coordinates of top, bottom, left, and right extents. If you search the Internet for dsp_nav_ximgcv you will get several hits, but when I searched, my results did not lead to a definition of that attribute. For Pathfinder SST, you must resort to the 4 km Pathfinder Version 5.0 User Guide. On that page, you will eventually find this:
Corner Coordinates: Upper Left (180d 0' 0.00"W, 90d 0' 0.00"N) Lower Left (180d 0' 0.00"W, 90d 0' 0.00"S) Upper Right (180d 0' 0.00"E, 90d 0' 0.00"N) Lower Right (180d 0' 0.00"E, 90d 0' 0.00"S) Center ( 0d 0' 0.00"E, 0d 0' 0.00"N)
From this experience, you should have learned that there is no 100% reliable means to finding out the coordinates. Your best bet is to carefully review both the HDF header and the documentation for the product you are working with.
Determining the cell size from the HDF header
In the previous step you found the coordinates of the lower-left corner, i.e. the left and bottom extent of the raster. The easiest way to determine the cell size is to find the top or right extent, calculate the height or width of the raster, and divide by the number of rows or columns:
Cell size = (top - bottom) / rows = (right - left) / columns
The number of rows and columns appear as the Size parameter of the Dim0 and Dim1 metadata for the HDF variable. For the Pathfinder SST file, there are 4096 rows and 8192 columns:
Variable Name = sst Index = 0 Type= 16-bit unsigned integer Ref. = 2 Rank = 2 Number of attributes = 12 Dim0: Name=lat Size = 4096 Scale Type = 64-bit floating point Number of attributes = 2 Dim1: Name=lon Size = 8192 Scale Type = 64-bit floating point Number of attributes = 2 ...
So the cell size is:
Pathfinder cell size = (top - bottom) / rows = 180 / 4096 = 0.0439453125
Occasionally you can also find the cell size listed as an attribute. For example, in HDF headers for MODIS chlorophyll data, there are two File attributes:
Attr42: Name = Latitude Step Type = 32-bit floating point Count= 1 Value = 0.041667 Attr43: Name = Longitude Step Type = 32-bit floating point Count= 1 Value = 0.041667
Notice that these round the cell size to six decimal places. We recommend you use as many decimal places as ArcGIS will allow you to enter. If you truncate at only six, you may notice that the output raster does not span exactly 360 degrees of longitude and 180 degrees of latitude; it may span slightly more or less, depending on how the rounding was done. The error may be insignificant for most purposes, but it is always best to minimize error whenever possible.
It is a common practice for data providers to use Dim0 for the rows and Dim1 for the columns, but occasionally they use Dim1 for the rows and Dim0 for the columns. When this happens, the converted raster will appear as though it was flipped about the diagonal axis. You can correct this problem using Transpose and Flip under Pre-conversion processing.
Converting the SDS
Now that you've populated all of the required parameters, you can convert the image and see what they look like in ArcMap width a blue-to-red gradient applied:
Using the optional parameters
When you loaded the converted raster into ArcMap, you might have noticed several problems:
- The land came out as blue, rather than NODATA.
- There is no coordinate system defined for the raster.
- The raster values are strange integers rather than the floating point values you expected (i.e. degrees C).
- There were no pyramids built for the rasters.
You can correct these problems with the optional parameters.
Setting land to NODATA
To set land to NODATA during the conversion process, you must specify the value to use for NODATA. You can determine this value by loading the rasters into ArcMap and using the Identify tool to obtain the value of pixels you know are land.
The Pathfinder SST data uses the value 0 for land:
After reconverting the files using these NODATA values, you'll notice that cloud pixels may have been marked as NODATA. Some data products, such as NASA OceanColor's MODIS chlorophyll product, mark clouds using the same value as land. Other data products, such as NOAA NODC's AVHRR Pathfinder SST provide separate HDF files or variables that indicate the quality of each pixel, or whether it failed or passed automated cloud tests. In the January 1990 image in this example, clouds are visible off the coast of North America:
One way to mark these as NODATA is:
- Obtain the file 199001.m04m1pfv50-qual.hdf, which contains a 0 to 7 "quality flag" for each pixel.
- Convert the qual SDS from that file to an ArcGIS raster.
- Use the ArcGIS Spatial Analyst Set Null tool to set pixels of the SST raster to NODATA where the corresponding quality flag raster value is less than 7.
The procedure for masking clouds or other defects is different for every data product. Some products, such as the AVHRR Pathfinder SST, even have multiple ways to mask clouds in the same product. Usually, you have to read the product documentation to figure out how to do it.
Defining the projection
You can instruct the tool to define the raster's projection by providing a value for the Define coordinate system parameter under Post-conversion processing:
You usually have to look up the coordinate system for the product you're converting in the product's documentation. Some global products that use a geographic coordinate system (i.e. longitudes and latitudes), including the product presented in this example, do not identify a coordinate system in their documentation, presumably because the data provider believes it doesn't matter which datum you use because the resolution is so coarse. For these products, we recommend you use the "WGS 1984" coordinate system (i.e. the geographic coordinate system that uses the WGS 1984 datum).
Converting integer values to the expected floating-point values
Many data products use integer data types to save space. In the example presented here, the values are stored as 16-bit integers. This allows the HDF file to be 2x or 4x smaller than it would be if the values were stored as the typical 32-bit or 64-bit floats. To obtain the floating-point values, you must run the integers through a scaling equation. The scaling equation is often present in the HDF header in some form, usually as attributes of the variable you're converting. If you can't find it there, check the product's documentation.
In the Pathfinder SST header, the sst variable has some promising attributes:
Attr7: Name = dsp_cal_eqnNumber Type = 16-bit unsigned integer Count= 1 Value = 2 Attr8: Name = dsp_cal_CoeffsLength Type = 16-bit unsigned integer Count= 1 Value = 8 Attr9: Name = dsp_cal_coeffs Type = 32-bit floating point Count= 2 Value = 0.075000 -3.000000 Attr10: Name = scale_factor Type = 32-bit floating point Count= 1 Value = 0.075000 Attr11: Name = add_off Type = 32-bit floating point Count= 1 Value = -3.000000
After accruing some experience with different data products, I can guess that the scale factor and add_off parameters are what we need. Compare the values of those to this text from the 4 km Pathfinder Version 5.0 User Guide:
Each of the seven parameter files listed above contains a mapped array with 8192 elements in longitude and 4096 in latitude, plus a vector of length 8192 identifying the longitudes and a vector with 4096 values indicating the latitudes. There are also global tags describing the entire contents as well as tags describing each of the 2 vectors and 1 array. The seven parameters are stored either as 8-bit or 16-bit unsigned integers which may be converted linearly (y = mx + b) to geophysical units using a scale (i.e., slope=m) and offset (i.e., intercept=b) according to the following table:
PARAMETER # BITS SCALE OFFSET UNITS ------------------------------------------------------------ "All-pixel" SST 16 bit 0.075 -3.0 Deg C First-guess SST 16 bit 0.075 -3.0 Deg C Standard Deviation 16 bit 0.150 0.0 Deg C Number of Observations 8 bit 1.000 0.0 Unitless Overall Quality Flag 8 bit 1.000 0.0* Unitless Mask 1 8 bit 1.000 0.0 Unitless Mask 2 8 bit 1.000 0.0 Unitless
We are dealing with SST data, so the scaling equation is:
The corresponding Map algebra expression is:
To instruct the tool to build pyramids for the ArcGIS raster, simply enable the Build pyramids option.
The final raster
After you populate the optional parameters, run the tool again to regenerate the raster. It should now have NODATA for land, have a coordinate system defined, use floating point values, and have pyramids already built. You can compare it to these rasters that I created using the parameters specified above.