Converting HDFs to ArcGIS Rasters

Many oceanography products are published in HDF format but ArcGIS 9.1 cannot read HDF. Arc 9.2  is supposed to support HDF but my experience has not been very good. For example, I have been unable to get it to read  PO.DAAC QuickSCAT ocean winds HDFs, and it is extremely slow at reading  NOAA NODC 4 km AVHRR Pathfinder SST HDFs. Arc also cannot detect the projection used by HDF files, which is not surprising due to the lack of a widely-adopted standard for HDF metadata. As a result, you must copy the HDF to some other format, shift and rescale the data, and define the projection yourself.

HDF conversion utilities exist but as far as I've seen, none are available as geoprocessing tools. The MGET Convert SDS in HDF to ArcGIS Raster tool will convert a Scientific Data Set (SDS) in an HDF file to ArcGIS raster format.

2009 update: In ArcGIS 9.3, you can use the built-in  Extract Subdataset tool to extract an HDF SDS to a raster. Using that tool in conjunction with other built-in tools such as Define Projection, Clip, Single Output Map Algebra, and so on, you can reproduce almost all of the capabilities of MGET's conversion tool using built-in tools. Therefore, if you have ArcGIS 9.3, you may be able to accomplish your conversion job without installing MGET. But there are several reasons you may want to install MGET. First, I have never been able to get Extract Subdataset's Subdataset ID parameter to work; it always fails with "ERROR 999998: Unexpected Error.". Second, the MGET tool automatically handles compressed HDFs (e.g. files with .gz, .bz2, .zip, or .Z extensions) and collects many common post-conversion steps into a single tool, making it unnecessary to create a lengthy geoprocessing model with lots of temporary outputs for all of these steps. Finally, MGET is useful for other tasks besides HDF conversion; here are more examples.

MGET can now download and convert popular products automatically

Since we first wrote this example, we have developed a number of tools that completely automate the procedure of downloading and converting various popular products. These tools perform all of the necessary steps to create a collection of georeferenced ArcGIS-compatible rasters. Given a spatial and temporal extent, they efficiently download data for only the region of interest using the OPeNDAP protocol; when the data provider does not make the data available with OPeNDAP, the tools download and convert files from the providers FTP or HTTP server and clip them to your region of interest.

Before you proceed with manually downloading and converting files, we suggest to check whether MGET already has a specialized tool designed for your product of interest. These tools are available under the Data Products node of the MGET toolbox. Look for tools beginning with the word Create Rasters. Here are a few examples:

This list is incomplete, so of you do not see the product you're interested in, be sure you check the Data Products node before resorting to manually downloading and converting your product. Assuming you need to do that, the rest of this page describes the procedure using AVHRR Pathfinder version 5 SST data (MGET can download and convert it automatically, but we use it as an example because many people are familiar with it).

Finding the tool

The tool is in the Marine Geospatial Ecology Tools toolbox. If you do not see the toolboxes window, click the Show/Hide ArcToolbox Window button on the toolbar.

source:/WikiFiles/Examples/ConvertingHDFsToRasters/FindingTheTool.png

Examining the tool's parameters

The tool has six required parameters and a number of optional ones.

source:/WikiFiles/Examples/ConvertingHDFsToRasters/ToolParams1.png

In addition to specifying the input HDF file and output raster, you must also specify the name of the SDS to extract from the HDF, the coordinates of the lower-left corner, and the cell size. The raster dimensions in rows and columns and the data type are automatically obtained by the tool from the HDF metadata. You will often also specify a NODATA value, and other optional parameters that are not shown.

The parameter values can usually be found in the documentation for the data product you are converting. Many of them are often also present as "attributes" in the HDF metadata, but because there is no reliable naming convention for these attributes, the tool cannot automatically obtain them. You can often figure out the parameter values yourself by extracting the HDF header and examining the attributes. The MGET Extract HDF Header tool will write the header to a text file.

Extracting the HDF header and finding parameter values

Running the MGET Extract HDF Header tool is very simple. You simply specify the input HDF file and the output text file to create:

source:/WikiFiles/Examples/ConvertingHDFsToRasters/ExtractAVHRRPathfinder.png

The hard part is interpreting the header and finding the values to plug into the Convert SDS in HDF to ArcIGS Raster tool. Although all HDF headers adhere to a common data model of variables and attributes, each data provider defines their own variables and attributes. Here, I provide an example that will hopefully give you an idea what to look for.

Here is the extracted header:

Finding the SDS name in the HDF header

Open the HDF file in a text editor such as Notepad. Each SDS is represented as a "variable" in the HDF header. You can find the names of the SDSes in the header by searching for Variable Name = and then looking at what follows. In the Pathfinder SST header, your search will find three variables:

Variable Name = sst
	 Index = 0
	 Type= 16-bit unsigned integer
	 Ref. = 2
	 Rank = 2
	 Number of attributes = 12
	 Dim0: Name=lat
		 Size = 4096
		 Scale Type = 64-bit floating point
		 Number of attributes = 2
	 Dim1: Name=lon
		 Size = 8192
		 Scale Type = 64-bit floating point
		 Number of attributes = 2
	 Attr0: Name = dsp_PixelType
		 Type = 8-bit unsigned integer 
		 Count= 1
		 Value = 1 
	 Attr1: Name = dsp_PixelSize
		 Type = 8-bit unsigned integer 
		 Count= 1
		 Value = 2 
	 Attr2: Name = dsp_Flag
		 Type = 16-bit unsigned integer 
		 Count= 1
		 Value = 0 
	 Attr3: Name = dsp_nBits
		 Type = 16-bit unsigned integer 
		 Count= 1
		 Value = 16 
	 Attr4: Name = dsp_LineSize
		 Type = 32-bit signed integer 
		 Count= 1
		 Value = 0 
	 Attr5: Name = dsp_cal_name
		 Type = 8-bit signed char 
		 Count= 11
		 Value = Temperature
	 Attr6: Name = units
		 Type = 8-bit signed char 
		 Count= 4
		 Value = Temp
	 Attr7: Name = dsp_cal_eqnNumber
		 Type = 16-bit unsigned integer 
		 Count= 1
		 Value = 2 
	 Attr8: Name = dsp_cal_CoeffsLength
		 Type = 16-bit unsigned integer 
		 Count= 1
		 Value = 8 
	 Attr9: Name = dsp_cal_coeffs
		 Type = 32-bit floating point 
		 Count= 2
		 Value = 0.075000 -3.000000 
	 Attr10: Name = scale_factor
		 Type = 32-bit floating point 
		 Count= 1
		 Value = 0.075000 
	 Attr11: Name = add_off
		 Type = 32-bit floating point 
		 Count= 1
		 Value = -3.000000 

Dimension Variable Name = lat
	 Index = 1
	 Scale Type= 64-bit floating point
	 Ref. = 113
	 Rank = 1
	 Number of attributes = 2
	 Dim0: Name=lat
		 Size = 4096
	 Attr0: Name = unit
		 Type = 8-bit signed char 
		 Count= 14
		 Value = degrees_north\000
	 Attr1: Name = long_name
		 Type = 8-bit signed char 
		 Count= 10
		 Value = latitude\000\000

Dimension Variable Name = lon
	 Index = 2
	 Scale Type= 64-bit floating point
	 Ref. = 115
	 Rank = 1
	 Number of attributes = 2
	 Dim0: Name=lon
		 Size = 8192
	 Attr0: Name = unit
		 Type = 8-bit signed char 
		 Count= 13
		 Value = degrees_east\000
	 Attr1: Name = long_name
		 Type = 8-bit signed char 
		 Count= 11
		 Value = longitude\000\000

In this case, it is obvious that you want the variable named sst, simply because if its name. But the other two variables deserve some explanation in case you encounter them in other HDF files. They are "dimension" variables. These list the coordinates of other variables. In this file you can see two dimension variables, named lat and lon. In general you can ignore the dimension variables, although you can sometimes find out interesting information by examining their names or attributes. If you try to convert them, the tool will report an error because they are 1-dimensional and the tool can only convert 2-dimensional variables.

Finding the coordinates of the lower-left corner in the HDF header

This is more difficult than finding the SDS name. You must carefully scroll through the entire HDF header and look for something that seems like the right information. After lengthy examination, you may discover the following under File attributes:

	 Attr69: Name = dsp_nav_ximgcv
		 Type = 32-bit floating point 
		 Count= 8
		 Value = 89.980003 0.000000 -89.980003 0.000000 -179.981979 
                         0.000000 179.981979 0.000000 

These look like coordinates of top, bottom, left, and right extents. If you search the Internet for dsp_nav_ximgcv you will get several hits, but when I searched, my results did not lead to a definition of that attribute. For Pathfinder SST, you must resort to the  4 km Pathfinder Version 5.0 User Guide. On that page, you will eventually find this:

Corner Coordinates:
Upper Left    (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left    (180d 0' 0.00"W, 90d 0' 0.00"S)
Upper Right   (180d 0' 0.00"E, 90d 0' 0.00"N)
Lower Right   (180d 0' 0.00"E, 90d 0' 0.00"S)
Center        (  0d 0' 0.00"E,  0d 0' 0.00"N)

From this experience, you should have learned that there is no 100% reliable means to finding out the coordinates. Your best bet is to carefully review both the HDF header and the documentation for the product you are working with.

Determining the cell size from the HDF header

In the previous step you found the coordinates of the lower-left corner, i.e. the left and bottom extent of the raster. The easiest way to determine the cell size is to find the top or right extent, calculate the height or width of the raster, and divide by the number of rows or columns:

Cell size = (top - bottom) / rows = (right - left) / columns

The number of rows and columns appear as the Size parameter of the Dim0 and Dim1 metadata for the HDF variable. For the Pathfinder SST file, there are 4096 rows and 8192 columns:

Variable Name = sst
	 Index = 0
	 Type= 16-bit unsigned integer
	 Ref. = 2
	 Rank = 2
	 Number of attributes = 12
	 Dim0: Name=lat
		 Size = 4096
		 Scale Type = 64-bit floating point
		 Number of attributes = 2
	 Dim1: Name=lon
		 Size = 8192
		 Scale Type = 64-bit floating point
		 Number of attributes = 2
	 ...

So the cell size is:

Pathfinder cell size = (top - bottom) / rows = 180 / 4096 = 0.0439453125

Occasionally you can also find the cell size listed as an attribute. For example, in HDF headers for MODIS chlorophyll data, there are two File attributes:

	 Attr42: Name = Latitude Step
		 Type = 32-bit floating point 
		 Count= 1
		 Value = 0.041667 
	 Attr43: Name = Longitude Step
		 Type = 32-bit floating point 
		 Count= 1
		 Value = 0.041667 

Notice that these round the cell size to six decimal places. We recommend you use as many decimal places as ArcGIS will allow you to enter. If you truncate at only six, you may notice that the output raster does not span exactly 360 degrees of longitude and 180 degrees of latitude; it may span slightly more or less, depending on how the rounding was done. The error may be insignificant for most purposes, but it is always best to minimize error whenever possible.

It is a common practice for data providers to use Dim0 for the rows and Dim1 for the columns, but occasionally they use Dim1 for the rows and Dim0 for the columns. When this happens, the converted raster will appear as though it was flipped about the diagonal axis. You can correct this problem using Transpose and Flip under Pre-conversion processing.

Converting the SDS

Now that you've populated all of the required parameters, you can convert the image and see what they look like in ArcMap width a blue-to-red gradient applied:

source:/WikiFiles/Examples/ConvertingHDFsToRasters/PathfinderConverted.png

Using the optional parameters

When you loaded the converted raster into ArcMap, you might have noticed several problems:

  • The land came out as blue, rather than NODATA.
  • There is no coordinate system defined for the raster.
  • The raster values are strange integers rather than the floating point values you expected (i.e. degrees C).
  • There were no pyramids built for the rasters.

You can correct these problems with the optional parameters.

Setting land to NODATA

To set land to NODATA during the conversion process, you must specify the value to use for NODATA. You can determine this value by loading the rasters into ArcMap and using the Identify tool to obtain the value of pixels you know are land.

The Pathfinder SST data uses the value 0 for land:

source:/WikiFiles/Examples/ConvertingHDFsToRasters/PathfinderIdentify.png

After reconverting the files using these NODATA values, you'll notice that cloud pixels may have been marked as NODATA. Some data products, such as NASA OceanColor's MODIS chlorophyll product, mark clouds using the same value as land. Other data products, such as NOAA NODC's AVHRR Pathfinder SST provide separate HDF files or variables that indicate the quality of each pixel, or whether it failed or passed automated cloud tests. In the January 1990 image in this example, clouds are visible off the coast of North America:

source:/WikiFiles/Examples/ConvertingHDFsToRasters/PathfinderClouds.png

One way to mark these as NODATA is:

  1. Obtain the file  199001.m04m1pfv50-qual.hdf, which contains a 0 to 7 "quality flag" for each pixel.

  1. Convert the qual SDS from that file to an ArcGIS raster.

  1. Use the ArcGIS Spatial Analyst Set Null tool to set pixels of the SST raster to NODATA where the corresponding quality flag raster value is less than 7.

The procedure for masking clouds or other defects is different for every data product. Some products, such as the AVHRR Pathfinder SST, even have multiple ways to mask clouds in the same product. Usually, you have to read the product documentation to figure out how to do it.

Defining the projection

You can instruct the tool to define the raster's projection by providing a value for the Define coordinate system parameter under Post-conversion processing:

source:/WikiFiles/Examples/ConvertingHDFsToRasters/ToolParams2.png

You usually have to look up the coordinate system for the product you're converting in the product's documentation. Some global products that use a geographic coordinate system (i.e. longitudes and latitudes), including the product presented in this example, do not identify a coordinate system in their documentation, presumably because the data provider believes it doesn't matter which datum you use because the resolution is so coarse. For these products, we recommend you use the "WGS 1984" coordinate system (i.e. the geographic coordinate system that uses the WGS 1984 datum).

Converting integer values to the expected floating-point values

Many data products use integer data types to save space. In the example presented here, the values are stored as 16-bit integers. This allows the HDF file to be 2x or 4x smaller than it would be if the values were stored as the typical 32-bit or 64-bit floats. To obtain the floating-point values, you must run the integers through a scaling equation. The scaling equation is often present in the HDF header in some form, usually as attributes of the variable you're converting. If you can't find it there, check the product's documentation.

In the Pathfinder SST header, the sst variable has some promising attributes:

	 Attr7: Name = dsp_cal_eqnNumber
		 Type = 16-bit unsigned integer 
		 Count= 1
		 Value = 2 
	 Attr8: Name = dsp_cal_CoeffsLength
		 Type = 16-bit unsigned integer 
		 Count= 1
		 Value = 8 
	 Attr9: Name = dsp_cal_coeffs
		 Type = 32-bit floating point 
		 Count= 2
		 Value = 0.075000 -3.000000 
	 Attr10: Name = scale_factor
		 Type = 32-bit floating point 
		 Count= 1
		 Value = 0.075000 
	 Attr11: Name = add_off
		 Type = 32-bit floating point 
		 Count= 1
		 Value = -3.000000 

After accruing some experience with different data products, I can guess that the scale factor and add_off parameters are what we need. Compare the values of those to this text from the  4 km Pathfinder Version 5.0 User Guide:

Each of the seven parameter files listed above contains a mapped array with 8192 elements in longitude and 4096 in latitude, plus a vector of length 8192 identifying the longitudes and a vector with 4096 values indicating the latitudes. There are also global tags describing the entire contents as well as tags describing each of the 2 vectors and 1 array. The seven parameters are stored either as 8-bit or 16-bit unsigned integers which may be converted linearly (y = mx + b) to geophysical units using a scale (i.e., slope=m) and offset (i.e., intercept=b) according to the following table:

          PARAMETER        # BITS   SCALE  OFFSET  UNITS 
------------------------------------------------------------
  "All-pixel" SST          16 bit   0.075   -3.0   Deg C
  First-guess SST          16 bit   0.075   -3.0   Deg C
  Standard Deviation       16 bit   0.150    0.0   Deg C
  Number of Observations    8 bit   1.000    0.0   Unitless
  Overall Quality Flag      8 bit   1.000    0.0*  Unitless
  Mask 1                    8 bit   1.000    0.0   Unitless
  Mask 2                    8 bit   1.000    0.0   Unitless

We are dealing with SST data, so the scaling equation is:

source:/WikiFiles/Examples/ConvertingHDFsToRasters/PathfinderEquation.png

The corresponding Map algebra expression is:

source:/WikiFiles/Examples/ConvertingHDFsToRasters/PathfinderMapAlgebra.png

Building pyramids

To instruct the tool to build pyramids for the ArcGIS raster, simply enable the Build pyramids option.

The final raster

After you populate the optional parameters, run the tool again to regenerate the raster. It should now have NODATA for land, have a coordinate system defined, use floating point values, and have pyramids already built. You can compare it to these rasters that I created using the parameters specified above.