Building an archive of popular oceanographic products in ArcGIS raster format
At Duke, we graduate approximately 100 Masters students per year, most of whom take at least one GIS class. We teach ArcGIS in our GIS classes, because it is the program that our students are most likely to encounter after graduation. As discussed in this example, ArcGIS has trouble reading data in HDF, NetCDF, and custom binary formats. Many oceanographic products are published in these formats. To relieve our students of the burden of downloading and converting frequently-used oceanographic products, we maintain an archive of these products in ArcGIS format (specifically, in ArcInfo binary grid format).
To maintain the archive, we wrote a Python script that uses MGET and Python's base libraries to scan the archive and the data providers' servers and update the archive with any data not presently in it. The script runs every night as a Windows Scheduled Task.
Products downloaded by the script
The script currently downloads the following products:
- MODIS Aqua, CZCS, OCTS, and GSFC Level 3 Mapped chlorophyll-a from the NASA GSFC OceanColor Group
- Various geostrophic currents datasets from Aviso that are accessible by OPeNDAP (see here and here; "merged" products only)
- Various sea surface height datasets from Aviso that are accessible by OPeNDAP (see here and here; "merged" products only)
- NOAA GOES Level 3 SST from NASA PO.DAAC
- Significant wave height from Aviso ("merged" only)
- Ocean wind speed modulus from Aviso ("merged" only)
- QuikSCAT L3 daily gridded ocean wind vectors from NASA PO.DAAC
We are adding more products to the script. In particular: NOAA NODC 4km AVHRR Pathfinder SST; MODIS Aqua SST, MODIS Terra SST, and the Merged Chlorophyll dataset from the NASA GSFC OceanColor Group.
Getting the script
You can download the script from source:/OceanoArchive. There are three files:
- source:/OceanoArchive/UpdateOceano.py - the script itself
- source:/OceanoArchive/UpdateOceano.cmd - a batch file wrapper that we use to run UpdateOceano.py (more on this below)
- source:/OceanoArchive/Logging.ini - the logging configuration file that we use when running the script (more on this below)
Requirements
- Windows XP SP2 or later
- ArcGIS 9.3 or later
- We use 9.3.1; 9.2 will probably work, but we haven't tested it
- Python 2.5
- We use 2.5.4; 2.4 may work; 2.6 will probably not work
- PyWin32 build 212 or later
- The latest available version of MGET
- The script may work with older versions, but it will always work with the latest version
Script parameters
You can view the script parameters by running UpdateOceano.py with no parameters:
USAGE: UpdateOceano rootdir tempdir emailaddr [/S startdate] [/E enddate]
[/L logini] [/T maxtime] [/P product [...]]
rootdir - Full path to oceano archive root directory. This must start with
a drive letter, e.g. C:. If the archive is stored on a remote
computer, access it through a mapped drive.
tempdir - Full path to directory to hold temporary files. If the archive
is extremely out of date, this directory should have 10 to 200
GB of free space. For optimal performance, it should be a hard
drive installed in the local computer.
emailaddr - Email address to be provided as the password when logging into
FTP servers with the anonymous user name. Many FTP servers
require you to provide an email address when logging in
anonymously, so this parameter is not optional.
startdate - Starting date for the data to download, in YYYY-MM-DD format.
If not specified, the script will scan the archive and start the
download at the date that the product ends in the archive. If
the product has never been downloaded before, the script will
start the download at the beginning of the product.
enddate - Ending date for the data to download, in YYYY-MM-DD format. If
not specified, downloading will end with the last date of the
product.
logini - Full path to Logging.ini file to configure logging. If not
specified, the normal MGET Logging.ini file will be used
(from the user's %APPDATA% directory, if it exists, or from the
GeoEco installation directory, if it does not).
maxtime - Maximum time, in minutes, that the script should run. If the
script exceeds this time, it will exit with error code 2. Use
this option to work around ArcGIS memory leaks. Call the script
from a loop in a batch file that keeps calling the script so
long as %ERRORLEVEL% is 2 when the script exits.
product - List of one or more products to update, separated by spaces. If
none are specified, all will be updated. Product names are case
sensitive.
The normal way to use this script is to omit the /S, /E, and /P parameters. The
script will then download all of the products, starting where the download ended
last time, and ending with the most recently-available data.
The products that may be specified for the /P parameter are:
Chl:GSFC:Aqua, Chl:GSFC:CZCS, Chl:GSFC:OCTS, Chl:GSFC:SeaWiFS,
GC:AVISO, SSH:AVISO, SST:PODAAC:GOES, Wave:AVISO, Winds:AVISO,
Winds:PODAAC:QuikSCAT
Running the script manually
To be written
Running the script automatically as a Windows Scheduled Task
To be written
ArcGIS rasters output by the script
The script constructs a tree under the directory that you provide as the first parameter to the script. The organization of this tree, the names of the rasters, and so on are not currently documented. You can figure this out by just looking at the output and examining the script source code. If you have questions, feel free to email jason.roberts@duke.edu.
