Building an archive of popular oceanographic products in ArcGIS raster format

At Duke, we graduate approximately 100 Masters students per year, most of whom take at least one GIS class. We teach ArcGIS in our GIS classes, because it is the program that our students are most likely to encounter after graduation. As discussed in this example, ArcGIS has trouble reading data in HDF, NetCDF, and custom binary formats. Many oceanographic products are published in these formats. To relieve our students of the burden of downloading and converting frequently-used oceanographic products, we maintain an archive of these products in ArcGIS format (specifically, in ArcInfo binary grid format).

To maintain the archive, we wrote a Python script that uses MGET and Python's base libraries to scan the archive and the data providers' servers and update the archive with any data not presently in it. The script runs every night as a Windows Scheduled Task.

Products downloaded by the script

The script currently downloads the following products:

We are adding more products to the script. In particular: NOAA NODC 4km AVHRR Pathfinder SST; MODIS Aqua SST, MODIS Terra SST, and the Merged Chlorophyll dataset from the NASA GSFC OceanColor Group.

Getting the script

You can download the script from source:/OceanoArchive. There are three files:

Requirements

  • Windows XP SP2 or later
  • ArcGIS 9.3 or later
    • We use 9.3.1; 9.2 will probably work, but we haven't tested it
  • Python 2.5
    • We use 2.5.4; 2.4 may work; 2.6 will probably not work
  • PyWin32 build 212 or later
  • The latest available version of MGET
    • The script may work with older versions, but it will always work with the latest version

Script parameters

You can view the script parameters by running UpdateOceano.py with no parameters:

USAGE: UpdateOceano rootdir tempdir emailaddr [/S startdate] [/E enddate]
                    [/L logini] [/T maxtime] [/P product [...]]

    rootdir   - Full path to oceano archive root directory. This must start with
                a drive letter, e.g. C:. If the archive is stored on a remote
                computer, access it through a mapped drive.

    tempdir   - Full path to directory to hold temporary files. If the archive
                is extremely out of date, this directory should have 10 to 200
                GB of free space. For optimal performance, it should be a hard
                drive installed in the local computer.

    emailaddr - Email address to be provided as the password when logging into
                FTP servers with the anonymous user name. Many FTP servers
                require you to provide an email address when logging in
                anonymously, so this parameter is not optional.

    startdate - Starting date for the data to download, in YYYY-MM-DD format.
                If not specified, the script will scan the archive and start the
                download at the date that the product ends in the archive. If
                the product has never been downloaded before, the script will
                start the download at the beginning of the product.

    enddate   - Ending date for the data to download, in YYYY-MM-DD format. If
                not specified, downloading will end with the last date of the
                product.

    logini    - Full path to Logging.ini file to configure logging. If not
                specified, the normal MGET Logging.ini file will be used
                (from the user's %APPDATA% directory, if it exists, or from the
                GeoEco installation directory, if it does not).

    maxtime   - Maximum time, in minutes, that the script should run. If the
                script exceeds this time, it will exit with error code 2. Use
                this option to work around ArcGIS memory leaks. Call the script
                from a loop in a batch file that keeps calling the script so
                long as %ERRORLEVEL% is 2 when the script exits.

    product   - List of one or more products to update, separated by spaces. If
                none are specified, all will be updated. Product names are case
                sensitive.

The normal way to use this script is to omit the /S, /E, and /P parameters. The
script will then download all of the products, starting where the download ended
last time, and ending with the most recently-available data.

The products that may be specified for the /P parameter are:

Chl:GSFC:Aqua, Chl:GSFC:CZCS, Chl:GSFC:OCTS, Chl:GSFC:SeaWiFS,
GC:AVISO, SSH:AVISO, SST:PODAAC:GOES, Wave:AVISO, Winds:AVISO,
Winds:PODAAC:QuikSCAT

Running the script manually

To be written

Running the script automatically as a Windows Scheduled Task

To be written

ArcGIS rasters output by the script

The script constructs a tree under the directory that you provide as the first parameter to the script. The organization of this tree, the names of the rasters, and so on are not currently documented. You can figure this out by just looking at the output and examining the script source code. If you have questions, feel free to email jason.roberts@duke.edu.