Downloading species observations from the Ocean Biogeographic Information System (OBIS) with the DiGIR protocol
THIS EXAMPLE IS UNDER CONSTRUCTION
This example shows how invoke MGET tools from ArcGIS to download species observation records from OBIS and save them as a point feature class. This example assumes you have basic familiarity with ArcGIS.
What are OBIS and DiGIR?
The Ocean Biogeographic Information System (OBIS) is a strategic alliance of people and organizations sharing a vision to make marine biogeographic data, from all over the world, freely available over the World Wide Web. OBIS maintains a database of species observations contributed by member organizations and individuals. At the time of this writing, the database contained over 16 million records. Most records include a latitude, longitude, and date, making them suitable for geospatial analysis.
Records may extracted from the OBIS database with the Distributed Generic Information Retrieval (DiGIR) protocol. This protocol is in widespread use throughout the bioinformatics community. OBIS uses it to collect records from contributing organizations such as OBIS-SEAMAP, a database of marine mammal, seabird, and sea turtle records maintained here at Duke University. The Global Biodiversity Information Facility (GBIF) uses it to collect records from OBIS into a larger biogeographic database for all taxa. At the time of this writing, there were dozens of servers that implemented the DiGIR protocol. The Big Dig website maintained a list of DiGIR servers. Big Dig stopped operating in February 2008 but many of the servers listed there are still operational and can be queried with DiGIR.
Although DiGIR is widely used, it is no longer under development. A new protocol, TAPIR, may eventually replace it. And although GBIF uses DiGIR to collect data from its many contributors, it exposes data through several GBIF-specific web services. As far as we know, you cannot query GBIF with DiGIR.
Like many "web service" protocols, DiGIR is an XML-based stateless request/response protocol. For technical details, please see the documents and XML schemas on the DiGIR home page.
Finding MGET's DiGIR tools in ArcGIS
You must have MGET 0.7a10 or a later release to use the tools. They appear in the Marine Geospatial Ecology Tools node in the ArcToolbox window. If you do not see the ArcToolbox window, click the Show/Hide ArcToolbox Window button on the toolbar.
DiGIR server URLs
Before you can use the MGET tools, you must obtain the URL for the server you wish to query. The best way to obtain the URL is to contact the server operator. The Big Dig website contains a list of several hundred URLs for servers that were active as of February 2008. (That website uses the formal DiGIR term provider rather than server. The terms are synonymous.) At the time of this writing, the URL for the OBIS server was http://iobis.marine.rutgers.edu/digir2/DiGIR.php. The URL for the OBIS-SEAMAP server maintained by Duke was http://seamap.env.duke.edu/digir/DiGIR.php.
Before you attempt to use the MGET tools, it is a good idea to enter the server URL into a web browser to make sure the server is responding. Some servers may take several minutes to respond if they have not been accessed by anyone for a while. Eventually you should receive some XML or be prompted to save a document that contains XML. For example, the OBIS server returns a response that looks like this in Internet Explorer:
If you get a response similar to this, it is likely that the server is functional and the MGET tools will work. If you don't, something is wrong and the MGET tools will probably fail. You should follow up with the server operator to resolve any problems.
Discovering the resources available from a DiGIR server
Each DiGIR server hosts a set of resouces. A resource is a collection of related species observation records. Once you obtain a URL to a server, it is possible to immediately start downloading these records, but it is usually worthwhile to first understand what resources are available from the server and review some metadata about them. MGET's Get DiGIR Resources as Table tool assists with this process.
The input parameters to the tool are the URL to the server and the paths and names of five tables of metadata:
Important: We suggest you store the output tables from this tool in a geodatabase rather than DBF files. Text fields in DBF files are limited to 254 characters, and several of the fields of this table will usually exceed this limitation.
The most important output is the resources table:
The important fields of this table are:
- Code contains the server's abbreviation for the resource. When you use the Search DiGIR Records and Create Points tool, you can provide a list of codes to restrict the search to specific resources. Also, the points output by that tool have an attribute called ResourceCode, allowing you to determine which resource provided each point.
- NumRecords shows the number of species occurrence records available from the resource and LastUpdate shows when they were last updated.
- Name, Abstract, and Keywords describe the resource. This information is provided by the organization or individual that contributed the resource.
- Citation gives instructions for citing the resource and UseRestr specifies any use restrictions imposed by the contributor. You are responsible for properly citing the resources you use and for observing these restrictions.
The remaining four tables are less important. The related information table lists the home page for each resource. The contacts table lists names, telephone numbers, and email addresses of the people who contributed the resource. The conceptual schemas table lists the formal names of the technical documents that specify the fields that appear in the species occurrence records available from the resource. The data elements table lists those fields and some metadata about them.
You should review the data elements table:
Each row of this table corresponds to a field of the records available from the server's resources. By default, the Search DiGIR Records and Create Points will attach all of these as attributes to the points it creates. In DiGIR terminology, these fields are formally called data elements, and their characteristics are defined in conceptual schemas. An important part of DiGIR's design is that it does not require the resources on a server to use the same conceptual schemas. An advantage of this design is that it allows data contributors to disagree about what fields should be in their biogeographic databases; they can each define their own fields, if desired. It also allows the databases to evolve over time without requiring DiGIR itself to be changed. A disadvantage is that it pushes complexity onto you, the consumer of the data: if different resources use different conceptual schemas, it is up to you to understand the implications and adjust your analysis accordingly.
In practice, a relatively small number of conceptual schemas are in widespread use. The most common is called Darwin 2, defined in the document http://digir.net/schema/conceptual/darwin/2003/1.0/darwin2.xsd. OBIS defines a conceptual schema that extends Darwin 2 by adding 27 additional fields, defined in http://www.iobis.org/obis/obis.xsd. The Big Dig website lists the conceptual schemas that were in use in February 2008.
The important fields of the data elements table are:
- Name specifies the name of the data element. It will be created as an attribute of the points using this name. The points may contain a slightly different name if the data element name is not a legal ArcGIS attribute name. For example, the name Order may be changed to Order_ because Order is a reserved word in database systems. If the points are a shapefile, all of the names will be truncated to 10 characters, the maximum allowed attribute name in a shapefile.
- ArcGISType specifies the ArcGIS data type that will be used when the attribute is created for this data element.
- Searchable indicates whether you can use the data element name in the Filter expression parameter of the Search DiGIR Records and Create Points tool. A value of 1 indicates that you can use it.
- Returnable indicates whether the data element can be returned by the server. A value of 1 indicates it is returnable. Data elements that are not returnable are intended only for searching. I only know of one such element: the BoundingBox element of the Darwin 2 schema. I was never able to locate the documentation for this element, so I never use it. Instead, I always use the Latitude and Longitude elements.
- Descr contains the description of the data element, taken from the conceptual schema that defines it.
Downloading points from a DiGIR server
After you have gained an understanding of the resources available on the DiGIR server, you are ready to download some points using the Search DiGIR Records and Create Points tool. This tool only has two required parameters, the DiGIR server URL and the name of the output point feature class to create:
If you just specify these two parameters, the tool will try to download all of the georeferenced records offered by the server. You can monitor the tool's processing in the ArcGIS progress window:
In this example, I attempted to download all of the records from the OBIS server. There are a few things you should note about this example:
- Before the tool can download the records, it has to obtain the count of records that match your request. To do this, it must query the server once for each resource available on the server. In this example, there were 182 resources. Depending on the speed of the server, this can take quite some time. In this example, it took 24 minutes. You can speed this up by restricting your request to specific resources, as described below.
- In this example, the OBIS server reported that 7876681 records are available. This is significantly less than the 16.7 million records that the OBIS home page reports are available. Because the tool creates points, it only downloads records that include geographic coordinates. At the time I placed this request, the OBIS server was undergoing some kind of maintenance and a substantial number of records were missing coordinates.
TO BE CONTINUED...