Developing New Tools

This document will provide a step-by-step guide to developing new tools for Marine Geospatial Ecology Tools.

Topics that need to be addressed (as I think of them, in no particular order):

  • Asserts
    • Use them to check conditions that represent programming errors inside GeoEco. These are intended to be discovered during development and testing of GeoEco, not by end users. Assertion error messages need not be localized; do not enclose them in _()
    • If you need to report an error that is not a GeoEco programming mistake, raise an exception and enclose the message text in _(). Use Unicode messages, like this: raise TypeError_(u'someParameter must be an int, or None.')

Supporting International Users

GeoEco was designed with international users in mind. Although the initial versions of GeoEco will only be localized to English, all of the code and documentation is written to allow localization to other languages. If the user community requests releases in other languages and resources are available, we can produce releases in those languages.

Inside GeoEco, all string processing is done in Unicode. For a brief review of Unicode in Python, see Unicode HOWTO.

Developer Checklist

  • DO NOT declare an encoding at the top of your .py files. By not declaring an encoding, Python interpreters and editors are supposed to assume the encoding is ASCII. We want to use ASCII to maintain platform neutrality in our Python code.
    # -*- coding: latin-1 -*-       # DON'T DO THIS IN YOUR .py FILES!
    

Note: If you can figure out how to explicitly specify an ASCII encoding, you may include it in your .py files. I could not figure out how to include one, at least in a way that the IDLE editor would understand.

  • DO NOT use non-ASCII characters in your .py files. The IDLE editor will warn you when you do. Pythonwin users be careful: Pythonwin will not warn you!
  • Import the _ function from GeoEco.Internationalization before defining any string literals.
    from GeoEco.Internationalization import _
    
  • Write ALL string literals in Unicode unless you have a specific reason to use 8-bit strings. To support international users, many strings must be in Unicode, such as file system paths or strings displayed to the user. Writing all string literals as Unicode helps avoid problems that arise when Unicode strings are mixed with 8-bit strings.
    s = u'Some string'
    
  • Enclose all string literals that will be viewed by the user in a call to the _ function. This function is the hook into Python's translation system, which is based on the GNU gettext utility. (For more information, look up the gettext module in Python's documentation.) If you have any doubt about whether the user will view a given string, assume that it will be and enclose it in the _ function. If you are wrong, the only cost is that we will needlessly localize the string to other languages.
    self.LogInfo(_(u'Finished processing %i input files.') % len(inputFiles))
    
  • DO NOT enclose the error messages from assert statements in the _ function. These statements should only fail during development and pre-release testing, never for the end user, and therefore do not require localization.
  • When calling any function or manipulating any external data (e.g. writing a file), always be aware of any string encoding issues.
    • ArcGIS has mixed support for Unicode. See Technical Article 27345 and Known Issues below.
    • String parameters passed using Microsoft COM are always Unicode. This means, by default, any component invoked through instances of the win32com.client.Dispatch class, such as the ArcGIS geoprocessor or Matlab's ActiveX interface, will accept and return Unicode strings. You can override this behavior by passing UnicodeToString=False to the Dispatch class. This causes Dispatch to automatically convert everything to 8-bit ASCII strings. But beware: unpredictable results will occur if a caller tries to invoke you with a Unicode string containing characters with ordinal values above 127.
    • XML is typically encoded in UTF-8. But XML allows alternate encodings, which may be specified in the XML header. In any case, your Python code should use one of the standard XML modules, such as xml.dom or xml.sax, to manipulate XML.
    • Text files are encoded in various encodings, depending on the operating system. On English versions of the Windows operating system, they are usually 8-bit strings using the Latin-1 encoding. But even the notepad.exe program can now save files in alternative encodings, such as UTF-8 or Unicode-LE. If your tool processes text files, consider allowing the caller to specify an encoding or at least document the encodings you support. See Unicode HOWTO for an example of using the codecs.open function to read and write text files in specific encodings.
    • Only pass 8-bit ASCII strings to the Python exec statement. exec does seem to accept Unicode but I'm not sure its a good idea, since Python says the interpreter is supposed to assume ASCII encoding if no encoding is explicitly specified.

Known Issues

  • ArcGIS users may only provide ASCII strings for GeoEco tool parameters (by definition, ASCII characters are those with ordinal values 0-127, inclusive). For example, when invoking from ArcGIS a GeoEco tool that accepts a path to a shapefile, that path cannot contain any non-ASCII characters. As far as I can determine, this restriction is due to two limitations, one with ArcGIS and the other with Python:
    • All GeoEco tools are registered with ArcGIS as script tools. ArcGIS apparently can only pass ASCII strings to script tools, even though it allows the user to input non-ASCII characters. The non-ASCII characters are converted to ? characters before they are passed to the script.
    • Python only allows a program to access its command line arguments as an array of 8-bit strings, not as an array of Unicode strings. Because ArcGIS provides input parameter values to the script as command line arguments, the script is forced to accept them as 8-bit strings.

Frequently Asked Questions

My tool's parameter needs to be an ArcGIS data type that is not currently supported. How do I add support for this data type?