Friday, June 21, 2024

Working with Geometries in Python

Module 6 introduces both working with geometries with Python and external text files or tabular data. Using geometry objects in memory can be used for input and output in geoprocessing. This also circumvents the need to make temporary feature classes. Utilizing the open() function, fileinput module, and write module, text from .txt, .csv and other files can be imported into Python or exported to a file outside of ArcGIS Pro. The readlines() and writelines() methods also work with external files, but on a line by line basis, sometimes with iteration through a loop. These aspects further add to the utility of automating processes with Python.

The Lab assignment for Module 6 is to write a Python script that creates a new text file outside of ArcGIS, and to populate it with geometry data from a feature class of polylines for rivers in Maui, Hawaii.

Maui, Hawaii Rivers feature class
The Maui, Hawaii rivers feature class for Module 6

Following is a graphic I made in an attempt to better explain how to work with geometries in Python. The getPart() method, which returns an array of point objects for each geometry part, is the crux of the assignment. 

Diagram outling Feature Class Architecture and the getPart Method


The SearchCursor function is used to read feature class geometries. Using the SearchCursor function requires several iterations of a feature class to return the points or vertices that make up the geometry of a feature. All polylines and polygons in a feature class are comprised of point objects.

Every feature class is comprised of individual features, things we can see and interact with in ArcGIS Pro. Each feature has a record, which is tabulated in an attribute table. The rows of an attribute table contain the records of every feature found within the feature class. The columns contain the attributes based upon a field.

When using a SearchCursor function to work with geometries, the initial loop iterates through the records and returns a tuple with two objects for each feature. The first variable is the row index number and the second is the polyline object. A polyline object is a shape defined by one or more paths.

A second iteration within the first iteration is required to ultimately return point objects from each polyline object. This is what is known as a nested loop. The for loop that is the nested loop iterates through each polyline object and returns an array object. Part of the ArcPy package, arrays are groupings of points that comprise the geometry of a feature. An array in other programming languages  is a list.

A second nested loop iterates through the array object to extract the point object, which contains the coordinate information of the vertices that make up each feature. The getPart() method however allows a script to bypass this third iteration by coupling the operation with the previous for loop that returns the array object of a feature. That is because the polyline object has this method, which receives an input parameter and index, and returns an array of point objects.

The getPart() method iterates through the tuple output by the SearchCursor function. The index inputs the first element of the tuple, which for the SearchCursor is based upon the field names argument. Part in the function name references the return of an array object of point objects for a particular part of the geometry, which for Module 6 is index 0, the FID number. The method iterates off the polyline object returned from the tuple.

The flowchart for the Module 6 script outlines the SearchCursor function and one nested loop used to obtain point objects for each part of the array object of a feature:

Nested Loops Program Flowchart

This becomes a little more complicated with multipart features, where a feature is comprised of multiple arrays of point objects making up one set of attributes. Scenarios of this are where a feature contains polygons that are not physically connected, or where empty (interior) polygons fall within a larger (exterior) polygon. Think of St. Martin Parish, Louisiana, which consists of two areas separated by Iberia Parish in between, or Shively, Kentucky, a city encircled on all sides by Louisville.

With multipart features, an iteration over all distinct parts comprising the overall feature is required to return its geometry. In summary, the polyline object returned from the SearchCursor contains array objects for each part. Within those array objects is another array which contains the point objects.

The output of the script compiled for Module 6 produced a formatted text file for each river feature in the Maui rivers feature class:

Module 6 text file screenshot of output data

Data output from point objects in the writelines() method produced the ID number and the X-coordinate and Y-coordinate for each vertex of all records in the rivers feature class. This in turn was written to the new text file created at the beginning of the script with the open() function.

The vertex number for the vertices were assigned as part of the nested loop on the array object, where a variable vertexnum increased by one for each pass. The getPart() method also returned the name object of each vertex, which was derived from the SearchCursor iteration.

‘Ohe’o Gulch in Maui, Hawaii

‘Ohe’o Gulch, downriver from Palikea Stream, which is one of the features from the Module 6 dataset

My wife and I visited Maui in January 2011 and drove the iconic Hana Highway to Haleakalā National Park. The above photo is one of ‘Ohe’o Gulch, which is connects Palikea Stream to the Pacific Ocean at the southeast end of the island.

Wednesday, June 12, 2024

Automating Geoprocessing with Python

Moving into Module 5, our assignment this week consists of writing a Python script to automate geoprocessing tasks. We were provided a dataset with several Alaska and New Mexico feature classes and the task of copying them into a newly created file geodatabase (fGDB). Working with lists and dictionaries, our script also implements several functions including ListFeatureClasses, Describe, and SearchCursor. Our end task was to output a dictionary (a Python list with pairs of keys:values) of New Mexico county seats.

As an aside, I always like working with data from places I have visited. Including an Albuquerque airport stop in 2007, I've been to New Mexico four times. I also updated the Universal Map Group Wall Map for Albuquerque as one of my projects in 2008.

Tijeras, New Mexico in 2017
I-40 west ahead of Tijeras from my trip to New Mexico in 2017.

The ListFeatureClasses() function returns a list of feature classes in the current workspace. This list can be stored in a variable to be used with subsequent functions. Part of the ArcPy package, the Describe() function returns a Describe object which includes properties of a feature class such as data type, geometry type, basename, etc.

Using the Describe function on the variable with the feature class list allows a property, such as the basename, to be used as part of the CopyFeatures() function in the Data Management toolbox. This function copies input feature class features into a new feature class. With a for loop, we used this to populate our newly created file geodatabase (fGDB) with a concatenation of a variable for the output environment, the name of our created fGDB and the basename property of each feature class.

The Flow Chart for this week's Lab assignment
The program flowchart for this week's Lab assignment

While the term cursor to me references the blinking vertical line in this word processor, it has a separate meaning in computer programming. Cursor is a database technology term for accessing a set of records in a table. Records in a table are referred to as rows. Iterating over the row of a table, the three cursors in Python are as follows:

  • Search Cursor - this function retrieves specific rows on the basis of attribute values. This is similar to performing a SQL query.
  • Insert Cursor - this function adds new roads to a table, which can in turn be populated with new attribute values.
  • Update Cursor - this function modifies the attribute data of existing rows or deletes rows from the table.
Each type of cursor is created by corresponding class of the arcpy.da.module. The cursor object accessed row objects, returning a tuple of field values in an order specified by the field names argument of the function. The cursor iterates over all table rows but using only specific fields as needed.

Cursors support with statements, which have the advantage of executing regardless of whether the cursor finished running successfully and completing without a data lock. A data lock ensues otherwise is a cursor is not deleted with a del statement. The data lock prevents multiple processes from changing the same table simultaneously.

With statements also incorporate SQL queries as the where_clause optional parameter in the SearchCursor syntax. SQL queries find records in a database table based upon specific criteria selecting data WHERE specific conditions occur. I often use SQL queries when updating the database for the main AARoads pages and also occasionally with the Simple Machines Forum database on the site.

The Lab assignment specified using the SearchCursor on the feature class of point data for cities in the state of New Mexico. The search criteria fields were the city NAME, the city FEATURE type and the 2000 Census population data (POP_2000). I found that assigning the SearchCursor output as variables made formatting the print statement vastly easier and cleaner looking codewise.

The biggest challenge I had was populating the dictionary with the county seats. I eventually incorporated the process in the for loop of the SearchCursor with statement. Used the W3Schools web siteW3Schools web site to narrow down the code to use, and with the previous variables, creating the dictionary was simple.

Looking at the example output included in the Lab assignment PDF, I opted to expand upon the formatting. Being the county collector that I am, I incorporated the COUNTY name field so that the output listed the associated county with each seat. Furthermore after discovering that three of the entries lacked population data, I implemented an if statement. The missing data values showed -99999. Being that it was an integer, I first cast that as a string, and then reassigned the population variable to read "not available".
Output showing completion of Geoprocessing tools
Collage of screen shots showing the output of the Module 5 Python Script


Wednesday, June 5, 2024

Geoprocessing with Python scripts and Models in GIS

The two main focuses of this week's Lab assignment in GIS Programming was an introduction to Model Builder in ArcGIS Pro and coding a geoprocessing script from scratch. The lessons show that Geoprocessing Tools can be run solely with Python scripts and the process be automated using models. Both use the ArcPy package, which contains several modules and other elements that add functionality to Python.

Geoprocessing is a series of actions performed on geographic data where there is an input or multiple inputs, a process or task, and then an output of data. There are two general categories of geoprocessing tools in ArcGIS Pro. There are the system or built-in tools created by ESRI. Then there are custom tools, including models and scripts, created by a user or a third-party.

Model Builder is used to run a sequence of tools to obtain a desired result. The application uses four elements: data variables which reference data on disk, value variables provided in formats such as linear units, connectors and tools. Model Builder uses a GUI interface and layout with some similarities to the program flowcharts designed in previous modules. The model elements are color coded to aid in their classification.

ArcGIS Pro Model Builder
The model I developed showing the automation of using three Geoprocessing tools

With sample data provided, the model created for Module 4 took a polygon layer of soils and clipped it to a polygon layer of a basin. The extracted section of the soils feature class was then filtered to select only parcels that were deemed unsuitable for farming. That subset of the soils data was then removed from previously created soils layer including only areas within the original basin.
Polygon feature class showing soils suitable for farming
Output layer of suitable soils for farming

If you had this kind of data over a large area, such as a county or state, rerunning this model for specific locations could save a significant amount of time. ArcGIS Pro can also export this model as a stand alone Python script. The only caveat is making sure the workspace environment, the default location for the files being processed, is set. This typically means using more explicit information such as full file paths.

What is an environment? Our textbook describes them as "essentially hidden parameters that influence how a tool runs." Environments are properties of the env class in ArcPy. Part of object-orientated programming (OOP), classes can be used to create objects, which in turn has properties and methods that can be used. Classes are often implemented to avoid using long and complicated strings in code. The OOP concept is still somewhat fuzzy to me, but  it is becoming more clear with continued use of Python .

Using Python code to perform geoprocessing tasks was not as difficult as anticipated. The three tasks to complete started with a point feature class of hospitals as the input data. Geoprocessing tools first add field values for X and Y based upon the coordinate system of the dataset. The second created a 1,000 meter buffer around each point while the last dissolved the individual buffers as a single feature in a completely separate feature class.

Geoprocessing Python Script Flow Chart
Flowchart showing the general behavior of the Python script

Approached writing this script by researching the syntax for the three geoprocessing tools. With a basic understanding of required parameters and optional parameters, coding was fairly straight forward. Trying out some of the syntax options, I hard-coded parameters such as the input feature layer name while assigning a variable for another function argument.

I also tried out separate nomenclature for calling the tools. Tools can be called by its function such as arcpy.<toolname_tollboxalias>(<parameters>) or the toolbox as a module followed by the tool as a function as arcpy.<toolboxalias>.<toolname>(<parameters>. The main difference is the use of an underscore or dot between the tool name and tool alias.

Used comments on most lines of the script so I can return to it for reference. With the final line of code compiled, I ran the script and encountered an error referencing the incorrect parameters for the Dissolve tool. I then implemented a try-except statement and quickly identified that I forgot to add a comma between the in_features and out_feature_class parameters. With that, the script ran successfully!

Successful run of a Python script running Geoprocessing Tools





Saturday, June 1, 2024

DeBugging and Error Handling in Python

Been a busy week outside of class this week, which made processing this week's module on DeBugging and Error Handling more challenging. I entered this Module feeling overwhelmed with just the concept of DeBugging Python. But as I worked through the exercises and reading, I realized that I already have experience implementing some of the practices with debugging from editing PHP scripts for AARoads. That and the textbook Python Scripting for ArcGIS Pro continues to be straightforward with good example blocks of code.

This week we also gain more experience with creating program flowcharts. The try-except expression is the focus of this week's final Lab assignment:

Python Try-Except program flowchart


The reading and Lab exercises for Module 3 provide an overview of two of three main types of errors encountered in Python programming and methods for handling them. Logic errors is the third type, and this occurs when a script runs but produced undesired results. I have encountered this on a number of occasions with testing out PHP scripts, where the webpage output the wrong data or only a portion of data. Logic errors are often difficult to parse, as they do not generate meaningful error data.

Syntax errors is the first type, and the most easy to comprehend. Syntax errors are akin to making typographical errors in writing. Mistyped variable names or functions is a common syntax error. Others relate to misplaced or missing punctuation and case sensitivity. An aspect somewhat unique to Python is indentation, where inconsistencies with spacing and the use of spaces or tabs can result in a syntax error.

A useful feature embedded within the IDLE Python interpreter is the check syntax option. Accessed by selecting Check Module from the Run menu, the feature produces a pop-up window referencing a syntax error detected and otherwise returns to the cursor if not are present.

Part 1 of the Lab exercise for Module 3 introduced us to a script with syntax errors. Correcting instances of variable names resulted in the successful output of the Python script:

Successful Python Script Output

The third main type of error in Python are exceptions. An exception is where a programming language differentiates between a normal course of events and something exceptional. When Python encounters an error in a script, it "throws" an exception, which usually means that the script stops running. If the exception object, the cause of the error, is not handled or "trapped", the script terminates with a runtime error.

Part 2 of the Lab for Module 3 included two examples of exceptions, one of which was a common syntax error. With those errors corrected, the results:

Output of corrected Python Script

With some knowledge of the main Python error types, debugging can correct or clean up bad code. Debugging is the methodological process for finding errors in a script. Basic principles of the process include carefully reviewing the content of generated error messages, adding print messages to a script, selectively commenting out code, and using a Python debugger.

There are many types of exceptions in Python which are included in the builtins module. Named exceptions are where a specific exception is referred to by name. There is also the generic exception, which is referenced as an unnamed exception. Having an idea of what type of error occurs is beneficial into correcting it.

The use of print messages, where an output is produced midway through a script just to determine if it functioned properly up to that point, is a tactic for handling unnamed exceptions. Commenting out code involves isolating a problematic line or block of code in order to determine if it affects the rest of the script from executing. I have implemented both of these methods when debugging PHP scripts for AARoads. As it stands right now, there is a problematic block of code in the current PHP that generates the live pages that I commented out until I can find a solution.

Part 3 of the Lab assignment for Module 3 focuses on the try-except statement. This method of error handling prevents a script from producing a run-time error while also reporting a meaningful error message to the user. It allows a script to continue beyond the exception and finish normally. In other words, code following the trapped error will still be executed.

The try-except statement isolates a problematic block of code between a to expression and an except expression. The Exception can be assigned to a variable e and subsequently print out to display information on the error in question:

Successful run of a two part Python Script

The flowchart at the beginning of this post illustrates the approach I made in implementing the try-except statement on the script for Part 3. While I quickly identified an error in the script, my use of the try-except statement did not produce the expected results. Instead the exception message changed from one type to another, with a remaining run-time error preventing the rest of the script from executing.

The Lab instructions mentioned modifying the script by adding try-except statements. With this in mind, I considered whether or not multiple statements were needed.

Questions to answer were, did the initial exception object in the script trigger subsequent exceptions? If so, how many additional ones, and where are they located? What ultimately worked was shifting my placement of the except statement to where it trapped all exception objects.

The wisdom gained here is to not assume that an exception object always stands independent of others. There can easily be a ripple affect with a variable omission or misspelling that continues through the rest of the script.