Friday, June 21, 2024

Working with Geometries in Python

Module 6 introduces both working with geometries with Python and external text files or tabular data. Using geometry objects in memory can be used for input and output in geoprocessing. This also circumvents the need to make temporary feature classes. Utilizing the open() function, fileinput module, and write module, text from .txt, .csv and other files can be imported into Python or exported to a file outside of ArcGIS Pro. The readlines() and writelines() methods also work with external files, but on a line by line basis, sometimes with iteration through a loop. These aspects further add to the utility of automating processes with Python.

The Lab assignment for Module 6 is to write a Python script that creates a new text file outside of ArcGIS, and to populate it with geometry data from a feature class of polylines for rivers in Maui, Hawaii.

Maui, Hawaii Rivers feature class
The Maui, Hawaii rivers feature class for Module 6

Following is a graphic I made in an attempt to better explain how to work with geometries in Python. The getPart() method, which returns an array of point objects for each geometry part, is the crux of the assignment. 

Diagram outling Feature Class Architecture and the getPart Method


The SearchCursor function is used to read feature class geometries. Using the SearchCursor function requires several iterations of a feature class to return the points or vertices that make up the geometry of a feature. All polylines and polygons in a feature class are comprised of point objects.

Every feature class is comprised of individual features, things we can see and interact with in ArcGIS Pro. Each feature has a record, which is tabulated in an attribute table. The rows of an attribute table contain the records of every feature found within the feature class. The columns contain the attributes based upon a field.

When using a SearchCursor function to work with geometries, the initial loop iterates through the records and returns a tuple with two objects for each feature. The first variable is the row index number and the second is the polyline object. A polyline object is a shape defined by one or more paths.

A second iteration within the first iteration is required to ultimately return point objects from each polyline object. This is what is known as a nested loop. The for loop that is the nested loop iterates through each polyline object and returns an array object. Part of the ArcPy package, arrays are groupings of points that comprise the geometry of a feature. An array in other programming languages  is a list.

A second nested loop iterates through the array object to extract the point object, which contains the coordinate information of the vertices that make up each feature. The getPart() method however allows a script to bypass this third iteration by coupling the operation with the previous for loop that returns the array object of a feature. That is because the polyline object has this method, which receives an input parameter and index, and returns an array of point objects.

The getPart() method iterates through the tuple output by the SearchCursor function. The index inputs the first element of the tuple, which for the SearchCursor is based upon the field names argument. Part in the function name references the return of an array object of point objects for a particular part of the geometry, which for Module 6 is index 0, the FID number. The method iterates off the polyline object returned from the tuple.

The flowchart for the Module 6 script outlines the SearchCursor function and one nested loop used to obtain point objects for each part of the array object of a feature:

Nested Loops Program Flowchart

This becomes a little more complicated with multipart features, where a feature is comprised of multiple arrays of point objects making up one set of attributes. Scenarios of this are where a feature contains polygons that are not physically connected, or where empty (interior) polygons fall within a larger (exterior) polygon. Think of St. Martin Parish, Louisiana, which consists of two areas separated by Iberia Parish in between, or Shively, Kentucky, a city encircled on all sides by Louisville.

With multipart features, an iteration over all distinct parts comprising the overall feature is required to return its geometry. In summary, the polyline object returned from the SearchCursor contains array objects for each part. Within those array objects is another array which contains the point objects.

The output of the script compiled for Module 6 produced a formatted text file for each river feature in the Maui rivers feature class:

Module 6 text file screenshot of output data

Data output from point objects in the writelines() method produced the ID number and the X-coordinate and Y-coordinate for each vertex of all records in the rivers feature class. This in turn was written to the new text file created at the beginning of the script with the open() function.

The vertex number for the vertices were assigned as part of the nested loop on the array object, where a variable vertexnum increased by one for each pass. The getPart() method also returned the name object of each vertex, which was derived from the SearchCursor iteration.

‘Ohe’o Gulch in Maui, Hawaii

‘Ohe’o Gulch, downriver from Palikea Stream, which is one of the features from the Module 6 dataset

My wife and I visited Maui in January 2011 and drove the iconic Hana Highway to Haleakalā National Park. The above photo is one of ‘Ohe’o Gulch, which is connects Palikea Stream to the Pacific Ocean at the southeast end of the island.

No comments:

Post a Comment