The GIS Road to Fruition

Wednesday, July 17, 2024

Visualizing data in 3D with GIS

The course work for GIS Applications Module 3 looks further into 3D visualization, Line of Sight (LOS) analysis and Viewshed analysis with ArcGIS Pro. We are tasked with completing four exercises through ESRI training and read an article accessing the Lines of Torres Vedra, the defensive lines constructed to defend the Portuguese capital of Lisbon against Napoleon in 1810, with LOS analysis in GIS.

Presentation of GIS data in three dimensions has a variety of potential benefits for the end user. 3D visualization can help identify patterns that may not be apparent in 2D. 3D data can provide context or a different perspective for a presentation or display of various phenomena. Allowing users to see vertically stacked content, a 3D scene can also go beyond terrain modeling.

Downtown San Diego 3D scene with marker symbols, fill symbol layers and procedural symbology

Furthermore, 3D visualization can spark imagination and enhance understanding while also accenting environmental effects, such as shadowing and depth. Rendering quantitative data in terms of three-dimensional shapes, 3D visualization can also more strongly contrast data ranges.

3D scene of San Diego visualized with a Summer sun angle

San Diego 3D scene with a procedural rule package for buildings and illumination visual enhancement

The ESRI Get Started with Visibility Analysis learning plan covered three-dimensional basics in GIS. Any data that uses a z-value, or vertical coordinate system, can be considered 3D. Z-values do not have to represent elevation or height, and instead can be some numerical value of a particular attribute, such as population or precipitation amount.

Vertical coordinate systems comprise both a linear unit of measurement and a direction. The direction defines if values are positive up or positive down. Positive down represents the depths below a surface. A scene requires information from where to start measurements. This is provided by the ground or elevation surface.

A two-dimensional (2D) map is a top-down or orthographic view of data that shows a flat representation of features. A 3D scene is viewable from any angle. A midpoint between the two is a functional surface. While technically not 3D, a functional surface contains a single associated z-coordinate for every x,y coordinate pair. 3D data, on the other hand, may be associated with multiple z-values at a single point or vertex.

A raster dataset can be used to create a 3D rendering. One of the most common is a digital elevation model (DEM), where each cell of a raster stores an elevation value. A more detailed 3D rendering revealing sharp, defined edges is a triangular irregular network (TIN). Also a type of functional surface, a TIN consists of nonoverlapping triangles that border each other that vary in size and proportion.

Triangulated irregular network (TIN)

Input points for a TIN form the triangle vertices (nodes). These vertices are connected by lines that form the legs of a triangle (edges). Using mathematical interpolation, the elevation for any surface on a TIN can be obtained. Calculations also produce the slope and aspect for each triangle space, where aspect is the compass direction that the slope faces.

Data in a 3D scene must include a height source value / base height (elevation source), which is the height at which it is displayed. This can be stored in a feature's geometry as a z-value or as an attribute value in a table. The elevation type of a layer determines how to draw data in relation to the ground. The surface provides content for the elevation of the data and the basis for layer measurements.

30-meter resolution

3-foot resolution

With a 30-meter resolution, WorldElevation3D/Terrain3D is the default elevation surface service from ArcGIS Online. Some projects may require a elevation dataset with higher resolution to reveal more detail.

You must determine how to draw 2D features in relation to the ground surface when visualizing in a 3D scene. The elevation type for a feature built from the ground up is referenced as on the ground. Features drawn at a consistent or known height above the ground are referenced at an absolute height. Features with variable heights, including subsurface depths, are positioned relative to the ground.

Height variables can be manipulated using cartographic offset and vertical exaggeration among other techniques.

Horizontal desert terrain enhanced visually with vertical exaggeration by amplifying relief

Setting the cartographic offset by 50 feet in this scene reveals point locations previously obscured

3D Features

2D features can be displayed in 3D by using extrusion. Extrusion creates a 3D shape using a numerical attribute as a z-value. This could be points on map representing precipitation amounts where the height of the column is the amount. It can also be used to display a building footprint as a 3D object if a value for a height is included in the attribute data.

3D buildings at the UWF Campus from Intro to GIS

3D objects of buildings on the UWF campus from the Geoprocessing lab in Intro to GIS last Fall

Extrusion of parcels based upon property value

The z-value of these extruded polygons in Manhattan, Kansas is based upon the property value of the parcel

A common way to create 3D features is to use a mesh. A mesh is a collection of triangles combined to create models of the real world. Multiple z-coordinates can be associated with x,y coordinate in a mesh. This is unlike a TIN where the x,y pair only has one z-value. Integrated meshes represent discrete objects on the ground surface as a single 3D object. These do not have feature attributes.

Multipatch features is a type of 3D geometry representing the outer surfaces or shells of a feature that occupy a discrete area of volume in three-dimensional space. They consist of 3D rings and triangles which can be used to represent both simple and complex objects. Multipatches can store texture, image, color and transparency within a feature's geometry. Buildings rendered in Google Earth utilize multipatches.

As covered in the previous Module, point clouds from LiDAR data also produce 3D scenes. Stored in a LAS file, the mass of coordinates in a point cloud consist of high accurate x,y,z measurements of the earth's surface.

LiDAR Point Cloud data of an overpass at State College, PA

LiDAR Point Cloud data showing part of I-99 at State College, PA. Structures like overpasses, where the laser pulses cannot reach the ground level, reveal portions of the basemap.

3D Visualization

Within ArcGIS Pro there are three views that an analyst can start with. The map view can map 3D data but visualizes it in 2D, with both drawn the same way. Visualizing data in 3D requires converting to one of two 3D viewing modes. These display 3D content from a real-world perspective.

The Global scene is used for data covering a large extent where the curvature of the earth is important for analysis. The Local scene covers other cases where the earth's curvature is not necessary for analysis and where the content covers a smaller, fixed extent. As to which mode to use, questions to ask include what is the minimum surface resolution, what is the appropriate basemap, and will thematic styling detract or aid the GIS information to be presented, will visualization include subsurfaces?

A 3D scene should have multipatch features, thematically symbolized layers, an elevation surface, and either an imagery or topographic layer as a basemap. Structuring the content well in a 3D scene benefits the end user with reduced complexity and increased understanding. Furthermore defining an area of interest (AOI) for the scene both limits the amount of data to be published (and geoprocessing time) and focuses on a subset of the data. This also reduces unneeded data that could be distracting to the end user.

3D scene of Portland, Oregon where buildings and trees area multipatch features created from 2D data.

Geoprocessing for a 3D scene results in 3D lines or multipatch features using the 3D display properties of the input feature layer. The multipatch feature data is used to create a scene layer, which can then be saved as part of a scene layer package to be published to ArcGIS Online. Depending upon the permission settings for who can access it, the scene layer package can be then be accessed or used in Scene Viewer and ArcGIS Pro.

View my 3D Buildings scene of Portland, Oregon from this week's exercise at https://pns.maps.arcgis.com/home/webscene/viewer.html?webscene=7e10ab3514b04c8796cd1b0f8c07d8f7

Line of Sight Analysis

With the definitions of 3D data, we can proceed with Line of Sight (LOS) analysis. A line of sight (LOS) calculates intervisibility along a straight line between an observer and target. This calculation factors in obstructions, which can include any combination of rasters (surfaces), TINs, extruded polygons or lines, and multipatch features. If the target is a line or polygon feature as opposed to a point, then a sampling distance can be input to determine the density of sight lines along the target.

The array produced by LOS analysis rom an observer point at a fictional parade route

The array of sight lines generated from an observer to a target line feature during the LOS analysis.

The Line of Sight geoprocessing tool in ArcGIS Pro determines the visibility along sight lines. The result includes the TarlsVis attribute field, which consists of a Boolean where 1 is visible and zero is not visible. With these values, sight lines with obstructions can be removed from analysis.

LOS lines with those with TarlsVis values of zero (not visible) removed

LOS lines where features with the TarlsVis value of zero removed.

Viewshed analysis models the visibility from a vantage point in up to a 360 degree view. The geoprocessing tool models the visibility of the horizon, shadows and line of sight. Outputs are a raster showing visible areas from the specified vantage point.

The Viewshed tool considers the height of the vantage point and obstructions surrounding it. Parameters that can be factored into the Viewshed tool control elevation values, vertical offsets, horizontal and vertical scanning angles and distance bounds.

The various Fields in the Viewshed geoprocessing tool controlling vertical offsets, angles and radius

Additionally while the tool is set to reflect visible light, the model can work with other wave-based transmissions such as radar by adjusting the refractivity coefficient.

Monday, July 8, 2024

GIS applications using LiDAR

Module 2 for GIS Applications returns us to take a more in depth look at the use of LiDAR, a topic briefly covered in previous courses. LiDAR (LIght Detection And Ranging) uses lasers, usually in the visible or near-infrared portion of the spectrum, calculates heights by measuring distances between a scanner and target area. The high energy pulses record the reflected response of different objects in a very narrow wave length. This produces a point cloud, where the masspoints associated with each return are distributed throughout the target area at various densities. The densities vary depending upon the materials encountered by the laser pulses.

Interpolation processes of individual masspoints creates a digital surface model (DSM), which shows the elevation characteristics of natural phenomenon and man-made structures. Another procedure with the removal of LiDAR masspoints in the first, intermediate and/or the last returns produces a bare-Earth digital terrain model (DTM). LiDAR is also effective at measuring water depth relative to the water surface.

Digital Elevation Models (DEM) provide elevation data in a raster format. DEMs with 3, 10 and 30 meter resolution are available for most of the United States. The USGS catalogs DEMs and makes this data available through the Earth Explorer and National Map portals. Furthermore, states also provide DEM datasets through Clearinghouse Websites.

LiDAR data is often delivered in a standard LAS format, which defines the file structure, content, storage order, naming, codes, etc. The standard LiDAR exchange file uses the .las file extension. The Lab for this week's module utilizes a LiDAR dataset provided by the Virginia Geographic Information Network (VGIN) covering a section of the Shenandoah Mountains at Big Meadows.

Looking north at the Shenandoah Mountains from Swift Run Overlook

North view toward along the Shenandoah Mountains from Skyline Drive. Photo by Andy Field.

Our textbook GIS Fundamentals references discrete-return LiDAR as the most common collection system. This system records records specific values for each laser pulse downward. It produces a point cloud consisting of X, Y, and Z coordinates along with the intensity, scan angle, return order and other information.

Points in the LiDAR point cloud are assigned to feature types such as structures or vegetation. Standard codes identify ground, buildings and water. These are derived by return strength, point order, local slope, etc.

LiDAR data is widely used to estimate vegetation characteristics such as tree height, forest density, growth rates and forest type. The initial part of our lab focuses on calculating forest height starting with the conversion of the LAS file into both a DEM and a Digital Surface Model (DSM).

We first use the Point File Information geoprocessing tool in ArcGIS Pro on the LAS file to summarize the file with values for the minimum bounding rectangle, number of points, the average point spacing, and the min/max z-values. DEM and DSMs are created next using the LAS Dataset to Raster geoprocessing tools. With these two rasters, a calculation using the Minus tool outputs a raster populated with estimated heights.

The LAS file for Big Meadows at Shenandoah National Park and derived DEM

The LAS file for Big Meadows at Shenandoah National Park and the compiled DEM

The subsequent step in Lab calculates the biomass density of the forested area in question. MultiPoint files for the ground and vegetation are created using the LAS to Multipoint geoprocessing tool on the point file previously created. These are then processed using the Point to Raster tool to output respective rasters.

Continuing with geoprocessing, both ground and vegetation multipoint rasters are further processed using the IS NULL tool to produce a boundary file where similar to Boolean, 1 is assigned to all values that are not null . The Con tool then juxtaposes the IsNull rasters with the Multipoint rasters for both sets so that if a value of zero is encountered, it is accepted as a true value and values of 1 are in turn pulled from the original multipoint rasters. This produces rasters of the cell counts.

Tree Height Estimation and Raster Cell Count for data derived from LiDAR

The output tree height estimation and the statistics of raster cells (points) versus height

Working forward to calculate the density of the returns, the Plus tool combines the counts for both the vegetation and ground. This results in a raster where all cell counts are assigned an integer value from zero to 23. After converting the raster values from integer to float (decimals), the tree canopy density can finally be calculated. This is completed by using the Divide geoprocessing tool on the raster of the cell counts for vegetation and the combined vegetation/ground counts raster with float values.

Canopy Density derived from LiDAR of Shenandoah National Forest at Big Meadows, VA

Forest Canopy Density of the Big Meadows area of Shenandoah National Park derived from LiDAR

Wednesday, July 3, 2024

Crime Analysis in GIS

Our first topic in GIS Applications is crime analysis and the use of crime mapping for determining crime hotspots. Crime mapping techniques provide insight into the spatial and temporal distributions of crime. This benefits criminologists in the research community and professionals in law enforcement.

Crime mapping factors in the importance of local geography as a reason for crime and considers that it may be as important as criminal motivation. The importance of identifying patterns and hotspots in crime mapping tends to be a precursor for implementing effective crime prevention methods.

Fundamental to crime mapping is spatial autocorrelation, which acknowledges the spatial dependency of values measured within areas. This recognizes that crime in one area can influence the crime rate of a nearby area.

We are tasked this week with quantifying data and generating hotspot maps showing crime density using various methods on the clustering of events. The Lab for Module 1 works with crime data for Washington, DC and Chicago.

Kernel Density Map showing crime hotspots for assaults with dangerous weapons in 2018

Output in this week's lab, a kernel density map showing 2018 crime hotspots for Washington, DC

A relative measure, a crime hotspot represents an area with a greater than average frequency of criminal or disorderly events. An area where people have an above average risk of victimization can also be classified as a crime hotspot. Victimization however cannot always be shown on maps, as the theory refers to multiple incidents on the same individual, regardless of location. This can also represent a street (line) or a neighborhood (polygon) where repeat occurrences take place.

Determining crime hotspots can aid in detecting spatial and temporal patterns and trends of crime. The concept can benefit law enforcement in better allocating resources to target areas. Crime hotspots can also be used to identify underlying causes for crime events.

The concept of local clustering, concentrations of high data values, is the most useful for crime analysis. Methods determine where clusters are located and produce a hotspot map showing concentrations.

Point data can be used directly in this analysis of clusters. A collection of points can produce a hotspot whose bounds are derived from the local density of points. Using point data also has the advantage of not being constrained by a predetermined jurisdictional boundary. Data aggregated into meaningful areas, such as within a jurisdiction where the polygons consists of the highest values, can also result in hot spots. Aggregation can produce crime rates, such as the number of events per number of residents or per households for an area.

Aggregated data showing the number of crimes per 1,000 households

Choropleth map with aggregated data determining the crime rate for Washington, DC.

The Lab for Module 1 focuses on three methods for local clustering. Grid-Based Thematic Mapping overlays a regular grid of polygons above point data of crime events. This produces a count of events for each grid cell. Since all cells are uniform in dimensions, the count is equivalent to a density.

The choropleth map output showing crime density can be further analyzed to determine the crime hotspots. Extracting the crime hotspots involves selecting the highest class of the data. Quintile classification is commonly used to determine this.

The data provided in Lab included point class data of homicides reported in the city of Chicago for 2017. Additionally we were supplied with polygon class data of 1/2 mile grid cells clipped to the Chicago city limits.

The grid cells and point data for Chicago were spatially joined and grid cells where the homicide value was zero were removed from analysis. Using quintile classification, the top 20% of grid cells based on the homicide values was extracted to generate a hotspot map:

Grid-Based Thematic Map of Chicago where three or more homicides were recorded in 2017

Using point values, Kernel Density can also be used to calculate a local density without the use of aggregation. The estimation method utilizes a user-defined grid over the point distribution. A search radius, known as the bandwidth, is applied to each grid cell. Using these two parameters, the method calculates weights for each point within the kernel search radius.

Points closer to the grid cell center are weighted more and therefore contribute more to the total density value of the cell. The final grid cell values are derived by summing the values of all circle surfaces for each location. For the Lab, we used the Spatial Analyst Kernel Density tool in ArcGIS Pro. Input were the grid cell size and bandwidth to run on the 2017 homicides feature class for Chicago. The output was a raster file with ten classes.

Since we were only interested in areas with the highest homicide rate, we reclassified the raster data into two classes. The upper class ranged from a value three times the mean to the maximum value of the raster data. This represented the crime hotspot as estimated with kernel density:

Continuous surface map showing the crime hotspots for Chicago based upon 2017 homicide point data

Local Moran's I is the final method implemented on the 2017 Chicago GIS data for Module 1. A global measure of spatial autocorrelation, the Moran's I method addresses the question, are nearby features similar? Features that are closer to each other are more similar to one another than those located farther apart. Moran's I produces a single statistic that reveals if a spatial pattern is clustered by comparing the value at any one location with the value at all other locations.

The result of Moran's I varies between -1.0 and +1.0. Positive values correlate to positive spatial autocorrelation (clustering) and negative values with negative autocorrelation. Where points that are closer together have similar values, the Moran's I result is high. If the point pattern is random, the value will be close to zero.

For the Lab, the homicides feature class and census tract data were spatially joined. A field calculating the number of homicides per 1,000 units was added. This feature in turn was input into the Cluster and Outlier Analysis (Anselin Local Moran's I) Spatial Statistics tool to output a new feature class based upon Local Moran's I. The result includes attribute data revealing two types of clusters: High-High (HH) representing clusters of high values and Low-low (LL) representing clusters of low values.

High-high clusters in the context of the Chicago crime data represent areas with high homicide values in close proximity to other areas with high homicide values. These are the crime hotspots:

Crime hotspots derived from 2017 homicide data for Chicago using the Local Moran's I method

Sources:

Ratcliffe, J. (2010). Crime Mapping: Spatial and Temporal Changes. Handbook of Quantitative Criminology,. (pp. 5-8). Springer New York, NY.

Eck, J.E., Chainey, S., Cameron, J.G, Leitner, M. & Wilson, R.E. (2005) Mapping Crime: Understanding Hot Spots. National Institute of Justice (NIJ) Special Report.

Friday, June 21, 2024

Working with Geometries in Python

Module 6 introduces both working with geometries with Python and external text files or tabular data. Using geometry objects in memory can be used for input and output in geoprocessing. This also circumvents the need to make temporary feature classes. Utilizing the open() function, fileinput module, and write module, text from .txt, .csv and other files can be imported into Python or exported to a file outside of ArcGIS Pro. The readlines() and writelines() methods also work with external files, but on a line by line basis, sometimes with iteration through a loop. These aspects further add to the utility of automating processes with Python.

The Lab assignment for Module 6 is to write a Python script that creates a new text file outside of ArcGIS, and to populate it with geometry data from a feature class of polylines for rivers in Maui, Hawaii.

The Maui, Hawaii rivers feature class for Module 6

Following is a graphic I made in an attempt to better explain how to work with geometries in Python. The getPart() method, which returns an array of point objects for each geometry part, is the crux of the assignment.

Diagram outlining Feature Class Architecture and the getPart Method

The SearchCursor function is used to read feature class geometries. Using the SearchCursor function requires several iterations of a feature class to return the points or vertices that make up the geometry of a feature. All polylines and polygons in a feature class are comprised of point objects.

Every feature class is comprised of individual features, things we can see and interact with in ArcGIS Pro. Each feature has a record, which is tabulated in an attribute table. The rows of an attribute table contain the records of every feature found within the feature class. The columns contain the attributes based upon a field.

When using a SearchCursor function to work with geometries, the initial loop iterates through the records and returns a tuple with two objects for each feature. The first variable is the row index number and the second is the polyline object. A polyline object is a shape defined by one or more paths.

A second iteration within the first iteration is required to ultimately return point objects from each polyline object. This is what is known as a nested loop. The for loop that is the nested loop iterates through each polyline object and returns an array object. Part of the ArcPy package, arrays are groupings of points that comprise the geometry of a feature. An array in other programming languages is a list.

A second nested loop iterates through the array object to extract the point object, which contains the coordinate information of the vertices that make up each feature. The getPart() method however allows a script to bypass this third iteration by coupling the operation with the previous for loop that returns the array object of a feature. That is because the polyline object has this method, which receives an input parameter and index, and returns an array of point objects.

The getPart() method iterates through the tuple output by the SearchCursor function. The index inputs the first element of the tuple, which for the SearchCursor is based upon the field names argument. Part in the function name references the return of an array object of point objects for a particular part of the geometry, which for Module 6 is index 0, the FID number. The method iterates off the polyline object returned from the tuple.

The flowchart for the Module 6 script outlines the SearchCursor function and one nested loop used to obtain point objects for each part of the array object of a feature:

This becomes a little more complicated with multipart features, where a feature is comprised of multiple arrays of point objects making up one set of attributes. Scenarios of this are where a feature contains polygons that are not physically connected, or where empty (interior) polygons fall within a larger (exterior) polygon. Think of St. Martin Parish, Louisiana, which consists of two areas separated by Iberia Parish in between, or Shively, Kentucky, a city encircled on all sides by Louisville.

With multipart features, an iteration over all distinct parts comprising the overall feature is required to return its geometry. In summary, the polyline object returned from the SearchCursor contains array objects for each part. Within those array objects is another array which contains the point objects.

The output of the script compiled for Module 6 produced a formatted text file for each river feature in the Maui rivers feature class:

Module 6 text file screenshot of output data

Data output from point objects in the writelines() method produced the ID number and the X-coordinate and Y-coordinate for each vertex of all records in the rivers feature class. This in turn was written to the new text file created at the beginning of the script with the open() function.

The vertex number for the vertices were assigned as part of the nested loop on the array object, where a variable vertexnum increased by one for each pass. The getPart() method also returned the name object of each vertex, which was derived from the SearchCursor iteration.

‘Ohe’o Gulch, downriver from Palikea Stream, which is one of the features from the Module 6 dataset

My wife and I visited Maui in January 2011 and drove the iconic Hana Highway to Haleakalā National Park. The above photo is one of ‘Ohe’o Gulch, which is connects Palikea Stream to the Pacific Ocean at the southeast end of the island.

Wednesday, June 12, 2024

Automating Geoprocessing with Python

Moving into Module 5, our assignment this week consists of writing a Python script to automate geoprocessing tasks. We were provided a dataset with several Alaska and New Mexico feature classes and the task of copying them into a newly created file geodatabase (fGDB). Working with lists and dictionaries, our script also implements several functions including ListFeatureClasses, Describe, and SearchCursor. Our end task was to output a dictionary (a Python list with pairs of keys:values) of New Mexico county seats.

As an aside, I always like working with data from places I have visited. Including an Albuquerque airport stop in 2007, I've been to New Mexico four times. I also updated the Universal Map Group Wall Map for Albuquerque as one of my projects in 2008.

I-40 west ahead of Tijeras from my trip to New Mexico in 2017.

The ListFeatureClasses() function returns a list of feature classes in the current workspace. This list can be stored in a variable to be used with subsequent functions. Part of the ArcPy package, the Describe() function returns a Describe object which includes properties of a feature class such as data type, geometry type, basename, etc.

Using the Describe function on the variable with the feature class list allows a property, such as the basename, to be used as part of the CopyFeatures() function in the Data Management toolbox. This function copies input feature class features into a new feature class. With a for loop, we used this to populate our newly created file geodatabase (fGDB) with a concatenation of a variable for the output environment, the name of our created fGDB and the basename property of each feature class.

The Flow Chart for this week's Lab assignment

The program flowchart for this week's Lab assignment

While the term cursor to me references the blinking vertical line in this word processor, it has a separate meaning in computer programming. Cursor is a database technology term for accessing a set of records in a table. Records in a table are referred to as rows. Iterating over the row of a table, the three cursors in Python are as follows:

Search Cursor - this function retrieves specific rows on the basis of attribute values. This is similar to performing a SQL query.
Insert Cursor - this function adds new roads to a table, which can in turn be populated with new attribute values.
Update Cursor - this function modifies the attribute data of existing rows or deletes rows from the table.

Each type of cursor is created by corresponding class of the arcpy.da.module. The cursor object accessed row objects, returning a tuple of field values in an order specified by the field names argument of the function. The cursor iterates over all table rows but using only specific fields as needed.

Cursors support with statements, which have the advantage of executing regardless of whether the cursor finished running successfully and completing without a data lock. A data lock ensues otherwise is a cursor is not deleted with a del statement. The data lock prevents multiple processes from changing the same table simultaneously.

With statements also incorporate SQL queries as the where_clause optional parameter in the SearchCursor syntax. SQL queries find records in a database table based upon specific criteria selecting data WHERE specific conditions occur. I often use SQL queries when updating the database for the main AARoads pages and also occasionally with the Simple Machines Forum database on the site.

The Lab assignment specified using the SearchCursor on the feature class of point data for cities in the state of New Mexico. The search criteria fields were the city NAME, the city FEATURE type and the 2000 Census population data (POP_2000). I found that assigning the SearchCursor output as variables made formatting the print statement vastly easier and cleaner looking codewise.

The biggest challenge I had was populating the dictionary with the county seats. I eventually incorporated the process in the for loop of the SearchCursor with statement. Used the W3Schools web site to narrow down the code to use, and with the previous variables, creating the dictionary was simple.

Looking at the example output included in the Lab assignment PDF, I opted to expand upon the formatting. Being the county collector that I am, I incorporated the COUNTY name field so that the output listed the associated county with each seat. Furthermore after discovering that three of the entries lacked population data, I implemented an if statement. The missing data values showed -99999. Being that it was an integer, I first cast that as a string, and then reassigned the population variable to read "not available".

Output showing completion of Geoprocessing tools

Collage of screen shots showing the output of the Module 5 Python Script

Wednesday, June 5, 2024

Geoprocessing with Python scripts and Models in GIS

The two main focuses of this week's Lab assignment in GIS Programming was an introduction to Model Builder in ArcGIS Pro and coding a geoprocessing script from scratch. The lessons show that Geoprocessing Tools can be run solely with Python scripts and the process be automated using models. Both use the ArcPy package, which contains several modules and other elements that add functionality to Python.

Geoprocessing is a series of actions performed on geographic data where there is an input or multiple inputs, a process or task, and then an output of data. There are two general categories of geoprocessing tools in ArcGIS Pro. There are the system or built-in tools created by ESRI. Then there are custom tools, including models and scripts, created by a user or a third-party.

Model Builder is used to run a sequence of tools to obtain a desired result. The application uses four elements: data variables which reference data on disk, value variables provided in formats such as linear units, connectors and tools. Model Builder uses a GUI interface and layout with some similarities to the program flowcharts designed in previous modules. The model elements are color coded to aid in their classification.

The model I developed showing the automation of using three Geoprocessing tools

With sample data provided, the model created for Module 4 took a polygon layer of soils and clipped it to a polygon layer of a basin. The extracted section of the soils feature class was then filtered to select only parcels that were deemed unsuitable for farming. That subset of the soils data was then removed from previously created soils layer including only areas within the original basin.

Polygon feature class showing soils suitable for farming

Output layer of suitable soils for farming

If you had this kind of data over a large area, such as a county or state, rerunning this model for specific locations could save a significant amount of time. ArcGIS Pro can also export this model as a stand alone Python script. The only caveat is making sure the workspace environment, the default location for the files being processed, is set. This typically means using more explicit information such as full file paths.

What is an environment? Our textbook describes them as "essentially hidden parameters that influence how a tool runs." Environments are properties of the env class in ArcPy. Part of object-orientated programming (OOP), classes can be used to create objects, which in turn has properties and methods that can be used. Classes are often implemented to avoid using long and complicated strings in code. The OOP concept is still somewhat fuzzy to me, but it is becoming more clear with continued use of Python .

Using Python code to perform geoprocessing tasks was not as difficult as anticipated. The three tasks to complete started with a point feature class of hospitals as the input data. Geoprocessing tools first add field values for X and Y based upon the coordinate system of the dataset. The second created a 1,000 meter buffer around each point while the last dissolved the individual buffers as a single feature in a completely separate feature class.

Flowchart showing the general behavior of the Python script

Approached writing this script by researching the syntax for the three geoprocessing tools. With a basic understanding of required parameters and optional parameters, coding was fairly straight forward. Trying out some of the syntax options, I hard-coded parameters such as the input feature layer name while assigning a variable for another function argument.

I also tried out separate nomenclature for calling the tools. Tools can be called by its function such as arcpy.<toolname_tollboxalias>(<parameters>) or the toolbox as a module followed by the tool as a function as arcpy.<toolboxalias>.<toolname>(<parameters>. The main difference is the use of an underscore or dot between the tool name and tool alias.

Used comments on most lines of the script so I can return to it for reference. With the final line of code compiled, I ran the script and encountered an error referencing the incorrect parameters for the Dissolve tool. I then implemented a try-except statement and quickly identified that I forgot to add a comma between the in_features and out_feature_class parameters. With that, the script ran successfully!

Successful run of a Python script running Geoprocessing Tools

Saturday, June 1, 2024

DeBugging and Error Handling in Python

Been a busy week outside of class this week, which made processing this week's module on DeBugging and Error Handling more challenging. I entered this Module feeling overwhelmed with just the concept of DeBugging Python. But as I worked through the exercises and reading, I realized that I already have experience implementing some of the practices with debugging from editing PHP scripts for AARoads. That and the textbook Python Scripting for ArcGIS Pro continues to be straightforward with good example blocks of code.

This week we also gain more experience with creating program flowcharts. The try-except expression is the focus of this week's final Lab assignment:

The reading and Lab exercises for Module 3 provide an overview of two of three main types of errors encountered in Python programming and methods for handling them. Logic errors is the third type, and this occurs when a script runs but produced undesired results. I have encountered this on a number of occasions with testing out PHP scripts, where the webpage output the wrong data or only a portion of data. Logic errors are often difficult to parse, as they do not generate meaningful error data.

Syntax errors is the first type, and the most easy to comprehend. Syntax errors are akin to making typographical errors in writing. Mistyped variable names or functions is a common syntax error. Others relate to misplaced or missing punctuation and case sensitivity. An aspect somewhat unique to Python is indentation, where inconsistencies with spacing and the use of spaces or tabs can result in a syntax error.

A useful feature embedded within the IDLE Python interpreter is the check syntax option. Accessed by selecting Check Module from the Run menu, the feature produces a pop-up window referencing a syntax error detected and otherwise returns to the cursor if not are present.

Part 1 of the Lab exercise for Module 3 introduced us to a script with syntax errors. Correcting instances of variable names resulted in the successful output of the Python script:

The third main type of error in Python are exceptions. An exception is where a programming language differentiates between a normal course of events and something exceptional. When Python encounters an error in a script, it "throws" an exception, which usually means that the script stops running. If the exception object, the cause of the error, is not handled or "trapped", the script terminates with a runtime error.

Part 2 of the Lab for Module 3 included two examples of exceptions, one of which was a common syntax error. With those errors corrected, the results:

With some knowledge of the main Python error types, debugging can correct or clean up bad code. Debugging is the methodological process for finding errors in a script. Basic principles of the process include carefully reviewing the content of generated error messages, adding print messages to a script, selectively commenting out code, and using a Python debugger.

There are many types of exceptions in Python which are included in the builtins module. Named exceptions are where a specific exception is referred to by name. There is also the generic exception, which is referenced as an unnamed exception. Having an idea of what type of error occurs is beneficial into correcting it.

The use of print messages, where an output is produced midway through a script just to determine if it functioned properly up to that point, is a tactic for handling unnamed exceptions. Commenting out code involves isolating a problematic line or block of code in order to determine if it affects the rest of the script from executing. I have implemented both of these methods when debugging PHP scripts for AARoads. As it stands right now, there is a problematic block of code in the current PHP that generates the live pages that I commented out until I can find a solution.

Part 3 of the Lab assignment for Module 3 focuses on the try-except statement. This method of error handling prevents a script from producing a run-time error while also reporting a meaningful error message to the user. It allows a script to continue beyond the exception and finish normally. In other words, code following the trapped error will still be executed.

The try-except statement isolates a problematic block of code between a to expression and an except expression. The Exception can be assigned to a variable e and subsequently print out to display information on the error in question:

Successful run of a two part Python Script

The flowchart at the beginning of this post illustrates the approach I made in implementing the try-except statement on the script for Part 3. While I quickly identified an error in the script, my use of the try-except statement did not produce the expected results. Instead the exception message changed from one type to another, with a remaining run-time error preventing the rest of the script from executing.

The Lab instructions mentioned modifying the script by adding try-except statements. With this in mind, I considered whether or not multiple statements were needed.

Questions to answer were, did the initial exception object in the script trigger subsequent exceptions? If so, how many additional ones, and where are they located? What ultimately worked was shifting my placement of the except statement to where it trapped all exception objects.

The wisdom gained here is to not assume that an exception object always stands independent of others. There can easily be a ripple affect with a variable omission or misspelling that continues through the rest of the script.