Tuesday, July 30, 2024

Damage Assessment - Hurricane Sandy

Module 5 for GIS Applications continues our focus on Hurricane Sandy and explores damage assessment for the storm's impact in the Garden State.

Our first task was to create a formal hurricane track map showing the path Sandy took from the Caribbean Sea to the Northeastern U.S. The symbology uses custom color coded coordinate points showing the hierarchy of storm intensity. Included are the maximum sustained winds and the barometric pressure shown in 12 hour increments to improve legibility.

Map showing the path of Hurricane Sandy.

The next section of lab 5 was the creation of a damage assessment survey using Survey123. This was a pretty straightforward process, with options to add multiple choice questions pertaining to damage to be documented, a field for describing that damage surveyed, the option to include an image or have the mobile device take a photo, and a required location setting either through GPS or map locator. Following the form creation, we determine what the survey application does after the submission of a completed survey and we set the restrictions on what an individual viewing the survey can see.

Our next task is the preparation of raster data of air photos showing an area of New Jersey both before and after Superstorm Sandy. An array of .SID raster images of pre-storm photos created the first mosaic dataset using geoprocessing. .JPG images of the post-storm photos were compiled into the second mosaic dataset.

With both mosaic datasets in place, we revisit the Flicker and Swipe tools (located in the Compare group below the Mosaic Layer tab), which were previously used in the Remote Sensing course, to alternate the display between the pre and post storm imagery. These are both fast methods to visually compare the two imageries.

Example of the Swipe Tool in ArcGIS Pro
An example of the Swipe tool showing pre-storm imagery above and post-storm imagery below

Step 3 of the lab focuses on the creation of data. For this, we revisit the concept of Domains previously covered in Intro to GIS last Fall. Attribute domains constrain values allowed in an attribute for a table of feature class. Domains create a rule set of acceptable attribute values, or in the case of Module 5, a range of integers associated with predefined aspects of damage assessment:

Domains set for an attribute of a newly created feature class
Helping to ensure data integrity, domains limit the number of acceptable values for a field.

Attribute domains are store in a geodatabase. They can be utilized by multiple feature classes, tables and subtypes in a geodatabase. Through Catalog in ArcGIS Pro, Data Design>Fields, these can be added to an existing feature class.

Using the aforementioned air photos showing Seaside Heights, NJ before and after Superstorm Sandy, we were tasked with conducting damage assessment within a study area of parcel data using the preset domains for the various damage categories. Symbolization uses a continuous color scheme from green for no damage to red for destroyed.

Point feature class showing damage assessment for each parcel within a study area
Damage assessment study area for Superstorm Sandy at Seaside Heights, NJ

Given the four domains of Structure Damage, Inundation, Wind Damage and Structure Type, each parcel within a seaside neighborhood was evaluated for damage based upon the two air photos. This was a tedious task due to relatively low image resolution and long shadows in the post-Sandy aerial imagery. Without in-situ data collection, evaluating parcels for wind damage was impractical given details such as missing roof shingles was not possible.

Expanding our analysis, we aggregate the damage assessment points into buffers of within 100 meters of the coastline, between 100-200 meters and between 200-300 meters. Using the Multi-ring Buffer geoprocessing tool, created the three storm surge zones. Proceeded to run a Spatial Join on the Structure Damage point file with the MultipleRing Buffer polygon file to quantify the damage type by buffer zone. The Summary Statistics geoprocessing tool does the tabulation for us:

Hurricane Sandy Damage Assessment - GIS Applications
The final results of our damage analysis confirms the penetration of Superstorm Sandy's storm surge varied as the distance from the coastline increased from 100 to 300 meters. Structures facing the ocean were generally pulverized, while buildings located around 300 meters inland fared much better, some with seemingly no visible damage. This damage estimation appears to be consistent along other parts of the barrier island where the elevation and slope are similar. Exceptions were noted, such as further south of the study area in Seaside Heights, New Jersey, where the barrier provided by a boardwalk and two piers protected adjoining neighborhood areas from the storm surge.

Wednesday, July 24, 2024

Coastal Flooding Analysis - Storm Surge

Module 4 for GIS Applications performs analyses on coastal flooding and storm surge. Storm surge is generally associated with landfalling tropical storms and hurricanes, but it can also be attributed to extratropical storms, such a Nor'easters along the Eastern Seaboard, or powerful winter storms with low barometric pressure and tight wind gradients. Coastal flooding events can also be due to spring tide events based upon the moon's cycle.

Storm surge from Hurricane Idalia inundated Bayshore Boulevard in Tampa, FL
Storm surge inundating Bayshore Boulevard in Tampa during Hurricane Idalia on August 30, 2023.

The first lab assignment revisits Superstorm Sandy, which made landfall as a hurricane transitioning into a powerful extratropical storm along the New Jersey coastline on October 29, 2012. The second and third part of the lab assignment uses Digital Elevation Models (DEMs) to develop scenarios for a generalized storm surge.

The lab analysis on Hurricane Sandy works with LiDAR data covering a barrier island along the Atlantic Ocean between Mantoloking and Point Pleasant Beach, New Jersey. LAS files were downloaded showing the conditions before the storm's impact and afterward.

Initial work in the lab for Module 4 created DEMs by converting the two LAS files to TIN files using geoprocessing in ArcGIS Pro. The TINs were then converted to a raster with a separate geoprocessing tool running upwards of ten minutes.

Comparing the two raster datasets, some pronounced impacts from the hurricane turned extratropical storm were visible. Several datapoints representing structures along the beach were noticeably missing. Additionally a wide breech was cut across the island, with several smaller breeches visible further north. It also appearing that severe scouring of the sand along the coast occurred with a wide area of lower data returns on the post Sandy dataset.

Indicative of the large file size of LiDAR data, when substracting the raster cell values of the post Sandy dataset from the pre Sandy dataset, geoprocessing took 12 minutes and 59 seconds. The result is a raster with values ranging from 33.69 to -35.87. Values toward the high range reflect earlier LiDAR returns, representing the build-up of material, such as sand or debris. Lower values in the change raster indicate later returns, or returns of bare-Earth. This correlates to areas where significant erosion may have occurred or the destruction of a structure.

The change in the the LiDAR pointclouds reveal parcels where homes were destroyed or where the barrier island was breeched by storm surge. The change raster quantifies the amount of change.


LiDAR before Superstorm Sandy

LiDAR showing a major breech caused by Superstorm Sandy

The difference between the two LiDAR pointclouds showing the breech and associated destruction of structures

Recent aerial imagery of Mantoloking, NJ where the breech occurred

The overall impact of Hurricane Sandy on the boroughs of Mantoloking, Bay Head and Point Pleasant Beach in Ocean County, New Jersey:

The raster quantifying the rate of change between the LiDAR datasets before and after Sandy

Output raster using a Boolean

The second analysis for Module 4 utilizes a storm surge DEM for the state of New Jersey. Our task was to reclassify the raster where all cells with values of 2 meters or less constitute areas potentially submerged as a result of Hurricane Sandy. Those cells with values above 2 meters were classified as "no data."

I began the process by adding a new field to the DEM for flooded areas due to storm surge. Cells where the elevation value was equal to or less than 2 were assigning a flood value of 1 for the Boolean of true. All other cells with an elevation value above 2 were assigned 0, for false.

With the added field, I used the Reclassify geoprocessing tool to output a raster of the DEM showing potentially flooded areas versus those at higher ground. The mask was set to the feature class of the New Jersey state outline to exclude areas of the DEM outside of the state that were not needed for our analysis.

Our analysis then focused on Cape May County in South Jersey, where we quantify the percentage of the county potentially inundated with a 2 meter storm surge. The storm surge raster was converted to a polygon and subsequently clipped to the the polygon of the Cape May County boundary.

Another issue encountered was that the storm surge data and county boundary were in different units of measurement. Ended up clipping the storm surge polygon from the county polygon, then comparing the output with the unclipped county boundary for the final percentage. This workaround succeeded as both used the same units.

Clipped feature class of the storm surge polygon over Cape May County, NJ
2-ft storm surge data clipped to Cape May County, NJ

The third analysis for Lab 4 focuses on a potential 1 meter storm surge in Collier County, Florida. Two DEM's are provided, one derived from LiDAR data and another from the regular elevation model from the USGS. Commenced working with this data by reclassifying each DEM to a new raster using a Boolean where any elevation 1 meter or less is considered flooded and anything above is not flooded.

Since we are only interested in storm surge related flooding, any areas shown inland that are entirely disconnected from the tidal basin are omitted from analysis. Accomplished this by using the Region Group geoprocessing tool, where all cells in a raster are reclassified by group and assigned a new ObjectID number.

The Region Group tool takes all of the cells within the hydrologic area of open waters extending into the Gulf of Mexico, and all associated bays and waterways seamlessly feeding into it, and assigns them to a single ObjectID. Similarly, the mainland of Florida is assigned an ObjectID as well. Islands, lakes, ponds, etc. that are independent of one another are also assigned unique ObjectID numbers.
Results of Region Group geoprocessing
Region Group assigns a unique ObjectID for each homogenous area of raster cells. The different colors in this sample from Naples shows separate groups for each land and hydrologic feature based upon the 1 meter elevation threshold
Using the Extract by Attribute geoprocessing tool, selecting the hydrologic area comprising the entire tidal basin is straightforward once the ObjectID number is determined. With that, a new raster comprising just water areas subjected to storm surge is output and subsequently converted to a polygon. The polygon feature class was juxtaposed with a feature class of building footprints for quantitative analysis.

There are a variety of methods in ArcGIS Pro that can be used to determine the number of impacted buildings of a 1 meter storm surge. One such process was to Select by Location based upon the Intersect relationship. This selects records where any part of a building footprint polygon falls within the storm surge raster polygon. Having preadded two fields to the buildings feature class based upon the Boolean of 1 = impacted and 0 = unaffected, with those records selected, used Calculate Field to assign each a value of 1. Repeated the process for both rasters and then proceeded with statistical calculations.

The final analysis quantified whether a building was located within the storm surge zone for the LiDAR based DEM, the USGS based DEM, or both. Errors of omission were calculated where a building was impacted by storm surge in the LiDAR DEM but not the USGS DEM, with that total divided by the overall total number of buildings affected in the LiDAR DEM. Errors of commission were calculated using the opposite and taking that result and dividing it again by the overall total number of buildings affected in the LiDAR DEM. The result tabulates affected buildings by feature type:

Storm surge inundation of areas 1 meter or less in elevation based upon DEMs







Wednesday, July 17, 2024

Visualizing data in 3D with GIS

The course work for GIS Applications Module 3 looks further into 3D visualization, Line of Sight (LOS) analysis and Viewshed analysis with ArcGIS Pro. We are tasked with completing four exercises through ESRI training and read an article accessing the Lines of Torres Vedra, the defensive lines constructed to defend the Portuguese capital of Lisbon against Napoleon in 1810, with LOS analysis in GIS.

Presentation of GIS data in three dimensions has a variety of potential benefits for the end user. 3D visualization can help identify patterns that may not be apparent in 2D. 3D data can provide context or a different perspective for a presentation or display of various phenomena. Allowing users to see vertically stacked content, a 3D scene can also go beyond terrain modeling.

3D scene of San Diego, California
Downtown San Diego 3D scene with marker symbols, fill symbol layers and procedural symbology

Furthermore, 3D visualization can spark imagination and enhance understanding while also accenting environmental effects, such as shadowing and depth. Rendering quantitative data in terms of three-dimensional shapes, 3D visualization can also more strongly contrast data ranges.

3D scene of San Diego visualized with a Summer sun angle
San Diego 3D scene with a procedural rule package for buildings and illumination visual enhancement

The ESRI Get Started with Visibility Analysis learning plan covered three-dimensional basics in GIS. Any data that uses a z-value, or vertical coordinate system, can be considered 3D. Z-values do not have to represent elevation or height, and instead can be some numerical value of a particular attribute, such as population or precipitation amount.

Vertical coordinate systems comprise both a linear unit of measurement and a direction. The direction defines if values are positive up or positive down. Positive down represents the depths below a surface. A scene requires information from where to start measurements. This is provided by the ground or elevation surface.

A two-dimensional (2D) map is a top-down or orthographic view of data that shows a flat representation of features. A 3D scene is viewable from any angle. A midpoint between the two is a functional surface. While technically not 3D, a functional surface contains a single associated z-coordinate for every x,y coordinate pair. 3D data, on the other hand, may be associated with multiple z-values at a single point or vertex.

A raster dataset can be used to create a 3D rendering. One of the most common is a digital elevation model (DEM), where each cell of a raster stores an elevation value. A more detailed 3D rendering revealing sharp, defined edges is a triangular irregular network (TIN). Also a type of functional surface, a TIN consists of nonoverlapping triangles that border each other that vary in size and proportion.

Triangular Irregular Network (TIN)
Triangulated irregular network (TIN)
Input points for a TIN form the triangle vertices (nodes). These vertices are connected by lines that form the legs of a triangle (edges). Using mathematical interpolation, the elevation for any surface on a TIN can be obtained. Calculations also produce the slope and aspect for each triangle space, where aspect is the compass direction that the slope faces.

Data in a 3D scene must include a height source value / base height (elevation source), which is the height at which it is displayed. This can be stored in a feature's geometry as a z-value or as an attribute value in a table. The elevation type of a layer determines how to draw data in relation to the ground. The surface provides content for the elevation of the data and the basis for layer measurements.

30-meter resolution
3-foot resolution

With a 30-meter resolution, WorldElevation3D/Terrain3D is the default elevation surface service from ArcGIS Online. Some projects may require a elevation dataset with higher resolution to reveal more detail.

You must determine how to draw 2D features in relation to the ground surface when visualizing in a 3D scene. The elevation type for a feature built from the ground up is referenced as on the ground. Features drawn at a consistent or known height above the ground are referenced at an absolute height. Features with variable heights, including subsurface depths, are positioned relative to the ground.

Height variables can be manipulated using cartographic offset and vertical exaggeration among other techniques.

Vertical Exaggeration
Horizontal desert terrain enhanced visually with vertical exaggeration by amplifying relief

Cartographic Offset
Setting the cartographic offset by 50 feet in this scene reveals point locations previously obscured

3D Features

2D features can be displayed in 3D by using extrusion. Extrusion creates a 3D shape using a numerical attribute as a z-value. This could be points on map representing precipitation amounts where the height of the column is the amount. It can also be used to display a building footprint as a 3D object if a value for a height is included in the attribute data.

3D buildings at the UWF Campus from Intro to GIS
3D objects of buildings on the UWF campus from the Geoprocessing lab in Intro to GIS last Fall
Extrusion of parcels based upon property value
The z-value of these extruded polygons in Manhattan, Kansas is based upon the property value of the parcel

A common way to create 3D features is to use a mesh. A mesh is a collection of triangles combined to create models of the real world. Multiple z-coordinates can be associated with x,y coordinate in a mesh. This is unlike a TIN where the x,y pair only has one z-value. Integrated meshes represent discrete objects on the ground surface as a single 3D object. These do not have feature attributes.

Multipatch features is a type of 3D geometry representing the outer surfaces or shells of a feature that occupy a discrete area of volume in three-dimensional space. They consist of 3D rings and triangles which can be used to represent both simple and complex objects. Multipatches can store texture, image, color and transparency within a feature's geometry. Buildings rendered in Google Earth utilize multipatches.

As covered in the previous Module, point clouds from LiDAR data also produce 3D scenes. Stored in a LAS file, the mass of coordinates in a point cloud consist of high accurate x,y,z measurements of the earth's surface.

LiDAR Point Cloud data of an overpass at State College, PA
LiDAR Point Cloud data showing part of I-99 at State College, PA. Structures like overpasses, where the laser pulses cannot reach the ground level, reveal portions of the basemap.

3D Visualization

Within ArcGIS Pro there are three views that an analyst can start with. The map view can map 3D data but visualizes it in 2D, with both drawn the same way. Visualizing data in 3D requires converting to one of two 3D viewing modes. These display 3D content from a real-world perspective.

The Global scene is used for data covering a large extent where the curvature of the earth is important for analysis. The Local scene covers other cases where the earth's curvature is not necessary for analysis and where the content covers a smaller, fixed extent. As to which mode to use, questions to ask include what is the minimum surface resolution, what is the appropriate basemap, and will thematic styling detract or aid the GIS information to be presented, will visualization include subsurfaces?

A 3D scene should have multipatch features, thematically symbolized layers, an elevation surface, and either an imagery or topographic layer as a basemap. Structuring the content well in a 3D scene benefits the end user with reduced complexity and increased understanding. Furthermore defining an area of interest (AOI) for the scene both limits the amount of data to be published (and geoprocessing time) and focuses on a subset of the data. This also reduces unneeded data that could be distracting to the end user.

3D scene of Portland, Oregon
3D scene of Portland, Oregon where buildings and trees area multipatch features created from 2D data.

Geoprocessing for a 3D scene results in 3D lines or multipatch features using the 3D display properties of the input feature layer. The multipatch feature data is used to create a scene layer, which can then be saved as part of a scene layer package to be published to ArcGIS Online. Depending upon the permission settings for who can access it, the scene layer package can be then be accessed or used in Scene Viewer and ArcGIS Pro.

View my 3D Buildings scene of Portland, Oregon from this week's exercise at https://pns.maps.arcgis.com/home/webscene/viewer.html?webscene=7e10ab3514b04c8796cd1b0f8c07d8f7

Line of Sight Analysis

With the definitions of 3D data, we can proceed with Line of Sight (LOS) analysis. A line of sight (LOS) calculates intervisibility along a straight line between an observer and target. This calculation factors in obstructions, which can include any combination of rasters (surfaces), TINs, extruded polygons or lines, and multipatch features. If the target is a line or polygon feature as opposed to a point, then a sampling distance can be input to determine the density of sight lines along the target.

The array produced by LOS analysis rom an observer point at a fictional parade route
The array of sight lines generated from an observer to a target line feature during the LOS analysis.

The Line of Sight geoprocessing tool in ArcGIS Pro determines the visibility along sight lines. The result includes the TarlsVis attribute field, which consists of a Boolean where 1 is visible and zero is not visible. With these values, sight lines with obstructions can be removed from analysis.

LOS lines with those with TarlsVis values of zero (not visible) removed
LOS lines where features with the TarlsVis value of zero removed.

Viewshed analysis models the visibility from a vantage point in up to a 360 degree view. The geoprocessing tool models the visibility of the horizon, shadows and line of sight. Outputs are a raster showing visible areas from the specified vantage point.

The Viewshed tool considers the height of the vantage point and obstructions surrounding it. Parameters that can be factored into the Viewshed tool control elevation values, vertical offsets, horizontal and vertical scanning angles and distance bounds. 
Optional Viewshed Settings
The various Fields in the Viewshed geoprocessing tool controlling vertical offsets, angles and radius

Additionally while the tool is set to reflect visible light, the model can work with other wave-based transmissions such as radar by adjusting the refractivity coefficient.



Monday, July 8, 2024

GIS applications using LiDAR

Module 2 for GIS Applications returns us to take a more in depth look at the use of LiDAR, a topic briefly covered in previous courses. LiDAR (LIght Detection And Ranging) uses lasers, usually in the visible or near-infrared portion of the spectrum, calculates heights by measuring distances between a scanner and target area. The high energy pulses record the reflected response of different objects in a very narrow wave length.  This produces a point cloud, where the masspoints associated with each return are distributed throughout the target area at various densities. The densities vary depending upon the materials encountered by the laser pulses.

Interpolation processes of individual masspoints creates a digital surface model (DSM), which shows the elevation characteristics of natural phenomenon and man-made structures. Another procedure with the removal of LiDAR masspoints in the first, intermediate and/or the last returns produces a bare-Earth digital terrain model (DTM). LiDAR is also effective at measuring water depth relative to the water surface.

Digital Elevation Models (DEM) provide elevation data in a raster format. DEMs with 3, 10 and 30 meter resolution are available for most of the United States. The USGS catalogs DEMs and makes this data available through the Earth Explorer and National Map portals. Furthermore, states also provide DEM datasets through Clearinghouse Websites. 

LiDAR data is often delivered in a standard LAS format, which defines the file structure, content, storage order, naming, codes, etc. The standard LiDAR exchange file uses the .las file extension. The Lab for this week's module utilizes a LiDAR dataset provided by the Virginia Geographic Information Network (VGIN) covering a section of the Shenandoah Mountains at Big Meadows.

Looking north at the Shenandoah Mountains from Swift Run Overlook
North view toward along the Shenandoah Mountains from Skyline Drive. Photo by Andy Field.

Our textbook GIS Fundamentals references discrete-return LiDAR as the most common collection system. This system records records specific values for each laser pulse downward. It produces a point cloud consisting of X, Y, and Z coordinates along with the intensity, scan angle, return order and other information.

Points in the LiDAR point cloud are assigned to feature types such as structures or vegetation. Standard codes identify ground, buildings and water. These are derived by return strength, point order, local slope, etc.

LiDAR data is widely used to estimate vegetation characteristics such as tree height, forest density, growth rates and forest type. The initial part of our lab focuses on calculating forest height starting with the conversion of the LAS file into both a DEM and a Digital Surface Model (DSM).

We first use the Point File Information geoprocessing tool in ArcGIS Pro on the LAS file to summarize the file with values for the minimum bounding rectangle, number of points, the average point spacing, and the min/max z-values. DEM and DSMs are created next using the LAS Dataset to Raster geoprocessing tools. With these two rasters, a calculation using the Minus tool outputs a raster populated with estimated heights.

The LAS file for Big Meadows at Shenandoah National Park and derived DEM
The LAS file for Big Meadows at Shenandoah National Park and the compiled DEM

The subsequent step in Lab calculates the biomass density of the forested area in question. MultiPoint files for the ground and vegetation are created using the LAS to Multipoint geoprocessing tool on the point file previously created. These are then processed using the Point to Raster tool to output respective rasters.

Continuing with geoprocessing, both ground and vegetation multipoint rasters are further processed using the IS NULL tool to produce a boundary file where similar to Boolean, 1 is assigned to all values that are not null . The Con tool then juxtaposes the IsNull rasters with the Multipoint rasters for both sets so that if a value of zero is encountered, it is accepted as a true value and values of 1 are in turn pulled from the original multipoint rasters. This produces rasters of the cell counts.

Tree Height Estimation and Raster Cell Count for data derived from LiDAR
The output tree height estimation and the statistics of raster cells (points) versus height

Working forward to calculate the density of the returns, the Plus tool combines the counts for both the vegetation and ground. This results in a raster where all cell counts are assigned an integer value from zero to 23. After converting the raster values from integer to float (decimals), the tree canopy density can finally be calculated. This is completed by using the Divide geoprocessing tool on the raster of the cell counts for vegetation and the combined vegetation/ground counts raster with float values.

Canopy Density derived from LiDAR of Shenandoah National Forest at Big Meadows, VA
Forest Canopy Density of the Big Meadows area of Shenandoah National Park derived from LiDAR




Wednesday, July 3, 2024

Crime Analysis in GIS

Our first topic in GIS Applications is crime analysis and the use of crime mapping for determining crime hotspots. Crime mapping techniques provide insight into the spatial and temporal distributions of crime. This benefits criminologists in the research community and professionals in law enforcement.

Crime mapping factors in the importance of local geography as a reason for crime and considers that it may be as important as criminal motivation. The importance of identifying patterns and hotspots in crime mapping tends to be a precursor for implementing effective crime prevention methods.

Fundamental to crime mapping is spatial autocorrelation, which acknowledges the spatial dependency of values measured within areas. This recognizes that crime in one area can influence the crime rate of a nearby area.

We are tasked this week with quantifying data and generating hotspot maps showing crime density using various methods on the clustering of events. The Lab for Module 1 works with crime data for Washington, DC and Chicago.

Kernel Density Map showing crime hotspots for assaults with dangerous weapons in 2018
Output in this week's lab, a kernel density map showing 2018 crime hotspots for Washington, DC

A relative measure, a crime hotspot represents an area with a greater than average frequency of criminal or disorderly events. An area where people have an above average risk of victimization can also be classified as a crime hotspot. Victimization however cannot always be shown on maps, as the theory refers to multiple incidents on the same individual, regardless of location. This can also represent a street (line) or a neighborhood (polygon) where repeat occurrences take place.

Determining crime hotspots can aid in detecting spatial and temporal patterns and trends of crime. The concept can benefit law enforcement in better allocating resources to target areas. Crime hotspots can also be used to identify underlying causes for crime events.

The concept of local clustering, concentrations of high data values, is the most useful for crime analysis. Methods determine where clusters are located and produce a hotspot map showing concentrations.

Point data can be used directly in this analysis of clusters. A collection of points can produce a hotspot whose bounds are derived from the local density of points. Using point data also has the advantage of not being constrained by a predetermined jurisdictional boundary. Data aggregated into meaningful areas, such as within a jurisdiction where the polygons consists of the highest values, can also result in hot spots. Aggregation can produce crime rates, such as the number of events per number of residents or per households for an area.

Aggregated data showing the number of crimes per 1,000 households
Choropleth map with aggregated data determining the crime rate for Washington, DC. 

The Lab for Module 1 focuses on three methods for local clustering. Grid-Based Thematic Mapping overlays a regular grid of polygons above point data of crime events. This produces a count of events for each grid cell. Since all cells are uniform in dimensions, the count is equivalent to a density.

The choropleth map output showing crime density can be further analyzed to determine the crime hotspots. Extracting the crime hotspots involves selecting the highest class of the data. Quintile classification is commonly used to determine this.

The data provided in Lab included point class data of homicides reported in the city of Chicago for 2017. Additionally we were supplied with polygon class data of 1/2 mile grid cells clipped to the Chicago city limits.

The grid cells and point data for Chicago were spatially joined and grid cells where the homicide value was zero were removed from analysis. Using quintile classification, the top 20% of grid cells based on the homicide values was extracted to generate a hotspot map:

Grid-Based Thematic Map of Chicago where three or more homicides were recorded in 2017

Using point values, Kernel Density can also be used to calculate a local density without the use of aggregation. The estimation method utilizes a user-defined grid over the point distribution. A search radius, known as the bandwidth, is applied to each grid cell. Using these two parameters, the method calculates weights for each point within the kernel search radius.

Points closer to the grid cell center are weighted more and therefore contribute more to the total density value of the cell. The final grid cell values are derived by summing the values of all circle surfaces for each location. For the Lab, we used the Spatial Analyst Kernel Density tool in ArcGIS Pro. Input were the grid cell size and bandwidth to run on the 2017 homicides feature class for Chicago. The output was a raster file with ten classes.

Since we were only interested in areas with the highest homicide rate, we reclassified the raster data into two classes. The upper class ranged from a value three times the mean to the maximum value of the raster data. This represented the crime hotspot as estimated with kernel density:

Continuous surface map showing the crime hotspots for Chicago based upon 2017 homicide point data

Local Moran's I is the final method implemented on the 2017 Chicago GIS data for Module 1. A global measure of spatial autocorrelation, the Moran's I method addresses the question, are nearby features similar? Features that are closer to each other are more similar to one another than those located farther apart. Moran's I produces a single statistic that reveals if a spatial pattern is clustered by comparing the value at any one location with the value at all other locations.

The result of Moran's I varies between -1.0 and +1.0. Positive values correlate to positive spatial autocorrelation (clustering) and negative values with negative autocorrelation. Where points that are closer together have similar values, the Moran's I result is high. If the point pattern is random, the value will be close to zero.

For the Lab, the homicides feature class and census tract data were spatially joined. A field calculating the number of homicides per 1,000 units was added. This feature in turn was input into the Cluster and Outlier Analysis (Anselin Local Moran's I) Spatial Statistics tool to output a new feature class based upon Local Moran's I. The result includes attribute data revealing two types of clusters: High-High (HH) representing clusters of high values and Low-low (LL) representing clusters of low values.

High-high clusters in the context of the Chicago crime data represent areas with high homicide values in close proximity to other areas with high homicide values. These are the crime hotspots:

Crime hotspots derived from 2017 homicide data for Chicago using the Local Moran's I method
Sources:

Ratcliffe, J. (2010). Crime Mapping: Spatial and Temporal Changes. Handbook of Quantitative Criminology,. (pp. 5-8).  Springer New York, NY.

Eck, J.E., Chainey, S., Cameron, J.G, Leitner, M. & Wilson, R.E. (2005) Mapping Crime: Understanding Hot Spots. National Institute of Justice (NIJ) Special Report.