Wednesday, November 20, 2024

GIS Day 2024 Event at FDOT District 7

Following months of planning, of which I contributed starting on the second day of my internship, GIS Day is finally here! Beyond brainstorming ideas in which to better spread the word at FDOT District 7 of the event, I was tasked with creating one or two GIS Day maps for display on the wall of the auditorium.

As the semester progressed, I took inspiration from Special Topics assignments and learned skills from Computer Cartography and GIS Applications for several mapping concepts to share on GIS Day. My idea was to show a few examples of the capabilities of GIS, both from an analytical standpoint, and also in the different ways data can be visualized.

After reading several classmates discussion board posts on presentations they made for GIS Day, I decided to follow their lead and create a presentation of my own. My goal was to provide an overview of maps in GIS, then cover each of the five maps I created with a mix of technical information such as the geoprocessing that went into it or the type of map (choropleth, graduated symbol), principles of design, and inspiration for the map subjects.

Our efforts paid off, and the D7 GIS Department's three hour event this morning was a great success! We had around 30 attendees, many of which stayed for all presentations, and received several positive comments on the event. My presentation went over well and I thoroughly enjoyed sharing some of the GIS knowledge gained from my time with the University of West Florida.

The start of 2024 GIS Day at FDOT District 7
2024 GIS Day at FDOT District 7

My GIS Day 2024 presentation and the maps I created for the event follow:

D7 GIS Day Map Overview

There are two general categories of maps, Reference maps and Thematic maps. We are all familiar with Reference Maps, such as a road map or a political map. On display in the auditorium here are examples of Thematic Maps, which are maps that focus on a specific theme, such as climate, population, or in our case, transportation. This leads me into our first GIS Day map…

Hurricane Tracks Map

Map quantifying the number of hurricanes striking Florida from 1851 to 2024
Florida Hurricanes quantifying direct impacts from 1851 to 2024

When we were planning our GIS Day event, one of the map concepts discussed was a Florida map of hurricane tracks impacting the state over the last 20 years. Sounds simple enough, but as the map was in production, Hurricane Milton formed, and one fact mentioned by media outlets was that Tampa had not been hit directly by a major hurricane since 1921.

This ultimately factored into me deciding to expand upon the hurricane tracks map concept to quantify the number of hurricanes that have directly passed, the center that is, over each county in the state.

I opted to cover two sets of temporal data. A choropleth map shows the number of hurricanes per county in the last 50 years. It uses dark colors for higher values, conveying that higher values have a heavier visual weight. The graduated symbols map, which quantifies the number of hurricanes per county since 1851, the first Florida hurricane in the dataset, correlate size of the symbol with quantity, i.e. larger means more.

As for how the map was created, the geoprocessing for the choropleth and graduated symbols maps were based upon the number of hurricane polylines crossing any part of the county polygons. These calculations are automatic in GIS and no manual comparisons are needed.

D7 Interstates History Map

Map showing the opening dates of every mile of the Interstate system within FDOT D7
FDOT District 7 Interstate opening dates color coded by decade

This thematic map aggregates sections of the District 7 Interstate system by the decade in which they opened to traffic. This also shows how the use of graphics can enhance the presentation of a map.

I also factored into the design the Gestalt Principles of Perceptual Organization, which in cartography includes Visual Hierarchy, where important features are emphasized, and less relevant ones deemphasized. The Figure-Ground relationship accentuates certain objects over others by making these appear closer to the map user. Visual Balance is where the size, weight and orientation of map elements are adjusted to achieve balance in the center of the map. Contrast and Color are other principles used in good map design.

D7 Lighting Raster Map

Raster showing the number of light poles per square mile in FDOT District 7
Raster quantifying light poles in FDOT District 7

I created this map to show how raster data can be used by GIS. The concept took the point feature class for all light poles within District 7 and overlayed them with a fishnet grid in ArcGIS Pro. This is also referred to as grid-based thematic mapping. I aggregated the light poles by 1 square mile grid cells and obtained a density unit via geoprocessing. I then symbolized the raster set where lighter colors convey more light fixtures. The end result is a map clearly showing where we maintain the most lighting.

D7 Storm Surge Map

Storm Surge map for FDOT District 7
Areas in FDOT District 7 inundated for storm surge by Saffir-Simpson category

Storm surge data is another form of raster data. These are generally calculated by the use of a Digital Elevation Model or DEM. One useful aspect of ArcGIS Pro is the ability to use geoprocessing to convert a raster into a polygon feature class, such as was done here with this NOAA storm surge dataset.

This expands the options for the GIS analyst. Among others, geoprocessing options include least cost path analysis, buffer analysis, and data interpolation, where unknown values between known data points such as rainfall rates, can be estimated.

3D Traffic Count Map

3-Dimension map of traffic volume (AADT) for FDOT District 7
3-Dimensional representation of traffic counts (AADT) on the FDOT D7 state road system

When you think of 3D mapping, you probably think of modeling buildings or terrain, but there are several other uses. One such concept of 3D mapping is to visualize 2D data in a different, and perhaps more thought-provoking way.

That was the idea behind this 3D traffic count map of District 7. ArcGIS uses the Extrusion method to add a 3D element to our 2D feature class. Extrusion bases the height of data on a Z-unit, where the unit can be based upon real-world units, such as the height of a building, or upon ranges of data, such as with the traffic counts here.

ArcGIS Pro renders data three dimensionally differently for points, polylines and polygons. Points will appear as columns. Polylines will appear as a wall, as it does here, and Polygons appear as solid objects, which is probably easiest to imagine when applied to a building footprint.

One thing revealed with this 3D traffic count map was that a stretch of traffic count data for Interstate 4 was missing. So, the 3D map produced an unintended benefit, revealing a section of missing data that we could correct.

So, as you can see, GIS allows you to show geospatial data in a more meaningful way. And these maps are only the tip of the iceberg when it comes to the types of deliverables that can be produced.

Wednesday, October 9, 2024

GIS Internship - Training and Hurricanes

The internship with the Florida Department of Transportation (FDOT) at District 7 Headquarters has been quite eventful. I got started a bit later in the semester through the Volunteer Program, so I am approximately one third through the hours for GIS. Beyond mandatory training for state employees, I got started on putting together a training manual for ArcGIS Pro desktop to be used by the GIS Department in future classes for general employees. I also have partaken in weekly GIS Check-in and GIS Day Progress meetings.

Things changed two weeks ago with the expected development of what became Hurricane Helene in the Northwestern Caribbean Sea. From the beginning the projected path focused on the west coast of Florida and Big Bend region. Either way, the counties within District 7 would be affected, so the focus of FDOT shifted from routine day-to-day operations to storm preparation and emergency management applications.

Additionally the office closed for a couple of days and work-from-home was implemented for most employees. Being an intern, that left me waiting until the following week to resume work. But since I am considered an employee, I could still partake in Microsoft Teams communications from my home PC. So I was able to assist in putting together storm surge inundation graphics for upper management using raster data provided in the NHC Data in GIS Formats downloads page.
NHC Storm Surge Inundation Raster Data for Tampa Bay
Potential Storm Surge Inundation for Hurricane Helene with Florida State Roads

Just over a week removed from Hurricane Helene, Tropical Storm Milton quickly formed within the Bay of Campeche over the southwestern Gulf of Mexico. The initial forecast track immediately targeted the Tampa Bay region. So again the focus at FDOT D7 shifted to emergency management operations and storm preparedness. The biggest difference this time was that FDOT played a role in logistics with relocating the massive amount of debris along the Pinellas County coastline that was the result of Hurricane Helene's storm surge.

The office closed again on Tuesday October 8 and remained close through Thursday. This meant my internship was again on hold, but similar to Helene, I could still contribute GIS related graphics from home. Storm surge being the biggest concern again, I put together another storm surge inundation map with GIS data downloaded from the NHC.
NHC raster data for Hurricane Milton storm surge inundation 10/09/24
Hurricane Milton NHC Storm Surge Inundation for Tampa Bay with Florida State Roads

With the knowledge gained from class, I can quantify this data to show the total mileage of Florida State Roads potentially inundated by storm surge. Then by Calculating Statistics, aggregate the percent of state roads effected by county. One thing I have learned thus far with my internship at FDOT is that the turnaround time for producing a deliverable is often very short. Any premade styling or formatting is absolutely necessary and fine tuning an output map is more of a luxury than a necessity.

10/15/24 Update, after 120 hours of no electricity, water or internet due to Hurricane Milton, I finally have the opportunity to created a LinkedIn profile! While I have known about this platform for many years, my internship and recent classes gave me the impetus for making an account.

My approach to setting up the initial profile was to cover as many aspects of my experiences with Geographic Information Systems (GIS) and geography in general. Barring anymore hurricanes and as I get more time, especially as my internship with FDOT progresses, I will further expound upon my GIS work and classes with my profile.

Monday, October 7, 2024

Scale and Resolution Effects on Spatial Data

What a last two weeks it has been this semester. Hurricane Helene threatened the area during the final week of September, shifting everyone's focus to preparation and expected impacts. The storm center passed approximately 90 miles to our west. While coastal impacts were severe, we were spared the brunt inland, even keeping electricity throughout the storm.

Followed that with a preplanned trip for AARoads to Puerto Rico. Then got started on the final module for GIS Special Topics and increased my time investment into the module leading into this past weekend as newly named tropical storm Milton formed in the Bay of Campeche. A Category 5 hurricane as of this writing, Hurricane Milton is expected to make landfall somewhere on the west coast of Florida on Wednesday or Thursday. While wind shear is eventually expected to weaken the storm, unlike Helene, Debby, Idalia and other storms, Milton is forecast to be a major wind event for inland locations. So anxiety levels are high!

The sixth module for GIS Special Topics investigates the effects of scale on vector spatial data and resolution on raster spatial data. The lab also covers spatial data aggregation and the concept of gerrymandering using GIS.

There are multiple meanings of scale to consider for Geographic Information Systems (Zanbergen, 2004).
  • as an indication of the relationship between units on a map and units in the real world. This is typically a representative fraction, which is commonly used with USGS Quads and GIS Maps in general.
  • to indicate the extent of the area of interest. Examples include spatial areas such as neighborhoods, cities, counties and regions.
  • to express the amount of detail or resolution. The resolution of a raster spatial dataset is the cell size, such as 10 meters for the Sentinel 2 blue, green and red spectral bands. This defines the scale of the data.
Scale in the Raster Data Model is straight forward represented by the resolution or cell size. A general rule is that a real world object needs to be at least as large as a cell in order to be recognizable.

Scale in the Vector Data Model also represents the amount of detail. While there is no single best method to express scale in vector data, a good indicator is the size of the smallest polygon or length of the shortest segment of a polyline.

When measuring the length of a complex shape, the total length depends on the smallest unit of the measuring tool. Where the units of a measuring tool decrease, the total length of the shape increases. More nodes and connecting segments result in longer shape lengths or area perimeters. The following images illustrate the differences in scale for the Vector Data Model.
Differing scales of Wake County, NC water flowlines
Water flowline vector data for Wake County, NC in different scales
Polygon vector data for Wake County, NC waterbodies at different scales
Waterbodies vector data for Wake County, NC in different scales

The properties of a Digital Elevation Model (DEM) depends upon what resolution is used. Higher resolution provides more detail. When measuring Slope, values decrease as the cell size increases and detail decreases. Higher detail results in steeper slopes. This effect applies to the full range of slopes regardless of steep areas of terrain (Zanbergen, 2004).
Scatterplot showing the relationship of Resolution vs. Slope in a DEM
Quantification of Resolution vs. Slope for a DEM in lab

The Modifiable Areal Unit Problem (MAUP) factors into deciding what scale to use for analysis of spatial data. MAUP is a complication with statistical analysis when quantifying aerial data. There are two facets of MAUP.

Scale Effect
The optimal spatial scale for analysis is generally not known, as there are multiple scales for analysis to be theoretically considered (Manley 2013). The results of data can be manipulated positively or negatively depending upon upon the size of the aggregation units used.

Zoning Effect
The method used to create areal units. This effect is the result of how spatial data is separated, such as the grouping of smaller areal units into less numbers of larger areal units (Dark & Bram 2007). Changing the grouping can manipulate the results of spatial analysis.

Part 2 of the lab conducting Linear Regression analysis of poverty statistics for Florida in U.S. Census data resulted in an example of MAUP. Different levels of aggregation convey different results:

Linear Regression Results based upon Congressional District
Linear Regression Results based upon Congressional District

Linear Regression Results based upon Counties
Linear Regression Results based upon Counties

Linear Regression Results based upon Zip Codes
Linear Regression Results based upon Zip Codes

Gerrymandering is the purposeful manipulation of a district shape with intentional bias (Morgan & Evans, 2018) or to affect political power (Levitt, 2010). Partisan gerrymandering takes place when the political party controlling the redistricting process draws district lines to benefit itself and restrict opportunities for opposition parties. While this maneuvering aims to increase inordinately the political power of a group (Levitt, 2010), the U.S. Supreme Court ruled that partisan-focused gerrymandering is not unconstitutional  (Morgan & Evans, 2018).

GIS can measure gerrymandering by the compactness in a number of ways. Compactness is the only common rule pertaining to redestricting that takes into account the geometric shape of the district. A district is considered compact if it has a regular shape where constituents generally live near each other. A circular district is very compact while a linear district is not (Levitt, 2010). 

Thanks to a discussion board post from our classmate Emily Jane, a method for determining compactness that I found easy to interpret is the Reock Score. Using this method, geoprocessing determines the minimum bounding circle around each polygon of a Congressional District. That is the smallest circle that entirely encloses the district. Reock scoring uses the ratio of the district area to the minimum bounding circle with the following equation R=AD/AMBC where AD is the area of the district and AMBC is the area of the minimum bounding circle. The score ranges from 0, which is not compacted, to 1, which is optimally compact.

Example of the Minimum Bounding Circle used with the Reock Score method
An example of the Minimum Bounding Circle around a District polygon for the Reock Score method 

Proceeded with the Reock Score analysis using the Minimum Bounding Geometry tool in ArcGIS Pro. This creates circular polygons for each record in the Congressional District dataset provided. With the minimum bounding circle area variable and the area value of the district, calculated the Reock score for every district. With a field added for the Reock Score, the worst "offenders" of gerrymandering based upon failing to have district 'compactness' from the provided dataset were determined.

Florida District 5 - 2nd worst gerrymandering 'offender'
Florida District 5 - 2nd worst gerrymandering 'offender'

North Carolina District 2 - the worst gerrymandering 'offender'
North Carolina District 2 - the worst gerrymandering 'offender'

References

Zanbergen (2004). DEM Resolution. Vancouver Island University, Nanaimo, BC, Canada.

Manley, D. J. (2013). Scale, Aggregation, and the Modifiable Areal Unit Problem. In Handbook of Regional Science. Springer Verlag.

Dark, S. J., & Bram, D. (2007). The modifiable areal unit problem (MAUP) in physical geography. Progress in physical geography, 31(5), 471-479.

Morgan, J. D., & Evans, J. (2018). Aggregation of spatial entities and legislative redistricting. The geographic information science & technology body of knowledge, 2018(Q3).

Levitt, J. (2010). A Citizen's Guide to Redistricting. New York, NY: Brennan Center for Justice at New York University School of Law.



Sunday, September 22, 2024

Interpolation Methods - Tampa Bay Water Quality

There are numerous spatial interpolation methods used to generate surfaces in GIS. This is the prediction of variables at unmeasured locations based upon sampling of similar variables at known locations or true points. Related, spatial prediction is the estimation of variables at unsampled locations based partly on other variables and a collective set of measurements. Comprised of spatially continuous data, surfaces could be topographic, a measure of air pollution, soil moisture, air temperatures and population density among others (Bolstad & Manson, 2022).

A number of factors can affect the performance of spatial interpolation methods. Some of these factors are data accuracy, temporality of the data, sampling design, sample spatial distribution, the presence of abnormal values or outliers, and the correlation of primary and secondary variables (Hu, 1995, Li & Heap, 2014).

Deciding upon the best interpolation method is not always a straight forward process. Methods often work well for a specific data set because of inherent assumptions and algorithm design for estimation. Different interpolations methods applied to the same data set may produce desired results for one study objective but not another (Hu, 1995).

Module 5 for GIS Special Topics performs interpolation analyses for Tampa Bay water quality data. Specifically four methods are used for the estimation of Biochemical Oxygen Demand (BOD) in milligrams per liter variables for Tampa Bay. A point feature class of BOD sample locations is provided and the study area is all of Tampa Bay, Old Tampa Bay and Hillsborough Bay. A statistical analysis of each is compared in an effort to determine which derived surface best describes water quality.

The first interpolation method implemented for the Tampa Bay water quality analysis is Thiessen Polygon. This method was the easiest to interpret. It aggregates the point dataset within the study area to polygons with one per point, which is referred to as a centroid. All estimated points within the Thiessen polygon (proximal zone) are closer in value to the associated centroid than any other centroid in the overall analysis.

The Thiessen Polygon method is optimal when there is no uniform distribution of the sample points. The method is applicable to environmental management (Wrublack et. al, 2013).

Thiessen Polygon interpolation of Tampa Bay water quality
The Thiessen Polygon raster with an output cell size of 250.

Previously discussed in the Isarithmic Mapping lab in Computer Cartography, the Inverse Distance Weighting (IDW) spatial interpolation method estimates values using the values of sample points and the distance to nearby known points (Bolstad & Manson, 2022). Values closer to a location have more weight on the predicted value than those further away. The power parameter in the mathematical equation of the method determines the weighting, which decreases as the distance increases. When the power parameter increases, a heavier weight is applied to nearby samples, which increases their influence on estimation (Ikechukwu, 2017).

The IDW method assumes that the underlying surface is smooth. It works well with regularly spaced data, but cannot account for the spatial clustering of sample points (Li & Heap, 2014).

Tampa Bay water quality estimates from the IDW method
The IDW raster for water quality. The power parameter was 2 and output cell size of 250.

Spline interpolation uses a mathematical function to interpolate a smooth curve along a set of sample data points with minimal curvature. Polynomial functions calculate the segments between join points. These accommodate local adjustments and define the amount of smoothing. The method is named after splines, the flexible ruler cartographers used to fit smooth curves through fixed points (Ikechukwu, 2017).

The performance of Splines improves when dense, regularly-spaced data is used (Li & Heap, 2014). The method is very suitable for estimating densely sampled heights and climatic variables (Ikechukwu, 2017).

The lab uses the options of Regularized and Tension for the Spline geoprocessing tool in ArcGIS Pro. This changes the weight parameter, where higher values in Regularized splines result in smoother surfaces. A weight of zero for the Tension spline option results in a basic thin plate spline interpolation. This is also referenced as the basic minimum curvature technique.

Tampa Bay water quality - Regularized Spline interpolation
Estimated Tampa Bay water quality - Regularized Spline Interpolation Method

Tampa Bay water quality - Tension Spline Interpolation Method
Estimated Tampa Bay water quality - Tension Spline Interpolation Method

References:

Bolstad, B., & Manson, S. (2022). GIS Fundamentals – 7th Edition. Eider Press.

Hu, J. (1995, May). Methods of generating surfaces in environmental GIS applications. In 1995 ESRI user conference proceedings.

Li, J., & Heap, A. D. (2014). Spatial interpolation methods applied in the environmental sciences: A review. Environmental Modelling & Software, 53, 173-189.

Wrublack, S. C., Mercante, E., & Vilas Boas, M. A. (2013). Water quality parameters associated with soil use and occupation features by Thiessen polygons. Journal of Food, Agriculture & Environment, 11(2), 846-853.

Ikechukwu, M. , Ebinne, E. , Idorenyin, U. and Raphael, N. (2017) Accuracy Assessment and Comparative Analysis of IDW, Spline and Kriging in Spatial Interpolation of Landform (Topography): An Experimental Study. Journal of Geographic Information System, 9, 354-371. doi: 10.4236/jgis.2017.93022.

Sunday, September 15, 2024

Searching for the right GIS job

Finally got started with my GIS Internship with the Florida Department of Transportation at District 7 (D7) Headquarters last week. The position affords me the opportunity to work on several GIS related tasks and with multiple departments. I am working with a great team and providing assistance to others with ArcGIS Pro.

Settling into my internship position at D7 went very smoothly. My initial task is working on a basic training manual for ArcGIS Pro to be used in future courses that the GIS department will offer employees. Additionally I was invited to join planning meetings for this year's GIS Day, which will include demonstrations and information on how various departments across D7 use GIS. I am excited to contribute ideas and provide input, and this will also aid in my eventual GIS Day assignment for GIS4944!
GIS Day - November 20, 2024

One of the assignments for this week in GIS4944 is to conduct a job search for what we could consider to be our Dream GIS Job. Working on road map production for a major mapping company in GIS would be it, but the paper map industry is minimal and becoming more niche. So my second GIS job choice is working in transportation. My positive experiences after two days at FDOT have already reinforced this! 

The job that is most appealing in my search is for a GIS Analyst I for the Texas Department of Transportation (TxDOT). Generally all of the essential duties listed in the job posting fall somewhere within my knowledge wheelhouse. Collecting, preparing and digitizing GIS data is the first listed. Create, maintain, update GIS databases and cartographic products is another duty. Extraction of features from georeferenced scanner paper maps is a third duty that I have experience with. Even the bullet point referencing converting CAD and other formats into ArcGIS formats is a task I likely could master, given previous work with CAD at Mapsource and Adobe Illustrator for AARoads.

The position requires no prior experience, but a Bachelor's Degree in Geography, GIS or a related field is. However, the posting reveals that relevant work experience may be substituted for a degree on a year per year basis. I am confidence I can meet this requirement through my previous work with Mapsource, Universal Map Group, and GIS Cartography & Publishing Services, in addition to our coursework in the UWF GIS Certificate program.

The results of the GIS job search gave me a framework for what to look for in future job searches. The TxDOT position is about as optimum as I could get for both my skillset and interests. A job description for a GIS analyst position with FDOT would likely be similar. However, with ongoing budgetary issues, no positions at FDOT will be posted in the near future. There's always the private sector to consider as well.

Friday, September 13, 2024

3D Mapping - TINs and DEMs

Moving on from Spatial Data Quality in GIS Special Topics, the next Lab focuses on surfaces with a comparison of the Digital Elevation Model (DEM) and Triangular Irregular Network (TIN). A surface in GIS is a geographic phenomena represented as continuous data. Continuous spatial data references geographic objects characterized by very gradual boundaries such as temperature or elevation.

The most common way to represent elevation data is with contour lines. Contour lines are 2-dimensional features with attributes containing the value of the surface at a given location. They can be derived by the TIN vector model or the DEM raster model.

TINs are used exclusively to represent a 3-dimensional surface. A series of linked irregular triangles comprised from elevation points (nodes) in 3D (X,Y,Z) coordinates (Manandhar, 2005) occurring at any given location represent the 3D surface. The topological relationship of the network of triangles creates a continuous surface. The normal vector of each triangle is used to assign the properties of Slope and Aspect.
 
DEMs are the simplest way to represent a topographic surface. A DEM is a regular raster that uses a regular rectangular grid method (Manandhar, 2005) with cell values representing elevation or spot height. The cell size of a DEM determines the resolution. Therefore a DEM with a high number of smaller sized cells provides more accuracy than a DEM with less larger sized cells. Data becomes more implicit with larger cell sizes.

One part of this week's lab utilizes a DEM to develop a 3-dimensional Ski Run Suitability Map. Initially the supplied DEM was converted to a TIN for the 3D component for the Local Scene. The suitability parameters included Elevation where areas exceeding 2,500 meters are most favorable, Slope where angles between 30 and 45 degrees rank highest, and Aspect where south and west facing slopes are most preferred.

Following reclassification, respective rasters were generated from the DEM using geoprocessing tools in ArcGIS Pro. These in turn were input into the Weighted Overlay tool where the suitability rate for aspect is 25%, elevation is 40% and slope is 35%.

The final 3D Ski Run Suitability Map for Lab 2.1 Part B
The output Ski Run Suitability Map. Lighting enhancements include shadowing and adjustment of the sun angle. The Vertical Exaggeration is 2.50.

The next part of the lab further explores TINs with adjustments to symbology between elevation, slope and aspect. The deliverable included the generation of contours and selecting appropriate colors.
TIN with Graduated Color for Slope and Contours
Cividis color TIN with 50 meter contours and 250 meter index contours.

The last section of the lab provides a point feature class that will represent the mass points for a TIN. Geoprocessing of these points were input along with a study area soft clip polygon boundary in the Create TIN tool. The resulting TIN was modified symbolically to show contours set at an interval of 100 meters.

The same mass points feature class was input into the Spline tool to create a DEM. Contours were subsequently generated from the DEM with additional geoprocessing. The two contour feature classes were then compared.
Comparison of TIN and DEM based Contours

While not necessarily more accurate, the DEM based contours have smoother curvature resulting from the implicit data values from each grid cell (Manandhar, 2005). Appearing more jagged in areas with less slope, the TIN based contours are derived from every node, where 3D coordinates are more explicit. There are less Faces (triangles) in flatter areas.

References:

Manandhar, N. (2005). Comparison of TIN and Grid Method of Contour Generation from Spot Height. Nepalese Journal on Geoinformatics, 4, 1-8.
https://www.nepjol.info/index.php/NJG/article/view/51271/38351

Friday, September 6, 2024

Spatial Data Quality - Road Network Completeness

Continuing the focus on Spatial Data Quality in GIS Special Topics, Module 1.3 covers the Accuracy Assessment of Roads. Road networks are widely used as the basemap for many applications. This factors into expectations for positional accuracy and completeness, which this week's lab covers.

Road networks are also used for geocoding and network routing. The usability of such is dependent upon robust attributes such as street names, address numbers, zip codes in addition to networking aspects such as turn restrictions and one-way directions. Topologically, road networks must also be robust, with exact connectivity found in reality (Zanbergen 2004).

Typically road network datasets are compiled from an array of historical sources, with digitization from aerial imagery and augmentation from GPS field data collection. One of the most comprehensive datasets in the U.S. with a long lineage is TIGER (Topologically Integrated Geographic Encoding and Referencing).

Produced by the US Census Bureau for 1:100,000 scale maps (Syoung & O'Hara, 2009), TIGER was originally compiled to be topologically correct. That is data was not focused on being as accurate as possible, but instead data stressed connections and boundaries. (Zanbergen 2004) This resulted in legacy errors, which were carried over in succeeding updates from 2000 onward.

TIGER roads centerline data for Jackson County, Oregon
TIGER roads centerline data for Jackson County, Oregon

Covered in the last week's lab, accuracy assessment of roads utilizes methods such as "ground-truthing" using GPS or surveying equipment, comparing roads with high resolution imagery, and comparing roads to existing datasets deemed to be of higher accuracy.

Positional accuracy last week looked at the comparison of points between two datasets using root-mean-square-error (RMSE) with reference or true points. Additional methods include using buffers. This is where the true line is buffered with some distance to show discrepancies. It is also used to determine where displacements between matching features fall within an expected nominal accuracy. (Syoung & O'Hara, 2009) In other words data located in areas outside a buffer (specified tolerance) are deemed to be substantial errors.

Another method for positional accuracy is line displacement. This is where the displacement of various sections of a polyline are measured using Euclidean distance. Using matching algorithms, errors show the displacement of one road network from another. These displacements can be summarized (Zanbergen 2004), or be represented as a raster dataset to analyze vector geometry (Syoung & O'Hara, 2009).

The lab assignment for Module 1.3 conducts accuracy assessment for completeness on two datasets of street centerlines for Jackson County, Oregon. The feature classes are TIGER road data from 2000 and a Streets_Centerlines feature class compiled by Jackson County GIS.

Street Centerlines Data from Jackson County, Oregon GIS
Street Centerlines data from Jackson County, Oregon GIS

Completeness is one of the aspects cited by Haklay (2010) in accessing data quality. Completeness is the measure of the lack of data, i.e. how much data is expected versus how much data is present. Zanbergen (2004) references measuring the total length of a road network and comparing that to a reference scenario and secondly counting the number of missing elements as a count of features.

Both accuracy assessment scenarios for completeness overlay an arbitrary grid cell over compared datasets to determine the total length of count in a smaller unit. Then a comparison between two sets of roads based on a total length can be determined.

Haklay (2010) references completeness as asking the question of how comprehensive is the coverage of real-world objects. Generalizing this as a simple measure of completeness for our analysis, the dataset with the higher total length of polylines is assumed to be more complete.

Our analysis proceeds by projecting the Tiger roads data into StatePlane coordinates to match the other provided datasets. The shape length of each polyline in kilometers is calculated from feet into a new field for each road feature class. Statistics for total length of all road segments per dataset are then summarized for the initial assessment of completeness, where the dataset with more kilometers of roads is considered more complete.

The results were 10,805.82 km of roads for the County Street Centerlines feature class and 11,382.69 km for the Tiger roads feature class. With more data, the Tiger roads data is considered more complete.

Further accuracy assessment for completeness continues with a feature class of grid polygons to be used as the smaller units for comparison. Both feature classes were clipped so that all roads outside of the 297 grid cells were dropped. Geoprocessing using the Pairwise Intersect tool separates each road centerline dataset by grid. This provides a numerical summary indicating a simple factor of completeness on a smaller scale.

The collective length of Tiger road segments exceeds the County street centerline segment length in 162 of the 297 grid cells.
The collective length of County street centerline segments exceeds the Tiger road segment length in 134 of the 297 grid cells
Additionally one grid cell contained zero polylines for either centerline dataset.

Visualization of these results shows the percent difference for the length of Tiger roads centerline data as compared to the County roads centerline data. Statistics were calculated using a  mathematical formula:
% π‘‘π‘–π‘“π‘“π‘’π‘Ÿπ‘’π‘›π‘π‘’ = (π‘‘π‘œπ‘‘π‘Žπ‘™ π‘™π‘’π‘›π‘”π‘‘β„Ž π‘œπ‘“ π‘π‘’π‘›π‘‘π‘’π‘Ÿπ‘™π‘–π‘›π‘’π‘  − π‘‘π‘œπ‘‘π‘Žπ‘™ π‘™π‘’π‘›π‘”π‘‘β„Ž π‘œπ‘“ 𝑇𝐼𝐺𝐸𝑅 π‘…π‘œπ‘Žπ‘‘π‘ )/(π‘‘π‘œπ‘‘π‘Žπ‘™ π‘™π‘’π‘›π‘”π‘‘β„Ž π‘œπ‘“ π‘π‘’π‘›π‘‘π‘’π‘Ÿπ‘™π‘–π‘›π‘’π‘ ) ×100%
Completeness is aggregated where cells with more kilometers of Tiger roads than County roads appear in reds and oranges and shades of green where the collective length of County roads polylines exceeds the length of the Tiger roads data.

Length comparison between County street centerline data and TIGER roads data
Map showing the geographic distribution in the differences of completeness for the two road datasets

References:

Zanbergen (2004, May). Spatial Data Management: Quality and Control. Quality of Road Networks. Vancouver Island University, Nanaimo, BC, Canada.

Suyoung & O'Hara (2009, December). International Journal of Geographical Information Science 23, 1503-1525.

Haklay (2010, August 1). Environment and Planning B: Planning and Design, 37, 682-703.