Sunday, September 22, 2024

Interpolation Methods - Tampa Bay Water Quality

There are numerous spatial interpolation methods used to generate surfaces in GIS. This is the prediction of variables at unmeasured locations based upon sampling of similar variables at known locations or true points. Related, spatial prediction is the estimation of variables at unsampled locations based partly on other variables and a collective set of measurements. Comprised of spatially continuous data, surfaces could be topographic, a measure of air pollution, soil moisture, air temperatures and population density among others (Bolstad & Manson, 2022).

A number of factors can affect the performance of spatial interpolation methods. Some of these factors are data accuracy, temporality of the data, sampling design, sample spatial distribution, the presence of abnormal values or outliers, and the correlation of primary and secondary variables (Hu, 1995, Li & Heap, 2014).

Deciding upon the best interpolation method is not always a straight forward process. Methods often work well for a specific data set because of inherent assumptions and algorithm design for estimation. Different interpolations methods applied to the same data set may produce desired results for one study objective but not another (Hu, 1995).

Module 5 for GIS Special Topics performs interpolation analyses for Tampa Bay water quality data. Specifically four methods are used for the estimation of Biochemical Oxygen Demand (BOD) in milligrams per liter variables for Tampa Bay. A point feature class of BOD sample locations is provided and the study area is all of Tampa Bay, Old Tampa Bay and Hillsborough Bay. A statistical analysis of each is compared in an effort to determine which derived surface best describes water quality.

The first interpolation method implemented for the Tampa Bay water quality analysis is Thiessen Polygon. This method was the easiest to interpret. It aggregates the point dataset within the study area to polygons with one per point, which is referred to as a centroid. All estimated points within the Thiessen polygon (proximal zone) are closer in value to the associated centroid than any other centroid in the overall analysis.

The Thiessen Polygon method is optimal when there is no uniform distribution of the sample points. The method is applicable to environmental management (Wrublack et. al, 2013).

Thiessen Polygon interpolation of Tampa Bay water quality
The Thiessen Polygon raster with an output cell size of 250.

Previously discussed in the Isarithmic Mapping lab in Computer Cartography, the Inverse Distance Weighting (IDW) spatial interpolation method estimates values using the values of sample points and the distance to nearby known points (Bolstad & Manson, 2022). Values closer to a location have more weight on the predicted value than those further away. The power parameter in the mathematical equation of the method determines the weighting, which decreases as the distance increases. When the power parameter increases, a heavier weight is applied to nearby samples, which increases their influence on estimation (Ikechukwu, 2017).

The IDW method assumes that the underlying surface is smooth. It works well with regularly spaced data, but cannot account for the spatial clustering of sample points (Li & Heap, 2014).

Tampa Bay water quality estimates from the IDW method
The IDW raster for water quality. The power parameter was 2 and output cell size of 250.

Spline interpolation uses a mathematical function to interpolate a smooth curve along a set of sample data points with minimal curvature. Polynomial functions calculate the segments between join points. These accommodate local adjustments and define the amount of smoothing. The method is named after splines, the flexible ruler cartographers used to fit smooth curves through fixed points (Ikechukwu, 2017).

The performance of Splines improves when dense, regularly-spaced data is used (Li & Heap, 2014). The method is very suitable for estimating densely sampled heights and climatic variables (Ikechukwu, 2017).

The lab uses the options of Regularized and Tension for the Spline geoprocessing tool in ArcGIS Pro. This changes the weight parameter, where higher values in Regularized splines result in smoother surfaces. A weight of zero for the Tension spline option results in a basic thin plate spline interpolation. This is also referenced as the basic minimum curvature technique.

Tampa Bay water quality - Regularized Spline interpolation
Estimated Tampa Bay water quality - Regularized Spline Interpolation Method

Tampa Bay water quality - Tension Spline Interpolation Method
Estimated Tampa Bay water quality - Tension Spline Interpolation Method

References:

Bolstad, B., & Manson, S. (2022). GIS Fundamentals – 7th Edition. Eider Press.

Hu, J. (1995, May). Methods of generating surfaces in environmental GIS applications. In 1995 ESRI user conference proceedings.

Li, J., & Heap, A. D. (2014). Spatial interpolation methods applied in the environmental sciences: A review. Environmental Modelling & Software, 53, 173-189.

Wrublack, S. C., Mercante, E., & Vilas Boas, M. A. (2013). Water quality parameters associated with soil use and occupation features by Thiessen polygons. Journal of Food, Agriculture & Environment, 11(2), 846-853.

Ikechukwu, M. , Ebinne, E. , Idorenyin, U. and Raphael, N. (2017) Accuracy Assessment and Comparative Analysis of IDW, Spline and Kriging in Spatial Interpolation of Landform (Topography): An Experimental Study. Journal of Geographic Information System, 9, 354-371. doi: 10.4236/jgis.2017.93022.

Sunday, September 15, 2024

Searching for the right GIS job

Finally got started with my GIS Internship with the Florida Department of Transportation at District 7 (D7) Headquarters last week. The position affords me the opportunity to work on several GIS related tasks and with multiple departments. I am working with a great team and providing assistance to others with ArcGIS Pro.

Settling into my internship position at D7 went very smoothly. My initial task is working on a basic training manual for ArcGIS Pro to be used in future courses that the GIS department will offer employees. Additionally I was invited to join planning meetings for this year's GIS Day, which will include demonstrations and information on how various departments across D7 use GIS. I am excited to contribute ideas and provide input, and this will also aid in my eventual GIS Day assignment for GIS4944!
GIS Day - November 20, 2024

One of the assignments for this week in GIS4944 is to conduct a job search for what we could consider to be our Dream GIS Job. Working on road map production for a major mapping company in GIS would be it, but the paper map industry is minimal and becoming more niche. So my second GIS job choice is working in transportation. My positive experiences after two days at FDOT have already reinforced this! 

The job that is most appealing in my search is for a GIS Analyst I for the Texas Department of Transportation (TxDOT). Generally all of the essential duties listed in the job posting fall somewhere within my knowledge wheelhouse. Collecting, preparing and digitizing GIS data is the first listed. Create, maintain, update GIS databases and cartographic products is another duty. Extraction of features from georeferenced scanner paper maps is a third duty that I have experience with. Even the bullet point referencing converting CAD and other formats into ArcGIS formats is a task I likely could master, given previous work with CAD at Mapsource and Adobe Illustrator for AARoads.

The position requires no prior experience, but a Bachelor's Degree in Geography, GIS or a related field is. However, the posting reveals that relevant work experience may be substituted for a degree on a year per year basis. I am confidence I can meet this requirement through my previous work with Mapsource, Universal Map Group, and GIS Cartography & Publishing Services, in addition to our coursework in the UWF GIS Certificate program.

The results of the GIS job search gave me a framework for what to look for in future job searches. The TxDOT position is about as optimum as I could get for both my skillset and interests. A job description for a GIS analyst position with FDOT would likely be similar. However, with ongoing budgetary issues, no positions at FDOT will be posted in the near future. There's always the private sector to consider as well.

Friday, September 13, 2024

3D Mapping - TINs and DEMs

Moving on from Spatial Data Quality in GIS Special Topics, the next Lab focuses on surfaces with a comparison of the Digital Elevation Model (DEM) and Triangular Irregular Network (TIN). A surface in GIS is a geographic phenomena represented as continuous data. Continuous spatial data references geographic objects characterized by very gradual boundaries such as temperature or elevation.

The most common way to represent elevation data is with contour lines. Contour lines are 2-dimensional features with attributes containing the value of the surface at a given location. They can be derived by the TIN vector model or the DEM raster model.

TINs are used exclusively to represent a 3-dimensional surface. A series of linked irregular triangles comprised from elevation points (nodes) in 3D (X,Y,Z) coordinates (Manandhar, 2005) occurring at any given location represent the 3D surface. The topological relationship of the network of triangles creates a continuous surface. The normal vector of each triangle is used to assign the properties of Slope and Aspect.
 
DEMs are the simplest way to represent a topographic surface. A DEM is a regular raster that uses a regular rectangular grid method (Manandhar, 2005) with cell values representing elevation or spot height. The cell size of a DEM determines the resolution. Therefore a DEM with a high number of smaller sized cells provides more accuracy than a DEM with less larger sized cells. Data becomes more implicit with larger cell sizes.

One part of this week's lab utilizes a DEM to develop a 3-dimensional Ski Run Suitability Map. Initially the supplied DEM was converted to a TIN for the 3D component for the Local Scene. The suitability parameters included Elevation where areas exceeding 2,500 meters are most favorable, Slope where angles between 30 and 45 degrees rank highest, and Aspect where south and west facing slopes are most preferred.

Following reclassification, respective rasters were generated from the DEM using geoprocessing tools in ArcGIS Pro. These in turn were input into the Weighted Overlay tool where the suitability rate for aspect is 25%, elevation is 40% and slope is 35%.

The final 3D Ski Run Suitability Map for Lab 2.1 Part B
The output Ski Run Suitability Map. Lighting enhancements include shadowing and adjustment of the sun angle. The Vertical Exaggeration is 2.50.

The next part of the lab further explores TINs with adjustments to symbology between elevation, slope and aspect. The deliverable included the generation of contours and selecting appropriate colors.
TIN with Graduated Color for Slope and Contours
Cividis color TIN with 50 meter contours and 250 meter index contours.

The last section of the lab provides a point feature class that will represent the mass points for a TIN. Geoprocessing of these points were input along with a study area soft clip polygon boundary in the Create TIN tool. The resulting TIN was modified symbolically to show contours set at an interval of 100 meters.

The same mass points feature class was input into the Spline tool to create a DEM. Contours were subsequently generated from the DEM with additional geoprocessing. The two contour feature classes were then compared.
Comparison of TIN and DEM based Contours

While not necessarily more accurate, the DEM based contours have smoother curvature resulting from the implicit data values from each grid cell (Manandhar, 2005). Appearing more jagged in areas with less slope, the TIN based contours are derived from every node, where 3D coordinates are more explicit. There are less Faces (triangles) in flatter areas.

References:

Manandhar, N. (2005). Comparison of TIN and Grid Method of Contour Generation from Spot Height. Nepalese Journal on Geoinformatics, 4, 1-8.
https://www.nepjol.info/index.php/NJG/article/view/51271/38351

Friday, September 6, 2024

Spatial Data Quality - Road Network Completeness

Continuing the focus on Spatial Data Quality in GIS Special Topics, Module 1.3 covers the Accuracy Assessment of Roads. Road networks are widely used as the basemap for many applications. This factors into expectations for positional accuracy and completeness, which this week's lab covers.

Road networks are also used for geocoding and network routing. The usability of such is dependent upon robust attributes such as street names, address numbers, zip codes in addition to networking aspects such as turn restrictions and one-way directions. Topologically, road networks must also be robust, with exact connectivity found in reality (Zanbergen 2004).

Typically road network datasets are compiled from an array of historical sources, with digitization from aerial imagery and augmentation from GPS field data collection. One of the most comprehensive datasets in the U.S. with a long lineage is TIGER (Topologically Integrated Geographic Encoding and Referencing).

Produced by the US Census Bureau for 1:100,000 scale maps (Syoung & O'Hara, 2009), TIGER was originally compiled to be topologically correct. That is data was not focused on being as accurate as possible, but instead data stressed connections and boundaries. (Zanbergen 2004) This resulted in legacy errors, which were carried over in succeeding updates from 2000 onward.

TIGER roads centerline data for Jackson County, Oregon
TIGER roads centerline data for Jackson County, Oregon

Covered in the last week's lab, accuracy assessment of roads utilizes methods such as "ground-truthing" using GPS or surveying equipment, comparing roads with high resolution imagery, and comparing roads to existing datasets deemed to be of higher accuracy.

Positional accuracy last week looked at the comparison of points between two datasets using root-mean-square-error (RMSE) with reference or true points. Additional methods include using buffers. This is where the true line is buffered with some distance to show discrepancies. It is also used to determine where displacements between matching features fall within an expected nominal accuracy. (Syoung & O'Hara, 2009) In other words data located in areas outside a buffer (specified tolerance) are deemed to be substantial errors.

Another method for positional accuracy is line displacement. This is where the displacement of various sections of a polyline are measured using Euclidean distance. Using matching algorithms, errors show the displacement of one road network from another. These displacements can be summarized (Zanbergen 2004), or be represented as a raster dataset to analyze vector geometry (Syoung & O'Hara, 2009).

The lab assignment for Module 1.3 conducts accuracy assessment for completeness on two datasets of street centerlines for Jackson County, Oregon. The feature classes are TIGER road data from 2000 and a Streets_Centerlines feature class compiled by Jackson County GIS.

Street Centerlines Data from Jackson County, Oregon GIS
Street Centerlines data from Jackson County, Oregon GIS

Completeness is one of the aspects cited by Haklay (2010) in accessing data quality. Completeness is the measure of the lack of data, i.e. how much data is expected versus how much data is present. Zanbergen (2004) references measuring the total length of a road network and comparing that to a reference scenario and secondly counting the number of missing elements as a count of features.

Both accuracy assessment scenarios for completeness overlay an arbitrary grid cell over compared datasets to determine the total length of count in a smaller unit. Then a comparison between two sets of roads based on a total length can be determined.

Haklay (2010) references completeness as asking the question of how comprehensive is the coverage of real-world objects. Generalizing this as a simple measure of completeness for our analysis, the dataset with the higher total length of polylines is assumed to be more complete.

Our analysis proceeds by projecting the Tiger roads data into StatePlane coordinates to match the other provided datasets. The shape length of each polyline in kilometers is calculated from feet into a new field for each road feature class. Statistics for total length of all road segments per dataset are then summarized for the initial assessment of completeness, where the dataset with more kilometers of roads is considered more complete.

The results were 10,805.82 km of roads for the County Street Centerlines feature class and 11,382.69 km for the Tiger roads feature class. With more data, the Tiger roads data is considered more complete.

Further accuracy assessment for completeness continues with a feature class of grid polygons to be used as the smaller units for comparison. Both feature classes were clipped so that all roads outside of the 297 grid cells were dropped. Geoprocessing using the Pairwise Intersect tool separates each road centerline dataset by grid. This provides a numerical summary indicating a simple factor of completeness on a smaller scale.

The collective length of Tiger road segments exceeds the County street centerline segment length in 162 of the 297 grid cells.
The collective length of County street centerline segments exceeds the Tiger road segment length in 134 of the 297 grid cells
Additionally one grid cell contained zero polylines for either centerline dataset.

Visualization of these results shows the percent difference for the length of Tiger roads centerline data as compared to the County roads centerline data. Statistics were calculated using a  mathematical formula:
% π‘‘π‘–π‘“π‘“π‘’π‘Ÿπ‘’π‘›π‘π‘’ = (π‘‘π‘œπ‘‘π‘Žπ‘™ π‘™π‘’π‘›π‘”π‘‘β„Ž π‘œπ‘“ π‘π‘’π‘›π‘‘π‘’π‘Ÿπ‘™π‘–π‘›π‘’π‘  − π‘‘π‘œπ‘‘π‘Žπ‘™ π‘™π‘’π‘›π‘”π‘‘β„Ž π‘œπ‘“ 𝑇𝐼𝐺𝐸𝑅 π‘…π‘œπ‘Žπ‘‘π‘ )/(π‘‘π‘œπ‘‘π‘Žπ‘™ π‘™π‘’π‘›π‘”π‘‘β„Ž π‘œπ‘“ π‘π‘’π‘›π‘‘π‘’π‘Ÿπ‘™π‘–π‘›π‘’π‘ ) ×100%
Completeness is aggregated where cells with more kilometers of Tiger roads than County roads appear in reds and oranges and shades of green where the collective length of County roads polylines exceeds the length of the Tiger roads data.

Length comparison between County street centerline data and TIGER roads data
Map showing the geographic distribution in the differences of completeness for the two road datasets

References:

Zanbergen (2004, May). Spatial Data Management: Quality and Control. Quality of Road Networks. Vancouver Island University, Nanaimo, BC, Canada.

Suyoung & O'Hara (2009, December). International Journal of Geographical Information Science 23, 1503-1525.

Haklay (2010, August 1). Environment and Planning B: Planning and Design, 37, 682-703.