The GIS Road to Fruition: thematic maps

Showing posts with label thematic maps. Show all posts

Sunday, November 24, 2024

GIS Portfolio

What a journey it has been in the GIS Undergraduate Certificate Program these last 15 months! As we fast approach the Thanksgiving holiday, the semester is winding down and our final assignment is the compilation of a GIS Portfolio showing some of our best examples of deliverables we produced both in class and professionally.

Me presenting for GIS Day at FDOT District 7

Giving an overview of thematic maps as part of my GIS Day presentation at FDOT District 7

My initial plan was to create an online portfolio on AARoads.com. Avada, the content management system that we use with Word Press on the site, has a built-in Portfolio module. Perfect, or so I thought. A nagging issue with the site has been the depreciating php script that generates the bulk of the guide pages. A customization to Word Press for this script breaks the Portfolio aspect of Avada, and addressing this issue so that it will function properly is not an easy task.

Rather than adding stress to an already busy schedule as of late, I opted to instead compile my GIS Portfolio using PowerPoint. The presentation can be found on my Google Drive at https://drive.google.com/file/d/1I36tE8EkP2LvJoM6Ro2yOl6Xk2r5DMev/view

Alternatively, I also uploaded the GIS Portfolio to AARoads at https://www.aaroads.com/anitzman-gis-portfolio.pdf

Wednesday, November 20, 2024

GIS Day 2024 Event at FDOT District 7

Following months of planning, of which I contributed starting on the second day of my internship, GIS Day is finally here! Beyond brainstorming ideas in which to better spread the word at FDOT District 7 of the event, I was tasked with creating one or two GIS Day maps for display on the wall of the auditorium.

As the semester progressed, I took inspiration from Special Topics assignments and learned skills from Computer Cartography and GIS Applications for several mapping concepts to share on GIS Day. My idea was to show a few examples of the capabilities of GIS, both from an analytical standpoint, and also in the different ways data can be visualized.

After reading several classmates discussion board posts on presentations they made for GIS Day, I decided to follow their lead and create a presentation of my own. My goal was to provide an overview of maps in GIS, then cover each of the five maps I created with a mix of technical information such as the geoprocessing that went into it or the type of map (choropleth, graduated symbol), principles of design, and inspiration for the map subjects.

Our efforts paid off, and the D7 GIS Department's three hour event this morning was a great success! We had around 30 attendees, many of which stayed for all presentations, and received several positive comments on the event. My presentation went over well and I thoroughly enjoyed sharing some of the GIS knowledge gained from my time with the University of West Florida.

2024 GIS Day at FDOT District 7

My GIS Day 2024 presentation and the maps I created for the event follow:

D7 GIS Day Map Overview

There are two general categories of maps, Reference maps and Thematic maps. We are all familiar with Reference Maps, such as a road map or a political map. On display in the auditorium here are examples of Thematic Maps, which are maps that focus on a specific theme, such as climate, population, or in our case, transportation. This leads me into our first GIS Day map…

Hurricane Tracks Map

Map quantifying the number of hurricanes striking Florida from 1851 to 2024

Florida Hurricanes quantifying direct impacts from 1851 to 2024

When we were planning our GIS Day event, one of the map concepts discussed was a Florida map of hurricane tracks impacting the state over the last 20 years. Sounds simple enough, but as the map was in production, Hurricane Milton formed, and one fact mentioned by media outlets was that Tampa had not been hit directly by a major hurricane since 1921.

This ultimately factored into me deciding to expand upon the hurricane tracks map concept to quantify the number of hurricanes that have directly passed, the center that is, over each county in the state.

I opted to cover two sets of temporal data. A choropleth map shows the number of hurricanes per county in the last 50 years. It uses dark colors for higher values, conveying that higher values have a heavier visual weight. The graduated symbols map, which quantifies the number of hurricanes per county since 1851, the first Florida hurricane in the dataset, correlate size of the symbol with quantity, i.e. larger means more.

As for how the map was created, the geoprocessing for the choropleth and graduated symbols maps were based upon the number of hurricane polylines crossing any part of the county polygons. These calculations are automatic in GIS and no manual comparisons are needed.

D7 Interstates History Map

Map showing the opening dates of every mile of the Interstate system within FDOT D7

FDOT District 7 Interstate opening dates color coded by decade

This thematic map aggregates sections of the District 7 Interstate system by the decade in which they opened to traffic. This also shows how the use of graphics can enhance the presentation of a map.

I also factored into the design the Gestalt Principles of Perceptual Organization, which in cartography includes Visual Hierarchy, where important features are emphasized, and less relevant ones deemphasized. The Figure-Ground relationship accentuates certain objects over others by making these appear closer to the map user. Visual Balance is where the size, weight and orientation of map elements are adjusted to achieve balance in the center of the map. Contrast and Color are other principles used in good map design.

D7 Lighting Raster Map

Raster showing the number of light poles per square mile in FDOT District 7

Raster quantifying light poles in FDOT District 7

I created this map to show how raster data can be used by GIS. The concept took the point feature class for all light poles within District 7 and overlayed them with a fishnet grid in ArcGIS Pro. This is also referred to as grid-based thematic mapping. I aggregated the light poles by 1 square mile grid cells and obtained a density unit via geoprocessing. I then symbolized the raster set where lighter colors convey more light fixtures. The end result is a map clearly showing where we maintain the most lighting.

D7 Storm Surge Map

Areas in FDOT District 7 inundated for storm surge by Saffir-Simpson category

Storm surge data is another form of raster data. These are generally calculated by the use of a Digital Elevation Model or DEM. One useful aspect of ArcGIS Pro is the ability to use geoprocessing to convert a raster into a polygon feature class, such as was done here with this NOAA storm surge dataset.

This expands the options for the GIS analyst. Among others, geoprocessing options include least cost path analysis, buffer analysis, and data interpolation, where unknown values between known data points such as rainfall rates, can be estimated.

3D Traffic Count Map

3-Dimension map of traffic volume (AADT) for FDOT District 7

3-Dimensional representation of traffic counts (AADT) on the FDOT D7 state road system

When you think of 3D mapping, you probably think of modeling buildings or terrain, but there are several other uses. One such concept of 3D mapping is to visualize 2D data in a different, and perhaps more thought-provoking way.

That was the idea behind this 3D traffic count map of District 7. ArcGIS uses the Extrusion method to add a 3D element to our 2D feature class. Extrusion bases the height of data on a Z-unit, where the unit can be based upon real-world units, such as the height of a building, or upon ranges of data, such as with the traffic counts here.

ArcGIS Pro renders data three dimensionally differently for points, polylines and polygons. Points will appear as columns. Polylines will appear as a wall, as it does here, and Polygons appear as solid objects, which is probably easiest to imagine when applied to a building footprint.

One thing revealed with this 3D traffic count map was that a stretch of traffic count data for Interstate 4 was missing. So, the 3D map produced an unintended benefit, revealing a section of missing data that we could correct.

So, as you can see, GIS allows you to show geospatial data in a more meaningful way. And these maps are only the tip of the iceberg when it comes to the types of deliverables that can be produced.

Tuesday, July 30, 2024

Damage Assessment - Hurricane Sandy

Module 5 for GIS Applications continues our focus on Hurricane Sandy and explores damage assessment for the storm's impact in the Garden State.

Our first task was to create a formal hurricane track map showing the path Sandy took from the Caribbean Sea to the Northeastern U.S. The symbology uses custom color coded coordinate points showing the hierarchy of storm intensity. Included are the maximum sustained winds and the barometric pressure shown in 12 hour increments to improve legibility.

Map showing the path of Hurricane Sandy.

The next section of lab 5 was the creation of a damage assessment survey using Survey123. This was a pretty straightforward process, with options to add multiple choice questions pertaining to damage to be documented, a field for describing that damage surveyed, the option to include an image or have the mobile device take a photo, and a required location setting either through GPS or map locator. Following the form creation, we determine what the survey application does after the submission of a completed survey and we set the restrictions on what an individual viewing the survey can see.

Our next task is the preparation of raster data of air photos showing an area of New Jersey both before and after Superstorm Sandy. An array of .SID raster images of pre-storm photos created the first mosaic dataset using geoprocessing. .JPG images of the post-storm photos were compiled into the second mosaic dataset.

With both mosaic datasets in place, we revisit the Flicker and Swipe tools (located in the Compare group below the Mosaic Layer tab), which were previously used in the Remote Sensing course, to alternate the display between the pre and post storm imagery. These are both fast methods to visually compare the two imageries.

An example of the Swipe tool showing pre-storm imagery above and post-storm imagery below

Step 3 of the lab focuses on the creation of data. For this, we revisit the concept of Domains previously covered in Intro to GIS last Fall. Attribute domains constrain values allowed in an attribute for a table of feature class. Domains create a rule set of acceptable attribute values, or in the case of Module 5, a range of integers associated with predefined aspects of damage assessment:

Domains set for an attribute of a newly created feature class

Helping to ensure data integrity, domains limit the number of acceptable values for a field.

Attribute domains are store in a geodatabase. They can be utilized by multiple feature classes, tables and subtypes in a geodatabase. Through Catalog in ArcGIS Pro, Data Design>Fields, these can be added to an existing feature class.

Using the aforementioned air photos showing Seaside Heights, NJ before and after Superstorm Sandy, we were tasked with conducting damage assessment within a study area of parcel data using the preset domains for the various damage categories. Symbolization uses a continuous color scheme from green for no damage to red for destroyed.

Point feature class showing damage assessment for each parcel within a study area

Damage assessment study area for Superstorm Sandy at Seaside Heights, NJ

Given the four domains of Structure Damage, Inundation, Wind Damage and Structure Type, each parcel within a seaside neighborhood was evaluated for damage based upon the two air photos. This was a tedious task due to relatively low image resolution and long shadows in the post-Sandy aerial imagery. Without in-situ data collection, evaluating parcels for wind damage was impractical given details such as missing roof shingles was not possible.

Expanding our analysis, we aggregate the damage assessment points into buffers of within 100 meters of the coastline, between 100-200 meters and between 200-300 meters. Using the Multi-ring Buffer geoprocessing tool, created the three storm surge zones. Proceeded to run a Spatial Join on the Structure Damage point file with the MultipleRing Buffer polygon file to quantify the damage type by buffer zone. The Summary Statistics geoprocessing tool does the tabulation for us:

Hurricane Sandy Damage Assessment - GIS Applications

The final results of our damage analysis confirms the penetration of Superstorm Sandy's storm surge varied as the distance from the coastline increased from 100 to 300 meters. Structures facing the ocean were generally pulverized, while buildings located around 300 meters inland fared much better, some with seemingly no visible damage. This damage estimation appears to be consistent along other parts of the barrier island where the elevation and slope are similar. Exceptions were noted, such as further south of the study area in Seaside Heights, New Jersey, where the barrier provided by a boardwalk and two piers protected adjoining neighborhood areas from the storm surge.

Wednesday, July 3, 2024

Crime Analysis in GIS

Our first topic in GIS Applications is crime analysis and the use of crime mapping for determining crime hotspots. Crime mapping techniques provide insight into the spatial and temporal distributions of crime. This benefits criminologists in the research community and professionals in law enforcement.

Crime mapping factors in the importance of local geography as a reason for crime and considers that it may be as important as criminal motivation. The importance of identifying patterns and hotspots in crime mapping tends to be a precursor for implementing effective crime prevention methods.

Fundamental to crime mapping is spatial autocorrelation, which acknowledges the spatial dependency of values measured within areas. This recognizes that crime in one area can influence the crime rate of a nearby area.

We are tasked this week with quantifying data and generating hotspot maps showing crime density using various methods on the clustering of events. The Lab for Module 1 works with crime data for Washington, DC and Chicago.

Kernel Density Map showing crime hotspots for assaults with dangerous weapons in 2018

Output in this week's lab, a kernel density map showing 2018 crime hotspots for Washington, DC

A relative measure, a crime hotspot represents an area with a greater than average frequency of criminal or disorderly events. An area where people have an above average risk of victimization can also be classified as a crime hotspot. Victimization however cannot always be shown on maps, as the theory refers to multiple incidents on the same individual, regardless of location. This can also represent a street (line) or a neighborhood (polygon) where repeat occurrences take place.

Determining crime hotspots can aid in detecting spatial and temporal patterns and trends of crime. The concept can benefit law enforcement in better allocating resources to target areas. Crime hotspots can also be used to identify underlying causes for crime events.

The concept of local clustering, concentrations of high data values, is the most useful for crime analysis. Methods determine where clusters are located and produce a hotspot map showing concentrations.

Point data can be used directly in this analysis of clusters. A collection of points can produce a hotspot whose bounds are derived from the local density of points. Using point data also has the advantage of not being constrained by a predetermined jurisdictional boundary. Data aggregated into meaningful areas, such as within a jurisdiction where the polygons consists of the highest values, can also result in hot spots. Aggregation can produce crime rates, such as the number of events per number of residents or per households for an area.

Aggregated data showing the number of crimes per 1,000 households

Choropleth map with aggregated data determining the crime rate for Washington, DC.

The Lab for Module 1 focuses on three methods for local clustering. Grid-Based Thematic Mapping overlays a regular grid of polygons above point data of crime events. This produces a count of events for each grid cell. Since all cells are uniform in dimensions, the count is equivalent to a density.

The choropleth map output showing crime density can be further analyzed to determine the crime hotspots. Extracting the crime hotspots involves selecting the highest class of the data. Quintile classification is commonly used to determine this.

The data provided in Lab included point class data of homicides reported in the city of Chicago for 2017. Additionally we were supplied with polygon class data of 1/2 mile grid cells clipped to the Chicago city limits.

The grid cells and point data for Chicago were spatially joined and grid cells where the homicide value was zero were removed from analysis. Using quintile classification, the top 20% of grid cells based on the homicide values was extracted to generate a hotspot map:

Grid-Based Thematic Map of Chicago where three or more homicides were recorded in 2017

Using point values, Kernel Density can also be used to calculate a local density without the use of aggregation. The estimation method utilizes a user-defined grid over the point distribution. A search radius, known as the bandwidth, is applied to each grid cell. Using these two parameters, the method calculates weights for each point within the kernel search radius.

Points closer to the grid cell center are weighted more and therefore contribute more to the total density value of the cell. The final grid cell values are derived by summing the values of all circle surfaces for each location. For the Lab, we used the Spatial Analyst Kernel Density tool in ArcGIS Pro. Input were the grid cell size and bandwidth to run on the 2017 homicides feature class for Chicago. The output was a raster file with ten classes.

Since we were only interested in areas with the highest homicide rate, we reclassified the raster data into two classes. The upper class ranged from a value three times the mean to the maximum value of the raster data. This represented the crime hotspot as estimated with kernel density:

Continuous surface map showing the crime hotspots for Chicago based upon 2017 homicide point data

Local Moran's I is the final method implemented on the 2017 Chicago GIS data for Module 1. A global measure of spatial autocorrelation, the Moran's I method addresses the question, are nearby features similar? Features that are closer to each other are more similar to one another than those located farther apart. Moran's I produces a single statistic that reveals if a spatial pattern is clustered by comparing the value at any one location with the value at all other locations.

The result of Moran's I varies between -1.0 and +1.0. Positive values correlate to positive spatial autocorrelation (clustering) and negative values with negative autocorrelation. Where points that are closer together have similar values, the Moran's I result is high. If the point pattern is random, the value will be close to zero.

For the Lab, the homicides feature class and census tract data were spatially joined. A field calculating the number of homicides per 1,000 units was added. This feature in turn was input into the Cluster and Outlier Analysis (Anselin Local Moran's I) Spatial Statistics tool to output a new feature class based upon Local Moran's I. The result includes attribute data revealing two types of clusters: High-High (HH) representing clusters of high values and Low-low (LL) representing clusters of low values.

High-high clusters in the context of the Chicago crime data represent areas with high homicide values in close proximity to other areas with high homicide values. These are the crime hotspots:

Crime hotspots derived from 2017 homicide data for Chicago using the Local Moran's I method

Sources:

Ratcliffe, J. (2010). Crime Mapping: Spatial and Temporal Changes. Handbook of Quantitative Criminology,. (pp. 5-8). Springer New York, NY.

Eck, J.E., Chainey, S., Cameron, J.G, Leitner, M. & Wilson, R.E. (2005) Mapping Crime: Understanding Hot Spots. National Institute of Justice (NIJ) Special Report.

Saturday, April 20, 2024

Isarithmic Mapping - Washington State Precipitation

The semester is accelerating and we move into the 6th lab covering Isarithmic Mapping! Following choropleth mapping, this thematic map type is the second most widely used in cartography. Isarithmic maps consider geographic phenomenon to be continuous and smooth, with measurements in the area of interest presumed to change gradually between data point locations instead of abruptly. There are two primary types of isarithmic mapping.

Often associated with meteorology, isometric maps depict smooth, continuous phenomenon, such as temperatures, rainfall, barometric pressure and wind velocity derived from data occurring at true points where values are actually measured at that location. The most common form of isometric maps are contour maps, which are lines marking equal value across a geographical area.

Collectively, contours used in isometric maps can be referred to as isolines. Iso in Latin means equal or the same. Variations of isoline terminology include isobars for lines of equal barometric pressure, isotherms for lines of equal temperature and isodrosotherms for lines of equal dew point.

Isopleth maps are comprised from data that occurs over geographic areas using conceptual points, where values are presumed to be at point locations. Isopleth maps show variations in quantity of features as a surface. The volume can be represented using contour lines or by filled contours with color shading representing quantitative values. Data for isopleth maps must be standardized to account for the area in which the data was collected.

Various interpolation methods on raster data sets are implored in the creation of isopleth maps. These methods generate data values over a given area using samples measured at control points. An algorithm in turn processes the data to predict the values of unknown points on an isopleth map. Values between the control points are predicted under the premise that spatially distributed objects are spatially correlated. Also referenced as the Concept of Spatial Auto Correlation, this basis of interpolation assumes that values of locations close together tend to share similar characteristics than those located farther apart.

The focus of lab this week is the creation of an isopleth map showing the average annual precipitation for a 30 year period across the state of Washington. The provided dataset was derived using PRISM, an inverse distance weight (IDW) interpolation method developed by the University of Oregon.

Washington Precipitation map using Hypsometric Tinting

The Parameter-elevation Regressions on Independent Slopes Model (PRISM) stresses elevation as the most important aspect in a localized region for the distribution of climate variables such as rainfall, temperature and dew point. The model calculates a climate-elevation relationship for each cell of a raster data set based upon data from nearby weather stations. The regression function used with the IDW method weights station data points to incorporate a wide range of physiographic variables that have a direct correlation with precipitation amounts and other climatological aspects.

Two types of isarthmic maps were created in Lab 6. The first was a continuous tone map, where geographic surfaces represent the values that exist across an entire area. Data collected at sample points, by mapping the density of points or the values they represent, factor into the interpolation that generates the continuous surface. This method portrays a more fluid appearance where data values in a raster set gradually transition from cell to cell.

The second was Hypsometric Tint, which reminds of me of the Futurama character the Hypnotoad, that classifies data into bands. These bands represent a method of coloring different values to enhance changes, such as in elevation with a Digital Elevation Model (DEM).

Using contours, hypsometric tint separates raster data into bands with uniform data values. These bands can represent a single value, or a range of values with lower and upper limits. An advantage of hypsometric tint is that changes in data are more clearly visualized over the smooth transitions of a continuous tone map. A drawback is that local variation of data values is lost with the generalization between contours.

The hypsometric tint map of Washington precipitation projected in State Plane coordinates.

Reprojecting the Washington precipitation data into State Plane coordinates, I ran through the lab again to create a second map showing Washington in a more aesthetically pleasing projection. This both gave me more practice with creating continuous tone and hypsometric tint maps, but also some of the difficulties with projecting data, as the hillside shading values changed from using world statistics to local statistics.

PRISM

PRISM was initially developed in 1991. Enhancements over time garnered the interest of the USDA Natural Resources Conservation Service (NRCS), which sought improvements for updated digital precipitation maps. With funding support, PRISM precipitation maps were generated for the Pacific Northwest and Intermountain West region of the U.S., where topographic features made mapping precipitation complex.

State Climatologists evaluated the maps produced by PRISM, offering their own suggestions for improvements. Following two years of trial and error, they concurred that PRISM produced maps equaling or exceeding previous ones produced by hand. The result is that the NRCS utilized PRISM to map averages for temperature and precipitation nationwide for the period from 1961 to 1990.

Sources:

Daly, C., & Bryant, K. (n.d.). The PRISM Climate and Weather System – An Introduction. University of Oregon. Retrieved April 20, 2024, from https://www.prism.oregonstate.edu/documents/PRISM_history_jun2013.pdf

The Hypnotoad may or may not approve of hypsometric tint!

via GIPHY

Monday, April 15, 2024

Hybrid Mapping - Choropleth and Graduated Symbols

Module 5 for Computer Cartography advances our understanding and usage of choropleth maps while introducing us to proportional and graduated symbol map types.

A choropleth map can be described as a statistical thematic map showing differences in quantitative area data (enumeration units) using color shading or patterns. Choropleth maps are not to be used to map totals, such as ones based on unequal sized areas or unequal sized populations. Instead data should be normalized using ratios, percentages or another comparison measure.

Proportional symbol maps show quantitative differences between mapped features. This is the appropriate map type designed for totals. The map type shows differences on an interval or ratio scale of measurement for numerical data. Symbols are scaled based upon the actual data value (magnitude) occurring at point locations instead of a classification or grouping.

Graduated symbol maps also show quantitative differences in data, but with features grouped into classes of similar values. Differences between features use an interval or ratio scale of measurement. The data classifications use a scheme that reflects the data distribution similar to a choropleth map. Previously discussed data classification methods, such as Equal Interval and Quantile, can be applied to generate classes.

Our lab for Module 5 was the creation of a map dually showing the population density of people per square kilometer and wine consumption at the rate of liter per capita for countries in Europe. A dual choropleth map will display population densities for the continent while a graduated or proportional symbol map will quantify wine consumption rates for each country.

The lab exercise tasks included the creation of both a proportional symbol map and a graduated symbol map of Europe. The ultimate map type used to portray the country data is partly based upon the anticipated ease of a map user to visually interpret the maps.

Generating a proportional map in ArcGIS Pro is a more rigid process with less user options. The scale classifications are preset to five breaks partitioning data into ranges of 20%. However, the feature class labels are not clearly understood, as the range array is 1, 2.5, 5, 7.5 and 10. The minimum size of the symbol proportionally determines the maximum value.

The raw and mostly unstylized output of the Proportional Symbol Map, with arbitrary values showing the rank of counties in wine consumption from lowest to highest, while the sizes convey the actual wine consumption rate of liters per capita:

A graduated symbol map for this assignment provided more flexibility with various methods of classification, more easily understood class separations and automatically generated labels, the ability to adjust classes using Manual Breaks, and absolute control over setting symbol sizes. The final output:

Map showing population density vs wine consumption for European countries

An added aspect of this lab was the introduction of picture symbols, which can be used in place of the default ArcGIS symbol set. Picture symbols allow for more personalized customization to a map, as long as they appropriately distinguish between differences of data magnitude.

Using a blue color palette from the Color Brewer web site, used the Natural Breaks data classification method to generate the choropleth map of European countries by population. The graduated symbol element of the map uses picture symbols that I created in Adobe Illustrator based off the Winery sign specifications used on Florida roads.

Picture Symbols Created for the European Wine Map

The winery icons incorporate a color scheme to aid in visually distinguishing the differences in data magnitude. The highest wine consumption rate equates to the largest symbol size where all grapes in the graphic are colored magenta. The next tier down in order reduces the symbol size by 15% and the proportion of graphics colored magenta versus those shaded green.

A series of three insets were created to better show detail on some of the smaller countries or groups of countries. These required some data exclusion so as not to conflict with data on the main map frame. Prior to creating the insets, I used the Polygon to Point geoprocessing tool to generate a separate point feature class for the graduated symbols. This provided me with the flexibility to relocate the placement of symbols in addition to the option of moving annotated text for the final layout.

The inset creation utilized a definition query with the SQL expression "not including values(s)", where wine consumption data for countries not to be displayed were omitted from the respective inset dataset. The annotation layer for the main map frame was also replicated for each inset to reduce conflict and speed up labeling time.

Chose Garamond font to give a more elegant look to the final map, since the wine is often equated with fine dining or culture. Additionally the blue color palette was specifically selected so as not to contrast with the color of the winery symbols.

Sunday, April 7, 2024

Thematic Mapping - Data Classification Methods

Module 4 for Computer Cartography contrasts 2010 Census Data for Miami-Dade County, Florida using multiple data classification methods. Our objective is to distribute quantitative data into thematic maps based upon two criteria. The first series of maps shows the percentage of the total population per Census Tract of the number of seniors aged 65 and older. The second map array uses normalized data to partition Census Tracts based upon the number of seniors per square mile.

When analyzing data distribution, it is important to understand that many geographical phenomena results in an array of values that can be represented by a bell-shaped curve. This is also referred to as "normal distribution." With normally distributed data, data values further away from the mean are much less frequent than those occurring nearer the mean.

Data classification is a topic that I have limited experience with. This lab required me to do additional research beyond the lectures and the textbook Cartography, to better understand the methods. Based upon online articles read and the course material, the four data classification methods for this lab can be defined as follows.

Equal Interval

The Equal Interval data classification method creates equally sized data classes for a feature layer based upon the range between the lowest and highest values. The number of classes is determined by the user. A simple way to understand this is if there were data with values ranging from 0 to 100, Equal Interval set to 4 classes would create classes with data ranges of 25 for each.

Equal Interval data classification is optimal for continuous datasets where data occurs throughout a geographic area, such as elevation, precipitation or temperature. The method is easily understandable and can be converted manually. However with unequal distribution of data, Equal Interval can result in classes with no data or classes with substantially more data than others.

Quantile

Similar to Equal Interval, the Quantile data classification method results in classes with an equal number of data values, but instead based upon the number records in an attribute table. That is, for a feature layer with 100 records, Quantile classification with five classes partitions the data into classes with 20 records a piece.

Furthermore, identical records cannot be placed in separate classes, nor will empty data classes be created. It also can place similar data values in different classes or very different values in a single class. Adjusting the number of classes can improve upon this.

Quantile data classification is good about showing the relative position of data values, such as where the highest concentration of data is located. It depicts variability, even if there is little within the data.

Standard Deviation

Standard Deviation is the average amount of variability within a dataset from the data mean, or in simpler terms, how spread out are the data values. The Standard Deviation data classification method adds and subtracts the standard deviation from the dataset mean to generate classes. These usually indicate how far data values diverge from the mean.

A disadvantage to implementing Standard Deviation method is that the data needs to be normally distributed. Data Normally distributed has a symmetrical bell shape where the mean and median are equal and both located at the center of the distribution. The empirical rule for normal distribution indicates that 68% of the data is within 1 standard deviation of the mean, 95% is within 2 standard deviations of the mean and 99.7% is within 3 standard deviations of the mean.

For our lab, the mean of the data for the percentage of seniors within the overall Census Tract population is 14.26%. The standard deviation is 7.19, so 207 of the 519 tracts of Miami-Dade County have senior population rates between 17.85% and 25.04%. The class showing a standard deviation between -1.5 (-10.78%) and -0.5 (-3.59%), shows Census tracts where the senior population makes up between 3.49% and 10.67%, or another 151 tracts of Miami-Dade County. Viewing a thematic map based upon standard deviation reveals where the average number of seniors are located juxtaposed with areas that have less and more than that average.

Miami-Dade Standard Deviation for the percent of seniors per Census Tract

Natural Breaks

The Natural Breaks data classification method separates data into groups where values are minimally different in the same class. Focusing on the natural gaps within a data set, the differences between classes however, are maximized. The aim of Natural Breaks is to determine logical separation points so that naturally occurring clusters are identified.

Natural Breaks works well with unevenly distributed data where values are not skewed toward the end of the distribution The method can still result in classes with a wide range of values. Manually adjusting the break values can be used to offset this or remove the gaps between classes.

A solid grasp of these methods is needed to provide adequate data analysis. Admittedly, I will benefit from further work with creating maps using these data classification methods to better understand their utility.

The Module 4 lab assignment tasks us to make an assessment as to which of the classification methods best displays the data for an audience seeking to market to senior citizens. Further, the lab questions which is the more accurate criteria for data distribution, classifying the population by the percentage of seniors per tract, or using the normalized data where data indicates the number of seniors per area in square miles?

The most accurate display of senior citizen population in Miami-Dade County, Florida is derived from the Natural Breaks data classification method. The thematic map clearly shows the urban areas that represent the highest concentration of the population aged 65 plus. The upper data classes are reserved for just 42 Census tracts while classes showing the mid-range population rate draw the most visual weight.

An audience targeting the senior citizen population may benefit from the Quantile data classification since it shifts the classification scale lower, with 441 seniors per square mile as the starting point for the 2nd class versus 872 seniors per square mile that Natural Breaks generates. This might be a better distribution of the data from an audience stand point.

Miami-Dade County Census Maps showing senior population by area

Having a better understanding of Standard Deviation after writing this blog post, that data classification method adequately shows areas of Miami-Dade County where senior population is below average. The thematic map generally matches the Quantile and Natural Breaks maps in displaying areas of typical and above average senior population.

Which is more preference really depends upon the needs of the end user. A drawback to the Standard Deviation thematic map is that the color pallet for below average senior population tracts dominates the visual aesthetics.

The normalized data based upon the population of seniors per square mile offsets outliers generated by simply using the percentage of seniors per Census Tract. That is because the percent of seniors per tract does not give an indication of how many that number represents. The tract with the highest percentage of seniors represents 95 out of 120 people. Thematic maps for all four data classification maps showed that tract as being the highest concentration of seniors, despite the very rural population statistics:

Thematic maps showing Miami-Dade County Census data based upon the percentage of seniors