Tabular Data

There are a few accepted ways to organize data for storage, manipulation, and analysis. Observations that share the same fields or characteristics can be stored in a table format, represented by rows (observations) and columns (fields). Ideally all of the same types of information are collected for each observation so that direct comparisons can be made and patterns detected across multiple variables. For example, a table of housing characteristics may include the number of bedrooms, number of bathrooms, interior square footage, and lot size. In this case a row of information represents one housing unit, and each column represents the four attributes mentioned. Tabular data is commonly rectangular in shape, with the same number of columns (fields) for each row (observations). Also referred to as a flat file, tabular data is a simple structure, although the number of rows and columns can be quite large.

Relational data structures join together or relate multiple flat files when each row or observation in a flat file may have multiple associated records in an external data file. An example is property tax payments associated with a single house address (Figure 4.2). The primary tabular data identifies individual houses by address with an accompanying table of annual tax statements (or transactions) that relate to each other by address, parcel identification number, or other identifier. This can be a more efficient structure for managing the different types of data, such as ownership information (single records) and associated transactions (multiple records), rather than including a significant amount of redundant information in a single file.

Flat File and Relational Database Structure

Figure 4.2 Flat File and Relational Database Structure

Qualitative Data

Qualitative data represent characteristics that are not easily measured or quantified. There may be a degree of subjectivity involved without having a precise scale or dimensionality. Housing characteristics, like the number of bedrooms or the number of bathrooms and square footage, have quantitative values, but house condition (e.g., excellent, good, fair, poor) or resident satisfaction (e.g., high, medium, low) do not have accepted measurement scales or criteria. Self-reported, qualitative data like condition or satisfaction may also be inconsistent because they are based on personal opinion rather than uniform criteria. Qualitative data can be useful when underlying variables or values are unknown, however, analysts should be careful that these data are collected and interpreted in ways that meet the intended purpose of the study and can be analyzed with appropriate methods (Dandekar, 2003). This is especially true in cases where qualitative and quantitative data are combined within a particular analysis, such as from a survey or interviews.

Structured and Unstructured Data

Data can also be characterized as being structured or unstructured. Previous examples, such as tabular data with rows, columns, and fields of information are traditional, structured formats. Unstructured data such as text from interviews, emails, and other

Planning Data and A nalysis 65 text-based social media are different and distinct. The methods for accessing and analyzing structured versus unstructured data are quite different. Statistical methods are used for both, with predetermined fields or variables used for structured data. Unstructured data is processed to identify latent variables and underlying patterns due to the lack of distinct variables. This includes the use of machine learning (ML) methods such as natural language processing (NLP). Text-based data from public comments may be of particular use to planners during the community involvement phases of a project.

Spatial Data

Spatial data store locational characteristics along with geographic references (i.e., X and Y coordinates) in either vector or raster formats (Figure 4.3). Vector features can be zero to three-dimensional, where a single X,Y pair indicates a specific point location (no width, depth, or height—therefore, O-dimensional), a pair of X,Y coordinates identifies a line segment (or edge), several X,Y pairs (in 2-dimensions) identifies an enclosed area, and several X,Y, and Z coordinate sets (3-dimensions) identify a volume or mass. Querying and analyzing these data include both locational and attribute criteria, to answer questions that combine both what and where. Patterns can be detected when locations also correlate with particular feature characteristics. Clusters are one example of patterns that can be detected using spatial statistics and indices. The scale of observation plays an important role because a grouping or cluster may only be distinguishable from a certain distance or range. Spatial data can represent a wide range of physical and non-physical features such as the location of a landmark (point data), road networks (line data), or jurisdictional boundaries (polygon data).

Spatial Feature Types

Figure 4.3 Spatial Feature Types

Raster data are significantly different from vector geographic/spatial features. A raster stores data in a matrix format composed of rows and columns. The cells or pixels are geographically referenced and can vary in size depending on the resolution or granularity of the data. Aerial photographs and satellite imagery are common sources of raster data, where cell colors or values represent locational characteristics. While objects such as roads, buildings, and water bodies are visible in a raster, they cannot be identified as such because raster lack the topolog}' of vector data. Additional processing, however, can convert data from raster to vector and vice versa.

< Prev   CONTENTS   Source   Next >