Engaging in Exploratory Data Analysis, Visualization, and Hypothesis Testing
- 1. Explore trends in the development and use of data visualization methods.
- 2. Understand how to create and interpret graphical summaries.
- 3. Understand the uses and applications of hypothesis testing.
- 4. Learn how to compute and interpret tests of independent sample means.
- 5. Learn how to compute and interpret chi-square tests.
In Chapters 2 and 3, we suggested that a good place to start with data analysis is to compute the descriptive measures that summarize the data distribution. In Chapter 3, we devoted coverage to statistical summaries that best describe the center, spread, shape, and relative position of the observations while also presenting the optional measures that apply to spatial data. In a similar fashion, we now explore the use of graphical summaries and data visualization methods as complementary tools in data exploration and analysis. As the familiar adage goes, "a picture is worth more than a thousand words," so becoming adept in the growing field of data visualization will significantly enrich our analytical skills. These tools will enable us to explore and visualize data in ways that would help us discern new information that would otherwise not be readily apparent when using conventional statistical tools. Data visualization methods are integral to what we might call "value-added" statistics in the sense that they enable us to go from large amounts of diverse forms of data to analyze, synthesize, and graphically display meaningful information with the expectations of possibly constructing and conveying new knowledge for use in decision-making. These methods are effective in exploring differences between phenomena, identifying expected as well as unexpected patterns, detecting clusters, revealing new relationships, and more. Drawing from several areas including spatial data mining, machine learning, geographic information systems (GISs), and cognitive science are many approaches in data visualization with applications in several domains. For example, we can use visualization tools and methods to simulate various real-world environments where users can test different scenarios; provide exploratory functions; practice/provide a real-world environment/experience; represent two-dimensional (2D) and three-dimensional (3D) environments; show spatial relationships; model different scenarios, for example, an urban environment; integrate real-time applications (wearable computers) with virtual environments, enable real-time applications, provide timely information/updates; support landscape viewing and drafting; engage human visual systems; and support the formulation of study hypotheses. The visualization community has also focused on developing visualization algorithms, tools, methods, and strategies, such as the social network analysis method, which is currently used for visualizing online social networks (Hoff et al. 2002; Heer and Boyd 2005; Perer and Shneiderman 2006; Luo et al. 2011; Luo and MacEachren 2014).
Given the cognitive and inherently subjective nature of synthesizing and interpreting the graphical displays, it is often best to validate the visual findings through hypothesis testing. There are also times when the results derived from hypothesis testing and statistical validation are best depicted through visual plots, charts, graphs, and maps to communicate the findings to the intended audience. As such, the processes of data exploration and visualization are closely aligned with hypothesis testing methods, a linkage that forms an integral part of spatial analysis and one that is clearly recognized and valued by geographers. Our plan in this chapter therefore is twofold. First, we explore the emerging field of data visualization and the contributory role of cartography and GISs in the development of these tools. This discussion is accompanied by examples of how standard plots are derived and the interpretation of the derived images. The second half of the chapter is devoted to the key steps in hypothesis testing. For hypothesis testing, our focus is on student's f-test and chi-square (y2) statistics, which are among the most commonly used significance tests. The examples presented in the chapter are foundational, with the primary goal of introducing the reader to the core concepts and tools in data visualization. Thereafter, in subsequent chapters of the book, we share examples that entail the use of more advanced visualization tools and statistical validation methods.
Exploratory Data Analysis, Geovisualization, and Data Visualization Methods
Data visualization, geovisualization, visual analytics, and exploratory data analysis (EDA) are all part of a growing domain of data-rich analytical, graphical, and interactive methods that are now available for screening, exploring, and synthesizing information. In the era of big data, these approaches are increasingly capable of converting diverse, dynamic, and complex forms of data into valuable information, and presenting this information in a comprehensible format that is beneficial to end users. A survey of the emerging literature on EDA, geovisualization, and data visualization may lead one to believe that these are three disparate fields with different end goals. However, scrutiny of the embedded tools and applications reveals many similarities in the core goals and objectives. These include the following: (1) data representation, (2) feature exploration and identification, (3) pattern recognition, (4) human-computer interaction, (5) knowledge construction and storage, and (6) effective communication and transmission of knowledge. These commonalities are elaborated on in the ensuing sections.