Spatial data analysis. An introduction to spatial autocorrelation and spatial regression analysis
Many research questions require analysis of complex patterns of interrelated social, behavioral, economic and environmental phenomena. In addressing these questions, it is increasingly argued that both spatial thinking and spatial analytical perspectives have an important role to play. Indeed, research on social stratification and inequality, health, mortality and fertility and many other issues depends on the collection and analysis of individual and context-level data.
The geospatial and methodological development environment has changed. The volume, sources and forms of available geospatial data are growing rapidly. The flow of information from a host of sensors has grown exponentially in recent years to the point that many observations can be geo-referenced. Data storage and handling (e.g. cloud computing) change what, how and when we collect data on individuals and their environments.
In a world where information is increasingly seen through geographic filters, the importance of spatial thinking is addressed. More and more instances show that space and place are important elements and stress the leverage of place-based politics. For example, conventional approaches in health research underestimate the contribution of place to disease risk. Several studies reinforce the view how neighborhood context is an important condition of human wellbeing. Place emerges as an important contextual framework for considering a number of critical societal issues. Place as a social context is deeply connected to larger patterns of social advantage and disadvantage.
Since the mid 1990s, there is a renewed interest in the much earlier tradition of spatial demography that focuses on areal aggregates as units of analysis.
When analyzing spatial data from a large number of units (e.g. counties), it is the natural inclination of researchers to move from simple descriptive analysis to begin asking questions as : How might these data be modeled ? How well can we account for variability in attribute values among geographic units ?
To answer these questions, analysts turned to multivariate regression modeling, the common methodology in the social sciences. However, the application of the standard regression approach to data tied to spatial units brings spacial complications because “spatial is special”. Attention has been drawn to the fact that spatial data require special analytic approaches.
Two properties are particularly important in the analysis of spatial data. The first, spatial dependence, refers to the tendency for spatial data to exhibit spatial autocorrelation. For most social phenomena mapped in space, local proximity usually results in value similarity. High values tend to be located near other high values, while low values tend to be located near other low values, thus exhibiting positive spatial autocorrelation. Less often, high values may tend to be co-located with low values (or vice versa), as islands of dissimilarity (negative spatial autocorrelation).
In either case, the units of analysis in spacial demography likely fail a key assumption of classical statistics : independence among observations. With respect to statistical analysis that presumes such independence (e.g. standard regression analysis), positive autocorrelation means that the spatially autocorrelated observations bring less information to the model estimation process than would the same number of independent observations. The greater the extent of spatial autocorrelation, the more severe is the information loss.
A quick explanation for the presence of spatial autocorrelation can be found in the oft-cited “first law of geography” enunciated by Tobler in 1970 : “Everything is related to everything, but near things are more related than distant things” (Tobler, 1970 : 36). Tobler’s first law is somewhat unsatisfying because it doesn’t tell us why this phenomenon arises in practice. The answer to this question can only be approximated with models of the spatial process and the analysts’s theory about the process.
The second concept refers to spacial heterogeneity, the tendency for phenomena distributed in many spaces to be statistically nonstationary (a lack of stability across space of one or more attribute values). Spacial heterogeneity confounds attempts to generalize because results of an analysis of a limited area will change when the boundaries of the area are shifted.
One of the more recent and fascinating developments in the design of local statistics is the theoretical background and associated software to explore how regression parameters and regression model performance vary across a study region.
Geographically weighted regression (GWR) is similar to a global regression model in that the familiar constant, regression coefficients and error term are all present within the regression specification. There are two ways in which GWR differs from standard (global) regression. First is the fact that a separate regression is carried out at each location (observation) using only the other observations that lie within a user-specified distance from that location. Second, the regression specification includes a statistical device which weights the attributes of nearby geographical units more highly than it does the attributes of distant geographical units. The result is a set of local regression parameters for each geographical unit. The regression is thus localized.
A GWR approach to regression analysis is a highly useful exploratory device for understanding parameter heterogeneity in one’s data. The output of GWR enables the researcher to examine and map local parameter estimates and local regression diagnostics, thereby enabling assessment of the utility of the model for various positions of the larger study region.
In the first part of this guide, we provide a general introduction to perform spatial regression and spatial autocorrelation analysis. In the second part, we model spatial data with geographically weighted regression to explain local variations in relationships.
CONTENTS
Part 1
An introduction to spatial autocorrelation and spatial regression with GeoDa 1
Manipulating data 4
Mapping and exploratory data analysis 8
Spatial autocorrelation 25
Spatial regression 69
Part 2
Analyzing spatial heterogeneity with geographically weighted regression 94