Geodemographic classification of local communities in Belgium : The construction of an index of social deprivation


In this post we calculate a deprivation index for Belgian municipalities based on census data. Deprivation can be defined as a state of observable and demonstrable disadvantage relative to the local community or the wider society to which an individual, family of group belongs. A deprivation index provides a single estimate of the level of deprivation for each location within the study area. The index is a latent construct calculated using a combination of indicator variables.

  1. Data and method

We create a deprivation index for each municipality in Belgium (N = 589) based on a principle components analysis (PCA) on a set of 19 socio-demographic indicators and measures of socio-economic status (the data used hereafter can be downloaded from my github). These 19 variables were selected from a set of 30 variables made available in the census of 2011 and complement each other based on a correlation of .25 or higher (see correlation matrix below)*. Since the purpose is to identify multiple deprivation, the collinearity between indicators may not be an issue.

Rcode

# geodemographic census data
setwd("xxxxx")
geodemo <- read.csv("geodemo.csv",sep=";",header=TRUE)
str(geodemo)
# missing values
sapply(geodemo, function(x) sum(is.na(x)))
# meansubstitution missing values
for(i in 1:ncol(geodemo[,4:33])) {
geodemo[,4:33][is.na(geodemo[,4:33][,i]),i] <- median(geodemo[,4:33][,i],na.rm=TRUE)
}
sapply(geodemo, function(x) sum(is.na(x)))
# correlations between variables
library(ggplot2)
library(reshape2)
library(scales)
co <- melt(cor(geodemo[,4:33]))
names(co) <- c("Var1","Var2","corr.coef")
qplot(x=Var1,y=Var2,data=co,fill=corr.coef,geom="tile") +
scale_fill_gradient2(limits=c(-1,1)) +
theme(axis.text.x=element_text(angle=90,hjust=1)) +
labs(title="Heatmap of Correlation Matrix",x=NULL,y=NULL)
qplot(x=Var1,y=Var2,data=melt(cor(geodemo[,4:33],use="p")),fill=value,geom="tile") +
scale_fill_gradient2(limits=c(-1,1)) +
theme(axis.text.x=element_text(angle=90,hjust=1)) +
labs(title="Heatmap of Correlation Matrix",x=NULL,y=NULL)
# order rows and columns according to hierarchical clustering
corr_mat <- cor(geodemo[,4:33],method="s")
corr_mat[1:5,1:5]
ord <- hclust(1-as.dist(corr_mat))$order
co <- melt(corr_mat[ord,ord])
ggplot(co, aes(Var1, Var2)) +
geom_tile(aes(fill = value)) +
geom_text(aes(fill = co$value, label = round(co$value, 2))) +
scale_fill_gradient2(low = muted("darkred"),
mid = "white",
high = muted("midnightblue"),
midpoint = 0) +
theme(panel.grid.major.x=element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.major.y=element_blank(),
panel.grid.minor.y=element_blank(),
panel.background=element_rect(fill="white"),
axis.text.x = element_text(angle=90, hjust = 1,vjust=1,size = 8,face = "bold"),
plot.title = element_text(size=12,face="bold"),
axis.text.y = element_text(size = 8,face = "bold")) +
ggtitle("Correlation plot census variables (grouped according to hierarchical clustering)") +
theme(legend.title=element_text(face="bold", size=8)) +
scale_x_discrete(name="") +
scale_y_discrete(name="") +
labs(fill="Corr. Coef.")

2. The construction of an index of social deprivation

Rcode

# PCA 19 census variables
selection <- subset(geodemo,select=c("income_index","couples_without_children","married",
"employment_rate_1564","active_pop","couples_with_children",
"average_HHsize","owner_buildings",
"noneuro_foreigners","ratio_FM","unemployed", "poverty_S",
"minimum_sub_income","one_parent_HH","moved_in","one_person_HH",
"criminality_index","euro_foreigners","foreigners"))
pca <- selection %>%
na.omit() %>%
ungroup() %>%
prcomp(center=TRUE,scale=TRUE)

tibble(sd = pca$sdev,
pc = 1:length(sd)) %>%
mutate(cumvar = cumsum((sd^2)/sum(sd^2))) %>%
ggplot(aes(pc,cumvar))+geom_line(linetype="dashed",col="blue")+geom_point(size=4,shape=20,col="red") +
labs(x="Principal Component",y="Cummulative Proportion of Variance Explained") +
ylim(0.25,1.0) +
ggtitle("Variance explained by deprivation indices")

The first component from the PCA explains more than 60% of the total variance present and is used to construct the deprivation index.

Rcode

libary(ellipsis)
dep_weights <- pca$rotation %>%
as_tibble() %>%
mutate(measure=row.names(pca$rotation)) %>%
gather(component, weight, -measure)
# plot loading weights
dep_weights %>%
filter(component == "PC1") %>%
ggplot(aes(measure, weight,fill=measure)) +
geom_bar(stat='identity') +
coord_flip() +
labs(title='Weights of selected census variables on first principal component') +
xlab(' ') +
theme(legend.position = "none")

The weights of the variables on the first component show that each of them is associated with deprivation in the expected direction.

For example, unemployment and poverty are associated with higher deprivation (deprivation supporting factors, deprivation enhancement) while higher income levels and higher employment rates are associated with lower deprivation (deprivation inhibiting factors, deprivation inhibition).

We normalize the scores on the first component to create a deprivation index ranging from 0 to 1, with a higher score indicative of more deprivation.

Rcode

# take the first pc and normalize to [0,1]
dep_index <- pca$x %>%
as_tibble() %>%
select(dep_index = PC1) %>%
mutate(dep_index = (dep_index - min(dep_index)) / diff(range(dep_index))) %>%
mutate(ID = geodemo %>% na.omit() %>% pull(ID))
# visualize univariate distribution of dep_index
dep_index %>%
ggplot(aes(dep_index)) +
geom_density(fill='lightgrey') +
labs(title='Distribution of deprivation index for all municipalities') +
xlab('deprivation index') +
geom_vline(aes(xintercept=0.24),linetype="dashed",color=c("blue")) +
geom_text(aes(x=0.24,y=1,label="mean",hjust=1.1))

Rcode

selection <- selection %>%
mutate(ID = geodemo %>% na.omit() %>% pull(ID))
d <- left_join(dep_index,selection,by = "ID")
# pairs plot including indices
d %>%
ungroup() %>%
select(-ID) %>%
gather(measure, value, -dep_index) %>%
ggplot(aes(dep_index, value,col=measure)) +
geom_point(alpha=0.5) +
facet_wrap(~ measure, scales='free') +
labs(title = 'Relationship of deprivation index with census variables') +
xlab('deprivation index') + ylab('') +
theme(legend.position="none")

3. Mapping deprivation in Belgium

To visually represent the deprivation scores of Belgian municipalities, we use library tmap which offers the possibility to perform a kmeans cluster analysis on the calculated deprivation scores**.

Rcode

library(rgdal)
# shapefile municipalities Belgium
munic <- readOGR(".","BEL_adm4")
munic <- merge(munic,d,by.x="ID_4",by.y="ID")
adminbel <- subset(geodemo,select=c("ID","admin"))
adminbel$admin <- as.character(adminbel$admin)
munic <- merge(munic,adminbel,by.x="ID_4",by.y="ID")
library(tmap)
var <- "dep_index"
tm_shape(munic) +
tm_fill(var, style="kmeans",palette="viridis",legend.hist=TRUE) +
tm_layout(legend.outside=TRUE,
main.title = "Deprivation index in Belgian municipalities, style=kmeans",title.size=0.75)

From the map it is clear that the distribution of social deprivation in Belgium shows a clear north-south divide, with substantially higher deprivation scores in the southern part of Belgium. Deprivation scores are highest in densely urban areas.

We further visualized the scores of municipalities on the variables used to construct a deprivation index with UMAP. Like tSNE, UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique. In the plot below, we colour the municipalities with the cluster to which they belong. Municipalities with low to moderate deprivation scores are situated in the upper part of the plot while municipalities with a high deprivation score are situated in the lower right quadrant of the plot.

Rcode

data <- munic@data
library(umap)
umapdata <- subset(data, select=c("income_index","couples_without_children","married",
"employment_rate_1564","active_pop","couples_with_children",
"average_HHsize","owner_buildings",
"noneuro_foreigners","ratio_FM","unemployed", "poverty_S",
"minimum_sub_income","one_parent_HH","moved_in","one_person_HH",
"criminality_index","euro_foreigners","foreigners"))
viz.umap <- umap(umapdata)
umap.labels <- data[, "class"]
plot.umap <- function(x, labels,
main="A UMAP visualization of census variables",
colors=c("darkgreen","darkblue","orange","darkred","purple"),
pad=0.1, cex=0.6, pch=19, add=FALSE, legend.suffix="",
cex.main=1, cex.legend=0.85) {
layout = x
if (is(x, "umap")) {
layout = x$layout
}
xylim = range(layout)
xylim = xylim + ((xylim[2]-xylim[1])*pad)*c(-0.5, 0.5)
if (!add) {
par(mar=c(0.2,0.7,1.2,0.7), ps=10)
plot(xylim, xylim, type="n", axes=F, frame=F)
rect(xylim[1], xylim[1], xylim[2], xylim[2], border="#aaaaaa", lwd=0.25)
}
points(layout[,1], layout[,2], col=colors[as.integer(labels)],
cex=cex, pch=pch)
mtext(side=3, main, cex=cex.main)
labels.u = unique(labels)
legend.pos = "topleft"
legend.text = as.character(labels.u)
if (add) {
legend.pos = "bottomleft"
legend.text = paste(as.character(labels.u), legend.suffix)
}
legend(legend.pos, legend=legend.text, inset=0.03,
col=colors[as.integer(labels.u)],
bty="n", pch=pch, cex=cex.legend)
}
umapdata <- as.data.frame(viz.umap$layout)
plot(umapdata,t="n",main="UMAP visualization of census variables",
xlab="UMAP dimension 1",ylab="UMAP","cex.main"=1,"cex.lab"=0.8)
text(umapdata,labels=data$admin,cex=0.45,col=colors[data$class])
legend("bottomright",legend=c("0 - 0.139 (low)", "0.139 - 0.247", "0.247 - 0.390", "0.390 - 0.629", "0.629 - 1 (high)"),
pch=19,col=c("darkgreen","darkblue","orange","darkred","purple"),title="deprivation index", cex=0.75)

Municipalities with a low to moderate social deprivation index are situated in the upper half of the plot, while municipalities with the highest deprivation levels are found in the lower right quadrant of the plot.

Rcode

# ranking deprivation index for municipalities (20 lowest & 210 highest)
rank <- subset(try,select=c("admin","dep_index"))
rank <- na.omit(rank)
df <- rank[order(rank$dep_index), ]
df$row <- 1:nrow(df)
df1 <- subset(df,row <= 20)
df2 <- subset(df,row >= 570)
KA <- rbind(df1,df2)
KA$type <- ifelse(KA$dep_index < 0.1,"deprivation index lowest 20","deprivation index highest 20")
KA <- KA[order(-KA$dep_index), ]
library(ggplot2)
ggplot(KA,aes(x=reorder(admin,dep_index),y=dep_index)) +
geom_bar(stat="identity",aes(fill=type),width=.5) +
coord_flip() +
labs(fill="Deprivation index\n(ranking municipalities)") +
xlab ("municipality") +
ylab ("deprivation index")

* Source : www.census2011.be (for a description of the variables, see www.census2011.be/info/gloss2_nl.html). The data on poverty were sourced from P. Marissal, X. May & D.M. Lombillo, Pauvreté rural et urbaine. Stedelijke en plattelandsarmoede, Programme Agora, Rapport final, 2013, pp. 102-105).

** In the same way as we created a general deprivation score, we also applied PCA on deprivation inhibiting factors (8), resp. deprivation enhancing factors (11) and calculated a score ranging from 0 to 1. Deprivation scores for municipalities by combinations of deprivation inhibition and deprivation enhancement (quantile scores) were visualized using a conditional map developed with GeoDa software (www.geodacenter.github.io).


Previous
Previous

Evolutie van de kansarmoede in Vlaanderen, 2001-2016

Next
Next

Mapping the results of the French presidential elections 2nd round, may 2017