As you know, ColOpenData can be used to access both
geospatial
and demographic
data from Colombia, in independent modules. However, we thought it would
be helpful to present a module that incorporates a way to merge
information between geospatial and demographic data. In this vignette
you will learn how to use the function
merge_geo_demographic()
.
Disclaimer: all data is loaded to the environment in the user’s R session, but is not downloaded to user’s computer.
Geospatial and demographic data can be merged based on the spatial aggregation level (SAL). While geospatial data can be aggregated down to the block level, demographic data is typically available only at the department and municipality levels. Therefore, these are the only SAL that can be accessed in both types of data for merging.
Now, the merge_geo_demographic()
function takes as a
parameter the demographic dataset of interest. Therefore, we should
first access the demographic documentation to know which dataset we want
to work with. Let’s suppose we want to select a dataset at the
department level. We can load all demographic available datasets and
then filter the level by the desired SAL.
datasets_dem <- list_datasets("demographic", "EN")
department_datasets <- datasets_dem[datasets_dem["level"] == "department", ]
head(department_datasets)
After reviewing the available datasets, we can select the one we wish to work with and take a closer look. For instance, let’s suppose we choose the dataset “DANE_CNPVPD_2018_14BPD”.
chosen_data
presents information regarding health
service attended by people that in the last thirty days had an illness,
accident, dental problem or other health problem. Now, we can use the
merge_geo_demographic()
function.
The simplified
argument downloads a simplified version
of the geometries. This is not recommended for very accurate
applications, but for a simple plot the approximation is enough. Also,
it makes the download process much faster. To override this, you could
use simplified = FALSE
.
merged_data <- merge_geo_demographic(
demographic_dataset =
"DANE_CNPVPD_2018_14BPD"
)
head(merged_data)
merged_data
presents geospatial information related to
departments, as well as the information related to the health service
attended by the population. We can use this dataset to visualize the
proportion of people in each department who used home remedies for
health issues. To achieve this, we will calculate the proportion by
dividing the count of people who reported using home remedies
(“uso_remedios_caseros”) by the total count of people who reported
experiencing a health problem in each department.
merged_data <- merged_data %>%
mutate(proportion_home_remedies = uso_remedios_caseros /
total_personas_que_tuvieron_alguna_enfermedad)
We can now plot the results
ggplot(data = merged_data) +
geom_sf(mapping = aes(fill = proportion_home_remedies), color = "white") +
theme_minimal() +
theme(
plot.background = element_rect(fill = "white", colour = "white"),
panel.background = element_rect(fill = "white", colour = "white"),
panel.grid = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5)
) +
scale_fill_gradient("Count", low = "#10bed2", high = "#deff00") +
ggtitle(
label = "Proportion of people who reported using home remedies to treat
a health problem",
subtitle = "Colombia"
)