The R package wiesbaden
provides functions to directly
retrieve data from databases maintained by the Federal Statistical
Office of Germany (DESTATIS) in Wiesbaden.
Access to the following databases is implemented:
To access any of the databases using this package, you need to register on the respective website to get a personal login name and password. The registration is free.
To authenticate, supply a vector with your user name, password, and
database shortcut (“regio”, “de”, “nrw”, “bm”) as an argument for the
genesis
parameter whenever you call a
retrieve_*
function:
c(user="your-username", password="your-password", db="database-shortname")
Alternatively, you can use save_credentials()
to store
the credentials on your computer. This function relies on the keyring package. For more
details about how credentials are stored by this package, see the
keyring package documentation.
Use the function test_login()
to check if your
login/password combination allows you to access the respective database
(and if the server is functioning properly).
library(wiesbaden)
# Assuming credentials are stored via save_credentials()
test_login(genesis=c(db='regio'))
#> [1] "Sie wurden erfolgreich an- und abgemeldet."
# ... or supply password/username
test_login(genesis=c(db='regio', user="your-username", password="your-password"))
#> [1] "Sie wurden erfolgreich an- und abgemeldet."
The available data are organized by themes (“Themen”) and subthemes. To get a list of all available themes go to the respective database website (links above) and click on “Themen”. Each theme typically comes with multiple subthemes.
Suppose we want to download the federal election results on the
county level from regionalstatistik.de.
This data is available in the theme “Wahlen” which has the code
14
. The federal election results are available in subtheme
141
.
Using retrieve_datalist()
, download a
data.frame
of all available data cubes in theme
141
:
Note, we are assuming that credentials are stored via
save_credentials()
.
Use grepl
(or str_detect()
from the
stringr
package) to filter cubes with a description that
contains the word “Kreise” (county):
subset(d, grepl("Kreise", description))
#> tablename
#> 1 14111KJ001
#> 2 14111KJ002
#> description
#> 1 Wahlberechtigte, Wahlbeteiligung, Gültige Zweitstimmen, Kreise und kreisfreie Städte, Stichtag
#> 2 Gültige Zweitstimmen, Kreise und kreisfreie Städte, Parteien, Stichtag
Having identified the correct data cube, call
retrieve_data()
to download the data:
head(data)
#> id14111 KREISE PART04 STAG WAHL09_val WAHL09_qual WAHL09_lock
#> 1 D 01001 AFD 22.09.2013 1855 e NA
#> 2 D 01001 AFD 24.09.2017 3702 e NA
#> 3 D 01001 B90-GRUENE 16.10.1994 4651 e NA
#> 4 D 01001 B90-GRUENE 27.09.1998 3815 e NA
#> 5 D 01001 B90-GRUENE 22.09.2002 5556 e NA
#> 6 D 01001 B90-GRUENE 18.09.2005 5028 e NA
#> WAHL09_err
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
The data are organized in long format: For each combination of
KREIS
(county), PART04
(political party) and
STAG
(election date) there is a vote count
(WAHL09_value
). Please see help file for the information on
the additional variables (*_qual, *_lock, *_err).
To get the metadata for each variable, call
retrieve_metadata()
:
retrieve_metadata(tablename="14111KJ002", genesis=c(db='regio'))
#> name description unit
#> 1 WAHL09 Gültige Zweitstimmen Anzahl
#> 2 STAG Stichtag
#> 3 PART04 Parteien
#> 4 KREISE Kreise und kreisfreie Städte
To get the value labels for the variable PART04
, call
retrieve_valuelabel()
:
retrieve_valuelabel("PART04", genesis=c(db='regio'))
#> PART04 description
#> 1 AFD AfD
#> 2 B90-GRUENE GRÜNE
#> 3 CDU CDU/CSU
#> 4 DIELINKE DIE LINKE
#> 5 FDP FDP
#> 6 SONSTIGE Sonstige Parteien
#> 7 SPD SPD
This function also works with the other variables (e.g.,
KREIS
).