The usdatasets
package provides a comprehensive
collection of U.S. datasets, encompassing various fields such as crime,
economics, education, finance, energy, healthcare, and more.
This package serves as a valuable resource for researchers and analysts
seeking to perform in-depth analyses and derive insights from
U.S.-specific data.
To facilitate the identification of data types, a suffix is added to the end of the name of each dataset. These suffixes indicate the format and type of the datasets, such as:
tbl_df
: A tibble data framedf
: A standard data framets
: A time series objectmatrix
: A matrix objectcharacter
: A character vectornumeric
: A numeric vectorfactor
: A factor variableHere are some examples of datasets included in the
usdatasets
package:
marathon_tbl_df: A tibble containing marathon race data, including runner statistics and performance metrics.
mn_police_use_of_force_df: A data frame documenting incidents of police use of force in Minnesota.
nba_players_19_tbl_df: A tibble that includes data on NBA players for the 2019 season.
ncbirths_tbl_df: A tibble summarizing birth statistics across various demographics.
nyc_marathon_tbl_df: A tibble containing results and statistics from the New York City Marathon.
nycvehiclethefts_tbl_df: A data frame documenting vehicle theft incidents in New York City.
To illustrate the data, we can use the ggplot2
package
to create some visualizations. Here are a few examples:
# Example: Visualizing finish times of the NYC Marathon
# Ajustado para las columnas disponibles en 'marathon_tbl_df'
marathon_tbl_df %>%
ggplot(aes(x = year, y = time, color = gender)) +
geom_point(alpha = 0.6) +
labs(title = "Marathon Finish Times by Year and Gender",
x = "Year",
y = "Finish Time (minutes)",
color = "Gender") +
theme_minimal()
# Example: Visualizing police use of force incidents by race
mn_police_use_of_force_df %>%
group_by(race) %>%
summarize(count = n()) %>%
ggplot(aes(x = reorder(race, count), y = count, fill = race)) +
geom_bar(stat = "identity") +
labs(title = "Incidents of Police Use of Force by Race",
x = "Race",
y = "Number of Incidents") +
theme_minimal() +
coord_flip()
The usdatasets
package is an invaluable tool for those
looking to analyze and derive insights from a variety of U.S.-specific
datasets. The suffixes used in the dataset names help users quickly
identify the type of data they are working with, facilitating a smoother
analysis process.
For more information and to explore the datasets, please refer to the package documentation.