Demographic statistics popularized by Hans Rosling’s TED talks.
library(gapminder)
gapminder
head(gapminder, n = 6)
## Systemic view
str(gapminder)
Hey, I’m a professional~ I wanna see the data in a systematic way, such as finding out
gapminder
nrow(gapminder)
ncol(gapminder)
names(gapminder)
str(gapminder)
Q: Tell me something about the population variable in the dataset, like, how many countries’ population we have, what the average, who has the largest and smallest population, and many other things! Btw, what type the pop
is stored?
head(gapminder$year, n = 10)
mean(gapminder$year, na.rm = TRUE)
median(gapminder$year)
min(gapminder$year)
max(gapminder$year)
length(gapminder$year)
summary(gapminder$year)
class(gapminder$gdpPercap)
typeof(gapminder$gdpPercap)
Welcome to the Tidyverse
Prevalent toolkit for data manipulation
Installation:
## install.packages("tidyverse")
library("tidyverse")
We focus on dplyr
today.
dplyr
They do one thing, but they do it well.
Making codes more readable.
Shortcut for %>%
:
You still remember str()
, right?
str(gapminder)
glimpse(gapminder)
Q: Which countries have the largest populations? And the smallest?
gapminder
gapminder %>%
arrange(pop)
arrange(gapminder, desc(pop))
Q: How many observations do we have in each continent? Do we have same number of observations in each countries in the same continent?
gapminder %>%
count(continent)
# gapminder %>%
# add_count(continent)
gapminder %>%
count(continent, country)
What does count()
give?
Q: What was the average GDP per capita and median life expectancy?
gapminder %>%
summarise(mean_gdp = mean(gdpPercap), median_life = median(lifeExp))
Q: What was the average GDP per capita and median life expectancy in each continent?
gapminder %>%
group_by(continent) %>%
summarise(mean_gdp = mean(gdpPercap), median_life = median(lifeExp))
Q: Which countries had the largest population in 2007?
gapminder %>%
arrange(desc(pop))
gapminder %>%
filter(year == 2007) %>%
arrange(desc(pop))
How about which country had the largest population in the decade ending with 2007? (Tip: using %in%
as a condition)
Q: If I want
gapminder %>%
select(country, year, pop)
gapminder %>%
select(-continent)
gapminder %>%
select(starts_with("co"))
Q: What’s the life expectancy of the country that had the largest population in 2007—showing the country name, population, and life expectancy together, please?
gapminder
gapminder %>%
filter(year == 2007) %>%
arrange(desc(pop)) %>%
select(country, pop, lifeExp)
Q: What’s the total GDP of each country?
gapminder %>%
mutate(gdp = pop * gdpPercap) %>%
select(country, pop, gdpPercap, gdp)
Q: How do we only keep the integers for all the numeric variables?
gapminder %>%
mutate_if(is.double, round, digits = 0)
When doing gapminder %>% ...
, you are NOT adding or changing anything of the gapminder
. If you want to save the changes, send the result to an object.
gapminderNew <- gapminder %>% ...
dplyr
functions wisely and in combo;
arrange
, count
, summarise
filter
, select
, mutate
group_by
and mutate_if
Q: I want to fill the missing in the x
, and combine y
and z
to one variable?
df_toy %>%
mutate(x = coalesce(x, 0L),
yz = coalesce(y, z))