In this practice, we will work with a dataset that comes with
ggplot2 package.
To load this data, run the following code.
library(ggplot2)
data(mpg)
You can now see the top few lines using the head()
function
head(mpg)
## # A tibble: 6 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
This dataset include fuel economy data for 38 different car models in 1999 and 2008. We will work with this dataset in the following this practice set.
In this activity, I will provide hints throughout. Only use the hints if you need them! Try to see what you can do without them first.
dplyr package
library(dplyr)
cty, for each vehicle class, class.
mpg |>
group_by(class)|>
summarize(city_mean = mean(cty))
hwy) for each manufacturer
(manufacturer).
max() for the highway mileage.
mpg |>
group_by(manufacturer) |>
summarize(hwy_max = max(hwy))
summarize()
mpg|>
group_by(year) |>
summarize(city_mean = mean(cty), hwy_mean = mean(hwy))
mpg|>
group_by(class, year) |>
summarize(city_mean = mean(cty)) mpg|>
group_by(class, year) |>
summarize(city_mean = mean(cty)) |>
filter(year == 2008)arrange() by the column you
made in the summarize function
mpg |>
group_by(manufacturer) |>
summarize(city_median = median(cty))|>
arrange(-city_median)
lm() with cty as the response
and year as the predictor
year_lm <- lm(data = mpg, cty ~ year )
summary() to view the statistical model
summary(year_lm)
There is no difference between the different years.
lm() with hwy as the response
and class as the predictor
class_lm <- lm(data = mpg, hwy ~ class )
summary() to view the statistical model
summary(class_lm)
There is a significant effect of vehicle class on highway gas mileage.
class on x axis and hwy on y axis.
ggplot(mpg, aes(x=class, y = hwy)) +
geom_boxplot() +
theme_classic()
There is a significant effect of vehicle class on highway gas mileage.
lm() function with cty as the predictor and hwy as
the response
mpg_lm <- lm(data = mpg, hwy ~ cty)
summary to view the results.
summary(mpg_lm)
There is a significant effect of city mileage on highway milegae.
cty on x axis and hwy on y axis. Add a
geom_smooth() layer
ggplot(mpg, aes(x=cty, y = hwy)) +
geom_point() +
geom_smooth(method = 'lm') +
theme_classic()
Here, we will look at the two dplyr datasets
band_instruments and band_members.
Use the function head() to look at these two
datasets
library(dplyr)
head(band_instruments)
## # A tibble: 3 × 2
## name plays
## <chr> <chr>
## 1 John guitar
## 2 Paul bass
## 3 Keith guitar
head(band_members)
## # A tibble: 3 × 2
## name band
## <chr> <chr>
## 1 Mick Stones
## 2 John Beatles
## 3 Paul Beatles
Join the two data sets to include all names in both data frames.
Answer
band_instruments |>
full_join(band_members)
Join the two data sets to include only names that are found in both data frames.
Answer
band_instruments |>
inner_join(band_members)
Join the two data sets to include only names that are
found in the band_instruments data frames.
Answer
band_instruments |>
left_join(band_members)
or
band_members |>
right_join(band_instruments)
Join the two data sets to include only names that are
found in the band_members data frames.
Answer
band_instruments |>
right_join(band_members)
or
band_members |>
left_join(band_instruments)