Additional practice questions exam 2

Set up

In this practice, we will work with a dataset that comes with ggplot2 package.

To load this data, run the following code.

library(ggplot2)
data(mpg)

You can now see the top few lines using the head() function

head(mpg)

## # A tibble: 6 × 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
## 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…

This dataset include fuel economy data for 38 different car models in 1999 and 2008. We will work with this dataset in the following this practice set.

In this activity, I will provide hints throughout. Only use the hints if you need them! Try to see what you can do without them first.

Summary tables

Before you begin the summaries, make sure you load the package to do summaries!

Hint
Load in the dplyr package
Answer
```
 library(dplyr)
```
Make a summary table to include show the average city gas mileage, cty, for each vehicle class, class.

Hint
Group by class and summarise based on cty
Answer
```
 mpg |>
   group_by(class)|>
   summarize(city_mean = mean(cty))
```
Make a summary table to show the maximum highway gas mileage (hwy) for each manufacturer (manufacturer).

Hint
Group by the manufacturer and summarize to calculate the max() for the highway mileage.
Answer
```
 mpg |> 
   group_by(manufacturer) |>
   summarize(hwy_max = max(hwy))
```
Make a summary table to show the city gas mileage and the highway gas mileage for each year.

Hint
Group by the year. Then add two different summaries in summarize()
Answer
```
    mpg|>
   group_by(year) |>
   summarize(city_mean = mean(cty), hwy_mean = mean(hwy))
```
Make a summary table to show the mean city gas mileage for each class type and year.

Hint
Group by the year and class
Answer
```
    mpg|>
   group_by(class, year) |>
   summarize(city_mean = mean(cty))
```
Filter the summary table you just make to only include the year 2008.

Hint
Add a filter to the code you just used to make the table after a pipe.
Answer
```
    mpg|>
   group_by(class, year) |>
   summarize(city_mean = mean(cty)) |>
   filter(year == 2008)
```
Make a summary table to display the median city mileage for every manufacturer. Then, sort to find which manufacturer has the highest gas mileage.

Hint
Make the summary table and then arrange() by the column you made in the summarize function
Answer
```
 mpg |>
   group_by(manufacturer) |>
   summarize(city_median = median(cty))|>
   arrange(-city_median)
```

Statistics

Do a statistical test to determine if the year the car was manufactured affected the city gas mileage.

Hint
Use the function lm() with cty as the response and year as the predictor
Answer
```
 year_lm <- lm(data = mpg, cty ~ year )
```
View the statistical test. Is there a significant difference in the gas mileage for the different years?

Hint
Use the function summary() to view the statistical model
Answer
```
   summary(year_lm)
```
There is no difference between the different years.
Do a statistical test to see if the highway gas mileage is predicted by the vehicle class

Hint
Use the function lm() with hwy as the response and class as the predictor
Answer
```
 class_lm <- lm(data = mpg, hwy ~ class )
```
View the statistical test. Is there a significant difference in the gas mileage based on vehicle class?

Hint
Use the function summary() to view the statistical model
Answer
```
   summary(class_lm)
```
There is a significant effect of vehicle class on highway gas mileage.
Make a boxplot to show the highway gas mileage for each vehicle class.

Hint
Use your ggplot functions to make a boxplot with the mpg data, with class on x axis and hwy on y axis.
Answer
```
   ggplot(mpg, aes(x=class, y = hwy)) +
   geom_boxplot() +
   theme_classic()
```
There is a significant effect of vehicle class on highway gas mileage.
Based on the plot, what class had the lowest highway mileage.
Perform a statistical test to see if city gas mileage predicts highway gas mileage.

Hint
Use the lm() function with cty as the predictor and hwy as the response
Answer
```
 mpg_lm <- lm(data = mpg, hwy ~ cty)
```
Is the result significant?

Hint
Use the function summary to view the results.
Answer
```
 summary(mpg_lm)
```
There is a significant effect of city mileage on highway milegae.
Plot a scatterplot of the city and highway gas mileage and add the best fit line.

Hint
Use your ggplot functions to make a boxplot with the mpg data, with cty on x axis and hwy on y axis. Add a geom_smooth() layer
Answer
```
   ggplot(mpg, aes(x=cty, y = hwy)) +
   geom_point() +
   geom_smooth(method = 'lm') +
   theme_classic()
```

Joins

Here, we will look at the two dplyr datasets band_instruments and band_members.

Use the function head() to look at these two datasets

library(dplyr)
head(band_instruments)

## # A tibble: 3 × 2
##   name  plays 
##   <chr> <chr> 
## 1 John  guitar
## 2 Paul  bass  
## 3 Keith guitar

head(band_members)

## # A tibble: 3 × 2
##   name  band   
##   <chr> <chr>  
## 1 Mick  Stones 
## 2 John  Beatles
## 3 Paul  Beatles

Join the two data sets to include all names in both data frames.
Answer
```
 band_instruments |>
   full_join(band_members)
```
Join the two data sets to include only names that are found in both data frames.
Answer
```
 band_instruments |>
   inner_join(band_members)
```
Join the two data sets to include only names that are found in the band_instruments data frames.
Answer
```
 band_instruments |>
   left_join(band_members)
```
or
```
 band_members |>
   right_join(band_instruments)
```

Join the two data sets to include only names that are found in the band_members data frames.

Answer

 band_instruments |>
   right_join(band_members)

 band_members |>
   left_join(band_instruments)