Description
Load one of the built-in dataset in R. Make sure the dataset has at least 2+ numeric and 1+ categorical variable. (Avoid time series dataset for this assignment, we will have opportunity to work on that later.) You are welcome to use the inbuilt dataset used for Python assignment.
Create a RMarkdown file and include:
background or the context of data selected – sources, description of how it was collected, time period it represents, context in it was collected if available,
reason(s) why you selected it?
Description of the data:
how big is it (number of observations, variables),
how many numeric variables,
how many categorical variables,
description of the variables, if available
Are there any missing values?
Any duplicate rows?
Compute summary statistics (mean, median, mode, standard deviation, variance, range).
Select one categorical variable, compute these statistics on a numeric variable by grouping on a categorical variable
Adding visualizations to illustrate: 1. Relationship between variables 2. Trend3. Distribution of the variable(s)4. Comparison of summary statistics across categories
Ask an inferential question and use hypothesis test to answer it.
Record your observation. What did you find the most fascinating from your descriptive and inferential analysis.Data transformation with r
2023-03-04
Task Instructions.
Pick a data set of your choice from built-in datasets, check for data quality and showcase at
least 5 out of the 6 ways (filter, arrange, select, mutate, summarise, group_by) to transform
data.
Data Description.
For this assignment, i have chosen the inbuilt mtcars dataset in r to showcase at least 5
ways to transform data. We will start by loading the data into r.
data(mtcars)
library(dplyr) # Library for data transformation.
str(mtcars)# Checking the str of our data.
## ‘data.frame’:
32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 …
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 …
## $ disp: num 160 160 108 258 360 …
## $ hp : num 110 110 93 110 175 105 245 62 95 123 …
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 …
## $ wt : num 2.62 2.88 2.32 3.21 3.44 …
## $ qsec: num 16.5 17 18.6 19.4 17 …
## $ vs : num 0 0 1 1 0 1 0 1 1 1 …
## $ am : num 1 1 1 0 0 0 0 0 0 0 …
## $ gear: num 4 4 4 3 3 3 3 4 4 4 …
## $ carb: num 4 4 1 1 2 1 4 2 2 4 …
The dataset has 32 observations of 11 recorded variables about cars.
Data Transformation.
1. Filter.
Filtering allows us to select a subset of the data based on a condition. For example, let’s
filter the dataset to include only cars with mpg greater than or equal to 22.
mtcars_filter =22)
head(mtcars_filter)
##
## Datsun 710
## Merc 240D
## Merc 230
## Fiat 128
mpg cyl disp hp drat
wt qsec vs am gear carb
22.8
4 108.0 93 3.85 2.320 18.61 1 1
4
1
24.4
4 146.7 62 3.69 3.190 20.00 1 0
4
2
22.8
4 140.8 95 3.92 3.150 22.90 1 0
4
2
32.4
4 78.7 66 4.08 2.200 19.47 1 1
4
1
## Honda Civic
30.4
## Toyota Corolla 33.9
4
4
75.7 52 4.93 1.615 18.52
71.1 65 4.22 1.835 19.90
1
1
1
1
4
4
2
1
From the above output, we can see that only observations where mpg is greater or equal to
22 is displayed.
2. Arrange.
Arranging allows us to reorder the rows of the dataset based on one or more variables. For
example, let’s arrange the mtcars dataset by ascending order of horsepower (hp).
mtcars_arranged
Purchase answer to see full
attachment