RStudio and Big Data Assignment
The goal of this project is to perform data analysis using R. The techniques and
skills we have introduced in class will help you along the way.
Datasets: Dataframes flights, airlines in the nycflights13 package. Other
data can be integrated when needed.
In this project, you will need to read in the given dataset in RStudio and then
perform the following data analysis using R.
Part I. Reading in the dataset and basic analysis
Pat II. Visualizing relationships between pairs of variables
Part III. Manipulating/ joining/ transforming Data
Part IV. Summarizing data
For each of the above four topics, please design 5 interesting questions/tasks,
run R commands to get the answers or complete the tasks.
Sample Questions: Q1. What are the subjects of this dataset? Q2. How
many subjects are there? Q3. What type of variable is origin?Q4. What is
the most common origin? Q5. What is the minimum distance?
Sample Tasks: T1. Produce a table to summarize the origin variable. T2.
Produce a bar chart of the origin variable. T3. Calculate the mean and standard
deviation of the distance variable. T4. Produce a histogram of the distance
variable. T5. Visualize the relationship between arr_time and sched_arr_time
The report should include but not limited to the following components.
1. Title and author(s).
2. Project Description:
-Give a brief introduction of the project goal.
-Describe the basic information of the dataset.
3. Data Analysis and Results:
-In this section, please document all the data
analysis in the following four parts. For each part, document the
following: specific questions/tasks; R commands you used to solve the
problems/tasks; answers/results to the questions/ Running results of the
commands (tables, figures, or values).
o Part I. Reading in the dataset and basic analysis
o Part II. Visualizing relationships between pairs of variables
o Part III. Manipulating/ joining/ transforming Data
o Part IV. Summarizing data