RStudio and Big Data Assignment

Project Description:

The goal of this project is to perform data analysis using R. The techniques and

skills we have introduced in class will help you along the way.

Datasets: Dataframes flights, airlines in the nycflights13 package. Other

data can be integrated when needed.

In this project, you will need to read in the given dataset in RStudio and then

perform the following data analysis using R.

Part I. Reading in the dataset and basic analysis

Pat II. Visualizing relationships between pairs of variables

Part III. Manipulating/ joining/ transforming Data

Part IV. Summarizing data

For each of the above four topics, please design 5 interesting questions/tasks,

run R commands to get the answers or complete the tasks.

Sample Questions: Q1. What are the subjects of this dataset? Q2. How

many subjects are there? Q3. What type of variable is origin?Q4. What is

the most common origin? Q5. What is the minimum distance?

Sample Tasks: T1. Produce a table to summarize the origin variable. T2.

Produce a bar chart of the origin variable. T3. Calculate the mean and standard

deviation of the distance variable. T4. Produce a histogram of the distance

variable. T5. Visualize the relationship between arr_time and sched_arr_time

using scatterplot.

Project Report

The report should include but not limited to the following components.

1. Title and author(s).

2. Project Description:

-Give a brief introduction of the project goal.

-Describe the basic information of the dataset.

3. Data Analysis and Results:

-In this section, please document all the data

analysis in the following four parts. For each part, document the

following: specific questions/tasks; R commands you used to solve the

problems/tasks; answers/results to the questions/ Running results of the

commands (tables, figures, or values).

o Part I. Reading in the dataset and basic analysis

§ Question/Task…

§ Commands…

§ Results…

o Part II. Visualizing relationships between pairs of variables

§ Question/Task…

§ Commands…

§ Results…

o Part III. Manipulating/ joining/ transforming Data

§ Question/Task…

§ Commands…

§ Results…

o Part IV. Summarizing data

§ Question/Task…

§ Commands…

§ Results…