Stratford University Amount of Water in Bottle Central Limit Theorem Discussion.
Central Limit Theorem
Initial Post (250+ words):
Collect some quantitative data. Find the sample mean and standard deviation. Plot it in a histogram. Does the data seem to follow the bell curve of the normal distribution? What features of the data do or do not fit in with the shape of the normal curve. How much deviation from the curve is to be expected?
Now perform a normality test on your data (Shapiro-Wilk test: http://sdittami.altervista.org/shapirotest/ShapiroTest.html or http://www.brianreedpowers.com/MAT240/stats/descriptiveStats.html)– the test will give you a p-value. The higher the p-value, the more closely your data follows the normal distribution. Based on the test, do you think your data could have been drawn from a normal distribution?
Responses (100+ words x2):
Choose two of your classmates’ data sets. Take 30 random samples of 5 data points each (one way: Past the data here http://www.randomizelist.com/ randomize the list and take the first 5 numbers, or use the sampling feature at http://www.brianreedpowers.com/MAT240/stats/descriptiveStats.html), and calculate the average for each of these samples. You will now have 30 sample means. Create and post a histogram for your sample means. What is the mean of these means? What is the standard deviation? Does this make sense based on the Central Limit Theorem? Do the sample means follow a normal distribution? What p-value does the normality test give? How and why does this differ from the original data?
Classmate #1 to respond to:
Since I am very curious to analyze the statistics for the USA’s Income and improvement, I have collected the Personal Income [ in Millions of dollars] for the past year 2018. My purpose to study this data is to check which states are getting a boost on Personal Income and which states still need consideration and funds from the government to improve its economy.
What is Personal Income by State?
The income people living in each state and the District of Columbia get from wages, proprietors’ income, dividends, interest, rents, and government benefits. These statistics help assess and compare the economic well-being of state residents. The data is collected from the website named Bureau of Economic Analysis which is an official website of the United States government. This data is collected by the Bureau of Economic Analysis from all the 50 States of the country.
The mean Personal Income [Millions of dollars] in all 50 states of the United States for the year 2018 is 344567.2.
The Standard deviation of Personal Income [Millions of dollars] in all 50 states of the United States for the year 2018 is 432699.8.
the data doesn’t seem to follow the bell curve of the normal distribution.
As the Personal Income for some states is too low and for some too high, the data is not normally distributed and do not fit in with the shape of the normal curve.
The data is not following the curve, but the deviation of the data is 432699.8.
Classmate #2 to respond to:
A survey was posted to social media to collect the heights of 30 people in inches for study. This is the data set that was used for analysis this week on normally distributed data. The sample size of 30 was chosen specifically because it is hypothesized that this is a large enough sample size to apply the Central Limit Theorem.
The sample mean and standard deviation were calculated. The mean, or average of the data, was found to be 63.6 inches and the sample standard deviation was calculated to be 9.1 inches. To understand the shape of the distribution a histogram was calculated and is presented below. Since there were 30 data points, it was calculated that 6 bins should be used, and these bins would have a width of 6.2. This is demonstrated in the figure.
There is a slight bell curve shape to the data, but the area in the right tail is too large to follow a strict bell-shaped pattern. We would expect a very small amount of counts in the outer two tails, with one central mode that is equal to the mean; the data here has two modes. The sample size is only 30, which is somewhat small, but large enough that one could expect to use the Central Limit Theorem to generalize a normal distribution, especially for data like height. However, small samples will always have more error in them, so the size of the sample could be expected to be causing the non-normal appearance of the data.
A normality test was performed for this data set using the Shapiro-Wilk test. This test will test the hypothesis that the data is normally distributed, vs the alternative hypothesis that the data is not normally distributed. The p-value that was calculated for this information was 0.939 at the 0.10 level. This p-value is large enough that the data would appear to be normal.