Dataset 1
Assignment : Hypothesis Testing
Type the last three digits of your student number in...

Question

Dataset 1
		Assignment : Hypothesis Testing
		Type the last three digits of your student number in the green cell:
								724
		 XXXXXXXXXX
		 XXXXXXXXXX		328
		Dataset 1
			7
	20		Groups			Frequencies
	0.1		300	to	307	12	 XXXXXXXXXX
	0.8		307	to	314	18	 XXXXXXXXXX
	62	0.67	314	to	321	44	 XXXXXXXXXX
	0.8	0.76	321	to	328	88	 XXXXXXXXXX
		1	328	to	335	86	 XXXXXXXXXX
		0.73	335	to	342	41	 XXXXXXXXXX
	 XXXXXXXXXX	0.45	342	to	349	15	 XXXXXXXXXX
	 XXXXXXXXXX	0.42	349	to	356	9	 XXXXXXXXXX
	 XXXXXXXXXX	0.14
	 XXXXXXXXXX	0
	 XXXXXXXXXX
	 XXXXXXXXXX
	 XXXXXXXXXX
	 XXXXXXXXXX
&"Helvetica Neue,Regular"&12&K000000&P	
Dataset 2
	 XXXXXXXXXX	Assignment : Hypothesis Testing
	 XXXXXXXXXX
	 XXXXXXXXXX	Dataset 2
	20
	0.1	Part (a)
		207.20	202.13	196.93	198.16	197.74	198.15
	0.8	207.65	203.68	197.13	197.06	196.60	197.55
		208.93	202.22	198.63	197.09	197.40	198.36
		207.51	201.32	197.97	198.31	198.04	198.78
		206.02	200.07	196.67	199.85	199.05	200.31
		205.84	199.09	197.67	198.40	200.32	199.29
		204.36	198.89	196.90	197.34	199.11	200.46
		Part (b)
		203.64	197.56	198.07	198.70	198.13	202.00
		203.23	198.43	199.61	197.65	198.25	200.55
		198.56	199.07	199.70	199.13	203.00	204.23
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
&"Helvetica Neue,Regular"&12&K000000&P	
Dataset 3
	 XXXXXXXXXX	Assignment : Hypothesis Testing
	 XXXXXXXXXX
	 XXXXXXXXXX	Dataset 3
	20
	0.1	12
	1500	List (a)
	10	1571.96	1521.34	1469.33	1481.63	1477.40	1481.47	1500.52
	0.8	1576.48	1536.76	1471.25	1470.57	1465.99	1475.55	1470.84
		1589.26	1522.16	1486.27	1470.91	1473.99	1483.60	1504.37
		1575.09	1513.18	1479.72	1483.13	1480.39	1487.82
		List (b)
		1548.18	1488.69	1454.72	1486.54	1478.50	1491.10	1477.71
		1546.39	1478.92	1464.75	1472.01	1491.17	1480.93	1489.03
		1531.61	1476.90	1457.05	1461.36	1479.14	1492.60	1472.54
		1524.35	1463.57	1468.71	1474.99	1469.29	1508.04	1484.82
		1520.35	1472.29	1484.15	1464.52	1470.48	1493.46
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
&"Helvetica Neue,Regular"&12&K000000&P	
Dataset 4
	 XXXXXXXXXX
	 XXXXXXXXXX	Assignment : Hypothesis Testing
	20	Dataset 4
	0.1
	0.8		Resistance:
	14
	0.8		Motor running	Motor not running
		0.86	15.72	15.12	 XXXXXXXXXX	0.2
		0.9	15.80	15.18	 XXXXXXXXXX	0.18
	 XXXXXXXXXX	1	16.00	15.27	 XXXXXXXXXX	0.07
	 XXXXXXXXXX	0.88	15.76	15.10	 XXXXXXXXXX	0.14
	 XXXXXXXXXX	0.76	15.52	14.74	 XXXXXXXXXX	0.02
	 XXXXXXXXXX	0.75	15.50	14.74	 XXXXXXXXXX	0.04
	 XXXXXXXXXX	0.63	15.26	14.62	 XXXXXXXXXX	0.16
	 XXXXXXXXXX	0.57	15.14	14.45	 XXXXXXXXXX	0.11
	 XXXXXXXXXX	0.54	15.08	14.28	 XXXXXXXXXX	0
	 XXXXXXXXXX	0.45	14.90	14.18	 XXXXXXXXXX	0.08
	 XXXXXXXXXX	0.57	15.14	14.36	 XXXXXXXXXX	0.02
	 XXXXXXXXXX	0.45	14.90	14.21	 XXXXXXXXXX	0.11
	 XXXXXXXXXX	0.38	14.76	14.20	 XXXXXXXXXX	0.24
	 XXXXXXXXXX	0.28	14.56	13.88	 XXXXXXXXXX	0.12
	 XXXXXXXXXX	0.2	14.40	13.63	 XXXXXXXXXX	0.03
	 XXXXXXXXXX	0.18	14.36	13.59	 XXXXXXXXXX	0.03
	 XXXXXXXXXX	0.07	14.14	13.47	 XXXXXXXXXX	0.13
	 XXXXXXXXXX	0.14	14.28	13.74	 XXXXXXXXXX	0.26
	 XXXXXXXXXX	0.72	15.44	14.78	 XXXXXXXXXX	0.14
	 XXXXXXXXXX	0.72	15.44	14.69	 XXXXXXXXXX	0.05
&"Helvetica Neue,Regular"&12&K000000&P	
Dataset 5
	 XXXXXXXXXX
	 XXXXXXXXXX	Assignment : Hypothesis Testing
	0.8
	28	Dataset 5		132
	0.2			0.15
				4.8
	62		Additive	Yield
	 XXXXXXXXXX	1	77	119.68	 XXXXXXXXXX	0.48
	0.2	 XXXXXXXXXX	74.37	119.26	 XXXXXXXXXX	0.31
	62.2	 XXXXXXXXXX	75.1	119.78	 XXXXXXXXXX	0.44
	 XXXXXXXXXX	 XXXXXXXXXX	76.07	119.29	 XXXXXXXXXX	0.37
	 XXXXXXXXXX	 XXXXXXXXXX	73.74	118.88	 XXXXXXXXXX	0.21
	 XXXXXXXXXX	 XXXXXXXXXX	72.83	119.49	 XXXXXXXXXX	0.31
	 XXXXXXXXXX	 XXXXXXXXXX	71.81	118.92	 XXXXXXXXXX	0.16
	 XXXXXXXXXX	 XXXXXXXXXX	70.94	118.72	 XXXXXXXXXX	0.09
	 XXXXXXXXXX	 XXXXXXXXXX	68.53	119.08	 XXXXXXXXXX	0.09
	 XXXXXXXXXX	 XXXXXXXXXX	65.94	119.04	 XXXXXXXXXX	0
	 XXXXXXXXXX	 XXXXXXXXXX	68.11	119.29	 XXXXXXXXXX	0.12
	 XXXXXXXXXX	 XXXXXXXXXX	70.43	119.85	 XXXXXXXXXX	0.31
	 XXXXXXXXXX	 XXXXXXXXXX	70.61	120.5	 XXXXXXXXXX	0.45
	 XXXXXXXXXX	 XXXXXXXXXX	69.07	120.06	 XXXXXXXXXX	0.31
	 XXXXXXXXXX
	 XXXXXXXXXX
	 XXXXXXXXXX
&"Helvetica Neue,Regular"&12&K000000&P	
Dataset 6
		Assignment : Hypothesis Testing
	15	Dataset 6
	0.12
			G1	G2	G3
		A	10	13	14
		B	18	11	6
	School	C	16	20	17
		D	12	25	13
		E	5	22	14
		 XXXXXXXXXX
		 XXXXXXXXXX	0.8	 XXXXXXXXXX	 XXXXXXXXXX
		 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
			 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
			 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
			 XXXXXXXXXX	 XXXXXXXXXX	 XXXXXXXXXX
				 XXXXXXXXXX
			 XXXXXXXXXX
			 XXXXXXXXXX
&"Helvetica Neue,Regular"&12&K000000&P	
Reference
			3
			724
		Student numbers	Last3		Seed value
		B XXXXXXXXXX	204	1	0.8
		B XXXXXXXXXX	224	1
		B XXXXXXXXXX	476	1
		B XXXXXXXXXX	935	0
		B XXXXXXXXXX	479	1
		B XXXXXXXXXX	662	1
		B XXXXXXXXXX	463	1
		B XXXXXXXXXX	837	0
		B XXXXXXXXXX	309	1
		B XXXXXXXXXX	219	1
		B XXXXXXXXXX	85	1
		B XXXXXXXXXX	353	1
		B XXXXXXXXXX	414	1
		B XXXXXXXXXX	800	0
		B XXXXXXXXXX	347	1
		B XXXXXXXXXX	307	1
		B XXXXXXXXXX	44	1
		B XXXXXXXXXX	110	1
		B XXXXXXXXXX	967	0
		B XXXXXXXXXX	464	1
		B XXXXXXXXXX	570	1
		B XXXXXXXXXX	650	1
		B XXXXXXXXXX	882	0
		B XXXXXXXXXX	304	1
		B XXXXXXXXXX	276	1
		B XXXXXXXXXX	488	1
		B XXXXXXXXXX	585	1
		B XXXXXXXXXX	295	1
		B XXXXXXXXXX	300	1
		B XXXXXXXXXX	346	1
		B XXXXXXXXXX	534	1
		B XXXXXXXXXX	448	1
		B XXXXXXXXXX	233	1
		B XXXXXXXXXX	32	1
		B XXXXXXXXXX	458	1
		B XXXXXXXXXX	56	1
		B XXXXXXXXXX	582	1
		B XXXXXXXXXX	439	1
		B XXXXXXXXXX	196	1
		B XXXXXXXXXX	627	1
		B XXXXXXXXXX	455	1
		B XXXXXXXXXX	814	0
		B XXXXXXXXXX	322	1
		B XXXXXXXXXX	901	0
		B XXXXXXXXXX	724	0
		B XXXXXXXXXX	328	1
		B XXXXXXXXXX	853	0
		B XXXXXXXXXX	7	1
		B XXXXXXXXXX	463	1
		B XXXXXXXXXX	522	1
		B XXXXXXXXXX	878	0
		B XXXXXXXXXX	983	0
		B XXXXXXXXXX	503	1
		B XXXXXXXXXX	367	1
		B XXXXXXXXXX	975	0
		B XXXXXXXXXX	51	1
		B XXXXXXXXXX	146	1
		B XXXXXXXXXX	765	0
		B XXXXXXXXXX	210	1
		B XXXXXXXXXX	959	0
		B XXXXXXXXXX	834	0
		B XXXXXXXXXX	572	1
		B XXXXXXXXXX	67	1
		B XXXXXXXXXX	640	1
		B XXXXXXXXXX	863	0
		B XXXXXXXXXX	876	0
		B XXXXXXXXXX	39	1
		B XXXXXXXXXX	956	0
		B XXXXXXXXXX	73	1
		B XXXXXXXXXX	969	0
		B XXXXXXXXXX	10	1
		B XXXXXXXXXX	688	1
		B XXXXXXXXXX	187	1
		B XXXXXXXXXX	882	0
		B XXXXXXXXXX	112	1
		B XXXXXXXXXX	282	1
		B XXXXXXXXXX	654	1
		B XXXXXXXXXX	373	1
		B XXXXXXXXXX	176	1
		B XXXXXXXXXX	24	1
		B XXXXXXXXXX	972	0
		B XXXXXXXXXX	339	1
		B XXXXXXXXXX	312	1
		B XXXXXXXXXX	10	1
		B XXXXXXXXXX	95	1
		B XXXXXXXXXX	610	1
		B XXXXXXXXXX	198	1
		B XXXXXXXXXX	430	1
		B XXXXXXXXXX	754	0
		B XXXXXXXXXX	841	0
		B XXXXXXXXXX	311	1
		B XXXXXXXXXX	946	0
		B XXXXXXXXXX	852	0
		B XXXXXXXXXX	676	1
		B XXXXXXXXXX	770	0
		B XXXXXXXXXX	18	1
		B XXXXXXXXXX	255	1
		B XXXXXXXXXX	502	1
		B XXXXXXXXXX	838	0
		B XXXXXXXXXX	111	1
		B XXXXXXXXXX	30	1
&"Helvetica Neue,Regular"&12&K000000&P	
 
Statistics and Probability
Assignment on Hypothesis Testing
January 30, 2023
Instructions
This document contains the questions for your assignment project
on Statistical Testing. The questions refer to the data given in the
individual worksheets in Excel document ‘Assignment Datasets.xlsx’.
Please read the following points.
1. All submissions must be in the form of PDF documents. Spread-
sheets exported to PDF will be accepted, but calculations must
be annotated or explained.
2. It is up to you how you do the calculations in each question, but
you must explain how you arrived at your answer for any given
calculation. This can be done with a written explanation and
by using the relevant equations, along with showing the results
of intermediate stages of the calculations. In other words, you
need to show that you know how to do a calculation for a statistic
other than using spreadsheet functions.
3. Each one of the questions involves a statistical test. Marks within
each question will generally be awarded for:
1
• Deciding which statistical test to use,
• Framing your Hypotheses and proper conclusions,
• Identifying the parameters for the test and
• Showing a reasonable level of clarity, detail and explanation
in the calculations needed to carry out the test.
4. The data you have been given is in the worksheets of an Excel
spreadsheet. This spreadsheet is locked against editing. Please
to not try to circumvent this; if you wish to use a spreadsheet to
do your calculations, you should copy and paste your data into
your own spreadsheet and work with that.
Question 1
The lifetimes (in units of 106 seconds) of certain satellite components
are shown in the frequency distribution given in ‘Dataset1’.
1. Draw a frequency polygon, histogram and cumulative frequency
polygon for the data.
2. Calculate the frequency mean, the frequency standard deviation,
the median and the first and third quartiles for this grouped data.
3. Compare the median and the mean and state what this indicates
about the distribution. Comment on how the answer to this ques-
tion relates to your frequency polygon and histogram.
4. Explain the logic behind the equations for the mean and standard
deviation for grouped data, starting from the original equations
for a simple list of data values. (This does not just mean ’explain
how the equations are used’.)
Page 2
5. Carry out an appropriate statistical test to determine whether the
data is normally distributed.
Question 2
A manufacturer of metal plates makes two claims concerning the
thickness of the plates they produce. They are stated here:
• Statement A: The mean is 200mm
• Statement B: The variance is 1.5mm2.
To investigate Statement A, the thickness of a sample of metal plates
produced in a given shift was measured. The values found are listed
in Part (a) of worksheet ‘Dataset2’, with millimetres (mm) as unit.
1. Calculate the sample mean and sample standard deviation for the
data in Part (a) of ’Dataset2’. Explain why we are using the phrase
’sample’ mean or sample’ standard deviation.
2. Set up the framework of an appropriate statistical test on State-
ment A. Explain how knowing the sample mean before carrying
out the test will influence the structure of your test.
3. Carry out the statistical test and state your conclusions.
To investigate the second claim, the thickness of a second sample of
metal sheets was measured. The values found are listed in Part (b) of
worksheet ‘Dataset2’, with millimetres (mm) as unit.
1. Calculate the sample mean and then the sample variance and
standard deviation for the data in Part (b).
Page 3
2. Set up the framework of an appropriate statistical test on State-
ment B. Explain how knowing the sample variance before carry-
ing out the test would influence the structure of your test.
3. Carry out the statistical test and state your conclusions.
Question 3
A manager of an inter-county hurling team is concerned that his team
lose matches because they ‘fade away’ in the last ten minutes. He
has measured GPS data showing how much ground particular players
cover within a given time period; this is the data in list (a) in worksheet
‘Dataset3’. He has acquired the corresponding data from an opposing,
more successful team, which is given in list (b).
1. Calculate the sample mean and sample standard deviation for the
two sets of data.
2. Set up the frame work of an appropriate statistical test to deter-
mine whether there is a difference in the distances covered by the
two groups of players.
3. Explain how having the results of the calculations above in ad-
vance of doing your statistical test will influence the structure of
that test.
4. Carry out the statistical test and state your conclusions.
Question 4
A study was carried out to determine whether the resistance of the
control circuits in a machine are lower when the machine motor is
Page 4
running. To investigate this question, a set of the control circuits was
tested as follows. Their resistance was measured while the machine
motor was not running for a certain period of time and then again
while the motor was running. The values found are listed in worksheet
‘Dataset4’, with kilo-Ohms as the unit of measurement.
1. Set up the structure of an appropriate statistical test to determine
whether the resistance of the control circuit in a machine are
lower when the machine motor is running.
2. Explain how the order of subtraction chosen to calculate the dif-
ferences will influence the structure of the test.
3. Give a reason why the data is measured with the engine not run-
ning first and then with the engine running.
4. Explain how knowing the mean of the differences in advance will
influence the structure of your statistical test.
5. Carry out the statistical test and state your conclusions.
Question 5
A study was carried out to determine the influence of a trace element
found in soil on the yield of potato plants grown in that soil, defined as
the weight of potatoes produced at the end of the season. A large field
was divided up into 14 smaller sections for this experiment. For each
section, the experimenter recorded the amount of the trace element
found (in milligrams per metre squared) and the corresponding weight
of the potatoes produced (in kilograms). This information is presented
in the worksheet ‘Dataset5’ in the Excel document. Define X as the
trace element amount and Y as the yield.
Page 5
1. Draw a scatterplot of your data set.
2. Calculate the coefficients of a linear equation to predict the yield
Y as a function of X.
3. Calculate the correlation coefficient for the paired data values.
4. Set up the framework for an appropriate statistical test to estab-
lish if there is a correlation between the amount of the trace ele-
ment and the yield. Explain how having the scatterplot referred
to above and having the value of r in advance will influence the
structure of your statistical test.
5. Carry out and state the conclusion of your test on the correlation.
6. Comment on how well the regression equation will perform based
on the results above.
Question 6
A multinational corporation is conducting a study to see how its em-
ployees in five different countries respond to three gifts in an incentive
scheme. The numbers of employees who choose each of the three gifts
(G1 to G3) in each of the five countries (A to E) are given in the table
in ‘Dataset6’ in the Excel document.
1. Set up the structure of an appropriate statistical test to deter-
mine whether the data supports a link between choice of gift and
country, including the statistic to be used.
2. Carry out this test, showing clearly in your work how the expected
values are calculated for your test statistic.
Page 6

Atul · Accepted Answer

Question 1 
Groups Frequencies 
300 to 307 12 
307 to 314 18 
314 to 321 44 
321 to 328 88 
328 to 335 86 
335 to 342 41 
342 to 349 15 
349 to 356 9
The lifetimes (in units of 106 seconds) of certain satellite components are shown in the 
frequency distribution given in ‘Dataset1’.
 1. Draw a frequency polygon, histogram and cumulative frequency polygon for the 
data.
To draw the frequency polygon, we first need to calculate the midpoints of each group: 
 
Groups Frequencies Midpoints 
300 to 307 12 303.5 
307 to 314 18 310.5 
314 to 321 44 317.5 
321 to 328 88 324.5 
328 to 335 86 331.5 
335 to 342 41 338.5 
342 to 349 15 345.5 
349 to 356 9 352.5
Histogram 

Finally, to draw the cumulative frequency polygon, we need to calculate the cumulative 
frequencies:
 
 
Groups Midpoints Frequencies 
Cumulative 
Frequencies 
300-307 303.5 12 12 
307-314 310.5 18 30 
314-321 317.5 44 74 
321-328 324.5 88 162 
328-335 331.5 86 248 
335-342 338.5 41 289 
342-349 345.5 15 304 
349-356 352.5 9 313
2. Calculate the frequency mean, the frequency standard deviation, the median and the 
first and third quartiles for this grouped data.
To calculate the mean, we need to use the midpoint of each group and the frequency of each 
group:
Midpoint    Frequency    Midpoint * Frequency 
---------------------------------------------- 
  303.5         12           3,642 
  310.5         18           5,589 
  317.5         44          13,970 
  324.5         88          28,556 
  331.5         86          28,521 
  338.5         41          13,895 
  345.5         15           5,183 
  352.5          9           3,173 
---------------------------------------------- 
                              102,509
The total frequency is 323, so the frequency mean is: 
Mean = 102,509 / 323 ≈ 317.44 
To calculate the frequency standard deviation, we need to find the variance first. We can use 
the formula: 
Variance = (Σ (f * (x - mean)²)) / (N - 1) 
where f is the frequency, x is the midpoint, mean is the frequency mean, and N is the total 
frequency. 
Midpoint    Frequency    x - mean      (x - mean)²    f * (x - mean)² 
-------------------------------------------------------------------- 
  303.5         12        -13.94         193.8756        2326.5072 
  310.5         18         -6.94          48.1636         866.9450 
  317.5         44          0.56           0.3136          13.7904 
  324.5         88          7.56          57.1536        5028.9792 
  331.5         86         14.56         211.9936       18223.0784 
  338.5         41         21.56         464.6736       19029.7856 
  345.5         15         28.56         817.5936       12263.9040 
  352.5          9         35.56        1262.0736       11358.6624 
-------------------------------------------------------------------- 
                                               Σ          66487.6624 
Variance = 66487.6624 / (323 - 1) ≈ 207.8 
Standard deviation = √207.8 ≈ 14.42 
To calculate the median, we need to find the frequency cumulative distribution function 
(CDF) 
Midpoint    Frequency    Cumulative frequency 
---------------------------------------------- 
  303.5         12                12 
  310.5         18                30 
  317.5         44                74 
  324.5         88               162 
  331.5         86               248 
  338.5         41               289 
  345.5         15               304 
  352.5          9               313 
The total frequency is 323, so the median corresponds to the midpoint that has a cumulative 
frequency of 161.5, which lies between the fourth and fifth groups. To estimate the median, 
we can use the formula: 
Median = L + ((N / 2 - CF(L-1)) / f) * w 
where L is the lower limit of the group that contains the median, N is the total frequency, 
CF(L-1) is the cumulative frequency up to the previous group, f is the frequency of the group 
that contains the median, and w is the width of the group. 
In this case, we have : 
L = 321 
N = 323 
CF(L-1) = 30 
f = 88 
w = 7
Median = 321 + ((323 / 2 - 30) / 88) * 7 ≈ 326.43 
To find the quartile boundaries, we can use the formula: 
Q(n) = L + ((n / 4 * N) - CF(L-1)) / f * w 
where n is the quartile number (1 for the first quartile, 3 for the third quartile), and the other 
variables have the same meaning as before.
For the first quartile, we have: 
n = 1
Q(1) = L + ((1 / 4 * 323) - CF(L-1)) / f * w 
     = 321 + ((0.25 * 323) - 30) / 88 * 7 
     = 313.73 
For third Quartile 
n = 3
Q(3) = L + ((3 / 4 * 323) - CF(L-1)) / f * w 
     = 331.5 + ((0.75 * 323) - 248) / 86 * 7 
     = 340.34 
So, the estimated first quartile is 313.73, and the estimated third quartile is 340.34.
3. Compare the median and the mean and state what this indicates about the 
distribution. Comment on how the answer to this question relates to your frequency 
polygon and histogram.
Comparison of Median and Mean
The median of the data set is 326.43, and the mean is 317.44. The fact that the median is 
slightly larger than the mean indicates that the distribution is slightly skewed to the right. 
This is consistent with what we see in the frequency polygon and histogram, where there are 
more values on the right side of the distribution.
4. Explain the logic behind the equations for the mean and standard deviation for 
grouped data, starting from the original equations for a simple list of data values. (This 
does not just mean ’explain how the equations are used’.)
The equations for the mean and standard deviation for grouped data are modifications of the 
equations for the mean and standard deviation for a simple list of data values. The main 
difference is that the grouped data is divided into intervals, and the frequency of each interval 
is used to determine the weight of each interval in the calculation of the mean and standard 
deviation.
For the mean, the equation for grouped data is:
mean = Σ (midpoint * frequency) / Σ frequency 
where midpoint is the midpoint of each interval, and frequency is the frequency of each 
interval. The numerator represents the sum of the products of the midpoint and frequency of 
each interval, while the denominator represents the total frequency of all intervals. This 
equation is used to calculate the weighted average of the midpoints of the intervals, where the 
weight of each interval is its frequency.
For the standard deviation, the equation for grouped data is:
standard deviation = sqrt(Σ [(x - mean)^2 * frequency] / (Σ frequency - 1)) 
where x is the midpoint of each interval, mean is the mean of the data set, and frequency is 
the frequency of each interval. The numerator represents the sum of the products of the 
squared differences between the midpoint and the mean and the frequency of each interval, 
while the denominator represents the total frequency of all intervals minus one. This equation 
is used to calculate the weighted average of the squared deviations of the midpoints from the 
mean, where the weight of each interval is its frequency.
The modification of the equations is necessary because grouped data provides less 
information about the individual data points than a simple list of values. The midpoint of each 
interval is used to represent all the data points within the interval,

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment