Linear Regression 20%
Use Excel to run a linear regression on the Quiz 2 Dataset ta
using donor_weight as the independent variable.
XXXXXXXXXXSave the regression results on a new sheet in this submission.
1 What is the regression equation?
2 What is the model's Adjusted R-Squared?
3 What would the model predict for a 30 year old male 62 inches tall?
4 How many terms are NOT significant at a 10% critical value?
5 If we increased the number of observations, coefficients' p-values would?
6 Beyond coefficient significance, identify two other concerns with these results?
7
8 What is the model's Sum of Squares Residual?
9 What is its Sum of Squares Total?
With linear and logistic regression, we convert qualitative variables into indicators
10 then we remove one of the indicators from the model to avoid what?
Regression Issues (20%)
What issue does each assess or address (2pts each)?
11 Plot #1
12 Plot #2
13 Plot #3
14 Plot #4
15 Variance Inflation Facto
16 The diagonal of a Hat Matrix
17 Information Criterion
18 Outliers
19 Regularization
20 Principal Components
PCA & ANOVA 20%
Perform a Principal Components Analysis on the Quiz 2 Dataset tab. (You'll need to make it a CSV file.) If you're using SAS EM,
XXXXXXXXXXWhat proportion of variance is explained by XXXXXXXXXXUpload the file to Enterprise Miner https:
www.youtube.com/watch?v=nd1otR42ARs
21 the first component? XXXXXXXXXXConnect the dataset to the Pincipal Components tool on the Modify ta
22 the second component? XXXXXXXXXXFrom the results, find the Eigenvalues of Co
elation Matrix report
23 the third component?
If you're using python,
24 How many pricipal components can this dataset have? XXXXXXXXXXuse "from sklearn.decomposition import PCA"
25 A plot show the components explained variance in declining order is called a what? XXXXXXXXXXwith n_components = 4
The prior visit count for blood donors was sampled 13 times from each ethnicity with these results
Prior Visit Counts
XXXXXXXXXXUse the Data Analysis in Excel to run a Single Factor ANOVA. Not Hispanic/Latino Hispanic or Latino Prefer not to answe
XXXXXXXXXXSave the regression results on a new sheet in this submission. 12 0 4
26 What is the Sum of Squares between groups? 151 6 8
27 Is mean Prior Visit Count is the same across Ethnicity with a 20% critical value? 1 17 9
28 What proportion of Total Sum of Squares comes from between groups? 19 3 2
29 What is the variance of the Hispanic or Latino sample? 2 6 10
30 What is the mean Prior Visit Count within the Not Hispanic/Latino sample? 50 0 2
6 57 2
6 17 1
19 3 8
34 17 2
3 0 8
30 17 2
1 0 2
Logistic Regression 20%
A model predicting a Titanic passengers' survival is summarized on the right. Optimization terminated successfully.
Based on this summary report . . . XXXXXXXXXXCu
ent function value: XXXXXXXXXX
31 What is the linear equation that would feed into the sigmoid function? (just the first few terms is fine) XXXXXXXXXXIterations 6
32 Which variable has a potentially insignificant coefficient? XXXXXXXXXXLogit Regression Results
33 What proportion of the survival variance is explained by the model? ==============================================================================
34 How much would the odds of survival rise by having FirstClass ? Dep. Variable: XXXXXXXXXXSurvived No. Observations: XXXXXXXXXX714
35 How many times did the logistic attempt to estimate these coefficients? Model: XXXXXXXXXXLogit Df Residuals: XXXXXXXXXX707
36 How many passengers are in this dataset? Method: XXXXXXXXXXMLE Df Model: XXXXXXXXXX6
37 Which link was used to estimate this model? Date: XXXXXXXXXXMon, 16 Oct XXXXXXXXXXPseudo R-squ.: XXXXXXXXXX2762
Time: XXXXXXXXXX22:00:13 Log-Likelihood: XXXXXXXXXX.06
According to this model . . . converged: XXXXXXXXXXTrue LL-Null: XXXXXXXXXX482.26
38 Are older people more or less likely to survive? Covariance Type: XXXXXXXXXXnonrobust LLR p-value: XXXXXXXXXX274e-54
39 Are men more or less likely to survive? ==============================================================================
40 Are people who paid more for their ticket more or less likely to survive? XXXXXXXXXXcoef std e
XXXXXXXXXXz P>|z| [ XXXXXXXXXX]
------------------------------------------------------------------------------
Age XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX.003
SibSp XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX.033
Parch XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX.233
Fare XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX.013
FirstClass XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
Male XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX.850
EmbarkedS XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
==============================================================================
Confusion Tables & Misc 20%
From the confusion matrix on the right, calculate . . . Actual
41 Accuracy Innocent Guilty
42 Recall Judged Acquitted 248 81
43 Precision Convicted 42 209
44 Type I E
or Rate
45 Type II E
or Rate
46 Specificity
47 F1
48 A logistic's dataset's target is 97% negative. What issue needs to be addressed?
49 What method is used to predict a students grade (A, B, C, etc)?
50 With more than 4 classes, One vs One classifiers are less complex than One vs All.
Quiz 2 Dataset
donor_weight donor_height donor_age donor_male
209 70 31 0
173 72 55 1
210 63 53 0
169 66 51 1
230 64 16 0
160 64 32 0
171 64 34 0
115 63 66 0
194 62 54 1
133 67 38 0
288 75 20 1
120 62 42 0
175 67 46 1
209 70 66 0
288 75 29 1
179 63 49 0
171 64 48 0
120 62 58 0
188 73 66 1
188 73 37 1
228 70 53 1
223 64 45 0
266 74 55 1
155 67 62 0
194 62 94 1
153 67 22 0
132 61 59 0
155 61 75 0
184 68 65 0
161 65 53 1
133 67 42 0
288 75 16 1
228 70 16 1
184 68 78 0
167 65 52 1
223 64 23 0
266 74 16 1
132 61 32 0
174 66 48 0
129 62 40 0
209 70 59 0
175 61 59 0
266 74 34 1
167 65 18 1
194 62 17 1
210 63 52 0
162 65 60 0
184 68 65 0
178 65 20 0
175 67 45 1
230 64 40 0
228 70 49 1
165 61 76 0
161 65 48 1
223 74 68 1
180 71 24 0
266 70 24 1
194 62 49 1
175 67 40 1
175 67 50 1
223 64 78 0
223 64 61 0
162 66 60 1
266 74 57 1
266 74 53 1
148 64 59 0
107 69 75 0
175 67 33 1
120 62 33 0
167 65 16 1
174 66 58 0
169 66 39 1
215 65 50 0
175 67 79 1
169 66 46 1
167 65 61 1
184 68 66 0
225 71 80 1
132 61 65 0
107 69 41 0
188 62 58 0
133 67 37 0
215 64 81 0
169 66 71 1
132 61 59 0
228 70 44 1
142 64 46 1
185 68 34 1
196 64 42 0
221 64 73 1
225 71 44 1
173 70 60 1
164 70 57 1
221 64 18 1
142 64 62 1
167 65 71 1
167 65 72 1
169 66 47 1
172 64 22 0
167 65 54 1
167 65 43 1
137 64 34 0
173 74 51 1
107 69 52 0
288 75 23 1
175 67 59 1
169 66 57 1
188 72 63 1
210 63 75 0
167 65 64 1
174 66 57 0
143 68 47 0
120 62 53 0
266 74 51 1
153 67 81 0
233 63 46 0
157 66 60 0
132 61 47 0
161 65 16 1
161 65 69 1
162 66 64 1
137 64 19 0
288 75 49 1
254 64 19 0
153 67 55 0
129 64 26 0
175 67 61 1
221 64 61 1
249 73 68 0
107 69 63 0
184 68 27 0
228 70 59 1
143 68 17 1
188 73 21 1
225 71 42 1
126 68 62 1
132 61 26 0
175 61 18 0
167 65 77 1
165 61 63 0
266 74 30 1
162 66 35 1
174 66 58 0
230 64 50 0
132 61 21 0
184 68 18 0
184 68 42 0
184 68 24 0
210 63 55 0
175 67 22 1
165 61 68 0
180 71 62 0
171 64 67 0
143 68 59 0
138 63 45 0
234 68 75 1
158 59 39 0
175 67 62 1
184 68 74 0
132 61 70 0
175 67 40 1
228 70 18 1
175 61 58 0
143 68 72 0
135 68 36 1
119 60 44 0
288 75 18 1
107 69 17 0
188 72 64 1
266 70 39 1
159 64 46 0
180 63 42 0
184 68 62 0
174 66 54 0
171 64 17 0
173 71 53 1
209 70 54 0
210 63 49 0
173 70 69 1
167 65 40 1
196 64 40 0
190 72 55 1
223 74 40 1
179 63 62 0
132 61 43 0
221 64 32 1
174 66 36 0
238 70 25 1
175 67 44 1
209 70 22 0
184 68 73 0
188 73 20 1
175 67 72 1
210 63 59 0
175 67 17 1
210 63 19 0
122 64 48 0
120 62 31 0
132 61 48 0
174 66 63 0
209 70 62 0
120 62 66 0
194 62 42 1
175 67 50 1
132 61 62 0
132 61 18 0
180 71 56 0
142 64 52 1
175 67 82 1
120 62 66 0
266 74 28 1
167 65 51 1
169 66 36 1
169 66 54 1
120 62 63 0
107 69 61 0
143 68 51 0
223 64 27 0
150 66 38 0
223 64 33 0
288 75 41 1
161 65 53 1
180 63 58 0
221 64 36 1
107 69 63 0
194 62 40 1
188 73 54 1
221 64 55 1
173 74 17 1
162 72 16 1
153 67 57 0
209 70 54 0
221 64 57 1
209 70 66 0
143 68 60 0
129 64 50 0
162 72 21 1
164 63 21 0
165 61 34 0
288 75 61 1
133 67 63 0
162 66 31 1
188 73 48 1
167 65 69 1
119 60 67 0
210 63 47 0
174 66 16 0
184 68 53 0
215 65 26 0
180 71 57 0
190 71 16 1
210 63 32 0
133 67 58 0
157 66 53 0
175 67 32 1
115 63 75 0
157 66 31 0
266 74 17 1
234 68 36 1
161 65 26 1
126 68 57 1
163 68 34 0
175 67 55 1
194 62 57 1
188 73 44 1
162 66 67 1
232 62 29 0
143 68 45 1
162 72 69 1
264 68 39 1
221 64 49 1
119 60 58 0
171 64 32 0
180 71 17 0
174 66 61 0
194 62 65 1
150 66 51 0
209 70 69 0
221 64 52 1
210 63 72 0
210 63 65 0
161 65 66 1
165 61 55 0
208 73 22 1
162 65 17 0
167 65 69 1
162 72 35 1
126 68 59 1
180 71 32 0
161 65 59 1
212 72 30 1
174 66 62 0
107 69 39 0
288 75 27 1
208 73 61 1
107 69 83 0
132 61 65 0
169 66 63 1
188 73 17 1
174 66 68 0
126 68 16 1
173 70 77 1
210 63 38 0
163 68 41 0
288 75 76 1
266 74 19 1
173 72 68 1
171 64 61 0
143 62 35 0
161 65 67 1
144 64 38 0
173 74 40 1
169 66 42 1
161 65 41 1
209 70 49 0
266 74 64 1
153 67 63 0
171 64 54 0
210 63 71 0
210 63 26 0
169 66 66 1
223 64 66 0
188 73 49 1
175 61 28 0
158 59 21 0
171 64 47 0
194 62 34 1
145 61 66 1
121 65 17 0
184 68 48 0
184 68 62 0
175 67 52 1
209 70 62 0
210 63 27 0
175 67 34 1
167 65 45 1
190 72 74 1
184 68 30 0
143 68 62 0
175 61 34 0
184 68 17 0
161 65 51 1
153 67 17 0
209 70 66 0
162 65 44 0
175 67 48 1
119 60 17 0
171 64 61 0
210 63 58 0
188 72 66 1
132 61 34 0
132 61 28 0
171 64 65 0
175 67 58 1
165 61 63 0
223 64 48 0
194 62 64 1
126 68 38 1
288 75 61 1
184 68 23 0
163 68 45 0
132 61 71 0
223 64 29 0
173 70 55 1
161 65 34 1
174 66 36 0
188 62 28 0
120 62 17 0
154 66 16 0
266 74 43 1
142 64 48 1
120 62 41 0
194 62 59 1
184 68 28 0
162 66 27 1
148 64 67 0
132 61 41 0
182 67 27 1
223 64 74 0
190 72 39 1
171 64 66 0
234 68 16 1
266 74 66 1
188 73 49 1
126 68 23 1
169 66 23 1
148 64 46 0
142 64 45 1
162 66 50 1
169 66 55 1
221 64 27 1
175 67 62 1
161 65 67 1
132 61 17 0
221 64 52 1
132 61 16 0
106 66 16 0
171 64 52 0
132 61 16 0
228 70 39 1
133 67 47 0
163 68 63 0
133 67 63 0
288 75 69 1
223 74 62 1
175 61 79 0
119 60 17 0
143 68 57 0
223 74 37 1
119 60 60 0
144 64 50 0
133 67 69 0
126 68 58 1
153 67 28 0
161 65 67 1
228 70 62 1
264 68 75 1
174 66 34 0
132 61 48 0
174 66 30 0
172 64 53 0
209 70 34 0
153 67 51 0
188 73 49 1
164 63 49 0
133 67 50 0
243 62 54 0
167 65 25 1
175 61 66 0
120 62 58 0
223 74 82 1
196 64 30 0
266 74 42 1
230 64 71 0
121 65 43 0
210 63 22 0
188 73 36 1
174 66 73 0
133 67 40 0
225 71 18 1
148 64 17 0
107 69 38 0
173 71 34 1
175 67 46 1
174 66 29 0
188 73 57 1
173 70 59 1
194 62 53 1
196 64 79 0
107 69 53 0
188 73 27 1
184 68 66 0
184 68 71 0
221 64 39 1
175 67 57 1
175 67 62 1
132 61 36 0
133 67 21 0
184 68 58 0
132 61 52 0
184 68 43 0
194 62 17 1
121 65 33 0
175 67 43 1
194 62 19 1
167 65 69 1
188 62 23 0
188 72 49 1
163 68 55 0
288 75 54 1
120 62 62 0
161 65 57 1
215 64 36 0
173 70 44 1
178 65 35 0
132 61 16 0
137 64 58 0
162 66 57 1
175 67 66 1
169 66 40 1
135 68 50 1
194 62 64 1
183 72 19 1
169 67 24 0
210 63 17 0
210 63 58 0
223 64 63 0
157 66 16 0
174 66 58 0
266 74 41 1
228 70 32 1
209 70 40 0
175 61 51 0
169 66 60 1
184 68 45 0
107 69 78 0
194 62 17 1
132 61 68 0
119 60 25 0
142 64 60 1
Lists
Independent columns TRUE Increase Power Log More Likely
Multicollinearity FALSE Decrease Significance Interaction Less Likely
Linear relationships Remain the same Just as Likely
Normal distributions
Regularization
Influential columns
Influential rows
Comparing Models
Heteroscedasiticy
Overfitting