11/18/22, 4:22 PM WGU Performance Assessment
https:
tasks.wgu.edu/student/ XXXXXXXXXX/course/ XXXXXXXXXX/task/2807/overview 1/10
NVM2 — NVM2 TASK 1: CLASSIFICATION ANALYSIS
DATA MINING I — D209
PRFA — NVM2
COMPETENCIES
XXXXXXXXXX : Classification Data Mining Models
The graduate applies observations to appropriate classes and categories using classification models.
XXXXXXXXXX : Data Mining Model Performance
The graduate evaluates data mining model performance for precision, accuracy, and model comparison.
INTRODUCTION
In this task, you will act as an analyst and create a data mining report. In doing so, you must select one of the data dictionary and data set files to use
for your report from the following link: Data Sets and Associated Data Dictionaries.
You should also refer to the data dictionary file for your chosen data set from the provided link. You will use Python or R to analyze the given data
and create a data mining report in a word processor (e.g., Microsoft Word). Throughout the submission, you must visually represent each step of you
work and the findings of your data analysis.
Note: All algorithms and visual representations used need to be captured either in tables or as screenshots added into the submitted document. A
separate Microsoft Excel (.xls or .xlsx) document of the cleaned data should be submitted along with the written aspects of the data mining report.
REQUIREMENTS
Your submission must be your original work. No more than a combined total of 30% of the submission and no more than a 10% match to any one
individual source can be directly quoted or closely paraphrased from sources, even if cited co
ectly. The originality report that is provided when you
submit your task can be used as a guide.
TASK OVERVIEW SUBMISSIONS EVALUATION REPORT
https:
lrps.wgu.edu/provision/ XXXXXXXXXX
11/18/22, 4:22 PM WGU Performance Assessment
https:
tasks.wgu.edu/student/ XXXXXXXXXX/course/ XXXXXXXXXX/task/2807/overview 2/10
You must use the ru
ic to direct the creation of your submission because it provides detailed criteria that will be used to evaluate your work. Each
equirement below may be evaluated by more than one ru
ic aspect. The ru
ic aspect titles may contain hyperlinks to relevant portions of the
course.
Tasks may not be submitted as cloud links, such as links to Google Docs, Google Slides, OneDrive, etc., unless specified in the task requirements. All
other submissions must be file types that are uploaded and submitted as attachments (e.g., .csv, .docx, .pdf, .ppt).
Part I: Research Question
A. Describe the purpose of this data mining report by doing the following:
1. Propose one question relevant to a real-world organizational situation that you will answer using one of the following classification methods:
• k-nearest neighbor (KNN)
• Naive Bayes
2. Define one goal of the data analysis. Ensure that your goal is reasonable within the scope of the scenario and is represented in the available
data.
Part II: Method Justification
B. Explain the reasons for your chosen classification method from part A1 by doing the following:
1. Explain how the classification method you chose analyzes the selected data set. Include expected outcomes.
2. Summarize one assumption of the chosen classification method.
3. List the packages or li
aries you have chosen for Python or R, and justify how each item on the list supports the analysis.
Part III: Data Preparation
C. Perform data preparation for the chosen data set by doing the following:
1. Describe one data preprocessing goal relevant to the classification method from part A1.
2. Identify the initial data set variables that you will use to perform the analysis for the classification question from part A1, and classify each
variable as continuous or categorical.
3. Explain each of the steps used to prepare the data for the analysis. Identify the code segment for each step.
4. Provide a copy of the cleaned data set.
Part IV: Analysis
D. Perform the data analysis and report on the results by doing the following:
1. Split the data into training and test data sets and provide the file(s).
2. Describe the analysis technique you used to appropriately analyze the data. Include screenshots of the intermediate calculations you
performed.
3. Provide the code used to perform the classification analysis from part D2.
11/18/22, 4:22 PM WGU Performance Assessment
https:
tasks.wgu.edu/student/ XXXXXXXXXX/course/ XXXXXXXXXX/task/2807/overview 3/10
Part V: Data Summary and Implications
E. Summarize your data analysis by doing the following:
1. Explain the accuracy and the area under the curve (AUC) of your classification model.
2. Discuss the results and implications of your classification analysis.
3. Discuss one limitation of your data analysis.
4. Recommend a course of action for the real-world organizational situation from part A1 based on your results and implications discussed in
part E2.
Part VI: Demonstration
F. Provide a Panopto video recording that includes a demonstration of the functionality of the code used for the analysis and a summary of the
programming environment.
Note: The audiovisual recording should feature you visibly presenting the material (i.e., not in voiceover or embedded video) and should
simultaneously capture both you and your multimedia presentation.
Note: For instructions on how to access and use Panopto, use the "Panopto How-To Videos" web link provided below. To access Panopto's
website, navigate to the web link titled "Panopto Access," and then choose to log in using the “WGU” option. If prompted, log in using your WGU
student portal credentials, and then it will forward you to Panopto’s website.
To submit your recording, upload it to the Panopto drop box titled “Data Mining I – NVM2.” Once the recording has been uploaded and processed
in Panopto's system, retrieve the URL of the recording from Panopto and copy and paste it into the Links option. Upload the remaining task
equirements using the Attachments option.
G. Record the web sources used to acquire data or segments of third-party code to support the analysis. Ensure the web sources are reliable.
H. Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.
I. Demonstrate professional communication in the content and presentation of your submission.
File Restrictions
File name may contain only letters, numbers, spaces, and these symbols: ! - _ . * ' ( )
File size limit: 200 MB
File types allowed: doc, docx, rtf, xls, xlsx, ppt, pptx, odt, pdf, txt, qt, mov, mpg, avi, mp3, wav, mp4, wma, flv, asf, mpeg, wmv, m4v, svg, tif, tiff, jpeg, jpg, gif, png,
zip, rar, tar, 7z
RUBRIC
11/18/22, 4:22 PM WGU Performance Assessment
https:
tasks.wgu.edu/student/ XXXXXXXXXX/course/ XXXXXXXXXX/task/2807/overview 4/10
A1:PROPOSAL OF QUESTION
A2:DEFINED GOAL
B1:EXPLANATION OF CLASSIFICATION METHOD
B2:SUMMARY OF METHOD ASSUMPTION
NOT EVIDENT
The submission does not propose 1 question.
APPROACHING COMPETENCE
The submission proposes 1 question that is
not relevant to a real-world organizational
situation. Or the proposal does not include 1
of the given classification methods.
COMPETENT
The submission proposes 1 question that is
elevant to a real-world organizational situa-
tion, and the proposal includes 1 of the given
classification methods.
NOT EVIDENT
The submission does not define 1 goal fo
data analysis.
APPROACHING COMPETENCE
The submission defines 1 goal for data analy-
sis, but the goal is not reasonable, is not
within the scope of the scenario, or is not rep-
esented in the available data.
COMPETENT
The submission defines 1 reasonable goal fo
data analysis that is within the scope of the
scenario and is represented in the available
data.
NOT EVIDENT
The submission does not explain how the cho-
sen classification method analyzes the se-
lected data set.
APPROACHING COMPETENCE
The submission does not logically explain how
the chosen classification method analyzes the
selected data set, or the explanation includes
inaccurate expected outcomes.
COMPETENT
The submission logically explains how the cho-
sen classification method analyzes the se-
lected data set and includes accurate expected
outcomes.
NOT EVIDENT
The submission does not summarize 1 as-
sumption of the chosen classification method.
APPROACHING COMPETENCE
The submission inadequately summarizes 1
assumption of the chosen classification
COMPETENT
The submission adequately summarizes 1 as-
sumption of the chosen classification method.
11/18/22, 4:22 PM WGU Performance Assessment
https:
tasks.wgu.edu/student/ XXXXXXXXXX/course/ XXXXXXXXXX/task/2807/overview 5/10
B3:PACKAGES OR LIBRARIES LIST
C1:DATA PREPROCESSING
C2:DATA SET VARIABLES
C3:STEPS FOR ANALYSIS
method.
NOT EVIDENT
The submission does not list the packages o
li
aries chosen for Python or R.
APPROACHING COMPETENCE
The submission lists the packages or li
aries
chosen for Python or R but does not justify
how 1 or more items on the list support the
analysis.
COMPETENT
The submission lists the packages or li
aries
chosen for Python or R and justifies how each
item on the list supports the analysis.
NOT EVIDENT
The submission does not describe 1 data pre-
processing goal.
APPROACHING COMPETENCE
The submission describes 1 data preprocess-
ing goal, but it is not relevant to the classifica-
tion method from part A1.
COMPETENT
The submission describes 1 data preprocess-
ing goal that is relevant to the classification
method from part A1.
NOT EVIDENT
The submission does not identify any data set
variables used to perform the analysis for the
classification question from part A1 or does
not classify the variables as continuous o
categorical.
APPROACHING COMPETENCE
The submission identifies the data set vari-
ables used to perform the analysis for the
classification question from part A1, but the
submission inaccurately classifies 1 or more
variables as continuous or categorical.
COMPETENT
The submission identifies the data set vari-
ables used to perform the analysis for the clas-
sification question from part A1, and the sub-
mission accurately classifies each variable as
continuous or categorical.
11/18/22, 4:22 PM WGU Performance Assessment
https:
tasks.wgu.edu/student/ XXXXXXXXXX/course/ XXXXXXXXXX/task/2807/overview 6/10
C4:CLEANED DATA SET
D1:SPLITTING THE DATA
D2:OUTPUT AND INTERMEDIATE CALCULATIONS
NOT EVIDENT
The submission does not explain each step
used to prepare the data for the analysis, o
the submission does not identify the code
segment for each step.
APPROACHING COMPETENCE
The submission inaccurately explains 1 o
more steps used to prepare the data fo
analysis, or the submission identifies an inac-
curate code segment for 1 or more steps.
COMPETENT
The submission accurately explains each step
used to prepare the data for analysis, and the
submission identifies an accurate code seg-
ment for each step.
NOT EVIDENT
The submission does not include a copy of the
cleaned data set
APPROACHING COMPETENCE
The submission includes a copy of the cleaned
data set, but the data set is inaccurate.
COMPETENT
The submission includes an accurate copy of
the cleaned data set.
NOT EVIDENT
The submission does not provide the training
and test data set file(s).
APPROACHING COMPETENCE
The submission provides training and test
data sets, but the split is not reasonably
proportioned.
COMPETENT
The submission provides reasonably propor-
tioned training and test data sets.
NOT EVIDENT
The submission does not describe the analy-
sis technique used to analyze the data, or it
does not include screenshots of the interme-
diate calculations performed.
APPROACHING COMPETENCE
The submission inaccurately describes the
analysis technique used to appropriately ana-
lyze the data, or the submission includes
COMPETENT
The submission accurately describes the
analysis technique used to appropriately ana-
lyze the data, and the submission includes ac-
11/18/22, 4:22 PM WGU Performance Assessment
https:
tasks.wgu.edu/student/ XXXXXXXXXX/course/ XXXXXXXXXX/task/2807/overview 7/10
D3:CODE EXECUTION
E1:ACCURACY AND AUC
E2:RESULTS AND IMPLICATIONS
E3:LIMITATION
screenshots of the intermediate calculations
performed but they are inaccurate.
curate screenshots of the intermediate calcu-
lations performed.
NOT EVIDENT
The submission does not provide the code
used to perform the classification analysis
from part D2.
APPROACHING COMPETENCE
The submission provides the code used to
perform the classification analysis from part
D2, but 1 or more e
ors are evident during
the execution of the code.
COMPETENT
The submission provides the code used to per-
form the classification analysis from part D2
and the code executes without e
ors.
NOT EVIDENT
The submission does not explain the accuracy
or the AUC of the classification model.
APPROACHING COMPETENCE
The submission does not logically explain the
accuracy or the AUC of the classification
model.
COMPETENT
The submission logically explains both the ac-
curacy and the AUC of the classification
model.
NOT EVIDENT
The submission does not discuss both the re-
sults and implications of the classification
analysis.
APPROACHING COMPETENCE
The submission discusses both the results
and implications of the classification analysis,
ut the discussion is inadequate.
COMPETENT
The submission adequately discusses both the
esults and implications of the classification
analysis.
11/18/22, 4:22 PM WGU Performance Assessment
https:
tasks.wgu.edu/student/ XXXXXXXXXX/course/ XXXXXXXXXX/task/2807/overview 8/10
E4:COURSE OF ACTION
F:PANOPTO RECORDING
G:SOURCES FOR THIRD-PARTY CODE
NOT EVIDENT
The submission does not discuss 1 limitation
of the data analysis.
APPROACHING COMPETENCE
The submission discusses 1 limitation of the
data analysis but lacks adequate detail or is
illogical.
COMPETENT
The submission logically discusses 1 limitation
of the data analysis with adequate detail.
NOT EVIDENT
The submission does not recommend a
course of action for the real-world organiza-
tional situation from part A1
APPROACHING COMPETENCE
The submission does not recommend a rea-
sonable course of action for the real-world
organizational situation from part A1, or the
course of action is not based on the results
and implications discussed in part E2.
COMPETENT
The submission recommends a reasonable
course of action for the real-world organiza-
tional situation from part A1 based on the re-
sults and implications discussed in part E2.
NOT EVIDENT
The submission does not provide a Panopto
video recording.
APPROACHING COMPETENCE
The submission provides a Panopto video
ecording, but it does not include a demon-
stration of the functionality of the code used
for the analysis or a summary of the program-
ming environment or both.
COMPETENT
The submission provides a Panopto video
ecording that