C291 – Systems Programming with C & Unix
Assignment 6
Grading an exam is a tedious job. While MCQs are autograded, they represent only one form of
the assessment. One simple idea to grade textual answers is to determine how similar the students’
answers are to the instructor provided answers. A similarity measure assigns a numerical score
which may represent the marks assigned by the instructor to a given student’s answer.
In this assignment, you will implement an auto grading system in C. The system will present the
questions to the students and prompt them for answers. These answers will be compared against
instructor provided answers using Cosine similarity to compute a numerical score representing the
assigned marks.
Here are the (informal) algorithms that you can use:
Main:
1. Define ‘n’ questions and answers. (They will represent the instructor provided Q&As and
will be provided by us at the end of the assignment)
2. Display each question and get the student's answer (limit response to 500 words).
3. For each question:
a. Compute cosine similarity between the answer provided by the instructor and the
student’s answer.
. Convert similarity to marks between 0-10
XXXXXXXXXXDisplay marks for each question and the total marks.
Define Q&As
1. Create an a
ay of n pointers representing questions and initialize it with the provided
questions
2. Create an a
ay of n pointers representing instructor answers and initialize it with the
provided answers
3. Create an a
ay of n pointers representing student answers.
Get Students Answers
1. For each question in the questions a
ay
a. Display the question
. Get the students answer in the student answers a
ay
Compute Cosine Similarity:
1. Convert the answer provided by the instructor and the input i.e., the student’s answer to
vector representation (a vector is simply a 1D a
ay)
2. Compute and return the cosine similarity between the two vectors.
Convert Answers to Vectors:
1. Convert the instructors and students answers to lowercase
2. Determine the total number of unique words U in the two vectors i.e., the instructor’s
answer and the student’s answer.
3. Create an a
ay of pointers of size U to store those unique words as strings (call it
dictionary).
4. Create two int vectors of size U and initialize them to 0 (they represent the answer vectors).
5. For each word in the dictionary:
a. If the word is found in the actual answer, set the appropriate index in the instructor
vector as 1.
. If the word is found in the actual answer, set the appropriate index in the student
vector as 1.
Cosine similarity is a measure of similarity between two documents (vectors in our case)
i
espective of their size. A document is represented as a vector in an n-dimensional space where
n is the total number of unique words in all the documents. In our case, an answer is considered as
a document.
Mathematically, the Cosine similarity metric measures the cosine of the angle between two n-
dimensional vectors. The Cosine similarity of two documents will range from 0 to 1. If the Cosine
similarity score is 1, it means two vectors have the same orientation, i.e., the documents are exactly
the same. The value closer to 0 indicates that the two documents have no similarity.
The equation of Cosine similarity between two non-zero vectors is:
?????????? = ???(?) =
?∙?
||?|| ||?||
=
∑ ????
?
?=1
√∑ ??
2?
?=1 √∑ ??
2?
?=1
Let’s see an example of how to calculate the cosine similarity between two texts.
Consider the question ‘What is a pointer?’
Now, you have two answers: instructor's answer and student’s answer.
instructor_ans = “A pointer is a variable that stores the memory address of another variable as its
value.”
studnet_ans = “Variable that stores the memory address”
You convert the answers to lowercase.
instructor_ans = “a pointer is a variable that stores the memory address of another variable as its
value.”
studnet_ans = “variable that stores the memory address”
No. of unique words in these two answers = 14
Create a vector of unique words (dictionary):
dictionary: [“a”, “pointer”, “is”, “variable”, “that”, “stores”, “the”, “memory”, “address”, “of”,
“another”, “as”, “its”, “value”]
instructor_vector = [2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
student_vector = [0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
Table 1: A pictorial representation of the instructor and student's answers as vectors. Row A represents instructor’s
answer while row B represents student’s answer
a pointer is variable that stores the memory address of another as its value
A XXXXXXXXXX XXXXXXXXXX
B XXXXXXXXXX XXXXXXXXXX
Let instructor_vector be A and student_vector be B
? ∙ ? = ∑ ????
?
?=1
= (2 × XXXXXXXXXX × XXXXXXXXXX × XXXXXXXXXX × XXXXXXXXXX × XXXXXXXXXX × XXXXXXXXXX × XXXXXXXXXX × 1)
+ (1 × XXXXXXXXXX × XXXXXXXXXX × XXXXXXXXXX × XXXXXXXXXX × XXXXXXXXXX × XXXXXXXXXX × 0)
= 5
√∑ ??
2?
?=1 = √ XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX + 1
= √17
√∑ ??
2?
?=1 = √ XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX + 0
= √6
?????????? = ???(?) =
?∙?
||?|| ||?||
=
5
√17√6
= XXXXXXXXXX
After getting this cosine similarity you convert it to marks between 0-10 and return it.
Implement the following functions:
char ** getStudentAnswers(const char ** questions, int n, int
len)
const char ** questions represents the a
ay of questions, int n is the number of questions and int
len represents the length of the answers which is no more than 500 characters. The function will
eturn an a
ay of student answers. Implementation details are provided in the Get Students Answer
section of this assignment.
int ** ans2Vectors(char *instructor_answer, char
*student_answer)
char *instructor_answer and char *student_answer represent instructor answer, and the
co
esponding students answer, respectively. The function will return a 2D a
ay (through double
pointer) consisting of the two vectors representing the provided instructor and the student answer.
The implementation details can be found in the Convert Answers to Vector section of this
assignment.
double cosineSimilarity(char *instructor_answer, char
*student_answer)
char *instructor_answer and char *student_answer represent instructor answer, and the
co
esponding students answer, respectively. The function will return the cosine similarity between
the two answers. The implementation details are available in the Compute cosine similarity section
of this assignment.
Questions & Answers
Use the following set of questions and answers as instructor provided Q&As
1. What are local variables?
XXXXXXXXXXAns: Variables defined in function definition are local variables. They can be accessed
only in that function scope.
2. What is an identifier?
Ans: Identifiers are user defined names given to variables, functions and a
ays.
3. What is recursion?
Ans: A function calling itself again and again to compute a value is refe
ed to as recursive
function or recursion. Recursion is useful for
anching processes and is effective where terms are
generated successively to compute a value.
4. What is a pointer?
Ans: A pointer is a variable that stores the memory address of another variable as its value.
5. What is the purpose of applying static to a local a
ay?
Ans: By making a local a
ay definition static the a
ay is not created and initialized every
time the function is called and it is not destroyed every time the function is exited.
Also, the execution time is reduced.