Project Specification
COMP7025 Social Media Intelligence
Aim
The Project requires us to analyse social media data using the knowledge obtained from this unit with assistance from a computer based statistical package. For this project, we will focus on Twitter.
Method
To complete this project:
1. Read through this specification
2. Complete the data analysis required by the specification
3. Write up your analysis using your favourite word processing/typesetting program, making sure that all of the working is shown and that is it presented well.
4. Include the student declaration text on the front page of your report. Please make sure that your name
and student number are clearly displayed on the front page.
5. Submit the report as a PDF by the due date.
Report Format
Once the required analysis is performed, write up the analysis as a report. Remember that the assessor will only see the report and will be marking the analysis based on your report. Therefore the report should contain a clear and concise description of the procedures carried out, the analysis of results, and any conclusions reached from the analysis.
The required analysis in this specification covers material presented in lectures and labs. Students should use the computer software R to carry out the required analysis and then present the results from the analysis in the report.
1
Marks
This project is worth 30 % of your final grade, and so the project will be marked out of 30. The project consists of six parts where each part contributes equally to your final mark.
There are five parts to the project, each will be marked using the following criteria:
Marks Criteria Satisfied
0 The method does not lead to insightful analysis.
1 The method is flawed, but the analysis would have provided insight had the method been correct.
2 The correct method leads to partially correct results and analysis.
3 The correct method leads to correct results and analysis.
4 The correct method leads to correct results and analysis, with an insightful aim and conclusion.
5 The correct method leads to correct results and analysis, with an insightful aim and conclusion. Limitations of the analysis are identified and suggestions for further analysis are provided.
If a report is submitted late, the maximum mark it can achieve will be reduced by 10% (3 marks) per day. E.g., if a report is submitted five days late, it can receive at most 15 marks.
Declaration
The following declaration must be included in a clearly visible and readable place on the first page of the report.
By including this statement, I the author of this work, verify that:
· I hold a copy of this assignment that I can produce if the original is lost or damaged.
· I hereby certify that no part of this assignment/product has been copied from any other student’s work or from any other source except where due acknowledgement is made in the assignment.
· No part of this assignment/product has been written/produced for us by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned.
· I am aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).
· I hereby certify that I have read and understand what the School of Computing and Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning guide for this unit.
Note: An examiner or lecturer/tutor has the right not to mark this project report if the above declaration has not been added to the cover of the report.
Project Description
A social and behavioural research group at Western Sydney University is studying social activists. They have consulted you to investigate the flow of information regarding environmental activist Greta Thunberg on Twit- ter. Researchers have provided a set of tasks below that need completion. The results are to be presented at the International Social and Behaviour Change Communication (SBCC) Summit.
Perform this analysis using R with the rtweet and igraph libraries. Use the rtweet documentation to find functions that will assist your analysis:
· https://cran.r-project.org/web/packages/rtweet/vignettes/intro.html
· https://cran.r-project.org/web/packages/rtweet/rtweet.pdf
1 Followed by Greta
Find 12 people followed by Greta that have the most followers. Use only people, not any company’s twitter handles. Examine the twitter accounts and summarise the types of people.
2 Followers of Greta
Find the 12 people who follow Greta and have the most followers and examine if they have a positive or negative relationship with Greta based on their tweets. Examine their twitter accounts and summarise the types of people.
3 Bypassing Greta
Plot the graph containing people followed by Greta and 12 followers. Identify if any of the found following or followers are friends with each other and add these edges to the graph. Then determine if any of the following and followers should be friends, based on their background, and add those edges to the graph.
4 Graph Statistics
Compute the diameter and density of the graph, and neighbourhood overlap of each edge and determine which nodes have the greatest social capital. State if the results are obvious from the graph structure and why.
5 Graph Homophily
Compute if there is homophily in the graph. To do this, label each node as either a supporter or non-supporter of Greta using the information gathered in parts 1, 2 and 3. Write out the hypotheses, the test statistic and a conclusions of the test. Use a significance level of α = 0.05.
6 Structural Balance
Finally, determine if the signed network is weakly balanced (using hierarchical clustering) and identify if any within or between signed relationships are not as expected. To perform this analysis, first label all existing edges as either positive or negative, based on their association to Greta.
Write up a report containing your code and analysis of the data with each section clearly labelled. Clearly annotate your code and make sure to state any conclusions you make from each piece of analysis. The report is being marked using the marking criteria, so make sure that each piece of analysis covers all of the criteria. Remember that you are examining the relationship of twitter users to Greta, so make sure that the conclusion of each section refers back to this.
##ASSIGNMENT SOCIAL MEDIA INTELLIGENCE COMP7025
##STUDENT_NAME : SUHAS THOTA
##STUDENT_ID : XXXXXXXXXX
version
install.packages("rtweet")
install.packages("base64enc")
install.packages("httpuv")
install.packages("rtweet")
install.packages("dplyr")
install.packages("tidytext")
install.packages("tidyr")
install.packages("textdata")
library("rtweet")
library("base64enc")
library("httpuv")
library("magrittr")
library("dplyr")
library("textdata")
#app="1657696929873301504suhasthota1"
#api_key="1ag4NiBTizl4S5vRf40jsYFhH"
#api_secret_key="kNPoy4r1spzb7ZaZaB7RoDjrTWucPHxiDdjZDDEDjwGgYR3v9f"
#acc_token=" XXXXXXXXXXYcpXyvRhjzdELJwDxUWPBXwYkwgEME6u2afVMbc"
#acc_secret_token="4Yutcn8OaSvn6i7xPEZaVTqurWKmeRzVcWH7Vv6pH184t"
### Using the above keys resulted in an API error [403] from Twitter; to avoid this,
##I used the keys supplied in the 6a solutions. Twitterkeys.txt
#Authenticating with Twitter API Credentials
app='SMIProject_2023'
api_key='AagjVq96hOMojkDdc0fz8OJPI'
api_secret_key='DWrqQZWe2QDabVKDT5nVped8jqDk6UrPGAmJM74xX1xMIVL6Cf'
acc_token=' XXXXXXXXXX1fvDtoNyoah7sq92QWFZ8GGsAkmmSl1xWBSgb3E3'
acc_secret_token='N29dRKpzRSgt7vCcVj8AFCuwfHUROGStK15X7HMeBWvg4'
#generate token
create_token(
app=app,
consumer_key=api_key,
consumer_secret=api_secret_key,
access_token=acc_token,
access_secret=acc_secret_token
)
#Retrieving tweets
tweets=search_tweets("Greta Thunberg",n=5,include_rts=FALSE)
print(tweets)
#####################################################################################################################################################################################################
#######################Q1.)Followed by Greta Thunberg ###############################################################################################################################################
#####################################################################################################################################################################################################
# Get Greta Thunberg's friends (people followed by Greta)
friends_data <- get_friends("GretaThunberg", n = 1000)
# Extract the user IDs of the friends
friend_ids <- friends_data$to_id
# Fetch complete user information for the friends
full_friends_data <- lookup_users(user = friend_ids)
# Filter out company accounts based on user description
filtered_friends <- full_friends_data[!grepl("^[A-Za-z0-9_]{1,15}$", tolower(full_friends_data$description)), ]
##Using a regular expression pattern, the code above attempts to filter out Twitter accounts based on their user description.
##However, the pattern we specified, "[A-Za-z0-9_]1,15$", matches sequences with 1 to 15 alphabetic or underscore characters.
##This pattern is ineffective at filtering out company accounts and does not provide meaningful results.
filtered_friends <- full_friends_data[!grepl("company|corporation|organization", tolower(full_friends_data$description)), ]
#We can employ a different strategy to exclude corporation accounts based on their user description from the full_friends_data dataframe.
#This code searches the lowercase version of the user descriptions for the words "company," "corporation," or "organization" using the grepl() function and a regular expression pattern.
#The negation of the pattern by the! before grepl() eliminates the rows in which the pattern matches.
#The subset of individuals in the filtered_friends dataframe who are not corporation accounts according to their user descriptions will be present after applying this filter.
#After then, we can continue with our investigation or research of these users.
# Sort filtered friends by follower count in descending order
sorted_friends <- filtered_friends[order(-filtered_friends$followers_count), ]
# Select the top 12 friends with the most followers
top_friends <- head(sorted_friends, 12)
# Summarize the types of people
summary(top_friends$description)
top_friends$description
filtered_top_friends <- top_friends[complete.cases(top_friends$name, top_friends$location, top_friends$screen_name, top_friends$description), ]
##Group the desired columns and summarise type of friends
summary_friends <- filtered_top_friends %>%
group_by(name, location, screen_name, description) %>%
summarize(Count = n()) %>%
ungroup()
print(summary_friends)
################################################################################################################################################################
################################ Q2.) Greta Thunberg Followers #################################################################################################
################################################################################################################################################################
library(tidytext)
#Loads the tidytext package, which provides functions for text mining and analysis.
library(dplyr)
# Loads the dplyr package, which provides tools for data manipulation and transformation.
library(tidyr)
#Loads the tidyr package, which provides functions for data tidying and reshaping.
#list of Greta Thunberg's followers
follower_ids <- get_followers("GretaThunberg", n = 100)
# Retrieves the IDs of Greta Thunberg's followers by using the get_followers function from the rtweet package. It retrieves 100 follower IDs.
#Get the follower's profiles and sort them by the number of followers:
follower_profiles <- lookup_users(user = follower_ids$from_id)
#Retrieves the profile information of Greta Thunberg's followers using the lookup_users function from the rtweet package.
#It takes the follower IDs as input and returns their profiles.
sorted_profiles <- follower_profiles[order(follower_profiles$followers_count, decreasing = TRUE), ]
#Sorts the follower profiles based on the number of followers in descending order, using the order function.
#The profiles with the highest number of followers will be at the top.
top_followers <- head(sorted_profiles, 12)
#Selects the top 12 followers from the sorted profiles using the head function.
#These are the followers with the highest number of followers themselves.
View(top_followers)
#Examine their relationship with Greta Thunberg based on their tweets:
follower_tweets <- get_timeline(user = top_followers$id_str, n = 100)
#Retrieves the timeline tweets of the top followers by using the get_timeline function from the rtweet package.
#It takes the user IDs of the top followers as input and retrieves 100 tweets from each follower.
View(follower_tweets)
colnames(follower_tweets)
follower_sentiments <- follower_tweets %>%
select(in_reply_to_screen_name,text)%>%
unnest_tokens(word, text) %>%
inner_join(get_sentiments("bing")) %>%
count(in_reply_to_screen_name, sentiment) %>%
spread(sentiment, n, fill = 0)
View(follower_sentiments)
#Performs sentiment analysis on the follower tweets. It selects the relevant columns
#(in_reply_to_screen_name and text), tokenizes the text using unnest_tokens, joins the sentiment lexicon using inner_join and
#calculates the count of each sentiment for each follower.
#Finally, it spreads the sentiment counts into separate columns using spread.
summary_followers_1 <- data.frame(
Name = table(top_followers$name),
Location = table(top_followers$location),
ScreenName = table(top_followers$screen_name),
Description = table(top_followers$description),
stringsAsFactors = FALSE
)
#Creates a data frame called summary_followers_1 with columns for Name, Location, ScreenName, and Description.
#It uses the table function to count the occurrences of each value in the respective columns of the top_followers dataset.
summary_followers_2 <- subset(summary_followers_1, select = -c(Name.Freq, Location.Freq, ScreenName.Freq, Description.Freq))
#Creates a subset of summary_followers_1 called summary_followers_2, excluding the columns with
#frequency counts (Name.Freq, Location.Freq, ScreenName.Freq, Description.Freq).
View(summary_followers_2)
#########################################################################################################################################
############################################### Q3.) ByPassing Greta #####################################################################
#########################################################################################################################################
#######################################################################################################
# Retrieve the user IDs of Greta's followers and the people she follows
follower_ids <- get_followers("GretaThunberg", n = 1000)$from_id
following_ids <- get_friends("GretaThunberg", n = 1000)$to_id
# Get the profiles of Greta's followers and the people she follows
follower_profiles <- lookup_users(user = follower_ids)
following_profiles <- lookup_users(user = following_ids)
# Extract the screen names of the followers and the people Greta follows
follower_screen_names <- follower_profiles$screen_name
following_screen_names <- following_profiles$screen_name
# Find common screen names between followers and following
common_screen_names <- intersect(follower_screen_names, following_screen_names)
### There are no common friends between people following greta and people whom greta is following,
### For a border perscpective to see if there are any indirect connections or shared interests among them, we are investigating further considering factors like shared locations, similar interests or common affiliations
##
# Retrieve the user IDs of Greta's followers and the people she follows
follower_ids <- get_followers("GretaThunberg", n = 1000)$from_id
following_ids <- get_friends("GretaThunberg", n = 1000)$to_id
# Get the profiles of Greta's followers and the people she follows
follower_profiles <- lookup_users(user = follower_ids)
following_profiles <- lookup_users(user = following_ids)
# Extract the screen names and locations of the followers and the people Greta follows
follower_screen_names <- follower_profiles$screen_name
follower_locations <- follower_profiles$location
following_screen_names <- following_profiles$screen_name
following_locations <- following_profiles$location
# Find common locations between followers and following
common_locations <- intersect(follower_locations, following_locations)
# Filter profiles based on common locations
follower_profiles_common <- follower_profiles[follower_locations %in% common_locations, ]
following_profiles_common <- following_profiles[following_locations %in% common_locations, ]
# Identify connections between followers
follower_friends <- get_friends(users = follower_profiles_common$id_str, n = 250)
follower_friends_common <- follower_friends[follower_friends$to_id %in% follower_profiles_common$id_str, ]
# Identify connections between following
following_friends <- get_friends(users = following_profiles_common$user_id, n = 250)
# Add edges to the graph for followers and following connections
edges_followers <- c(rep(1, length(follower_friends_common)), match(follower_friends_common$to_id, follower_profiles_common$id_str))
edges_following <- c(match(following_friends$from_id, following_profiles_common$user_id), rep(1, length(following_friends)))
edges <- c(edges_followers, edges_following)
# Create an empty graph
graph <- graph.empty()
# Add vertices for Greta, followers, and following
vertex_names <- c("Greta Thunberg", follower_profiles_common$screen_name, following_profiles_common$screen_name)
graph <- add_vertices(graph, nv = length(vertex_names), name = vertex_names)
# Add edges to the graph
graph <- add_edges(graph, edges, directed = FALSE)
# Determine if any of the followers and following should be friends based on their background
# You can add logic here based on your criteria for determining friendship
# Print the graph
print(graph)
1
##ASSIGNMENT SOCIAL MEDIA INTELLIGENCE COMP7025
##STUDENT_NAME : SUHAS THOTA
##STUDENT_ID :