Sei sulla pagina 1di 13

SOCIAL MEDIA ANALYSIS

TOOL

Session 2014-2018

Project Advisors
Dr. Syed Asim Ali

Submitted By

Muhammad Sameer Khan B14101060


Hasnain Ahmed Khan B14101039

Department of Computer Science


University of Karachi
Karachi
BS

Mar 15, 2019


A report submitted to the

Department of Computer Science

in partial fulfillment of the requirements for the

degree

Batchelor of Science

in

Computer Science

by

Muhammad Sameer Khan


Hasnain Ahmed Khan

University of Karachi

March 15, 2019


Acknowledgements
The success and final outcome of this project required a lot of guidance and assistance from
many people and I am extremely privileged to have got this all along the completion of my
project. All that I have done is only due to such supervision and assistance and I would not
forget to thank them. A special gratitude I give to our venerable teacher and our project
supervisor Dr. Syed Asim Ali for providing us an opportunity to do this work as final year
project and giving us all support and guidance which made us complete the project duly. I
would like to express my deepest appreciation to him for providing such a nice support and
guidance, although he had a very busy schedule as he managing the corporate and
university affairs.

(Signed)

Muhammad Sameer Khan B14101060


Hasnain Ahmed Khan B14101039

Date

March 15, 2019


ABSTRACT

In the past few years, the popularity of social media has grown dramatically, with more
and more users sharing all kinds of information through different platforms. Companies
use social media platforms to promote their brands, professionals maintain a public
profile online and use social media for networking, and regular users discuss about any
topic. More users also means more data waiting to be mined.
The social media has changed the whole world into a global village in which all the
humans on planet have been brought at a single platform in which they can share their
thoughts, their images as well as their feelings with the other humans which they know or
with whom they are in contact. There are multiple social media platforms on which a
person can do all such things.
Some of the famous social media platforms are Twitter, Facebook, Instagram etc. but the
most popular and powerful of all of them is Twitter because Twitter has a very unique
kind of structure of being friend with another user by which a user can get in touch or
receive the updates of others. Unlike Facebook Twitter follows the pattern of following
someone and being followed by someone which gives the user more reach over other
users.
The main purpose or we can say job of this project is to gather all the data and activities
of a Twitter user which he is doing or undergoing on his twitter timeline and analyze his
activities, tweets and the data by performing high level analysis and manipulation of that
data show the end-user a small report of that user within no time by which the end-user
can easily judge the personality, nature and way of thinking of that user and analyzing the
report he can easily took decisions about that user that if to follow him or not or any other
decision the end-user wish to make.
Table Of Contents

Statement Of Submission ................................................................................................................. i


Acknowledgements ...........................................................................................................................ii
Abstract ............................................................................................................................................. iii
Clients Approval Letter .....................................................................................................................iv
List Of Figures .................................................................................................................................. v
List Of Tables....................................................................................................................................vi
Table Of Contents............................................................................................................................ vii
Chapter 1: Introduction ..................................................................................................................... 1
1.1 People Capability Maturity Model (P-Cmm) ......................................................................... 2
1.1.1 Overview ........................................................................................................................... 3
1.1.2 Strategic Objectives .......................................................................................................... 4
1.1.3 P-Cmm Maturity Levels ..................................................................................................... 5
1.1.4 Key Process Areas For Level 3......................................................................................... 6
1.1.1.A Knowledge And Skills Analysis 7
1.1.1.C Workforce Planning 8
1.1.1.D Competency Development 9
1.1.4.E Career Development 10
1.1.4.F Competency Based Practices 11
1.2 Workforce Planning ............................................................................................................ 12
1.2.1 Definitions ........................................................................................................................ 13
1.2.1.A Workforce Planning 14
1.2.1.B Staffing Assessment 15
1.2.1.C Demand Model 16
1.2.1.D Supply Model 17
1.2.1.E Gaps18
1.2.1.F Critical Skill Gaps 19
1.2.1.G Surpluses 20
1.2.1.F Workforce Planning Methodology 21
1.2.2 Steps For Developing Workforce Plan ............................................................................ 22
1.2.2.A Step 1:Obtain Leadership Commitment 23
1.2.2.B Step 2:Analyze Strategic Goals And Objectives 24
1.2.2.C Step 3:Determine Functional Requirements 25
1.2.2.D Step 4:Create Future Workforce Profile 26
1.2.2.E Step 5:Develop Current Workforce Profiles 27
1.2.2.F Step 6:Estimate Workforce Requirement 28
1.2.2.G Step 7:Develop Workforce Strategy & Budget Projections 29
1.2.2.H Step 8:Evaluate Workforce Planning Process 30
1.3 Competency Model And Skills Assessment General Concepts ........................................ 31
1.3.1 Competencies And Core Competencies ......................................................................... 32
1.3.2 Competency Model ......................................................................................................... 33
1.3.3 Identifying Competencies ................................................................................................ 34
1.3.4 Steps For Conducting Competency Analysis .................................................................. 35
1.3.5 Skills Assessment Tool ................................................................................................... 36
Chapter 2: Motivation For The Project............................................................................................ 37
2.1 Motivation For The Project ................................................................................................. 38
2.1.1 Research Conducted In U.S. .......................................................................................... 39
2.1.1.A Executive Summary Of The Report 40
2.1.1.B Statistical Facts 41
2.1.2 Research Conducted By Itaa .......................................................................................... 42
2.1.2.A Itaa It Skills Gap Research Program 43
2.1.3 Excerpt From Software Magazine ................................................................................... 44
2.1.4 Excerpt Form Network World .......................................................................................... 45
Chapter 3: Requirements And Specifications ................................................................................. 46
3.1 Project Description ............................................................................................................. 47
3.1.1 Goals And Objectives ...................................................................................................... 48
3.1.2 Context ............................................................................................................................ 49
3.1.3 Usage Scenario ............................................................................................................... 50
3.1.3.A Users 51
3.1.3.B Use Cases 52
3.1.3.C Special Usage Considerations 53
3.1.4 Research Methodology ................................................................................................... 54
3.1.5 High Level Design ........................................................................................................... 55
3.1.6 Deliverables ..................................................................................................................... 56
3.1.7 Validation Criteria ............................................................................................................ 57
3.1.7.A Client Expectation 58
3.1.8 Risk Assessment ............................................................................................................. 59
Chapter 4: Analysis and Desing ..................................................................................................... 60
4.1 Examining The Existing Practices ...................................................................................... 62
4.2 Problems Identification ....................................................................................................... 63
4.3 Formulation Of Methodology .............................................................................................. 64
4.4 Competency Model Framework ......................................................................................... 65
4.4.1 Identifying The Roles ...................................................................................................... 66
4.4.2 Identifying The Job Competencies .................................................................................. 67
4.4.2.A Definition Of Terms 68
4.4.2.B Leadership 69
4.4.2.C General Management 70
4.4.2.Dteamwork 71
4.4.2.E Process Analysis 72
4.4.2.F Technology Leadership & Management 73
4.4.2.G Client Relations 74
4.4.2.H Business Awareness 75
4.4.2.I Training And Development 76
4.4.2.J Technology-Networking 77
4.4.2.K Technology-Computing 78
4.4.2.L Technology-Applications 79
4.4.2.M Technology Others 80
4.4.3 Constructing The Need Matrix ........................................................................................ 81
4.4.4 Assessing The Employees .............................................................................................. 82
4.4.5 Competency Gap Analysis .............................................................................................. 83
Chapter 5: Development Of Matrices, Skill Assessment & Gap Analysis ...................................... 84
5.1 Matrix Development ........................................................................................................... 85
5.1.1 Identifying Job Positions ................................................................................................. 86
5.1.1.A Survey Of Job Postings And The Skill Competency Requirements 87
5.1.2 Product Engineer............................................................................................................. 88
5.1.2.A Responsibilities 89
5.1.2.B Required Competencies: 90
5.1.2.C Expected Competencies 91
5.1.2.D Need Matrix For Product Engineer 92
5.1.2.E Skill Assessment Tool For Product Engineer 93
5.1.2.F Graphical Representation Of Competencies 94
5.1.3 Product Manager ............................................................................................................. 95
5.1.3.A Responsibilities 96
5.1.3.B Required Competencies 97
5.1.3.C Expected Competencies 98
5.1.3.D Need Matrix Development For Product Manager 99
5.1.3.E Skill Assessment Tool For Product Engineer 100
5.1.3.F Graphical Representation Of Competencies 101
5.1.4 Project Manager ............................................................................................................ 102
5.1.4.A Responsibilities: 103
5.1.4.B Required Competencies 104
5.1.4.C Expected Competencies: 105
5.1.4.D Need Matrix For Project Manager 106
5.1.4.E Skill Assessment Tool For Project Manager 107
5.1.4.F Graphical Representation Of Competencies 108
5.1.5 Software Engineer ......................................................................................................... 109
5.1.5.A Responsibilities: 110
5.1.5.B Required Competencies 111
5.1.5.C Expected Competencies 112
5.1.5.D Need Matrix For Software Engineer 113
5.1.5.E Skill Assessment Tool For Software Engineer 114
5.1.5.F Graphical Representation For Competencies 115
5.1.6 Sr. Software Architect .................................................................................................... 116
5.1.6.A Responsibilities: 117
5.1.6.B Required Competencies 118
5.1.6.E Skill Assessment Tool For Sr. Software Architect 119
5.1.6.F Graphical Representation Of Competencies 120
5.1.7 Sr. Software Engineer ................................................................................................... 121
5.1.7.A Responsibilities 122
5.1.7.B Required Competencies 123
5.1.7.C Expected Competencies 124
5.1.7.D Need Matrix Sr. Software Engineer 125
5.1.7.E Skill Assessment Tool For Sr. Software Engineer 126
5.1.7.F Graphical Representation Of Competencies 127
5.1.8 Sr. Software System Architect ....................................................................................... 128
5.1.8.A Responsibilities 129
5.1.8.B Required Competencies 130
5.1.8.C Expected Competencies 131
5.1.8.D Need Matrix For Sr. Software System Architect 132
5.1.8.E Skill Assessment Tool For Sr. Software System Architect 133
5.1.8.F Graphical Representation Of Competencies 134
5.1.9 Example Of Competency Gap Analysis ........................................................................ 135
5.1.9.A Interpretation Of The Analysis 136
Chapter 6: References ................................................................................................................. 137
6.1 References ....................................................................................................................... 138
Appendix A: Glossary Of Terms ................................................................................................... 139
Appendix B: Ideal Model ............................................................................................................... 140
Appendix C: P-CMM Maturity Level–
3 ..........................................................................................................................................................
141 ......................................................................................................................................................
CHAPTER 1 INTRODUCTION

OVERVIEW

The main purpose of Social Reach is about applying data mining techniques to Twitter
using Python in order to get the interesting as well as very useful insights of a user. In
2013, Twitter had reported a volume of 500+ million tweets per day. These numbers are
just the tip of the iceberg when describing how the popularity of social media has grown
exponentially with more users sharing more and more information through Twitter. This
wealth of data provides unique opportunities for all the data mining practitioners to use
their data mining skills and bring some interesting facts and figures out of it.

OPPORTUNITIES
The key opportunity of developing data mining systems is to extract useful insights from
the data. The aim of the process is to answer interesting (and sometimes difficult)
questions using data mining techniques to enrich our knowledge about a particular
domain. This project brings you not all but many opportunities to avail by using this
product. For example with help of this product you can easily mine any Twitter’s user
which may be your friend, your family members your favorite personality, your
colleagues or any other person who has a legit twitter account and you will be provided
with the latest real time insights of that user activity on his account.
SOFTWARE PLATFORM
Social Reach is a web application which has been built using Django (version 1.9) which
is a web application development framework for Python. The core logic of the program
which usually software engineers refer as back-end of web application is written in
Python (version 3) maintaining all the rules and protocols of the language and the user
interface which is usually called front-end of web application has been done using
HTML, CSS, JavaScript and JQuery.

CHALLENGES
With the commendable opportunities there also are challenges which were faced by the
development team and will impact on the performance of this product. Social Reach is a
web application which means anyone can easily use the app from his or her mobile,
laptop, tablet and personal computer etc. but the internet connection would be must for
any user to connect to this application. Following are some of the major challenges:

1) Authentication

a. The user agrees with the consumer to grant access to the social media platform.
b. As the user doesn't give their social media password directly to the consumer, the
consumer has an initial exchange with the resource provider to generate a token
and a secret. These are used to sign each request and prevent forgery.
c. The user is then redirected with the token to the resource provider, which will ask
to confirm authorizing the consumer to access the user's data.
d. Depending on the nature of the social media platform, it will also ask to confirm
whether the consumer can perform any action on the user's behalf, for example,
post an update, share a link, and so on.
e. The resource provider issues a valid token for the consumer.
f. The token can then go back to the user confirming the access.

2) Fetching Data

The Twitter gives us the access some of its APIs (Application Programming Interface )
which we have to call from our backend in order to get data and these APIs return with
the asked data in the JSON (JavaScript Object Notation) format. When using a third-party
API, developers don't need to worry about the internals of the component, but only about
how they can use it. With the term Web API, we refer to a web service that exposes a
number of URIs to the public, possibly behind an authentication layer, to access the data.
A common architectural approach for designing this kind of APIs is called
Representational State Transfer (REST).
3) Data Volume:
When dealing with social data, we're often dealing with big data. To understand the
meaning of big data and the challenges it entails, we can go back to the traditional
definition (3D Data Management: Controlling Data Volume, Velocity and Variety, Doug
Laney, 2001) that is also known as the three Vs of big data: volume, variety, and
velocity. Over the years, this definition has also been expanded by adding more Vs, most
notably value, as providing value to an organization is one the main purposes of
exploiting big data. Regarding the original three Vs, volume means dealing with data that
spans over more than one machine. This, of course, requires a different infrastructure
from small data processing (for example, in-memory). Moreover, volume is also
associated with velocity in the sense that data is growing so fast that the concept of big
becomes a moving target. Finally, variety concerns how data is present in different
formats and structures, often incompatible between them and with different semantics.
Data from social media can check all the three Vs. The data which has been provided to
us by Twitter is in JSON format which means we can classify it as a semi- structured
data.

3) Rate Limits:

The Twitter API limits access to applications. These limits are set on a per-user basis, or
to be more precise, on a per-access-token basis. This means that when an application uses
the application-only authentication, the rate limits are considered globally for the entire
application; while with the per-user authentication approach, the application can enhance
the global number of requests to the API.
The implications of hitting the API limits is that Twitter will return an error message
rather than the data we're asking for. Moreover, if we continue performing more requests
to the API, the time required to obtain regular access again will increase as Twitter could
flag us as potential abusers. . When many API requests are needed by our application, we
need a way to avoid this. In Python, the time module, part of the standard library, allows
us to include arbitrary suspensions of the code execution, using the time.sleep() function.
For example, a pseudo-code is as follows:
# Assume first_request() and second_request() are defined.
# They are meant to perform an API request.
import time
first_request()
time.sleep(10)
second_request()
In this case, the second request will be executed 10 seconds (as specified by the sleep()
argument) after the first one.

COLLECTING DATA FROM TWITTER

In order to interact with the Twitter APIs, we are using a Python client that implements
the different calls to the API itself. There are several other options as well but none of
them are officially maintained by Twitter and they are backed by the open source
community. While there are several options to choose from, some of them almost
equivalent, so we will choose to use Tweepy here as it offers a wider support for different
features and is actively maintained. We have installed the package Tweepy version 3.3 in
our code in order to get authenticated and to start fetching data from twitter.
The Tweepy package gives us two predefined methods by which we can easily get our
application authenticated and to create a twitter client which creates the API object
needed to interface with Twitter.
All the code related to authentication and creating twitter client has been written in
authentication.py file which is our custom made module which we are using frequently
in order to get authenticated and initializing client. The code in authentication.py module
is as follows:

/// CODE GOES HERE

Here we are defining two methods, one is get_twitter_auth() which takes no arguments,
this method is responsible for the authentication and the other method is
get_twitter_client() which also does not take any argument as well as this function is used
to create an instance of tweepy.API, used for many different types of interaction with
Twitter.

CHAPTER 2 STRUCTURE OF TWEET


Internal Structure Of A Tweet:

As we know that the RESTful APIs usually provides the data in very famous JSON type,
and the tweet is a complex object having multiple properties which we have to handle.
The problem here is that we are getting hundreds or thousands of tweets from the user
timeline and we are persisting these tweets in a List which is a very famous data structure
and the Lists in python language provides us many built-in features which helps us in
manipulating data, but every tweet is of JSON type which means that every element of
List should be of JSON type and must be a complete JSON object. Here we are using the
(_json) property of the tweet which gives us dictionary with the JSON response of the
status and appending each status in the List. The complete structure of the tweet is as
follows:

//Structure of tweet

Getting tweets From Timeline:


All the analysis, processing, refining and manipulation of data is being done on the tweets
of the corresponding user, which we are getting from the Twitter’s API. As a Twitter
user, your timeline is the screen that you see when you log in to Twitter. It contains a
sequence of tweets from the accounts you've chosen to follow, with the most recent and
interesting tweets at the top. Here is a code snippet which we are using for getting the
tweets from user’s timeline:

//Code goes here

The preceding snippet shows how to use tweepy.Cursor to loop through the first XYZ
items of User’s timeline. Firstly, we need to import Cursor and the get_twitter_client
function that was previously defined. We have mapped the above code into a method so
that we can easily get the tweets of any user just by passing the username of the user to
the method.

CHAPTER 3 ENTITY ANALYSIS

From the previous chapter we already know how to get tweets from user’s timeline and
how to contain them In the list of tweets as well as we also got to know about the
structure of the tweet. This chapter focus on analyzing entities in tweets. We are going to
perform some frequency analysis using the data collected in the previous section. Slicing
and dicing this data will allow us to produce some interesting statistics that can be used to
get some insights on the data and answer some questions.
Analyzing entities such as hashtags is interesting as these annotations are an explicit way
for the author to label the topic of the tweet.
// code for hashtag Frequency

The above code needs a list of tweets which we have already got to know how to create it.
Then we run a for loop to iterate over all the tweets present in the list, then for each tweet
we are calling get_hashtags(tweet) method which gets a tweet as an argument and returns
the hashtags used in the tweet. After getting all the hashtags from tweets we perform
further analysis on the list of hashtags and determine which are the most used hashtags
used by user in his/her tweets and returns list containing each hashtag and its count.
The previous script gave an overview of the hashtags most frequently used by the user,
but we want to dig a little bit deeper. We can, in fact, produce more descriptive statistics
that give us an overview of how hashtags are used by the user:

// code for hashtag analysis

From the above code we got to know interesting insights of using hashtags by user
because just by watching user’s timeline we don’t even estimate that how many tweets
contain hashtags and how many tweets contain how many number of hashtags. Above
script is just mind blowing and provides you with the result which is enough astonishing
for anyone.

Potrebbero piacerti anche