Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract — The aim of this paper is to elucidate the Zhang et al. (2012) claimed that the method used for
implications of Machine Learning in detecting the threat phishing varies from region to region. Chinese Phishers
of Phishing. Machine learning can provide an efficient register a new domain to host phishing websites while
method in detecting if a website or pop-up is a phishing Americans hack an existing domain to deploy the phishing
website or not. The term phishing comes from “fishing”, website [1].
probably influenced by phreaking, and alludes to the use
of increasingly sophisticated lures to "fish" for users'
financial information and passwords. There are many
different techniques to combat phishing, including
legislation and technology created specifically to protect
against phishing.
In this paper one will come across various methods to
detect the phishing attacks using Machine Learning. The
concept of Machine learning is largely used and is
evolving at a rapid rate in today’s technological world.
Phishing attacks could be detected in a very efficient and
sophisticated manner with the implication of Machine
learning.
Keywords— Machine Learning, Phishing attacks, Phishers, Figure 1: The Phishing Procedure
Domain, Websites, Decision Tree.
Every phisher carries out phishing in a generic approach.
INTRODUCTION The process can be elaborated as follows:
1. Planning: In this step, the phishers decide which
Phishing is a type of a fraudulent practice of organization to target and what information to get
sending messages of emails to prominent entities of an hold of. They also decide the strategy to get their
organization to persuade them to reveal personal private information.
information, company related information, and other 2. Setup: After the victim has been decided, the
sensitive information. phishers create the basic setup to attack the victim
Generally, in Phishing, the victim receives a and persuade him to give up the relevant
message via any communication medium which appears to information. This often involves creation of e-mails
have sent by a known contact or organization. Such or websites, etc.
messages look authentic but contain malwares to steal
sensitive information. 3. Attack: After the creation of the setup, the phishers
deploy the website or sends the e-mail to the victim.
Users if not alert, fall prey to such malwares and
lose authenticity of their private information. According to 4. Collection: If the victim falls into the trap of the
the 3rd Microsoft Computing Safer Index Report, released phishers, they have to collect the information leaked
in February 2014, the annual worldwide impact of phishing by the victim.
could be very high as $5 billion. [3s] 5. Illicit use of information: Phishers use the
The more sophisticated a phishing email becomes, information to commit frauds, Identity thefts and
the more difficult it is to detect. Fortunately, we have the many other illicit activities.
sophisticated Machine learning approach to detect Phishing.
I. PHISHING STRATEGIES
Phishing has established itself as a major security threat
in today’s web-driven world. The people who carry out
Phishing, colloquially known as Phishers, choose from a
variety of ways to harm the data security of millions of users
involved in the web traffic around the world.
The primary reason of many web servers falling prey to
such phishing websites is their vulnerability. The weakness
in the web servers gets exploited immensely to host a
phishing website without the knowledge of the owner.
It is also possible for a phisher to host a legitimate and Figure 2: Example of Phishing Email
independent server just for carrying out phishing activities.
Based on the mode of attack, Phishing can be classified such random links. This could potentially be a
into the following types: phishing attack.
1. Deceptive Phishing: This is the most common
phishing approach where the phishers act as a 3. Check the links [6]: In case if you encounter a link
legitimate organization in order to steal someone’s or website which seems suspicious you can just
login credentials or other personal information. This copy the link and check it on different websites
also includes the “Dropbox phishing” and the available which will tell you if that website or link
“Google Docs Phishing”. is malicious or not.
2. Spear Phishing: This approach is an advancement to 4. Secure connection [4]: This is usually identified by
the deceptive approach. Here, the phishers lure the
a green area in the address bar, along with https in
victim to give up personal sensitive information by
the URL.
acting as a sender with whom the victim has a
connection. This includes the “Whaling” attack.
5. Check the Grammar [4]: Usually if a site or popup
3. Pharming: The Internet uses DNS to locate the or link is a phishing attack the grammar written is
servers. DNS converts the alphabetical website very poor in structure and is clear giveaway to stay
names into numerical IP addresses which are related out of that site or link.
to the servers. In Pharming, the Phisher changes the
IP address related to the website name. Hence, he 6. Keep software updated [5]: Phishing malware
can redirect the user to malicious website to steal usually depends on the system bugs to attack. If the
his information. system is updated regularly such bugs would go
away and thus prevent the chances of an attack
II TRADITIONAL METHODS FOR PHISHING happening.
DETECTION
Criminal hackers have been using phishing since 7. Beware of offers [13]: Most of the malware will
long to gain secret and sensitive information from the users. lure the customer or user using a “too good to be
These phishing websites, emails, ads are very well disguised true” offer or some exciting deal. Never fall prey to
and very much replicating the ones that the user trusts such false schemes. When something is too good to
enough to enter one’s sensitive information. be true most of the times it is not.
But no matter how complex and difficult to detect
these can never be perfect. There are very simple and easy
things a user could look out for while determining if a
program is legitimate or is a phishing candidate.
i. Meta Tags Lets say, we select the length of the URL as the feature. The
ii. Images Websites will be divided into two sets denoting Long and
iii. Page title short URLs. The entropy and the gain will be calculated.
These features give out information like: This calculation is repeated for every feature that is relevant
i. Website Category to us. When all the calculations are done, a decision tree is
ii. Requirement of login through third party created. As we traverse the tree downwards, all the nodes
domains. will have high purity.
iii. Information about the traffic.
All the above three criteria when scanned, give a clear
picture about the website we’re using.
IV DETECTION PROCESS
REFERENCES
[1] Oluwatobi Akanbi and Elahe Fazeldehkordi, A ML
approach to phishing detection and defense, 2014 edition.
[3] https://towardsdatascience.com/phishing-domain-
detection-with-ml-5be9c99293e5
[4] https://www.itgovernance.co.uk/blog/5-ways-to-detect-
a-phishing-email
[5] https://ssd.eff.org/en/module/how-avoid-phishing-
attacks
[6] https://www.makeuseof.com/tag/4-general-methods-
detect-phishing-attacks/
[7]https://www.sciencedirect.com/science/article/pii/S09574
17418306067
[8] https://link.springer.com/chapter/10.1007/978-3-319-
72598-7_20
[9] https://www.icann.org/resources/pages/phishing-2013-
05-03-en
[10] https://www.business.com/articles/machine-learning-
spear-phishing/
[11] https://www.globalsign.com/en-in/blog/how-to-spot-a-
fake-website/