Sei sulla pagina 1di 14

1

1. Introduction

Due to the wideness of bots, Web applications must try to identify when interacting with an actual human
user from an automated tool. Automated tools can be used for lots of malicious purposes, like scraping,
spamming, and application- level DoS attacks. More exclusively, attackers may use automated tools to post
comments in blogs and forums, create fake accounts, salvage mailing lists, and advertise products. In this
purpose there used a technique named CAPTCHA which is a common security measure used at present
against automated attacks.

CAPTCHA

CAPTCHA an acronym for Completely Automated Public Turing Test to Tell Computers and Humans
Apart. It’s a test; which ensures that it is interacting with a human or a computer. In this test users have to
perform some task like reading digits, words or listening to speech, and then tell the users to type on the
screen what they saw or heard. The picture or sound is usually distorted in various ways. For humans it is
very easy pass the task but difficult and time-consuming for bots to complete. In other words we can say
that CAPTCHA is a class of HIPs that have been able to efficiently prevent Web bots to getting
unauthorized access to Web services. In Turing test, there were two players and one human judge. The
human judge had to find which player was human and which player was computer. For this human judge
ask a series of question to those players. CAPTCHA is similar to Turing test. But the difference is that here
the judge is a computer and the participants are web bots and humans. Computer has to distinguish between
them in this test. CAPTCHAs are based on AI problems that cannot be solved by current computer
programs and Bots but are solved by humans. For building a model of CAPTCHA a standard must be
followed. In every model some principles are widely maintained.

History of CAPTCHAs

• In 1996, Moni Noar suggested the use of an Automated Turing Test to distinguish between human
users and bots.

• In 1997, Andrei Broder et al. developed a mechanism to distinguish between human users and
2

computer programs and also in the same year, the Altavista website used this method to block bot
programs from entering by displaying a distorted English word to the user and asking the user to
copy it.
• In 2002, Broder announced that a CAPTCHA system had been in place for more than year which
had minimized the number of spam advertising URLs by more than 95% .
• 2000 term CAPTCHA was coined by the team led by Manuel Blum and Luis von Ahn at Carnegie
Mellon University .
• 2003 Barid and Monica Chew from California designed the Baffle text CAPTCHA.
• 2004 the Yahoo website utilized a simple version of the EZ-Gimpy method

Why CAPTCHA

CAPTCHAs rose to keep out the website/search engine abuse by bots. With the help of technologies people
are trying to game the system -- they want to exploit weaknesses in the computers running the site. While
these individuals probably make up a minority of all the people on the Internet, their actions can affect
millions of users and Web sites. For example, a free e-mail service might find itself bombarded by account
requests from an automated program. That automated program could be part of a larger attempt to send out
spam mail to millions of people. The CAPTCHA test helps identify which users are real human beings and
which ones are computer programs. Spammers are constantly trying to build algorithms that read the
distorted text correctly. So strong CAPTCHAs have to be designed and built so that the efforts of the
spammers are thwarted.

Basic properties of CAPTCHAs


CAPTCHAs are based on AI problems that can’t be solved by current computer programs or Bots but are
solved by humans. There are three basic properties that CAPTCHAs must satisfy:

1. It should be easy enough for a user to participate and pass the test.
2. It should be easy for tester machine to generate and grade
3. It should virtually accept all human users and reject software robots.
3

2. Different Types of CAPTCHA

Websites use different types of CAPTCHAs as a security measurement to distinguish human users from
Bots Here we discuss about text-based, audio- based, video-based , image-based CHAPTCHAs.

1. Text-Based CAPTCHA:
In text based CAPTCHA characters are distorted and connected to prevent recognition by Bots. Security of
a text based CAPTCHA is increased by adding noise and distortion and arranging characters more tightly.
Usability is always an important issue in designing a CAPTCHA. Successful text used by Microsoft,
Yahoo, and Google use technique that are resistant to segmentation attacks by using random arcs,
connected random lines and crowding characters.

A. Drag and drop CAPTCHA: Desia and patadia proposed DnD CAPTCHA which requires the use of
computer mouse to answer. Here user has to solve a normal text-based CAPTCHA, but the user cannot
type the answer to challenge in a text box. Here User has to drag and drop character blocks into their
respective blank blocks as they appear in the image.

Figure 1: Drag and drop CAPTCHA

B. Gimpy: Gimpy is built by CMU in collaboration with Yahoo for their Messenger service . Gimpy is
based on the human ability to read extremely distorted text and the inability of computer programs to
do the same. In Gimpy ten words are randomly chosen from dictionary and displaying them in a
distorted and overlapped manner. Gimpy then ask the user to enter the subset of the words in the
image.

Figure 2: Gimpy CAPTCHA


4

C. Ez-Gimpy: Ez-Gimpy is a simplified version of Gimpy CAPTCHA, adopted by Yahoo in their signup

Figure 3: Ez-Gimpy CAPTCHA

page. Ez-Gimpy takes a single word from dictionary and displaying it in a distorted manner. User has
to identify the word.

D. Baffle Text: This was developed by Henry Baird at University of California at Berkeley. It pick up
random alphabet to create non dictionary word but pronounceable text. After distorting the text it is
presented to the user and user has to identify it. It beat the drawback of Gimpy CAPTCHA. Gimpy
uses dictionary word so bot can easily break the CAPTCHA using brute-froce.

Figure 4: Baffle Text CAPTCHA

E. MSN CAPTCHA: Microsoft uses a different CAPTCHA for services provided under MSN umbrella.
It also known as MSN passport service CAPTCHAs. It uses eight (upper case) characters and digits
.To distort characters and digits warping is used.

Figure 5: MSN CHAPCHA


5

F. reCAPTCHA:
To archive human knowledge and to make information more accessible to the world, multiple
projects are currently digitizing physical books that were written before the computer age. The book
pages are being photographically scanned, and then transformed into text using "Optical Character

Figure 6: reCAPTCHA
Recognition" (OCR). The transformation into text is useful because scanning a book produces
images, which are difficult to store on small devices, expensive to download, and cannot be
searched. The problem is that OCR is not perfect. reCAPTCHA improves the process of digitizing
books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs
for humans to read .Two words are given here. One that machine knows and another one that
machine does not know. If the user solve the word that machine knows the system assumes their
answer is correct for the other one.

2. Image based CAPTCHA:


In image based CAPTCHAs the user is provided with a small set of images to name or distinguish or
identify anomalies. Here user does not need to read, just simple click is required. Bots find difficulty in
identifying images .They can not recognize human facial expression, color as efficiently as human does.
Various implementation of image based CAPTCHAs are as follows:

a. Bongo: Bongo is named after M.M. Bongard, who published a book of pattern recognition
problems in the 1970s. Bongo asks the user to solve a visual pattern recognition problem. Bongo

Figure 7: Bongo CAPTCHA

displays two series of blocks, the left and the right series. The blocks in the left series differ from
6

those in the right, and the user must identify the characteristic that sets the two series apart. A
possible left and right series is shown in Figure 2.7 These two sets are different because everything
on the left is drawn with thick lines and those on the right are in thin lines. The system then
presents a single image to the user and user has to identify to which set the block belongs.

b. Asirra: In October 2007 Elson et al. developed an image-based authentication system called Asirra
(Animal Species Image Recognition for Restricting Access).In this CAPTCHA user has to identify

Figure 8: Asirra CAPTCHA

cats out a set of 12 photographs of both cats and dogs. Asirra is not scalable. It is powered by over
three million photos from unique partnership with Petfinder.com.

c. Google Image Orientation CAPTCHA: In this CAPTCHA user has to adjust randomly rotated
images to their upright orientation. Google shows off below 3 sample images:

Figure 9 Google Image Orientation CAPTCHA

Google says sample A is easy to orient upwards for humans, but bots may also succeed here because they
may use face detection, sample B is most useful approach: for humans, adjusting to an upwards direction is
7

easy, but for bots it’s not, but sample C is less useful because it’s hard for humans, too, to adjust this one
correctly.

3. Multi Model CAPTCHA:

In this CAPTCHA both text and image based system used. Here an image with four text labels are given.
Text labels are attached to the image. User has to select the right text label.

PIX: Pix an example of mult Model Captcha; is a program that has a large database of labeled images. All
of these images are pictures of concrete objects (a horse, a table, a house, a flower, etc). The program picks

Figure 10 : Multi Model CAPTCHA

an object at random, finds random images of that object from its database, distorts them at random, presents
them to the user and then asks the question "what are these pictures of?" Current computer programs are
not able to answer this question.

4. Audio-Based CAPTCHA:

Audio-Based CAPTCHAs are based on the sound-based systems. These CAPTCHAs are developed by

Figure 2.10: Audio-Based CAPTCHA


8

Nancy Chan for visually disabled users. It contains downloadable audio-clips. Audio CAPTCHAs take a
random sequence drawn from recordings of simple words or numbers, combine them and add some
disturbance and noise to it. This recording is played when the user clicks a button provided on the web
page. The CAPTCHA system then asks the user to enter the words and/or numbers in the recording. Audio
CAPTCHAs are more difficult to solve, hard to internationalize and more demanding in terms of time and
efforts in comparison to text and image CAPTCHAs.

5. Video-Based CAPTCHA:
Video offers the greatest advancement in CAPTCHA
technology since it was first introduced ten years ago, due to
the simple fact that motion in video is very hard for
computers to read, yet extremely easy for humans. In video
based CAPTCHAs three words (tags) are provided to user
which describes a video. If a user's tag belongs to a set of
automatically generated ground truth tags then a challenge
is passed. YouTube which currently stores and indexes
close to 150 million videos used as a video dataset in.

Figure 11: Video-Based CAPTCHA

6. Question-based CAPTCHA

A Question-based CAPTCHA merging OCR-based and Non-OCR-based methods. A simple mathematical


problem is shown to the user in the form of images and the user is asked to answer this question. The
images are selected randomly from a database of images and can be changed. For example, “There are 5
apples, 3 pencils and 4 bananas. How many fruits are there on the table?". Figure 12 shows an example of
a Question-based CAPTCHA.

Figure 12.Example of a Question-based CAPTCHA

Also, the CAPTCHA can place an image in the question instead of text as in the following example in
9

Figure 13.

Figure 13.Example of using images in the question.

Humans can answer this question easily whereas it is very difficult for computers to recognize the phrases
and shapes which are shown and also understand the question. This method has many advantages as
follows:

a) This method is easy as the user only needs to type one number for the answer, hence it is also very
time effective for the user.

b) In this method, the keyboard is not important because the user only has to enter a number so this
method is useful on devices which do not have keyboard or on devices where it is difficult to utilize
a keyboard, such as mobiles and Pocket PCs.

This method does not require any processing to be executed by the client and it can be used on small
devices and on devices with limited resources.

3. Applications of CAPTCHA:

The CAPTCHA test is simple visual test or simple puzzle any human can crack but internet bots will
not able to crack such challenges so bots can get the permission to access the services provided by the
websites. CAPTCHA has the following applications on the web for practical security:

I. Protect Online Polls: In November 1999, slashdot.com released an online poll asking which was the
best graduate school in computer science (a dangerous question to ask over the web!). As is the case
with most online polls, IP addresses of voters were recorded in order to prevent single users from
voting more than once. However, students at Carnegie Mellon found a way to stuff the ballots by using
programs that voted for CMU thousands of times CMU’s score started growing rapidly. The next
day, students at MIT wrote their own voting program and the poll became a contest between voting
“bots”. MIT finished with 21,156 votes, Carnegie Mellon with 21,032 and every other school with less
10

than 1,000. Can the result of any online poll be trusted? Not unless the poll requires that only humans
can vote.

II. Protecting Web Registration: Many companies offer free email and other services. Until
recently, these service providers suffered from a serious problem – bots. Since bots are program
they can easily sign up thousands of email accounts every minute and waste web space. To avoid the
misuse of such services user have to prove that they are human by solving CAPTCHA. For example
Yahoo!, uses a CAPTCHA to prevent bots from registering for accounts. Their CAPTCHA asks users
to read a distorted word such as the one shown below :

Figure 12: The Yahoo! CAPTCHA

III. E- Ticketing: Ticket brokers like Ticketmaster use CAPTCHA applications. These applications
help prevent ticket scalpers from bombarding the service with massive ticket purchases for big
events. Without some sort of filter, it's possible for a scalper to use a bot to place hundreds or
thousands of ticket orders in a matter of seconds. Legal customers become victims as events sell
out minutes after tickets become available. Scalpers then try to sell the tickets above face value.
While CAPTCHA applications don't prevent scalping; they do make it more difficult to scalp
tickets on a large scale.

IV. Prevent dictionary attack: CAPTCHA can be used to prevent dictionary attack in password systems.
The idea is simple; prevent a computer from being able to iterate through the entire space of passwords
by requiring human to type the passwords.

V. Prevent deceiving advertisers: Websites often provide advertising for other sites and get paid when
user visits the advertised website. To deceive advertisers bots generate fake visits. As a result
advertisers have to pay for the ads which were not viewed by human. CAPTCHAs are used to solve
such problem.

VI. Preventing comment spam: Many programs submit large number of automated posts to
increase search engine ranks of that site. CAPTCHAs can be used before a post is submitted to
ensure that the post is done by human. A CAPTCHA won't stop someone who is determined to
11

post a rude message or harass an administrator, but it will help prevent bots from posting
messages automatically.

VII. Email spam: CAPTCHA also provide solution against email spam and worm. Before sending a mail
user have to solve a CAPTCHA to prove them-selves human.

VIII. Search Engine Bots: Some web pages want to be unindexed by search engines. There is an html tag
to prevent search engine bots from reading web pages. But tags does not guarantee that bots won't read
the pages. However in order to make sure that bots won't enter web pages CAPTCHAs are needed.

IX. As a tool to verify digitized books: An application of increasing the value of CAPTCHA is
reCAPTCHA. Harnesses users responses in CAPTCHA fields to verify the contents of a scanned piece
of paper. Because computers aren‘t always able to identify words from a digital scan, humans have to
verify what a printed page says. Then it‘s possible for search engines to search and index the contents
of a scanned document. This is how it works: The application already recognizes one of the words. If
the visitor types that word into a field correctly, the application assumes the second word the user
types is also correct. That second word goes into a pool of words that the application will present to
other users. As each user types in a word, the application compares the word to the original answer.
Eventually, the application receives enough responses to verify the word with a high degree of
certainty. That word can then go into the verified pool.

4. Comparison Among Different Types of CAPTCHA:

 Strength and weakness of text based CAPTCHA


Text based CAPTCHA is the simplest type of CAPTCHA where it is the first type which has been invented and
implemented in Email services and search engine. Text based CAPTCHA consists of English letters and numbers.
These characters are limited so the bot programs and hackers can solve the text CAPTCHA by designing programs
which scan the text CAPTCHA and typing it in the specific place. This problem is solved by doing some
modification on characters such as adding some noise or rotate and scatter letters or present characters as corrupted
and distorted letters or introduce characters as 3D. These modifications cause some problems to the user when he
identify the correct characters because some characters have similar shapes after making modifications.in addition,
Another issue in text based CAPTCHA is presented by English language where some users can’t understand some
types of text based CAPTCHA such as clickable CAPTCHA and Strangeness in Sentences CAPTCHA.
12

 Strength and weakness of CAPTCHA based on image

In this type, some similarity images are introduced to the user to select the suitable image depend on the question
under the CAPTCHA .Although this type is simple there are some problems can face the user when he is trying to
solve image based CAPTCHA:

1. Some users who have low vision or learning disability will meet some issues when they are attempting to
solve this CAPTCHA .

2. Probability of break the CAPTCHA by bot programs will increase if the number of choice is decreased so it
is better to create more options in CAPTCHA to make it strong however this mechanism will consume the database.

3. This CAPTCHA is available just by English language therefore English speakers and some others who have
English vocabulary knowledge will solve the CAPTCHA only.

 Strength and weakness of CAPTCHA based on audio

This type of CAPTCHA is designed for people have visual problem where the recorded words are introduced to the
user and the user should type the word which he/she has listen. Although audio CAPTCHA is available for visually
disabled users, there are some issues may face the users:

1. The added noise to the recorded words is to make the CAPTCHA stronger and product it from breaking by
Bot programs, can confuse the user and leading to fault answer and may

2. Audio CAPTCHA is introduced by English language therefor just users with English ability can solve this
type.

3. In English language there are some letters have similar sound such as J &G, C & K .this can cause confusing
to the user.

 Strength and weakness of CAPTCHA based on video

CAPTCHA based on video represents in introducing a short movie contains person representing some kind of action
and the user must select correct description from the list.
13

The size of video in this method is large so the users will face issue when they downloading it from the internet .This
problem can lead users to leave the website or Email which they was trying to utilize it. Another issue can affect
some users who do not have English language ability because video CAPTCHA is available by English language
only.

 Strength and weakness of CAPTCHA based on puzzle

Puzzle CAPTCHA represents on introducing parts of images and the user should to merge the parts or identify a
particular part of image. This mechanism consume more time to solve so user may be bored and leave the website.
Furthermore, users who have low vision will face image identification issue

5. Different types attacks

CAPTCHAs are an important and widely used modern Internet technology to protect websites
Unfortunately, in reality CAPTCHAs have not been quite so successful .Designers of malicious automated
agents have shown their capabilities of breaking CAPTCHAs using various mechanism Different types
attacks that attacker use to break CAPTCHAs are :

1. Dictionary Attack: It is a method of breaking into a password protected computer or server by

Figure 13: Dictionary Attack


14

systematically entering every word in a dictionary as a password. There are two types of dictionary attack

 Online Dictionary Attack: In online dictionary attack each password attempt is sent to verifier to check.
 Offline Dictionary Attack: In offline dictionary attack the attacker knows something that allows him to
determine the password correctness by himself.

2. Pixel Count Attack: In this attack, the


number of foreground pixel is counted in each
segment character and used to look up in a pre-
computed table to determine the character in the
segment.
Figure 14: Pixel Count Attack

3. Denial-of-service Attack: It is an incident in which a user or organization is deprived of the


services of a resource they would normally expect to have.

4. Brute-force Attack: In brute force attack, it systematically check all possible keys or password
until the correct one is found.

6. CONCLUSIONS
CAPTCHA plays important role in World Wide Web security where it prevents Bot programs and Hackers
from abusing online services. This paper has presented concepts and history of CAPTCHAs, and discussed
their applications. This paper describing classification of current CAPTCHA methods based on text,
images, voice, video and puzzle. In each classify there are many varied methods has introduced and
discussed. In addition, we discussed the strength and weakness of each category. Finally, we can say some
improvements and develop multi linguistics CAPTCHAs May results in better security system and website
protections.

Potrebbero piacerti anche