Paper-2 Smart Dictionary Attack Based On Patterns Trained From Disclosed Passwords

International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No.
7 ISSN: 1837-7823
Smart Dictionary Attack Based on Patterns Trained from Disclosed Passwords

Hsien-Cheng Chou1, Hung-Chang Lee2, Hwan-Jeu Yu1, Kuo-Hsuan Huang4, Fei-Pei Lai1,3
1
Department of Computer Science and Information Engineering, National Taiwan University, Taiwan d96922034@csie.ntu.edu.tw, ecpro@seed.net.tw
2
Department of Information Management, Tamkang University, Taiwan hclee@mail.im.tku.edu.tw
Graduate institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taiwan flai@csie.ntu.edu.tw
4
Department of Computer Science and Engineering, Tatung University, Taiwan khhuang@ttu.edu.tw
Abstract
Password-based authentication systems are still the most commonly used mechanism for protecting sensitive information despite being vulnerable to dictionary based attacks. To guard against such attacks, many organizations enforce complicated password creation rules and require that passwords include numeric and special characters. This study demonstrates that as long as passwords are not difficult to remember, they remain vulnerable to smart dictionary attacks. In this paper, a password analysis platform is developed to formally analyze commonly used passwords and identify frequently used password patterns and their associated probabilities. Based upon these patterns, we establish a model consisting of a Training set, a Dictionary set and a Testing set (TDT model) to generate probabilistic passwords sorted in decreasing order. The model can be used to dramatically reduce the size of the password space to be searched. Moreover, as long as a password is successfully cracked by the model, the cracked password can be fed back to the Training set, and regenerate the new passwords again by the TDT model. Through this repeat training mechanism, the generated passwords become better adapted to the targets, increasing the hit rate. Simulation results show that the number of passwords cracked using the TDT model is 1.58 and 2.82 times higher compared to the John-the-Ripper attack and Brute-force attack, respectively. We also design a hybrid password cracking system combining different attacks to verify the effectiveness of the proposed method. After applying the TDT model, the number of passwords cracked increased by up to 297%. Keywords: Password cracking, dictionary attack, brute-force attack, TDT model.
1. Introduction
1.1 Background
Identity authentication or access control through passwords is still a widely used method of ensuring system security, despite the increased use of alternative techniques such as graphical passwords [1], smart card, and biometrics. However, password based security is vulnerable to the dictionary attack [2]. In order to force users to create strong passwords, system administration policies often enforce several complex rules intended to force users into creating strong passwords [3, 4]. Under such rules, users may be required to use numeric or special characters, have to enter passwords of a minimum length, and avoid words found in a dictionary. Users often struggle to create passwords that meet these requirements. However, it also becomes difficult to crack such passwords using a dictionary attack. For example, consider the password 'computer123!'. If password rules were not enforced, a user may simply have selected 'computer' as their password. However, in order to satisfy the requirements of the system administration policy, they may have to append numeric characters '123' and a special character '!'. In this way, security is enhanced because the difficulty of cracking the password increases from 268 to 6812. In addition to enforcing high password complexity, many password-based cryptosystems execute multiple 15
International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823 iterations in their core encryption algorithm to defend against the Brute-force attack. Take the WORD software as an example, where an attacker can crack more than 30,000 passwords per second for the 2003 edition using a dual quad-core 2.33GHz Intel computer. However, it is only able to crack around 400 passwords per second for the 2007 edition. Therefore, taking WORD 2007 as an example, for a lowercase string of length 8, it may take 16.5 years (268/400*86400*365) to crack using the Brute-force attack.
1.2 Motivation
Due to rapidly changing user habits for password selection and enhancements in the encryption algorithm, the dictionary attack and brute-force attack face an enormous challenge. This paper, looking from the attacker's standpoint, aims to overcome this problem and design a good method for generating highly efficient passwords for cracking. The proposed method makes the dictionary attack stronger by overcoming its shortcomings and also overcomes the limitations of the brute-force attack. In this paper, we analyze user habits in creating passwords, and design an effective password attack method. First, we design a password analysis platform using Flex [5, 6] with Cygwin [7]. It uses regular expressions to categorize and analyze disclosed passwords to understand how users create passwords. By analyzing the disclosed passwords, we are able to determine how users incorporate the required character types in their passwords and summarize some frequently used password patterns with their associated probabilities. We then create a model comprising a Training set, a Dictionary set and a Testing set, called the TDT model. This model can generate passwords and sort them in decreasing order of probability. As long as a password is successfully cracked by the model, the cracked password can be fed back to the Training set, and regenerate the new passwords again. Through this repeat training mechanism, the generated passwords become better adapted to the targets, increasing the hit rate. Finally, we devise a hybrid password cracking system, which is a three stage sequence comprising a dictionary attack, the TDT model attack, and a brute force attack. We then simulate and apply this system to crack UNIX access passwords.
1.3 Structure
The content of this paper is structured and organized as follows: Section 2 describes the related researches on password attacks. Section 3 analyses pattern analysis from disclosed passwords and proposes the strategy for generating passwords. Section 4 presents the experimental results and improved methods. Section 5 demonstrates the physical application in cracking UNIX access passwords. Section 6 contains conclusion and future work.
2. Related Researches
Techniques for cracking or acquiring passwords include social engineering, phishing and shoulder surfing. However, most current identity authentication attacks (focusing on passwords) are still based on the dictionary attack or on brute-force methods. Other attacks such as time-memory tradeoff [8, 9] are gradually gaining importance as computing power and storage space increases. However, this attack requires substantial computational time and effort to build Rainbow tables [10] and thus has not been widely deployed. In addition, an effective defense has been mounted against Rainbow tables by exploiting the fact that some hash functions are combined with a random salt (for example salted SHA-1), which causes the same passwords to produce different hash values. The effectiveness of dictionary attack and brute-force attack has been challenged by the increasing awareness and emphasis on secure password policies. In addition, many password-based cryptosystems have incorporated more complicated protection mechanisms into existing encryption algorithms to defend against a brute-force attack, increasing the time required for every password guess in a brute-force attack. Castelluccia [12] and Narayanan [13] proposed a password-cracking technique based on a Markov model, in which password guesses are based on the contextual frequency of characters. Weir et al. [14, 15] presented a novel password-cracking technique that used the text structure from training data while applying mangling rules to the text itself. The authors found their technique to be more effective than the John the Ripper tool [16]. In a separate study, Zhang et al. [17] found Weirs algorithm to be the most effective among the techniques they used. In 2011, Kelley et al. [18] applied Weirs algorithm and a variation of the Markov model to generate simulated passwords and evaluate their strength. However, Weirs algorithm inherently depends on common dictionaries. Weir et al. used three different dictionaries to generate passwords. Their method will not be effective if users have used words that are not in 16
International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823 these dictionaries as passwords. For example, if the password is '!qazxsw@', based on the meaningless sequence of characters 'qazxsw', then Weirs algorithm will not be able to crack the password. Although 'qazxsw' is not a meaningful word, it is a keyboard pattern. This paper applies a variation of Weirs algorithm to generate more passwords more efficiently. We analyzed the patterns found in disclosed passwords, and developed the TDT model to generate passwords. These generated passwords were then used to crack UNIX access passwords.
3. Principles and Methods

3.1 Pattern Analysis from Disclosed Passwords
Users password usually comes from printable characters. These characters are used in forming the passwords. We define the elements and password patterns obtained from the disclosed passwords.
3.1.1 Basic Elements and Patterns of Disclosed Passwords

The basic elements of users disclosed passwords are the printable characters. These characters available on a keyboard can be categorized into four different types, i.e., Numeric (N), Lowercase (L), Uppercase (U) and Other (O). Table 1 shows all the 94 printable characters and their types.
Table 1: Basic Elements and Types of Users Passwords. Type Numeric (N) Lowercase (L) Uppercase (U) Other (O) Character Number 10 26 26 32 Basic Elements 0123456789 abcdefghijklmnopqrstuvwsyz ABCDEFGHIJKLMNOPQRSTUVWXYZ ~`!@#$%^&*()_-+=[]\{}|;:',./<>?
In order to describe and analyze the disclosed passwords comprising of N, L, U and O strings, a terminology is defined as follows. Definition 1: (N+, Nn) A Number string (denoted N+) is a continuous sequence of characters of N type. A Number string of length n (denoted Nn) is a continuous sequence of characters of N type of length n. Definition 2: (Element) An element of type Nn is an instance of Number string Nn found in a disclosed password. For example, from an instance of disclosed passwords like '12348888', we obtain an element '12348888' of type 'N8', which also belongs to 'N+'. By the definition 1, 'N+' represents either one of N1, N2, N3, , Ni and denotes a super class among N1, N2, N3, , Ni. Lemma 1: The same definition also applies to Lowercase string (L+, Ln), Uppercase string (U+, Un) and Other string (O+, On). An Alpha string (A+, An) is a continuous sequence of characters of L or U types. Users create their passwords as patterns based on a combination among the strings of character types, defined as follows: Definition 3: (Password pattern and pattern class) The password pattern of a password is the combination of strings of character type (N, L, U and O) in a password. It is represented in terms of the combinations strings of Nn, Ln, Un and On, where the subscript n represents the length of the corresponding type. The pattern class represents a set of password patterns and is a combination of strings of N+, L+, U+ and O+. For example, the password pattern of the password 'Password10!!' is 'U1L7N2O2' since the password consists of four elements, i.e., 'P', 'assword', '10', and '!!'. These elements belong to patterns U1, L7, N2, O2, respectively. Furthermore, the password pattern 'U1L7N2O2' is an instance of pattern class 'U+L+N+O+'.
3.1.2 Statistics of Password Patterns

In October 2008, a phishing attack was launched to obtain the email addresses and passwords of Rockyou 17
International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823 users [19]. The phishing attack received much publicity, and the 14,344,389 passwords obtained were made available for download. In the paper, we use these passwords as disclosed real password data for further analysis. To analyze the password corpus more systematically, we developed a password analysis platform using Flex [5, 6] with Cygwin [7]. The Cygwin package provides a complete UNIX environment from within a Windows PC. After installing Cygwin, Flex will be able to complete all flex assignments on a PC. Flex is normally used to partition a stream of characters into tokens. It takes a specification that associates regular expressions with actions as input. With our password analysis platform, we represent the pattern classes by regular expressions, and use Flex to obtain statistics. Table 2 lists the top ten pattern classes in RockYou disclosed passwords. The top ranked 'L+N+' pattern class comprises 32.91% of the total. For the 'L+N+' pattern class, we further calculate the password patterns according to various lengths of the Lowercase and Number strings. The result is shown in Table 3. After obtaining the statistics of password patterns for each of the pattern class, we parse the passwords according to the previously defined four strings (N, L, U, O) and then count their probabilities respectively. Table 4 describes the statistics of length between one and ten in Number strings. It is clear that pattern of length two occurs more frequently than any others do.
Table 2: Top Ten Pattern Classes from Rockyou.
Pattern Class L+N+ L+ N+ N+L+ L+N+L+ U+N+ U+L+N+ U+ + + + L O L L+O+N+
Occurrences 4,720,184 3,726,129 2,346,744 499,167 388,157 325,941 236,331 229,875 172,279 144,129
Percentage of Total 32.91% 25.98% 16.36% 3.48% 2.71% 2.27% 1.65% 1.6% 1.2% 1.0%
Table 3: Top Ten Password Patterns in 'L+N+' Pattern Class from Rockyou.
Password Patterns L6N2 L5N2 L7N2 L4N4 L4N2 L8N2 L6N1 L7N1 L5N4 L6N4
Occurrences 420,318 292,306 273,624 235,360 215,074 213,109 193,097 189,847 173,559 160,592
Percentage of 'L+N+' 8.91% 6.19% 5.80% 4.99% 4.56% 4.51% 4.10% 4.02% 3.68% 3.40%
Table 4: Patterns of Length between One and Ten in Number Strings.
Patterns of Number Strings N2 N4 N1 N3 N6 N7 N8 N10 N9 N5
Occurrences 2,131,337 1,330,610 1,321,553 928,760 776,436 608,993 523,130 503,930 343,396 272,088
Percentage of Total 24.38% 15.22% 15.12% 10.63% 8.88% 6.97% 5.98% 5.77% 3.93% 3.11% 18
International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823
In addition, we also analyze the password patterns generated by the keyboard positions. The Keyboard strings refer to passwords formed by relative keyboard positions and associated finger movements. Its examples include 'keyboard neighbor' and 'keyboard parallel n', which are parallel patterns of length n [21]. In this paper, such passwords are referred to as Keyboard strings, and represented by the symbol K. For example, the Keyboard string '1qa2ws' is formed by a keyboard parallel 3 pattern on the keyboard. Using Flex, we overcome the limitation on recursive expressions, and use regular expressions to analyze keyboard neighbor and parallel relationships. The statistics are given in Table 5. Notice that although the percentage of keyboard strings is not high, with increasing awareness and emphasis on password security, more and more users choose this structure for their passwords.
Table 5: Statistics of Keyboard Strings in Rockyou.
Keyboard String Keyboard Neighbor Keyboard Parallel 3 Keyboard Parallel 4 Keyboard Parallel 5
Example asdfvcxz 1qa2ws 0okm9ijn 12345qwert
Numbers of Occurrences 20,739 1898 780 231
Percentage of Total 0.15% 0.013% 0.0054% 0.0016%
3.2 Strategy for Generating Passwords

From statistics of disclosed passwords for RockYou described in Section 3.1, we summarize some common patterns for selecting passwords. 1. Many passwords are composed of Lowercase and Number strings, i.e. 'L+N+', the Lowercase strings are likely appended by Number strings. 2. Number or Other strings are inserted before or appended after Alpha strings, e.g., 'jessica!!' belongs to 'L8O2' pattern, the Alpha string 'jessica' is appended with the Other string '!!'. 3. Alpha strings include Keyboard strings, e.g., 'qwerty123' belongs to 'K6N3' pattern, containing the Keyboard string 'qwerty'. Instead of trying to create rules that mimic these common password patterns, we assign probabilities to these patterns, and then use those probabilities directly to generate very fine-grained rules, and thus generate passwords for cracking.
3.2.1 TDT Model

Based on the analysis of the disclosed passwords in Section 3.1, we create the TDT model consisting of the Training set, Dictionary set and Testing set, to generate the passwords. Three sets in the model are defined as follows. Definition 4: (Training set, Dictionary set and Testing set) The Training set specifies users disclosed passwords. The Dictionary set specifies dictionary words, Alpha strings or Keyboard strings. The Testing set specifies the cracking targets, which are plaintext passwords. If a password is used in the Training set, we ensure that it is not included in the Testing set. The TDT model uses the Training set to create password rules, and use these rules along with the Dictionary set to generate passwords. Then it uses the generated passwords to crack passwords in the Testing set. Once passwords have been cracked, they can be fed back to the Training set. When the Training set is updated, we can update UDP rules and generate new passwords. The framework of the TDT model is shown in Figure1.
19
Dictionary Set
Training Set (real world passwords)
Rules
Generated Passwords
Testing Set (cracking passwords)
Feedback cracked passwords
Figure 1: Framework of TDT Model.
3.2.2 Establishing UDP Rules

Each password in the Training set is parsed. The parsed strings in a password that have Number strings and Other strings are called 'leaf nodes' and those having Lowercase, Uppercase, and Keyboard strings are called 'inner nodes'. Leaf nodes are categorized based on character types and their length. Their probabilities are calculated based on the number of times they occur in the Training set. Inner nodes are used as dictionary words and added to the Dictionary set. At this point, dictionary words in the Dictionary set do not have probabilities associated with them, so we set the probability for inner nodes is 1.0. Moreover, we use the symbols Nn, Ln, Un, On, Kn defined in Section 3.1 to represent each password pattern, where 'N' stands for Number, 'L' for Lowercase, 'U ' for Uppercase, 'O ' for Other, 'K' for Keyboard, and the subscript n represents the length of the corresponding symbols. For example, the password pattern of 'abc123' is 'L3N3'. Based on this rule, passwords with the same patterns will be merged, and the probability of each pattern can be calculated. Thus, we can calculate the probability of each password pattern and find which among them are used more frequently. We describe this process through a simple example. Assume that there are eight passwords in the Training set as shown in Figure 2. For leaf nodes, we categorize them by length, and then calculate their probabilities. Among strings of length=1, the Other string '@' appears twice, and '!' appears once. Therefore, the probability of '@' is 2/3=0.67, while that of '!' is 1/3=0.33. Using the same method, we can find the probabilities of Number strings of length three and four, as shown in Figure 2 (a). Inner nodes are used as dictionary words, as explained in Section 3.2.3. In addition, after converting the eight passwords into their patterns, we merge these patterns and obtain five distinct patterns, as shown in Figure 2 (b). For example, the three passwords 'abc123', 'cat789' and 'man666' all have the 'L3N3' pattern, so its probability is 3/8=0.375. Note that the higher the probability of a pattern, the more frequently it is used by users.
Number/Other Strings Length String @ 1 ! 456 123 3 789 666 4 1970 (a) Password Pattern L3N3 L8N3 O1K6O1 L8N4 U1L3O1 (b) Figure 2: Example of Generated UDP Rules Using a Training Set. Probability 0.67 0.33 0.4 0.2 0.2 0.2 1.0 Probability 0.375 0.25 0.125 0.125 0.125
Training Set abc123 password456 !qwerty@ cat789 man666 computer456 football1970 Hero@
20
Dictionary Set
Training Set abc123 password456 !qwerty@ cat789 man666 computer456 football1970 Hero@
Parsing
Dictionary Words
abc password qwerty cat man computer football Hero princess nicole you
Figure 3: The Dictionary Set.
3.2.3 Passwords Generation

Using the UDP rules created in Section 3.2.2, we generate passwords with the help of a Dictionary set. The Dictionary set mainly consists of commonly used dictionary words, or the inner nodes parsed from the Training set. By constructing the Dictionary set, we can gain a larger coverage of the target set of passwords. The core idea of this paper is to generate highly efficient passwords by using the Dictionary set as the basis and combining it with rules generated from the Training set. An example Dictionary set is shown in Figure 3. In addition to the eight inner nodes parsed from the Training set in Figure 2, we added the three words 'princess', 'nicole' and 'you' to the Dictionary set. Next, we use the Dictionary set and the UDP rules to generate passwords and calculate their probabilities. The formal mathematical definition is as follows. Theorem 1: Assume that P(x) be the probability of the pattern of a password x, which x = t1t2t3tn is the pattern of a password x, consisting of t1, t2, t3, , tn inner or leaf nodes. Let v ( ti ) be the probability of the ith inner or leaf nodes of length in the Training set. Therefore, the probability of generated password x is given by
i
P ( x ) =x ) v i ( ti ), ti {inner nodes, leaf nodes} P(

i =1
Since the probability of each generated password can be calculated, we sort the probabilities of generated passwords in decreasing order and apply a threshold . Therefore, the generated passwords from the TDT model are
= D ,
n X : P( ) i ( ti ) , ti {inner nodes, leaf nodes} i =1
When a password is successfully cracked by our method, we can move the password back to the Training set, and rebuild the UDP rules. Through this repeat training mechanism, the generated passwords become better adapted to the targets, increasing the hit rate.
4 Experiments and Results 4.1 Password Generation and Cracking

To reduce computational complexity, we randomly divided the 14,344,389 leaked Rockyou passwords into 28 subsets using the GNU shuf tool. Each such subset consisted of 500,000 passwords. For convenience, we selected the first and last subset as the Training set and Testing set respectively, called Rockyou1 and Rockyou28. The Dictionary set used is Dic-0294, which is a commonly used password cracking dictionary [20]. This dictionary contains 869,228 unique words. We selected Rockyou28, Myspace, and Phpbb plaintext passwords as the Testing set. This was then used to evaluate performance. The complete Myspace and Phpbb list contained 36,764 and 184,389 plaintext passwords, respectively. A summary is provided in Table 6, and experimental results are presented in Figure 4. 21
Table 6: The Training Set, Dictionary Set and Testing Set.
Set Training set Dictionary set Testing set
Name Rockyou1 Dic-0294 Rockyou28 Myspace Phpbb
Number 500,000 869,228 500,000 36,764 184,389
In Figure 4, the x-axis represents the 1 million, 5 million, 50 million and 450 million passwords generated by our TDT-model. The y-axis depicts the corresponding numbers of cracked passwords for Myspace, Phpbb and Rockyou28 in the Testing set. To clarify further, we will consider the case of 450 million passwords as an example: the number of cracked passwords for Myspace is 12,132, accounting for 32.99%. For Phpbb, the number is 56,460 cracked passwords, accounting for 30.62%, and for Rockyou28 the amount is 137,800 cracked passwords, accounting for 27.56%. In the next step, we will apply the cracked passwords as feedback to the Training set, and try to regenerate new second-round passwords, and crack the remaining passwords in the Testing set. To further illustrate, we define the n-round, where n may be the first, second, etc., representing the time required to crack passwords of targets in the Testing set.
Myspace 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 -
Phpbb
Rockyou28
137,800 99,100
Number of Cracked Passwords
62,750 23,122 12,300 34,314 5,592
47,351 9,830
56,460 12,132
1,077
50
450
Number of Generated Passwords (Millions)
Figure 4: Number of Passwords Cracked.
4.2 Feedback Mechanism of Cracked Passwords

4.2.1 Problem
When we inputted the cracked passwords as feedback to the Training set and generated second-round passwords, we found that these generated passwords were the same as the first-round passwords, and only the order of partial passwords was altered. To fully analyze the reasons for this, we discovered that passwords generated using our TDT-model are based mainly on password patterns determined by users disclosed passwords in the Training set (as shown in Figure 2 (b)), i.e., password patterns of cracked passwords are consistent with password patterns in the Training set. Therefore, we could not alter or change password patterns when we fed the cracked passwords back into the Training set. Considering Figure 2 as an example, we will assume that the generated password 'abc456', belonging to the 'L3N3' password pattern, can crack a password in the Testing set. In this case, the use of the password as feedback to the Training set cannot alter or change the password pattern. The reason is that the 'L3N3' password pattern appeared in the Training set, and when the passwords are inputted back into the Training set, only the probability changes, increasing from 0.375 (3/8) to 0.444 (4/9). Therefore, if we want to generate new passwords in the second-round, the key condition is to generate new password patterns that are different from password patterns in the first-round.
22
4.2.2 Improved Methods

To solve the bottleneck encountered in Section 4.2.1, i.e., to improve the hit rate of the second-round, we experimented with three improved methods to increase the password patterns in the Training set and to enhance the diversity of each password pattern. Method 1. Improved Password Patterns In the method that we developed, password patterns are essential to determine the formation of generated passwords, therefore, if we are able to obtain more password patterns in the second-round, it is possible to generate a greater number of different passwords. We propose here an improved method to obtain more password patterns: first, we summarized password patterns in the Training set. Second, we adopted a brute-force method to generate length 1~8 password patterns that do not appear in the Training set according to the character types (N, L, U and O) and assigned the associated probability to each. To obtain the probability of each password pattern, we calculated the number of length 1~8 password patterns that do not appear in the Training set. Taking Figure 2 for example, the total number of length 1~8 password patterns is 87,380. Excluding the 'L3N3' pattern in the Training set, the total number of password patterns is 87,379, and therefore the probability of the length 1~8 password patterns generated using the brute-force method are
1 = 1.14*105 87379
Method 2. Improved Number and Other Strings Upon performing a statistical analysis of Section 3, we discovered that many users set complex and easy-toremember passwords by using Alpha strings as the basis of their passwords, combined with a variable-length Number or Other strings. For example, 'jessica!!' belongs to the 'L8O2' pattern, as the Alpha string 'jessica' is appended with the Other string '!!'. From the statistics in Table 4, we know that Number strings of length 1~4 occur more frequently than any others, as do Other strings of length between 1~2. In order to generate more new passwords in the second-round, we devised a strategy as follows: first, we classified and analyzed the Number and Other strings in the Training set. Second, we used the brute-force method to generate Number strings of length 1~4 and Other strings of length 1~2 that do not appear in the Training set. For the example in Figure 2, the only Number strings of length three that appear in the Training set are the '123', '456', '666', and '789' strings, and therefore, we generated all other Number strings using the brute-force method, formed as '000', '001', ..., '122', '124', , '999', and then assigned the associated probabilities. To calculate the probabilities, we adopted a variant of the Laplace smoothing [22] defined as follows:
( Number of Appeared in the Training Set ) + (Total Number of Value s in a Category ) + ( Number of Categories * )
where represents a value between 0 and 1, where > 0 is the smoothing parameter ( = 0 corresponds to no smoothing). For the example in Figure 2, for an value of 1, five Number strings of length 3 were observed in the Training set, but the '777' was never encountered. The '777' string would therefore be assigned a probability:
0 +1 0.00099 = 0.099% 5 + 1000 *1
Method 3. Character Replacement According to the analysis in [16], more and more characters in passwords are being replaced by corresponding similar shapes, sounds, and meanings to enhance password complexity. For example, the character 'a' is replaced by '@' and 'o' is replaced by '0' in the password, 'p@ssw0rd'. Such common characters are organized in Table 7. This technique can also be used to increase the volume of new generated passwords.
23
Table 7: Replacement Characters.
Character 0 1 2 3 4 5 6 7 8 9
Replacement Character o, ) e, !, l r, @ e, # d, $ f, % l, ^, 9 c, & g, * 6, (
Character a b c d e f g i l o
Replacement Character @ p 7 b 3 5 8 1 1 0
Character p q r s t u v w x z
Replacement Character b 9 2 $ d w ^ u s 2
The purpose of the previous improvements, i.e., Method 1, is to increase password patterns in the Training set, and Methods 2 and 3 are to increase the diversity of new generated passwords of each password pattern, i.e., each password pattern can generate more new passwords. For example, in Figure 2, when we apply our first improvement, i.e., Method 1, the new password pattern 'N3L3' will be added to the Training set. The selection of the Number strings 'N3' in the original method only has the four strings '456', '123', '789', and '666'. However, when we apply our second improvement, Method 2, Number strings can be selected from '000', '001', ..., to '999'. The application of our method can substantially increase the number of new passwords generated.
4.2.3 Experimental results

To validate our approach, we undertook experimental testing of the second-round using cracked passwords from Rockyou28, Phpbb, and Myspace in the Testing set based on the 450 million passwords that were generated in the first-round. First, the cracked passwords in Table 8 were fed back to the Training set. Updated numbers of the Training set and the Testing set are shown in Table 9. Second, we applied the method introduced in Section 3.2.2 to regenerate UDP rules, and then in conjunction with the Dictionary set to generate the second-round passwords. Finally, we took these new generated passwords to crack the remaining passwords belonging to Rockyou28, Phpbb, and Myspace in the Testing set.
Table 8: Number of Cracked Passwords for Generated 450 million Passwords.
Name Rockyou1 Dic-0294 Rockyou28 Myspace Phpbb
Number 500,000 869,228 500,000 36,764 184,389
Number of Cracked Passwords N/A N/A 137,800 12,132 57,160
Table 9: The Updated Training Set, Dictionary Set and Updated Testing Sets.
Name Rockyou1 Dic-0294
Number 500,000+137,800+12,132+57,160=707,092 869,228
Rockyou28_1 500,000-137,800=362,200 Myspace_1 36,764-12,132=24,632 Phpbb_1 184,389-57,160=127,229
24
Figure 5 (a)
Figure 5 (b)
Figure 5 (c)
Figure 5: Number of Cracked Passwords of Rockyou28, Phpbb and Myspace.
Figure 5 shows the cracked passwords of the first-round and the second-round for Rockyou28, Phpbb, and Myspace in the Testing set. The light blue shade represents the number of cracked passwords in the first-round and the purple shade represents those in the second-round. For example, Figure 5 (a) depicts the cracking results for Rockyou28 for 450 million generated passwords. The number of cracked passwords was 137,800 (28%) in the first-round and was 54,801 in the second-round, adding up to a total of 192,601. The performance of the cracking operation increased from 28% in the first-round to 38.52% in the second. Similarly, Figure 5 (b) depicts the cracking results for Phpbb. Here, the performance of the cracking operation improved from 31% to 40.83%. Figure 5 (c) displays the results for Myspace where the performance improved from 33% to 43.8%. Overall, the performance of the cracking operation of the second-round enhanced the effect by about 10%, raising the total cracking performance from 28%~33% to 38%~44%. This paper presented a design for a TDT-model with a feedback mechanism to enhance password cracking performance. The first-round utilized users disclosed passwords to produce UDP rules and then generates passwords. The second-round referred to UDP rules of the first-round in conjunction with a brute-force method to generate new passwords that differ from passwords in the first-round. We also attempted a third-round: the effect was extremely limited, mainly due to the poor efficiency in the generation of new passwords. One possible improvement is to change or increase the elements of the Training set or the Dictionary set. We will discuss this issue in the future.
5 Application
This section describes a hybrid password cracking system to crack access passwords collected from UNIX. Within this system, we introduce a new stage called TDT-model attack, which uses the passwords generated by TDT model. The design goal of the hybrid password cracking system is to crack encrypted target passwords by incrementally increasing the size of the password cracking space. Therefore, the system consists of three attack stages: the Dictionary attack followed by the TDT-model attack, and then the Brute-force attack; this system is called the DTB password cracking system, as shown in Figure 7. The cracking strategy is to crack encrypted passwords using the Dictionary attack and the TDT-model attack first in finite time, and to then run the Bruteforce attack when some encrypted passwords have not been cracked. The system can be used in DTB mode 25
International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823 where all stages are enabled or in DB mode where only the TDT-model attack stage is disabled.
DTB Password Cracking System
Encrypted passwords
Dictionary attack
TDT-model attack
Brute-force attack
Decrypted passwords
Figure 7: The System Architecture of DTB Password Cracking System.
UNIX Access Password Attack We collected 382 encrypted access passwords for the UNIX system using the John-the-Ripper cracker tool [16]. These passwords are of 13-byte values generated by the DES cryptosystem. Then, we proceeded to crack the passwords in DTB and DB modes. The results showed that, in the same 5-hour experiments, 236 passwords and 155 passwords are cracked in DTB mode and DB mode, respectively. After the Dictionary attack, in the same period of time, the number of passwords cracked with the TDT-model increases by 81 and by up to 297% [(85+37)/41]. The details are shown in Table 10, where the time taken to crack the passwords is given in brackets.
Table 10: Number of UNIX Passwords Cracked.
DTB Password Cracking System Number of Passwords Cracked
DTB mode DB mode Dictionary TDT-model Brute-force Dictionary Brute-force attack attack attack attack attack 114 85 37 114 41 (12min) (15min) (4hr 33min) (12min) (4hr 48min) 236 155
6 Conclusions and Future Work

We developed a password analysis platform to formally analyze disclosed passwords, and establish the TDT model for generating passwords. Based on the probabilistic patterns of the TDT model, the passwords generated are sorted in decreasing order of probability; this is known as the TDT-model attack. We also design a hybrid password cracking system consisting of the Dictionary attack, TDT-model attack and Brute-force attack to verify the effectiveness of the TDT model. Experimental results show that this hybrid model is effective for cracking UNIX access passwords and there is a performance improvement of up to 297% from using the TDT-model attack. In future work, we plan to collect information about the subject, such as birthdays, home addresses, names of family members, etc, and then create specialized input dictionaries with associated high probabilities by providing more flexibility and generality. The effectiveness of this strategy with other languages may also be studied.
References
[1] [2] [3] [4] [5] H. Gao, X. Liu, S. Wang, H.g Liu, R. Dai, Design and Analysis of a Graphical Password Scheme, International Conference of Innovative Computing, Information and Control, pp.675-678, 2009. S. Delaune, F. Jacquemard, A Theory of Dictionary Attacks and its Complexity, 17th IEEE Computer Security Foundations Workshop, 2004. J. Yan, A. Blackwell, R. Anderson, and A. Grant. Password Memorability and Security: Empirical Results. IEEE Security and Privacy Magazine, Volume 2, Number 5, pages 25-31, 2004. R. V. Yampolskiy. Analyzing User Password Selection Behavior for Reduction of Password Space. Proceedings of the IEEE International Carnahan Conferences on Security Technology, pp.109-115, 2006. Flex, The Fast Lexical Analyzer, http://flex.sourceforge.net/, 2007. 26
International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No. 7 ISSN: 1837-7823 [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison Wesley, 1979. G. J. Noer, "Cygwin32: A Free Win32 Porting Layer for UNIX Applications", 1998. P. Oechslin, Making a Faster Cryptanalytic Time-Memory Trade-Off, in Advances in Cryptology CRYPTO 03, pp.617630, 2003. V. L. L. Thing, H. M. Ying, A Novel Time-Memory Tradeoff Method for Password Recovery, June, 2009. O. Billet, H. Gilbert, Cryptanalysis of Rainbow, Security and Cryptography for Networks, Volume 4116, pp.336-347, 2006. Security issues with MD5 - http://en.wikipedia.org/wiki/MD5#Security. C. Castelluccia, M. Durmuth, and D. Perito, Adaptive Password-Strength Meters from Markov Models, NDSS 12, in Proceedings of the Network and Distributed System Security Symposium, 2012. A. Narayanan, V. Shmatikov, Fast Dictionary Attacks on Passwords Using Time-Space Tradeoff, Proceedings of the 12th ACM conference on Computer and communications security, 2005. M. Weir, S. Aggarwal, B. de Medeiros, B. Glodek, Password Cracking Using Probabilistic Context-Free Grammars, 30th IEEE Symposium on Security and Privacy, 2009, pp.391405. M. Weir. Using Probabilistic Techniques to Aid in Password Cracking Attacks. PhD thesis, 2010. Openwall Project. John the Ripper Password Cracker. Retrieved on 13/3/2010. Y. Zhang, F. Monrose, and M. K. Reiter. The security of modern password expiration: an algorithmic framework and empirical analysis. In CCS 10, Proceedings of the 17th ACM conference on Computer and communications security, ACM, pp.176186, 2010. P. G. Kelley, S. Komanduri, M. L. Mazurek, R. Shay, T. Vidas, L. Bauer, N. Christin, L. F. Cranor, and J. Lopez, Guess again (and again and again): Measuring password strength by simulating password-cracking algorithms, Carnegie Mellon University, Tech. Rep. CMU-CyLab-11-008, 2011. Rockyou users, http://www.skullsecurity.org/wiki/index.php/Passwords. http://www.linux-pour-lesnuls.com/traduc/Dictionnaires/dic-0294/dic-0294.txt. H. C. Chou, H. C. Lee, C. W. Hsueh, F.P. Lai, Password Cracking Based on Special Keyboard Patterns, International Conference of Innovative Computing, Information and Control, pp.387-402, Volume 8, Number 1(A), January 2012. S. Chen, J. Goodman, An empirical study of smoothing techniques for language modeling, Proceedings of the 34th annual meeting on Association for Computational Linguistics, 1996.
[18]
[19] [20] [21]
[22]
27

Paper-2 Smart Dictionary Attack Based On Patterns Trained From Disclosed Passwords

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Paper-2 Smart Dictionary Attack Based On Patterns Trained From Disclosed Passwords

Caricato da

Copyright:

Formati disponibili

International Journal of Computational Intelligence and Information Security, September 2012 Vol. 3, No.

Smart Dictionary Attack Based on Patterns Trained from Disclosed Passwords

Department of Information Management, Tamkang University, Taiwan hclee@mail.im.tku.edu.tw

Department of Computer Science and Engineering, Tatung University, Taiwan khhuang@ttu.edu.tw

3. Principles and Methods

3.1.1 Basic Elements and Patterns of Disclosed Passwords

3.1.2 Statistics of Password Patterns

Pattern Class L+N+ L+ N+ N+L+ L+N+L+ U+N+ U+L+N+ U+ + + + L O L L+O+N+

Table 4: Patterns of Length between One and Ten in Number Strings.

Patterns of Number Strings N2 N4 N1 N3 N6 N7 N8 N10 N9 N5

Example asdfvcxz 1qa2ws 0okm9ijn 12345qwert

Numbers of Occurrences 20,739 1898 780 231

Percentage of Total 0.15% 0.013% 0.0054% 0.0016%

3.2 Strategy for Generating Passwords

3.2.1 TDT Model

Training Set (real world passwords)

Testing Set (cracking passwords)

Feedback cracked passwords

Figure 1: Framework of TDT Model.

3.2.2 Establishing UDP Rules

Figure 3: The Dictionary Set.

3.2.3 Passwords Generation

P ( x ) =x ) v i ( ti ), ti {inner nodes, leaf nodes} P(

4 Experiments and Results 4.1 Password Generation and Cracking

Set Training set Dictionary set Testing set

Name Rockyou1 Dic-0294 Rockyou28 Myspace Phpbb

Number 500,000 869,228 500,000 36,764 184,389

Myspace 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 -

Number of Cracked Passwords

62,750 23,122 12,300 34,314 5,592

Number of Generated Passwords (Millions)

Figure 4: Number of Passwords Cracked.

4.2 Feedback Mechanism of Cracked Passwords

4.2.2 Improved Methods

Replacement Character o, ) e, !, l r, @ e, # d, $ f, % l, ^, 9 c, & g, * 6, (

4.2.3 Experimental results

Set Training set Dictionary set Testing set

Name Rockyou1 Dic-0294 Rockyou28 Myspace Phpbb

Number 500,000 869,228 500,000 36,764 184,389

Number of Cracked Passwords N/A N/A 137,800 12,132 57,160

Set Training set Dictionary set Testing set

Name Rockyou1 Dic-0294

Number 500,000+137,800+12,132+57,160=707,092 869,228

Rockyou28_1 500,000-137,800=362,200 Myspace_1 36,764-12,132=24,632 Phpbb_1 184,389-57,160=127,229

DTB Password Cracking System

Figure 7: The System Architecture of DTB Password Cracking System.

DTB Password Cracking System Number of Passwords Cracked

6 Conclusions and Future Work

[19] [20] [21]

Potrebbero piacerti anche