Sei sulla pagina 1di 6

Evaluation of Google Hacking

Justin Billig
Department of Computer Science Northern Kentucky University Highland Heights, KY 41099 (859)572-5320

Yuri Danilchenko
Department of Computer Science Northern Kentucky University Highland Heights, KY 41099 (859)572-5320

Charles E. Frank
Department of Computer Science Northern Kentucky University Highland Heights, KY 41099 (859)572-5320

justin.billig@gmail.com

yuri@danilchenko.com

frank@nku.edu

ABSTRACT
Google Hacking uses the Google search engine to locate sensitive information or to find vulnerabilities that may be exploited. This paper evaluates how much effort it takes to get Google Hacking to work and how serious the threat of Google Hacking is. The paper discusses the countermeasures that can be used against Google Hacking.

techniques that worked in the past no longer work, as vulnerabilities are patched. We tried to determine how much effort it took to perform various Google hacks. This was done purely for research purposes. We never had the intent of maliciously using any sensitive information or potential security vulnerabilities. We have disclosed potential issues to the security staff at our university. In this paper, we assess the seriousness of information disclosure using Google Hacking and make recommendations of what can be done to defend against Google hackers.

Categories and Subject Descriptors


K.6.5 [Management of Computing and Information Systems]: Security and Protection authentication, unauthorized access.

2. BACKGROUND
The definitive source for information about Google Hacking is Long [5]. This book provides background in Google queries and advanced operators. It has chapters on locating information on the Web in various types of documents, locating exploit code and finding vulnerable targets, and on how to search for usernames, passwords, and social security numbers. This book is a must read for security professionals wishing to protect their websites from disclosing information to Google hackers. A second important source is Johnny Longs website [3]. Its Google Hacking Database [2] contains a large number of Google searches by category. The categories include Files containing passwords, Pages containing login portals, and Sensitive directories. A user can try a Google search in the database by simply clicking on a link. We were only able to find one paper on Google Hacking in the academic literature. Lancor and Workman [4] describe incorporating Google Hacking into a graduate course on web security. This paper serves as a good introduction to Google Hacking. It describes a series of exercises used to teach students how to use Google Hacking to test their own sites and how to defend against it.

General Terms
Security.

Keywords
Information security, web security, hacking, Google Hacking, information assurance.

1. INTRODUCTION
Wikipedia [7] defines Google Hacking as the art of creating complex search engine queries in order to filter through large amounts of search results for information related to computer security. In its malicious format it can be used to detect websites that are vulnerable to numerous exploits and vulnerabilities as well as locate private, sensitive information about others, such as credit card numbers, social security numbers, and passwords. This filtering is performed by using advanced Google operators. Attackers can use Google Hacking to uncover sensitive information about a company or to uncover potential security vulnerabilities. A security professional can use Google Hacking to determine if their websites are disclosing sensitive information. Northern Kentucky University is a 15,000 student regional state university. We performed a Google Hacking security assessment of our university. In a few cases, we tried some of the Google Hacking techniques more widely on the Internet. This allowed us to determine if various Google hacks actually work. Often,
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. InfoSecCD Conference08, September 26-27, 2008, Kennesaw, GA, USA. Copyright 2008 ACM 978-1-60558-333-4/00/0006$5.00.

3. TECHNIQUES
We mostly limited our Google Hacking activities to Northern Kentucky University. We sometimes tried other educational sites in the US, except for a few network device searches, which required a bit of a broader domain. Our main goal, while performing the searches, was to check which Google hacks actually work. These hacks were found by us on the Internet, in Johnny Long's book [5] and in his Google Hacking Database [2]. This information is critical to understand how vulnerable we really are to Google Hacking. Are websites protecting information against Google Hacking? Sadly, most of the examples of

27

unprotected sensitive information were found within our own university and did not require a substantial amount of time to find. Google Hacking turned out to be a very powerful and flexible hacking approach. Many of the most powerful hacks we found did not quite work. But, in most cases, if we spent enough time analyzing the target and understanding how the queries found information, we were able to tweak the original query by changing the parameters or the advanced operators to find similar information requested in the original query. We found it very helpful to use Google cached pages while performing Google Hacks. Google crawls web pages and stores a copy of them on its local servers. We used Google cached pages to anonymously browse a target's site without sending a single packet to its server. Google grabs most of the pages it crawls, but omits images and some other space consuming media. When we viewed Google cached pages by simply clicking on the cached link on the results page, we ended up connecting to the target's server to get the rest of the page content. This might identify our Google Hacking to the target website. We added &strip=1 parameter to the URL to tell Google to return only crawled content and not connect to the target's server to get any information. A system administrator might decide to prevent access to a certain part of the site by moving it, protecting it with a password or simply shutting down the server. What administrators often do not realize is that the information that they are trying to protect may still exist on Google's servers and can be accessed through cached pages. This allowed us to view data on websites that had been removed. [2, p. 88]. Here is another example of some error messages that provide SQL query information. "You have an error in your SQL syntax near" + inurl:.edu

4. GOOGLE HACKING
According to the Johnny Longs Google Hacking Database [2], there are roughly fourteen categories of Google hacks. This paper looks at five of them: Error Messages, Open Directories, Documents & Files, Network Devices, and Personal Information Gathering.

4.1 Error Messages


Error messages provide a wealth of information. Developers use these error messages to pinpoint where their code has gone wrong. Unfortunately for web administrators, error messages that are open to the world provide that information to those who know how to look for them. Database error messages can provide information like usernames, passwords, and server names. Here is an example of a MySQL error messages that tell the Googler the username for a MySQL database. "Warning: mysql_connect(): Access denied for user: '*@*" "on line" -help forum

28

4.2 Open Directories


Googles web-bots crawl pages in a site that a web administrator may not want to be catalogued. Most sites stop users from browsing their directory structure, but not all websites are setup correctly. A simple Google search can provide a wealth of information. Directory browsing allows someone to see all the files you have on your web server. Much of the important company information is stored on its server directories. Leaving those directories accessible for outsiders can compromise the entire company's line of defense and make hackers lives way too easy. A search of intitle:index of returns a list of sites that allow directory browsing. Often this search reveals all kinds of information. Not only does it give a potential hacker access to all of your files, many times index pages reveal information like the operating system and web server software. This information gives a hacker a roadmap to which vulnerabilities you may have. A simple Google search like intitle:index of + solutions potentially give students access to solutions. Adding a site search parameter (site:some_university.edu), we were able to obtain a solution manuals for a science department potentially allowing students to cheat on class assignments. In one of the results brought back by an intitle:index of query, we found a directory listing that contains a screen shot of a universitys financial management system. One of the most popular hacking techniques used within directory listings is the directory traversal technique. This technique refers to modifying parts of the originally found URL in order to access other directories on the server. These may not be accessible to direct Google searches. For example, if you found a relative URL /cs/accounting/admin/jerryb, you can start getting rid of parts of the original URL in order to access parent directories such as admin or accounting, or you could replace some parts of the URL with potential directories names, such as hr [5, p. 109]. Using our financial management system documentation, we used the directory traversal technique to get to parent directories of the original search result. As we browsed through these directories, we found the complete documentation on managing and using that university's financial system. Screen shots contained some user IDs and, potentially valid, names of university's funds. Such information might be used by hackers to attack the university. This technique should be used by penetration testers to determine whether sensitive company information is being exposed on the web.

spreadsheets. By searching Google using this simple query site:some_university.edu intitle:index.of .xls, we found several Microsoft Excel files stored within directory listings. We found the equipment spending master list of a university department. This file contained equipment purchases with vendor and price information. Another Excel file from the same department contained faculty salaries. This information should not be publicly obtainable through a simple Google search.

4.3.2 WS_FTP Logs


Another source of information is log files [6]. By default, WS_FTP creates a WS_FTP.log on the web server. This file contains a wealth of sensitive information such as: usernames, file directories, file names, times of file uploads/downloads, web server usage information. This information can save hackers a lot of time in their attempt to attack a company's website. The query site:some_university.edu index.of ".log" brought us back many results. Among these was a link to a WS_FTP.log file in a universitys physics and geology department file directory, that listed dates and times of file uploads done by using WS-FTP client. This file disclosed usernames and names of file directories. WS_FTP.log files contain information about file transfers to and from FTP servers.

4.3.3 Source Code


A source code of a computer program can contain large amounts of sensitive information. Source code can show how the system was implemented and how the database is accessed. Code can contain passwords, server names, database tables and field names, and directories. Many companies are still not using any version control or professional backup solutions for their source code. As a result, programmers backup their code by making copies of their files with extensions such as .bak, .bak2, or .bak3. Web servers may contain pages like MyCode.asp.bak. What programmers do not realize is that these code files may be retrieved from the web server. Web servers display a page based on the file extension. The web server has no idea how to display these backup files, and will display them as a plain text. That means that all of the code is now exposed to the user, perhaps revealing sensitive information. [5, p. 112]. By using the following simple query site: .edu index.of asp.bak, we found many such pages on university websites. This included backed up ASP pages from careers site of one university. We can search other domains by simply replacing .edu with another domain such as .com.

4.4 Network Devices


You can find much more that just documents on the Internet. There are also many types of devices, interactive environments, collaboration tools, and social networks. Devices accessible through the Internet are a very popular target for hackers. Being able to control printers, web cameras, and network routers can be useful to plan an attack on a company. It is important that penetration testers understand those threats and protect companies against them.

4.3 Documents & Files


4.3.1 Office Documents
Website administrators do not always think of how a search engine will crawl their site when they build it. People will put sensitive files on their website without thinking. Word documents, Excel spreadsheets, and Access databases have a wealth of information in them. Companies may store sensitive information, such as financial reporting or human resources documentation, on their websites in

29

To provide convenience to its employees, companies may put hardware devices online. With the increase in telecommuting, this is happening more and more. There are countless devices online, and the Google Hacking Database [2] provides users with queries to find them.

4.4.1 WebCams
The first type of device that rookie Google hackers will attempt to find is webcams. Simple searches like camera linksys inurl:main.cgi reveal web pages that have Linksys web cameras. Other queries like inurl:"ViewerFrame?Mode=" + inurl:.edu allintitle: Axis 2.10 OR 2.12 OR 2.30 OR 2.31 OR 2.32 OR 2.33 OR 2.34 OR 2.40 OR 2.42 OR 2.43 "Network Camera " also provide users with information about cameras. Webcam information may not seem very interesting, considering that webcams themselves are designed to be shown on the web. Some webcam owners put their devices online but do not share the URL for the device, except with a certain set of people. This security through obfuscation does not hold up very well with Google. The Google bots crawl all accessible pages indiscriminately. One specific webcam we found allowed the user to control the cameras direction, tilt, zoom, and display size. Another example that we found was a webcam at a construction website, which showed so much detail we could read the license plate numbers.

4.4.2 Routers and Firewalls


Routers and hardware firewalls are connected to the Internet are to allow remote administration. These devices are almost always password protected by system administrators. Unfortunately, some companies keep the default login and password. This information is easily found by using these Google queries. intitle:"Main page - SmoothWall Express" intitle:"Smoothwall Express" inurl:cgi-bin "up * days". Google uses the information in the title of the SmoothWall Express firewall client to find the administrative login pages for the device. In the Johnny Longs Google Hacking Database [2], the bottom query was listed as a query to use to find the administrative login page for the device. We found that the bottom query doesnt return results.

4.4.3 Network Printers


Finally, network printers are also available online. Many of these are password protected, but often they are available to anyone. intext:"MaiLinX Alert (Notify)" -site:networkprinters.com

30

We tried searching for UPS tracking information using the following Google query site:ups.com intitle:"Ups Package tracking" intext:"1Z ### ### ## #### ### #" posted on the Johnny Long's Google Hacking Database [2]. The original query no longer worked, but that doesn't mean that the information is not there. By simply going to the UPS website and opening the shipment tracking page, we found out that the URL of the shipment tracking site had changed since the original query had been posted; so did the format of the tracking number. By updating the URL and removing tracking number format from the query, we get cleaner and simpler query that works "In Transit" site:wwwapps.ups.com. This query can be adjusted to filter down to the information you need. New query brings back a substantial amount of pages with tracking information for UPS packages that are currently in transit. This information can be used to track all incoming UPS packages for a selected address, perhaps to steal a package. Surely, most people would not be happy with the fact that this kind of information is available though a simple Google query. The references are also in 9 pt., but that section (see Section 7) is ragged right. References should be published materials accessible to the public. Internal technical reports may be cited only if they are easily accessible (i.e. you can give the address to obtain the report within your citation) and may be obtained by any reader. Proprietary information may not be cited. Private communications should be acknowledged, not referenced (e.g., [Robertson, personal communication]).

5. PROTECTING AGAINST GOOGLE HACKING


Google Hacking is well documented and easy to learn. It is very important for security professionals to protect their companies against Google Hacking. To protect your site against Google Hacking, you need to establish a solid security policy of what information can be put on the web. Security professionals should perform Google Hacking against their website to check for sensitive information disclosure. There is no 100% protection against Google Hacking, but strong policies and testing can improve the security of your site. Security professionals need to learn Google Hacking to provide a good level of protection for their sites. As you become more familiar with manual hacks, you can start using some of the automated Google Hacking tools. This will automate your hacks, ensuring that every single page within your site is protected. Automated tools allow for periodic security checks with frequency that is simply impossible to achieve with manual hacks. There are different routes you can go with using automated Google Hacking tools. You can use some of the pre-built automated tools, or take advantage of Google API and build your own Google Hacking tool. Pre-built automated Google Hacking tools, such as Johnny Longs Gooscan [5, p489-499] are very good for many common hacks and will save you time. If you need something more customized, you may need to implement your own tool using Google API.

4.5 Personal Information Gathering


4.5.1 Email Address Harvesting
A simple search like, site:nku.edu + @, will return all web pages that have the @ sign on the page. This query gives a spammer a legal means to gather countless email addresses. While the Google Terms of Service prohibit users from using tools that will automatically query websites, you can create a simple program that will use a simple Google query to return a list of pages that have email addresses. Using screen scrapes and regular expression, this kind of program can be written in no time. An example program that we wrote can be found at [1]. Once you have harvested your emails you can run a simple telnet program and use the GMAIL servers to validate our email addresses. telnet open gmail-smtp-in.l.google.com 25 HELO test MAIL FROM: <email address>

4.5.2 Shipment Tracking Information


In the past few years, online shipment tracking systems have become very popular. People enjoy checking the status of their shipments online in real time. But how secure is that information?

31

6. CONCLUSION
While Google Hacking does not necessarily follow the standard definition of hacking, it can prove just as fruitful. By using Google, you can gain access to information that may otherwise be hidden. The information that you gather using these hacks will allow you to gain access to systems or devices. The hacks work because Google indiscriminately stores information when its web spiders crawl the Internet. By using the advanced operators, you can view this information. Google makes it extremely easy to find this information. Those with more computer knowledge will have a smaller learning curve, but it will not take that long for even a novice Internet user to master these techniques. Security professionals can address the problem of Google Hacking in a manner similar to addressing other security issues. 1) They can use Google Hacking to test their Web sites for sensitive information disclosure. 2) They can educate employees concerning what information should not be put on the Internet. 3) They can also implement enforceable policies to ensure employee compliance.

7. REFERENCES
[1] Email Address Harvesting, http://www.nku.edu/~frank/FindEmailAddresses.htm. [2] Google Hacking Database Web Site, http://johnny.ihackstuff.com/ghdb.php. [3] Johnny Longs Web Site, http://johnny.ihackstuff.com/. [4] Lancor, L. and Workman, R., Using Google Hacking to Enhance Defense Strategies. SIGCSE Bull. 39, 1 (Mar. 2007), 491-495. DOI= http://doi.acm.org/10.1145/1227504.1227475. [5] Long, J., Google Hacking for Penetration Testers, Vol. 2, Syngress Press, 2008. [6] Neohapsis Archive ws_ftp.log, http://archives.neohapsis.com/archives/fulldisclosure/200408/0663.html. [7] Wikipedia Google Hacking Web Site, http://en.wikipedia.org/wiki/Google_Hacking.

32

Potrebbero piacerti anche