Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Justin Billig
Department of Computer Science Northern Kentucky University Highland Heights, KY 41099 (859)572-5320
Yuri Danilchenko
Department of Computer Science Northern Kentucky University Highland Heights, KY 41099 (859)572-5320
Charles E. Frank
Department of Computer Science Northern Kentucky University Highland Heights, KY 41099 (859)572-5320
justin.billig@gmail.com
yuri@danilchenko.com
frank@nku.edu
ABSTRACT
Google Hacking uses the Google search engine to locate sensitive information or to find vulnerabilities that may be exploited. This paper evaluates how much effort it takes to get Google Hacking to work and how serious the threat of Google Hacking is. The paper discusses the countermeasures that can be used against Google Hacking.
techniques that worked in the past no longer work, as vulnerabilities are patched. We tried to determine how much effort it took to perform various Google hacks. This was done purely for research purposes. We never had the intent of maliciously using any sensitive information or potential security vulnerabilities. We have disclosed potential issues to the security staff at our university. In this paper, we assess the seriousness of information disclosure using Google Hacking and make recommendations of what can be done to defend against Google hackers.
2. BACKGROUND
The definitive source for information about Google Hacking is Long [5]. This book provides background in Google queries and advanced operators. It has chapters on locating information on the Web in various types of documents, locating exploit code and finding vulnerable targets, and on how to search for usernames, passwords, and social security numbers. This book is a must read for security professionals wishing to protect their websites from disclosing information to Google hackers. A second important source is Johnny Longs website [3]. Its Google Hacking Database [2] contains a large number of Google searches by category. The categories include Files containing passwords, Pages containing login portals, and Sensitive directories. A user can try a Google search in the database by simply clicking on a link. We were only able to find one paper on Google Hacking in the academic literature. Lancor and Workman [4] describe incorporating Google Hacking into a graduate course on web security. This paper serves as a good introduction to Google Hacking. It describes a series of exercises used to teach students how to use Google Hacking to test their own sites and how to defend against it.
General Terms
Security.
Keywords
Information security, web security, hacking, Google Hacking, information assurance.
1. INTRODUCTION
Wikipedia [7] defines Google Hacking as the art of creating complex search engine queries in order to filter through large amounts of search results for information related to computer security. In its malicious format it can be used to detect websites that are vulnerable to numerous exploits and vulnerabilities as well as locate private, sensitive information about others, such as credit card numbers, social security numbers, and passwords. This filtering is performed by using advanced Google operators. Attackers can use Google Hacking to uncover sensitive information about a company or to uncover potential security vulnerabilities. A security professional can use Google Hacking to determine if their websites are disclosing sensitive information. Northern Kentucky University is a 15,000 student regional state university. We performed a Google Hacking security assessment of our university. In a few cases, we tried some of the Google Hacking techniques more widely on the Internet. This allowed us to determine if various Google hacks actually work. Often,
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. InfoSecCD Conference08, September 26-27, 2008, Kennesaw, GA, USA. Copyright 2008 ACM 978-1-60558-333-4/00/0006$5.00.
3. TECHNIQUES
We mostly limited our Google Hacking activities to Northern Kentucky University. We sometimes tried other educational sites in the US, except for a few network device searches, which required a bit of a broader domain. Our main goal, while performing the searches, was to check which Google hacks actually work. These hacks were found by us on the Internet, in Johnny Long's book [5] and in his Google Hacking Database [2]. This information is critical to understand how vulnerable we really are to Google Hacking. Are websites protecting information against Google Hacking? Sadly, most of the examples of
27
unprotected sensitive information were found within our own university and did not require a substantial amount of time to find. Google Hacking turned out to be a very powerful and flexible hacking approach. Many of the most powerful hacks we found did not quite work. But, in most cases, if we spent enough time analyzing the target and understanding how the queries found information, we were able to tweak the original query by changing the parameters or the advanced operators to find similar information requested in the original query. We found it very helpful to use Google cached pages while performing Google Hacks. Google crawls web pages and stores a copy of them on its local servers. We used Google cached pages to anonymously browse a target's site without sending a single packet to its server. Google grabs most of the pages it crawls, but omits images and some other space consuming media. When we viewed Google cached pages by simply clicking on the cached link on the results page, we ended up connecting to the target's server to get the rest of the page content. This might identify our Google Hacking to the target website. We added &strip=1 parameter to the URL to tell Google to return only crawled content and not connect to the target's server to get any information. A system administrator might decide to prevent access to a certain part of the site by moving it, protecting it with a password or simply shutting down the server. What administrators often do not realize is that the information that they are trying to protect may still exist on Google's servers and can be accessed through cached pages. This allowed us to view data on websites that had been removed. [2, p. 88]. Here is another example of some error messages that provide SQL query information. "You have an error in your SQL syntax near" + inurl:.edu
4. GOOGLE HACKING
According to the Johnny Longs Google Hacking Database [2], there are roughly fourteen categories of Google hacks. This paper looks at five of them: Error Messages, Open Directories, Documents & Files, Network Devices, and Personal Information Gathering.
28
spreadsheets. By searching Google using this simple query site:some_university.edu intitle:index.of .xls, we found several Microsoft Excel files stored within directory listings. We found the equipment spending master list of a university department. This file contained equipment purchases with vendor and price information. Another Excel file from the same department contained faculty salaries. This information should not be publicly obtainable through a simple Google search.
29
To provide convenience to its employees, companies may put hardware devices online. With the increase in telecommuting, this is happening more and more. There are countless devices online, and the Google Hacking Database [2] provides users with queries to find them.
4.4.1 WebCams
The first type of device that rookie Google hackers will attempt to find is webcams. Simple searches like camera linksys inurl:main.cgi reveal web pages that have Linksys web cameras. Other queries like inurl:"ViewerFrame?Mode=" + inurl:.edu allintitle: Axis 2.10 OR 2.12 OR 2.30 OR 2.31 OR 2.32 OR 2.33 OR 2.34 OR 2.40 OR 2.42 OR 2.43 "Network Camera " also provide users with information about cameras. Webcam information may not seem very interesting, considering that webcams themselves are designed to be shown on the web. Some webcam owners put their devices online but do not share the URL for the device, except with a certain set of people. This security through obfuscation does not hold up very well with Google. The Google bots crawl all accessible pages indiscriminately. One specific webcam we found allowed the user to control the cameras direction, tilt, zoom, and display size. Another example that we found was a webcam at a construction website, which showed so much detail we could read the license plate numbers.
30
We tried searching for UPS tracking information using the following Google query site:ups.com intitle:"Ups Package tracking" intext:"1Z ### ### ## #### ### #" posted on the Johnny Long's Google Hacking Database [2]. The original query no longer worked, but that doesn't mean that the information is not there. By simply going to the UPS website and opening the shipment tracking page, we found out that the URL of the shipment tracking site had changed since the original query had been posted; so did the format of the tracking number. By updating the URL and removing tracking number format from the query, we get cleaner and simpler query that works "In Transit" site:wwwapps.ups.com. This query can be adjusted to filter down to the information you need. New query brings back a substantial amount of pages with tracking information for UPS packages that are currently in transit. This information can be used to track all incoming UPS packages for a selected address, perhaps to steal a package. Surely, most people would not be happy with the fact that this kind of information is available though a simple Google query. The references are also in 9 pt., but that section (see Section 7) is ragged right. References should be published materials accessible to the public. Internal technical reports may be cited only if they are easily accessible (i.e. you can give the address to obtain the report within your citation) and may be obtained by any reader. Proprietary information may not be cited. Private communications should be acknowledged, not referenced (e.g., [Robertson, personal communication]).
31
6. CONCLUSION
While Google Hacking does not necessarily follow the standard definition of hacking, it can prove just as fruitful. By using Google, you can gain access to information that may otherwise be hidden. The information that you gather using these hacks will allow you to gain access to systems or devices. The hacks work because Google indiscriminately stores information when its web spiders crawl the Internet. By using the advanced operators, you can view this information. Google makes it extremely easy to find this information. Those with more computer knowledge will have a smaller learning curve, but it will not take that long for even a novice Internet user to master these techniques. Security professionals can address the problem of Google Hacking in a manner similar to addressing other security issues. 1) They can use Google Hacking to test their Web sites for sensitive information disclosure. 2) They can educate employees concerning what information should not be put on the Internet. 3) They can also implement enforceable policies to ensure employee compliance.
7. REFERENCES
[1] Email Address Harvesting, http://www.nku.edu/~frank/FindEmailAddresses.htm. [2] Google Hacking Database Web Site, http://johnny.ihackstuff.com/ghdb.php. [3] Johnny Longs Web Site, http://johnny.ihackstuff.com/. [4] Lancor, L. and Workman, R., Using Google Hacking to Enhance Defense Strategies. SIGCSE Bull. 39, 1 (Mar. 2007), 491-495. DOI= http://doi.acm.org/10.1145/1227504.1227475. [5] Long, J., Google Hacking for Penetration Testers, Vol. 2, Syngress Press, 2008. [6] Neohapsis Archive ws_ftp.log, http://archives.neohapsis.com/archives/fulldisclosure/200408/0663.html. [7] Wikipedia Google Hacking Web Site, http://en.wikipedia.org/wiki/Google_Hacking.
32