As we know, there are plenty of search engines. We have Bing, Yahoo, DuckDuckGo, Dogpile, lxquick, etc but the most popular is Google. We also should know that we can only find information about websites which are indexed by these search engines. This means that websites which are not indexed will not be shown as "results" for these search engines. In this topic, we will only focus on the most popular search engine: Google.
For example, in Google whenever you do a search, you are not actually searching the web but the Google's index of the web. Google fetches the information with the help of small pieces of software called Spiders. They do not only fetch the websites but also the links of those websites making the database bigger and bigger. When you type a keyword or keywords, Google searches its index and by asking 200+ questions in a matter of seconds. These questions could be something like: "how many times does this page contains your keywords?" "where do these keywords appear: title, url, header?", determines its rank (how many links and how many times were those links were clicked), does it come from a high or low quality website. Google works all these factors (and many more) combining them into a formula in which Google determines where in the result page that website will be in. This helps the user find exactly what she wants with little or no hassle. Google crawlers (Googlebot is the most popular),then starts to fetch more pages by analyzing every single website's outside link and does all the process again. That is how Google works in terms of crawling, and spidering data. Now, the funny thing here is that Google's robots are simply robots. They cannot discern if they are crawling websites which are helpful, or harmful ones. They also are not aware if you put your backup system online and now they are crawling it for the world to access and see it. Not to mention if you have your system's backup files, your baby's monitor, personal or home cameras, digital toaster, refrigerator, home heater, or whole NAS storage exposed to the Internet and Google bots already crawled it.
|Taken from: http://smartdatacollective.com|
This is where the fun begins. It takes research, countless hours of experimenting and several sodas to be awake the whole night! But this is nothing new or illegal. It is called "The IoT (Internet of Things)." According with TechTarget (whatis.techtarget.com/definition/Internet-of-Things
) The Internet of Things is a scenario in which objects, animals or people are provided with unique identifiers and the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction. Simply put it's the encouraging act of placing every device which makes your life to the Internet (attaching an IP address) and be accessed with the commodity and comfort of your couch anywhere you might be. This, of course posses a huge risk because there are things you wouldn't want to be accessible by the whole world, for example your thermostat or other critical form which composes part of your "private" life.
By the time we are willingly accepting the fact of exposing our lives to the world, the IoT idea is to have a "controlled" centralized way of communicating with no privacy at all. We all know that there are not enough secure controls which will prevent someone from changing our cooling system from our refrigerator or simply intercepting and talk over our baby's monitor. If you think I am being paranoid, it actually happened -- several times.
But freaking out families and checking someone's thermostat is not the only use (or misuse) that can be done through Google or other search engines. You can actually have some fun and do some passive (and not so passive) reconnaissance penetration testing :)
Introduce by 2010 by Johnny Long, Google Dork is the term of using Google search keywords (queries) in a smart way to get exactly what you want. Needless to say that whatever you want to look for 1) Needs to be indexed by Google, 2) It is legal, since it is available in the Internet, therefore is public and 3) You should note that breaking into anything without the user's or owner's consent and approval is illegal and punishable by the law.
DISCLAIMER: THIS TUTORIAL IS MEANT FOR ETHICAL PURPOSES ONLY. PLEASE REFRAIN FROM USING THESE QUERIES FOR UNLAWFUL MEANS
After this small disclaimer, let's dive to some interesting stuff. I will use microsoft.com as an example and use your logic to replace the examples with the like of your choice. If, for example you need to know the indexed subdomains of microsoft.com you can simply search for:
To only find all indexed links of PDF documents within the site of microsoft.com, you search for:
You can go ahead and search for more interesting stuff. For example, if you know the host is using
Wordpress , you can find out the service version:
"Powered by WordPress" -html filetype:php -demo -wordpress.org -bugtraq
You can also find out and tweak this a little to find out services such as IIS, Telnet and more (equivalent of nmap -sV) but done passively.
We can also find out information about users by performing the following query:
Also, you can find out what files the host has:
Additionally, we can even find vulnerabilities in a system. How so? Well, by reading the up to date CVEs, we can perform a fast audit to our client's host. For example, if our client's host is:
and we know (via white box) that this host contains an out-of-date version of PHP-Fusion 6.x.x, we can try to do this:
site: totallyowned.com "Powered by PHP-Fusion v6.00.110" | "Powered by PHP-Fusion v6.00.2.." | "Powered by PHP-Fusion v6.00.3.." -v6.00.400
So, now it is a matter of time of understanding the PoC from the CVE and we successfully took advantage of the vulnerability.
This is one of the thousands of examples out there and this topic is so vast it would take me hundreds of blog posts to cover them all (and yet they will not suffice). The point here is to make you aware of the risks associated with the Internet of Things. Be careful what you put up on the Internet. It will certainly come back and bite you. To conclude this section, I would like to share some interesting "dorks" for you:
inurl:"ViewerFrame?Mode=" Panasonic Network Camera webcams
inurl:indexFrame.shtml Axis Axis webcams
SNC-RZ30 HOME Sony SNC-RZ30 webcams
intitle:"my webcamXP server!" inurl:":8080" Webcams accessible via WebcamXP Server
intitle:liveapplet inurl:LvAppl Canon Webview webcams
"Copyright (c) Tektronix, Inc." "printer status" PhaserLink printers
inurl:"printer/main.html" intext:"settings" Brother HL printers
intitle:"Dell Laser Printer" ews Dell printers with EWS technology
intext:centreware inurl:status Xerox Phaser 4500/6250/8200/8400 printers
inurl:hp/device/this.LCDispatcher HP printers
"Apache/1.3.28 Server at" intitle:index.of Apache 1.3.28
"Microsoft-IIS/4.0 Server at" intitle:index.of Microsoft Internet Information Services 4.0
"Oracle HTTP Server/* Server at" intitle:index.of Any version of Oracle HTTP Server
"IBM _ HTTP _ Server/* * Server at" intitle:index.of Any version of IBM HTTP Server
"Red Hat Secure/*" intitle:index.of Any version of the Red Hat Secure server
Advisories and Vulnerabilities:
"powered by tikiwiki"
SQL Database :
filetype:cfg mrtg "target
"Index of" / "chat/logs"
Interesting confidential (but public) documents:
"not for distribution" Confidential documents
intitle:”curriculum vitae” “phone * * *” “address *” “e-mail”
What about non-index content?
It does not work with every site, but with most of them. To tell Google (and other search engines) not to index certain paths of the website, a robots.txt file is used to state what NOT to list. Unfortunately, this file needs to be also public in order to let the search engines know what NOT to index, so you can have access to their "non-indexed" path really easy:
For example, we try this with the city of Oak Brook, IL:
user-agent: Baiduspider Disallow: / User-agent: Yandex Disallow: / User-agent: * Disallow: /activedit Disallow: /admin Disallow: /common/admin/ Disallow: /OJA Disallow: /support Disallow: /currenteventsview.asp Disallow: /search.asp Disallow: /currenteventsview.aspx Disallow: /search.aspx Disallow: /currentevents.aspx Disallow: /Support Disallow: /CurrentEventsView.asp Disallow: /Search.asp Disallow: /CurrentEventsView.aspx Disallow: /Search.aspx Disallow: /Search Disallow: /CurrentEvents.aspx Disallow: /Currentevents.aspx Disallow: /map.aspx Disallow: /map.asp Disallow: /Map.aspx Disallow: /Map.asp
This tells us to disallow * (every relative path) from the paths above from the Baidu search engine's spider (Baidu Spider) as well as Yandex and all other user agents (user-agent: *) to apply to the whole list. Simply, you can try one by one of them after the .org/ and it simply goes and tries. This is useful for finding login portal entrances during your pentesting (note the /admin) which can be vulnerable to some flaws.
Please, feel free to drop your comments and don't forget to keep on investigating. Happy lurking!!
How Google Works: www.google.com/howgoogleworks
Baby Monitor Hack 1:
Baby Monitor Hack 2:
Baby Monitor Hack 3:
Johnny Long PDF: https://www.blackhat.com/presentations/bh-europe-05/BH_EU_05-Long.pdf
Johnny Long DefCon Presentation: https://www.youtube.com/watch?v=N3dzVl40lQA
Some Google Dorks (use it under your own discretion): https://blackmoreops.wordpress.com/2014/07/08/useful-google-hacks/