25th October 2021
theHarvester

Gathering Information with theHarvester

Overview

In this article, I want to talk about the passive information gathering tool ‘theHarvester’. theHarvester is a relatively easy to use program for gathering information about a target. This program is gathering information like email addresses, subdomains, hosts, employees of a company, open ports, banners from different sources, like Google (Dorks), Bing, Linkedin, Shodan, and so on. I’m using this program almost on a weekly basis, and it’s very useful when you are conducting a penetration test, cybersecurity assessment, bug bounty, and so on.

theHarvester is being used by ethical-hackers and non ethical-hackers

theHarvester comes pre-installed on the most penetration testing Linux distributions, like Linux Kali and ParrotOS. It’s also supporting other operating systems. Let’s get started!

theHarvester

Configuring the APIs

Some modules of theHarvester are requiring an API. I advise you to use these APIs to get the most out of this tool. This modules are requiring a API:

ModuleFree APIPurpose
BingYesSearch engine from Microsoft
GitHubYesTo search through Github repo’s for information
HunterYesConnecting email addresses to people
IntelxYesSearch engine
Pentest ToolsNoPentesting Platform
SecurityTrailsYesExplores historical data from any internet asset
ShodanYesSearch engine for the Internet of Things (IoT)
SpyseNoCybersecurity search engine

The APIs need to be configured in the api-keys.yaml file. When you’re working on an operating system with theHarvester pre-installed, you can find this file in the /etc/theHarvester location. When you’ve cloned the GitHub repository you can find the file in the directory where the file ‘theHarvester.py’ is located.

~$ cat api-keys.yaml 
apikeys:
  bing:
    key: 

  github:
    key: 

  hunter:
    key: 

  intelx:
    key: 

  pentestTools:
    key:

  projectDiscovery:
    key:

  securityTrails:
    key: 

  shodan:
    key:

  spyse:
    key: 

Parameters of theHarvester

In the overview section, I’ve done a little explanation about the capabilities of theHarvester. Actually, this program starts with harvesting email addresses, and now you can do so much more with this program. After the installation, the program can be started.

When theHarvester comes pre-installed, you can just start the program by typing in this command:

~$ theharvester --help

If you are using an Microsoft operating system or macOS, you can just launching the Python version of the program.

~$ python3 theHarvester.py --help

After starting theHarvester with the ‘–help’ flag, the following is shown.

*******************************************************************
*  _   _                                            _             *
* | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
* | __|  _ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
*  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
*                                                                 *
* theHarvester 3.2.0                                              *
* Coded by Christian Martorella                                   *
* Edge-Security Research                                          *
* [email protected]                                   *
*                                                                 *
******************************************************************* 


usage: theHarvester.py [-h] -d DOMAIN [-l LIMIT] [-S START] [-g] [-p] [-s]
                       [--screenshot SCREENSHOT] [-v] [-e DNS_SERVER]
                       [-t DNS_TLD] [-r] [-n] [-c] [-f FILENAME] [-b SOURCE]

theHarvester is used to gather open source intelligence (OSINT) on a company
or domain.

optional arguments:
  -h, --help            show this help message and exit
  -d DOMAIN, --domain DOMAIN
                        Company name or domain to search.
  -l LIMIT, --limit LIMIT
                        Limit the number of search results, default=500.
  -S START, --start START
                        Start with result number X, default=0.
  -g, --google-dork     Use Google Dorks for Google search.
  -p, --proxies         Use proxies for requests, enter proxies in
                        proxies.yaml.
  -s, --shodan          Use Shodan to query discovered hosts.
  --screenshot SCREENSHOT
                        Take screenshots of resolved domains specify output
                        directory: --screenshot output_directory
  -v, --virtual-host    Verify host name via DNS resolution and search for
                        virtual hosts.
  -e DNS_SERVER, --dns-server DNS_SERVER
                        DNS server to use for lookup.
  -t DNS_TLD, --dns-tld DNS_TLD
                        Perform a DNS TLD expansion discovery, default False.
  -r, --take-over       Check for takeovers.
  -n, --dns-lookup      Enable DNS server lookup, default False.
  -c, --dns-brute       Perform a DNS brute force on the domain.
  -f FILENAME, --filename FILENAME
                        Save the results to an HTML and/or XML file.
  -b SOURCE, --source SOURCE
                        baidu, bing, bingapi, bufferoverun, certspotter,
                        crtsh, dnsdumpster, duckduckgo, exalead, github-code,
                        google, hackertarget, hunter, intelx, linkedin,
                        linkedin_links, netcraft, otx, pentesttools,
                        projectdiscovery, qwant, rapiddns, securityTrails,
                        spyse, sublist3r, threatcrowd, threatminer, trello,
                        twitter, urlscan, virustotal, yahoo

Let’s walk through the possibilities.

-d <DOMAIN>, –domain <DOMAIN>
This is the entry point of your search. Here you can specify the domain name or company name or your target.

-l <LIMIT> –limit <LIMIT>
The default search and output is a maximum of 500 results. If you want to limit your results, you can specify an integer to meet your needs. The fewer results the faster your search.

-S START, –start START
The default displaying the results is from 0. If you want to start displaying the results from a specific amount, you can specify it with this parameter.

-g, –google-dork
theHarvester is using Google Dorks search queries. The Google Dorks searches are places in the ‘/wordlists/dorks.txt’ file, you can add more Google dorks searches if you want to this file.

-p, –proxies
If you want to hide yourself behind a proxy, like SOCK5, you can specify this parameter to enable the proxy. The proxy settings needs to be defined in the proxies.yaml file.

-s, –shodan
Use shodan to query for the discovered hosts. This gives you insights into possible open ports or vulnerabilities on the hosts.

–screenshot SCREENSHOT
If you want that theHarvester is making screenshots of resolved domains, you can use this parameter. You need to specify the output directory where theHarvester need to store the images.

-v, –virtual-host
This verifies the specified hostname through DNS resolution and checks for virtual hosts.

-e DNS_SERVER, –dns-server DNS_SERVER
If you want to use a different DNS-server for the lookups, you can specify here the IP-address of the DNS server.

-t DNS_TLD, –dns-tld DNS_TLD
Some companies have more Top Level Domains. If you want that theHarvester is trying to discover them all, you can set this parameter to True. theHarvester is using a TLD dictionary.

-r, –take-over
This parameter performs a check if the domain is vulnerable for take over.

-n, –dns-lookup
Reverse Lookup if the found IP-addresses in order to find the hostnames.

-c, –dns-brute
This parameter will run a dictionary brute-force enumeration.

-f FILENAME
To save the results in a HTML file, you can define in this parameter the path and filename.

-b SOURCE
Define a source where theHarvester must performing a investigation on the specified domain.

How to use theHarvester?

Now we know how to configure the APIs, which parameters we can use to conduct the research. Let’s do an example of an investigation. In this example, we will use the domain microsoft.com, from the company Microsoft Corporation. We are going to search on Google -b google, and we will limit the results to 100 -l 100.

~$  python3 theHarvester.py -d microsoft.com -l 100 -b google

 *******************************************************************
 *  _   _                                            _             *
 * | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
 * | __|  _ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
 * | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
 *  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
 *                                                                 *
 * theHarvester 3.2.0                                              *
 * Coded by Christian Martorella                                   *
 * Edge-Security Research                                          *
 * [email protected]                                   *
 *                                                                 *
 ******************************************************************* 
 

 

 [*] Target: microsoft.com 
  
 Searching 0 results.
 Searching 100 results.
 [*] Searching Google. 
 

 [*] No IPs found.
 

 [*] No emails found.
 

 [*] Hosts found: 13
 ---------------------
 account.microsoft.com:95.100.163.25
 azure.microsoft.com:13.107.42.16
 browser.pipe.aria.microsoft.com:52.114.77.164
 ftp.microsoft.com:134.170.188.232
 mcr.microsoft.com:13.69.64.81
 msdn.microsoft.com:95.101.199.168
 schemas.microsoft.com:95.101.204.37
 support.microsoft.com:104.81.140.150
 technet.microsoft.com:95.101.199.168
 visualstudio.microsoft.com:23.38.18.69
 www.microsoft.com:104.99.234.13
 x3esupport.microsoft.com
 x3ewww.microsoft.com


The results are showing some hosts, which are belonging to the company Microsoft. There will probably be many more subdomains on the domain microsoft.com, and with the -c parameter we can perform a subdomain brute-force.

 
 ~$   python3 theHarvester.py -d microsoft.com -l 100 -b google -c 
...
 [*] Starting DNS brute force.
 Starting DNS brute forcing with 566 words
 

 [*] Hosts found after DNS brute force:
 admin.microsoft.com:13.107.9.156
 ajax.microsoft.com:152.199.19.160
 apps.microsoft.com:95.100.162.252
 asia.microsoft.com:40.113.200.201, 40.112.72.205, 40.76.4.15, 104.215.148.63, 13.77.161.179
 billing.microsoft.com:13.90.255.202
 blogs.microsoft.com:141.193.213.21, 141.193.213.20
 careers.microsoft.com:13.92.199.137
 communication.microsoft.com:191.234.1.49
 connect.microsoft.com:40.113.200.201
 customers.microsoft.com:104.40.179.243
 demo.microsoft.com:104.215.148.63, 40.113.200.201, 40.112.72.205, 40.76.4.15, 13.77.161.179
 dev.microsoft.com:95.101.202.154
 dns.microsoft.com:13.77.161.179, 40.76.4.15, 40.112.72.205, 104.215.148.63, 40.113.200.201
 docs.microsoft.com:88.221.70.216
 download.microsoft.com:104.81.140.145
 e.microsoft.com:40.113.200.201, 40.76.4.15, 40.112.72.205, 104.215.148.63, 13.77.161.179
 email.microsoft.com:157.55.150.73
 engineering.microsoft.com:13.107.246.13
 es.microsoft.com:51.143.57.13
 flow.microsoft.com:40.68.225.143
 forums.microsoft.com:65.52.103.99
 fs.microsoft.com:104.81.140.70
 ftp.microsoft.com:134.170.188.232
 g.microsoft.com:52.142.114.176
 games.microsoft.com:207.46.166.10
 gateway.microsoft.com:131.107.16.143, 131.107.16.142
 go.microsoft.com:104.108.39.131
 help.microsoft.com:40.113.200.201
 home.microsoft.com:52.169.118.173
 i.microsoft.com:104.108.37.246
 id.microsoft.com:13.107.42.22
 img.microsoft.com:104.108.37.246
 int.microsoft.com:13.107.246.13
 jobs.microsoft.com:52.207.139.125
 labs.microsoft.com:134.170.188.221, 134.170.185.46
 linux.microsoft.com:13.77.154.182
 login.microsoft.com:40.126.9.73, 20.190.137.96, 40.126.9.66, 20.190.137.6, 40.126.9.77, 20.190.137.10, 20.190.137.73, 20.190.137.14
 m.microsoft.com:65.55.186.235
 mail.microsoft.com:167.220.71.19, 157.58.197.10
 marketing.microsoft.com:207.46.242.110
 mobile.microsoft.com:65.55.186.235
 ms.microsoft.com:199.15.215.8
 my.microsoft.com:95.100.163.25
 neo.microsoft.com:104.40.3.53
 news.microsoft.com:141.193.213.20, 141.193.213.21
 online.microsoft.com:40.112.72.205, 104.215.148.63, 13.77.161.179, 40.76.4.15, 40.113.200.201
 open.microsoft.com:40.113.200.201
 ops.microsoft.com:40.71.199.117
 outlook.microsoft.com:40.76.4.15, 40.113.200.201, 13.77.161.179, 104.215.148.63, 40.112.72.205
 owa.microsoft.com:131.107.1.90, 131.107.1.89, 131.107.1.91, 131.107.0.91
 portal.microsoft.com:13.107.9.156
 public.microsoft.com:131.107.1.70
 rcs.microsoft.com:52.175.227.175
 remote.microsoft.com:131.107.0.6
 research.microsoft.com:13.67.218.189
 retail.microsoft.com:40.112.72.205, 40.113.200.201, 104.215.148.63, 13.77.161.179, 40.76.4.15
 rss.microsoft.com:40.112.72.205, 104.215.148.63, 13.77.161.179, 40.76.4.15, 40.113.200.201
 s.microsoft.com:40.113.200.201
 search.microsoft.com:213.144.255.200, 213.144.255.209
 sharepoint.microsoft.com:13.77.161.179, 40.112.72.205, 40.76.4.15, 40.113.200.201, 104.215.148.63
 signup.microsoft.com:13.107.246.13
 support.microsoft.com:104.81.140.150 

Here we can see the power of theHarvester. The are many subdomains found belonging to the domain microsoft.com. Microsoft has a bug bounty program, the next step can be running this subdomains against Shodan to enumerate the open ports and maybe some known vulnerabilities.

The next powerful part of theHarvester is that it can find the email addresses of employees working for an organization. Recently this was very useful for me in reporting after I’ve found a vulnerability in an organization. This organization was not part of a bug bounty program, and with theHarvester, I could fairly easily find a lot of email addresses of employees, so I could notify them by email of my findings.

For this example, I will use the domain pvv.nl. This is a political party in The Netherlands, and I want to find out the email addresses of the employees.

 ~$ python3 theHarvester.py -d pvv.nl -b google
 
 *******************************************************************
 *  _   _                                            _             *
 * | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
 * | __|  _ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
 * | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
 *  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
 *                                                                 *
 * theHarvester 3.2.0                                              *
 * Coded by Christian Martorella                                   *
 * Edge-Security Research                                          *
 * [email protected]                                   *
 *                                                                 *
 ******************************************************************* 
 

 

 [*] Target: pvv.nl 
  
 Searching 0 results.
 Searching 100 results.
 Searching 200 results.
 Searching 300 results.
 Searching 400 results.
 Searching 500 results.
 [*] Searching Google. 
 

 [*] No IPs found.
 

 [*] Emails found: 5
 ----------------------
 [email protected]
 [email protected]
 [email protected]
 [email protected]
 [email protected] 
...

As you can see, it founds 5 email addresses, belonging to the domain pvv.nl. We will do the last example and we will save the output in an HTML file. This time we are using the domain europe.eu. This time we will use the parameter. So, theHarvester will use all of the possibilities of the tool to find information about this domain and save it to an HTML file.

 ~$ python3 theHarvester.py -d europe.eu -b all -f ./europe.eu.html

There is a massive output saved to the HTML file. The contents of the HTML file are searchable in the table, you can filter on specific records, like email address and plugin.

theHarvester output to HTML

I hope you’ve learned something about theHarvester.

Thanks for reading!

T13nn3s

I'm a cybersecurity enthusiast! I'm working as an IT Security Engineer for a company in The Netherlands. I love writing scripts and doing research and pentesting. As a big fan of Hack The Box, I share my write-ups on this blog. I'm blogging because I like to summarize my thoughts and share them with you.

View all posts by T13nn3s →

Leave a Reply

Your email address will not be published. Required fields are marked *