Post

Web Reconnaissance and Enumeration Cheatsheet

Working cheatsheet of recon and enumeration commands for web pentesting.

Web Reconnaissance and Enumeration Cheatsheet

Introduction

When I start a web pentest, the first hour or two goes into recon. The wider I cast the net here, the more attack surface shows up later.

This is my working cheatsheet for that first phase, the commands and tools I keep reaching for across OSINT, subdomain enumeration, port scanning, and content discovery. I’ll keep updating it as I pick up new techniques.

Important:

  • These commands are a baseline. Manual analysis and verification are still essential.
  • Adapt techniques to the target. Recon is not one-size-fits-all.
  • Tool versions may affect command syntax and output. Adjust accordingly.

Project Setup

1
2
3
# Create project structure
mkdir -p 2026/target.com && cd 2026/target.com
mkdir subdomains ports dirs waf ssl

OSINT

Passive intel gathering before any active probing. Often surfaces credentials and infrastructure data that shortcut later phases.

GitHub

1
2
3
4
5
6
7
8
9
10
11
"target.com"
"target.com" password
"target.com" secret
"target.com" api_key
"target.com" token
"target.com" path:.env
"target.com" path:config
"target.com" filename:credentials
"target.com" filename:.npmrc
"target.com" extension:sql
"target.com" language:python password

Shodan

1
2
3
4
5
6
hostname:"target.com"
ssl.cert.subject.cn:"target.com"
http.title:"target.com"
http.html:"target.com"
org:"Organization Name"
net:"203.0.113.0/24"

Google Dorking

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
site:target.com
site:*.target.com

# Login pages
site:target.com inurl:login
site:target.com inurl:admin
site:target.com intitle:admin

# Sensitive files
site:target.com filetype:pdf
site:target.com filetype:sql
site:target.com filetype:log
site:target.com filetype:env
site:target.com filetype:config

# Sensitive directories
site:target.com inurl:backup
site:target.com inurl:api
site:target.com inurl:test
site:target.com inurl:dev

# Credentials
site:target.com intext:password
site:target.com intext:username

# Error messages
site:target.com intext:"sql syntax"
site:target.com intext:"stack trace"

# Technology info
site:target.com inurl:phpinfo
site:target.com intext:"powered by"

Bing

1
2
3
site:target.com
site:target.com filetype:pdf
site:target.com instreamset:(url title):login

Wayback Machine

1
2
https://web.archive.org/web/*/target.com
https://web.archive.org/web/*/subdomain.target.com

Censys

1
2
3
4
5
"target.com"
host.services.dns.names: "target.com"
host.services.http.response.html_title: "App Name"
host.services.port = 443 and host.services.dns.names: "target.com"
host.services.software.product: "nginx"

Paste Sites

1
2
3
4
site:pastebin.com "target.com"
site:gist.github.com "target.com"
site:paste.ee "target.com"
site:rentry.org "target.com"

Cloud Storage

1
2
3
4
5
# Public buckets/blobs that leak via misconfiguration
site:s3.amazonaws.com "target.com"
site:blob.core.windows.net "target.com"
site:storage.googleapis.com "target.com"
site:digitaloceanspaces.com "target.com"

Project Trackers

1
2
3
# Tickets/boards sometimes leak credentials and internal info
site:trello.com "target.com"
site:atlassian.net "target.com"

Social Media

1
2
3
"target.com" site:x.com
"target.com" site:linkedin.com
"target.com" site:reddit.com

Root Domain Discovery

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cd subdomains

# Domain -> IP
dig +short target.com

# IP -> ASN, name, prefix, country (Team Cymru)
whois -h whois.cymru.com " -v 8.8.8.8"

# IP -> ASN + org info (richer, via ipinfo.io, no token needed)
curl -s https://ipinfo.io/8.8.8.8/json

# ASN -> announced prefixes
whois -h whois.radb.net -- "-i origin AS15169" | grep ^route

# Org name -> ASN: https://bgp.he.net/ (web)

# WHOIS lookup
whois target.com

# Historical WHOIS: https://whois-history.whoisxmlapi.com/

Subdomain Enumeration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
domain='target.com'

# OSINT discovery - subfinder
subfinder -d "$domain" -o domain-ip-subfinder.txt -active -ip
awk -F"," '{print $1}' domain-ip-subfinder.txt > domain-subfinder.txt
awk -F"," '{print $2}' domain-ip-subfinder.txt > ip-subfinder.txt

# OSINT discovery - assetfinder
assetfinder --subs-only "$domain" > domain-assetfinder.txt

# OSINT discovery - certificate transparency (crt.sh)
curl -s "https://crt.sh/?q=%25.${domain}&output=json" \
  | jq -r '.[].name_value' \
  | tr ',' '\n' | sed 's/^\*\.//' \
  | grep -E "(\.|^)${domain}$" \
  | sort -u > domain-crt.txt

# Brute-force discovery
gobuster dns --domain "$domain" \
  -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt \
  -o domain-ip-brute.txt --resolver 1.1.1.1 --delay 100ms --timeout 5s

sed 's/\x1B\[[0-9;]*[mK]//g' domain-ip-brute.txt | awk '{print $1}' | sort -u > domain-brute.txt
sed 's/\x1B\[[0-9;]*[mK]//g' domain-ip-brute.txt | grep -oE '([0-9]{1,3}\.){3}[0-9]{1,3}' | sort -u > ip-brute.txt

# Merge results
cat domain-assetfinder.txt domain-brute.txt domain-crt.txt domain-subfinder.txt | sort -u > all-subdomains.txt
cat ip-brute.txt ip-subfinder.txt | sort -u > all-ip.txt

# Generate permutations
dnsgen all-subdomains.txt | sort -u > permutations.txt

# Resolve and filter wildcards (puredns wraps massdns with wildcard heuristics)
puredns resolve permutations.txt \
  --resolvers resolvers.txt \
  --resolvers-trusted resolvers-trusted.txt \
  -w valid-subdomains.txt
echo "$domain" >> valid-subdomains.txt

Resolver files for puredns:

  • resolvers.txt - Public resolver list, grab from trickest/resolvers (auto-updated daily).
  • resolvers-trusted.txt - Small list of high-quality resolvers used for double-validation. A safe default:
    1
    2
    3
    
    1.1.1.1
    8.8.8.8
    9.9.9.9
    

Live Host Discovery

1
2
3
4
5
6
7
8
9
10
11
12
cd ../ports

# Probe subdomains
cat ../subdomains/valid-subdomains.txt | httpx -o http.txt -title -status-code -tech-detect

# Probe IPs
cat ../subdomains/all-ip.txt | httpx -o http-ip.txt -title -status-code -tech-detect

# Merge and screenshot
cat http-ip.txt http.txt | sort -u > http-all.txt
awk '{print $1}' http-all.txt | sort -u > http-all-clean.txt
gowitness scan file -f http-all-clean.txt

Port Scanning

1
2
3
4
5
6
7
8
9
10
11
12
# Passive scanning (Shodan via nrich)
cat ../subdomains/all-ip.txt | xargs -I{} -n1 sh -c 'echo {} | nrich -; sleep 2' > shodan-ips.txt

# Active scanning - naabu
naabu -l ../subdomains/all-ip.txt -top-ports 100 -o naabu-ips.txt

# Active scanning - nmap (loop over unique IPs from naabu)
for target in $(awk -F: '{print $1}' naabu-ips.txt | sort -u); do
  nmap -p- -oN "${target}-allports.nmap" "$target" -vv
  ports=$(grep -E '^[0-9]+/tcp.*open' "${target}-allports.nmap" | awk -F/ '{print $1}' | paste -sd,)
  [ -n "$ports" ] && nmap -sV -sC --script vuln -p"$ports" -oN "${target}.nmap" "$target" -vv
done

Notes:

  • nrich uses Shodan’s free InternetDB API. The sleep 2 keeps you under the rate limit for small to medium IP lists. For hundreds+ of IPs, expect throttling. Split the list into chunks or use an authenticated Shodan API key.

WAF Detection

1
2
cd ../waf
wafw00f https://target.com/ | tee waf.txt

Finding origin IP behind WAF:

  • Check DNS history: SecurityTrails
  • Compare screenshots from Live Host Discovery

If found, test access:

1
2
3
# Replace 203.0.113.50 with actual origin IP
echo "203.0.113.50 target.com" | sudo tee -a /etc/hosts
curl -i http://target.com

SSL/TLS Analysis

1
2
3
4
5
6
7
8
9
10
11
12
cd ../ssl

# Extract HTTPS hostnames from live host discovery
grep '^https://' ../ports/http-all-clean.txt | sed 's|https://||;s|[/:].*||' | sort -u > https-hosts.txt

# Quick SSL scan via nmap NSE (cert info, cipher enum, known vulns)
nmap --script "ssl*" -p443 -iL https-hosts.txt -oN ssl.nmap

# Comprehensive SSL/TLS analysis per host
while read host; do
  testssl.sh --quiet --color 0 "$host" > "testssl-${host}.txt"
done < https-hosts.txt

Content Discovery

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cd ../dirs

# Passive discovery
echo "target.com" | gau > gau.txt
echo "target.com" | waybackurls > waybackurls.txt
cat gau.txt waybackurls.txt | sort -u > osint-url.txt

# Active discovery - dirsearch
dirsearch -u "https://target.com/" -o dirsearch.txt
dirsearch -r -u "https://target.com/" -o dirsearch-recursive.txt

# Active discovery - gobuster
gobuster dir -u https://target.com/ -w /usr/share/dirb/wordlists/common.txt -o gobuster-common.txt
gobuster dir -u https://target.com/ -w /usr/share/seclists/Discovery/Web-Content/raft-large-directories.txt -o gobuster-raftdir.txt

# Crawling - katana (-jc parses JS endpoints, -kf all includes robots.txt + sitemap.xml)
katana -d 4 -jc -kf all -u https://target.com -o katana.txt

# Extract parameters
cat osint-url.txt | grep "="
grep -oP '(?<=\?).*' osint-url.txt | tr '&' '\n' | cut -d= -f1 | sort -u > parameters.txt

# Parameter discovery (active)
arjun -u https://target.com

Browser Extensions

Useful for spot-checking the target while browsing manually.

Technology Detection:

  • Wappalyzer - Identifies frameworks, CMS, and server software

Endpoint Discovery:

Secret Detection:

Conclusion

We’ve walked through the recon phase from passive OSINT through asset enumeration, live host probing, and defense fingerprinting to active content discovery. None of these tools alone gives the full picture. The real work is correlating their outputs and following up manually on what looks interesting.

This post is licensed under CC BY 4.0 by the author.