You can verify if a web crawler accessing your server really is who they claim they are. This is useful if you're concerned that spammers or other troublemakers are accessing your site while claiming to be known crawlers. Crawlers do not post public lists of IP addresses to whitelist. This is because these IP address ranges can change, causing problems for any systems who have hard-coded them, so you must run a DNS lookup as described next.
Example flow to verify Googlebot as the caller
import { verify } from "reverse-dns-lookup";
import requestIP from "request-ip";
const clientIp = requestIP.getClientIp(request);
const isGooglebotServer = await verify(clientIp, "google.com", "googlebot.com");
- Run a reverse DNS lookup on the accessing IP address.
- Verify that the domain name is in the supplied domain names.
- Run a forward DNS lookup on the retrieved domain name (from step 1).
- Verify that it is the same as the original accessing IP address from your logs.
Some popular domains
const crawler_domains = [
".google.com",
".googlebot.com",
"search.msn.com", // Bing
".applebot.apple.com",
".twttr.com", // Twitter
".crawl.baidu.com", // Baidu craler
];
const isCrawlerServer = await verify(clientIp, ...crawler_domains);
reverse-dns-lookup 66.249.66.1 google.com googlebot.com
Checks out |
---|
66.249.66.1 checks up with google.com, googlebot.com |
Exit code 0 |
Does not check out okay |
---|
1.1.1.1 does not check up with google.com, googlebot.com. |
Exit code 1 |