Skip to content

msih-apify/Social-Media-and-Contact-Info-Extractor

 
 

Repository files navigation

Features

Our free Contact Details Scraper can crawl any website and extract the following contact information for individuals listed on the website:

Why scrape contact information?

Scraping contact details can give you a fast way to get lead generation data for your marketing and sales teams. Harvesting contact details can help you populate and maintain an up-to-date database of contacts, leads, and prospective customers. Instead of manually visiting web pages and copy-pasting names and numbers, you can extract the data and rapidly sort it in spreadsheets or feed it directly into your existing workflow.

Check out our industry pages for use cases and more ideas on how you can take advantage of web scraping.

Tutorial

Read our step-by-step guide to using Contact Details Scraper.

Input Configuration

The actor offers several input options to let you specify which pages will be crawled:

  • Start URLs - Lets you add a list of URLs of web pages where the scraper should start. You can enter multiple URLs, upload a text file with URLs, or even use a Google Sheets document.
  • Maximum link depth - Specifies how deep the actor will scrape links from the web pages specified in the Start URLs. If zero, the actor ignores the links and only crawls the Start URLs.
  • Stay within domain - If enabled, the actor will only follow links that are on the same domain as the referring page. For example, if the setting is enabled and the actor finds a link on http://www.example.com/some-page to http://www.another-domain.com/, it will not crawl the second page, because www.example.com is not the same as www.another-domain.com.

The actor also accepts additional input options that let you specify proxy servers, limit the number of pages, etc.

Results

The actor stores its results into the default dataset associated with the actor run. You can then download the results in formats such as JSON, HTML, CSV, XML, or Excel. For each page crawled, the following contact information is extracted (examples shown):

  • Emails
  • Phone numbers - These are extracted from phone links in HTML (e.g. <a href='tel://123456789'>phone</a>).
    123456789
    +123456789
    00123456789
    
  • Uncertain phone numbers - These are extracted from the plain text of the web page using a number of regular expressions. Note that this approach can generate false positives.
    +123.456.7890
    123456789
    123-456-789
    
  • LinkedIn profiles
    https://www.linkedin.com/in/alan-turing
    en.linkedin.com/in/alan-turing
    linkedin.com/in/alan-turing
    
  • Twitter profiles
    https://www.twitter.com/apify
    twitter.com/apify
    
  • Instagram profiles
    https://www.instagram.com/old_prague
    www.instagram.com/old_prague/
    instagr.am/old_prague
    
  • Facebook profiles or pages
    https://www.facebook.com/apifytech
    facebook.com/apifytech
    fb.com/apifytech
    https://www.facebook.com/profile.php?id=123456789
    

The results also contain information about the URL of the web page, domain, and referring URL (if the page was linked from another page), and depth (how many links away from Start URLs the page was found).

For each page crawled, the resulting dataset contains a single record, which looks like this (in JSON format):

{
  "url": "http://www.robertlmyers.com/index.html",
  "domain": "robertlmyers.com",
  "depth": 2,
  "referrerUrl": "http://www.robertlmyers.com",
  "emails": [
    "[email protected]"
  ],
  "phones": [],
  "phonesUncertain": [
    "717.393.3643"
  ],
  "linkedIns": [],
  "twitters": [],
  "instagrams": [],
  "facebooks": [
    "https://www.facebook.com/robertlmyers/"
  ]
}

Personal data

You should be aware that your results might contain personal data. Personal data is protected by GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers. You can also read our blog post on the legality of web scraping.

Notes

This actor was built using utils.social from the Apify SDK. If you need to have more control over the crawling and data extraction process, you can relatively easily build a new actor using the Apify SDK. For more details on how to build actors, see our documentation.

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 96.5%
  • Dockerfile 3.5%