Skip to content

How does see link works?

Bharat Sharma edited this page Dec 11, 2021 · 4 revisions

See-A-Link!

See-Link first extracts the URL from the input text. It uses the first URL in the list for further action. It then visits the URL and scrapes the data like a normal user would do by opening a browser. It uses puppeteer to do this and by default, it opens the webpage in a headless chrome/chromium browser. To bypass certain restrictions put up by websites to detect bots, see-link uses facebook's user agent in its request headers.

See-Link gives first priority to the open-graph markup followed by twitter-card markup and other meta tags. Although the priority given to the markups is almost the same all over the package, there are some different approaches for different meta info.

MetaData Provided by See-Link

Following headers form the meta info returned by See-Link. Meta markups/tags are listed in the priority order that see-link follows to scrape data using them.

Title

  • og:title
  • twitter:title
  • Document title tag
  • First h1 tag
  • First h2 tag

Description

  • og:description
  • twitter:description
  • Description meta tag
  • First p tag

Image

See-Link looks through og:image, twitter:image markups and if it finds nothing then it looks for a link tag with attribute rel="image_src". However many websites don't have the above markups. In such a case see-link parses images from the page's body.

The problem in this approach is: how to determine which image to use? A user asked a similar question on Quora:

How does Facebook determine which images to show as thumbnails when posting a link?

A Facebook employee answered the question (in 2010):

On the client-side, the candidate images are filtered by javascript that removes all images less than 50 pixels in height or width and all images with a ratio of the longest dimension to the shortest dimension greater than 3:1. The filtered images are then sorted by area and users are given a selection of multiple images that exist.

See-Link uses the same strategy to filter out the possible results and returns the first image in the list after filtering.

Domain Name

  • Link tag with rel='canonical' attribute
  • og:url

If nothing is found then it uses the page's URL and returns the domain name.

Theme-Color

This info is present in the meta tag with attribute name="theme-color". The theme-color info can be leveraged by designers to create an awesome preview. Since many sites don't provide the info, See-Link defaults to returning the dominant color of the page if this metadata is not found.

See-Link uses color-thief to extract the dominant color from the page. By default getDominantThemeColor is set to true.

Video

  • og:video
  • twitter:player
  • Link tag with rel='video_src' attribute
  • The page URL, if it points to a video content

FavIcon

It looks through the link tag with attributes in the order:

  • rel='icon'
  • rel='shortcut icon'
  • rel='apple-touch-icon'

Media Type

This is the type of the web page and looks for og:type meta markup.