Skip to content

Latest commit

 

History

History
80 lines (49 loc) · 1.71 KB

section_link_analysis.md

File metadata and controls

80 lines (49 loc) · 1.71 KB

Link analysis

Notes:

  • How to use link analysis to determine relevant pages?

Assumption

web graph

  • In-link = endorsement
  • Similar to citation analysis

Notes:


Link info

Notes:


PageRank

  • Rank pages with many in-links higher
  • PageRank =~ In-degree
  • But harder to influence as it propagates
  • One in-link from high PageRank site worth more than many low PageRank in-link
  • Probability that random surfer will end up on page

PageRank vs In-Degree

PageRank vs In-Degree


Pagerank calculation

  1. Assume random surfer
  2. Randomly walk web graph
    • No out-links → teleport
    • 15% chance that user opens a random page
  3. Count how many times a page is visited

$$\sum \text{pagerank} = 1$$


Pagerank visualization

Notes:


Reasonable surfer

  • Clicks some links more often than others
  • Main content vs sidebar / footer
  • Anchor text related to query / user intent
  • Avoid ads

Notes:

  • How can the random surfer be improved to provide more realistic results?