Skip to content

🐘 PageRank implementation in PHP with extendable features (PHP 7.4)

License

Notifications You must be signed in to change notification settings

PHP-Science/PageRank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PageRank

This source code is an OOP implementation of the PageRank algorithm.

About

This implementation is based on Larry Page's PageRank algorithm. It weights the connections between the nodes in a graph. In theory, the nodes can be websites, words, people, etc. As the number of the nodes are increasing the algorithm is becoming slower. To scale the size and handle millions of nodes, the accuracy of the calculation can be limited, and the long-running calculation can be scheduled in batches using the Strategy OOP pattern in the implementation.

Workflow

  • It calculates the initial ranking values. At the first iteration, all the nodes have the same rank.
  • Iterates the nodes using the power method/iteration technique over and over again until it reaches the maximum iteration number.
  • However, the iteration stops when the ranks are accurate enough even if the max iteration didn't reach its limit.
  • The accuracy measured by the float epsilon constant.
  • At the end, the algorithm normalizes the ranks between 0 and 1 and then scale them between 1 and 10. The scaling range is configurable.
  • Getting, setting, updating the nodes from the resource is a responsibility of the NodeDataSourceStrategyInterface.
  • The package provides a simple implementation of the NodeDataSourceStrategyInterface that only keeps the nodes in the memory. Another way of implementing the NodeDataSourceStrategyInterface could be a simple class that uses an ORM to handle the node collection.

Install

composer require php-science/pagerank

Example

$dataSource = $this->getYourDataSource();

$nodeBuilder = new NodeBuilder();
$nodeCollectionBuilder = new NodeCollectionBuilder();
$strategy = new MemorySourceStrategy(
    $nodeBuilder,
    $nodeCollectionBuilder,
    $dataSource
);

$rankComparator = new RankComparator();
$ranking = new Ranking(
    $rankComparator,
    $strategy
);

$normalizer = new Normalizer();

$pageRankAlgorithm = new PageRankAlgorithm(
    $ranking,
    $strategy,
    $normalizer
);

$maxIteration = 100;
$nodeCollection = $pageRankAlgorithm->run($maxIteration);

var_dump($nodeCollection->getNodes());

Functional Sample