Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix using screenshot step with response cache #1

Merged
merged 2 commits into from
Feb 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [1.0.0] - 2024-02-17
### Changed
* Change the output of the `Screenshot` step, from an array `['response' => RespondedRequest, 'screenshotPath' => string]` to a `RespondedRequestWithScreenshot` object, that has a `screenshotPath` property. The problem with the previous solution was: when using the response cache, the step failed, because it gets a cached response from the loader that was not actually loaded in the headless browser. When the step afterwards tries to take a screenshot from the page that is still open in the browser, it just fails because there is no open page. Now, with the new `RespondedRequestWithScreenshot` object, the `screenshotPath` is also saved in the cached response.

## [0.1.2] - 2024-02-07
### Fixed
* Upgrade to `crwlr/crawler` v1.5.3 and remove the separate `HeadlessBrowserLoader` and `HeadlessBrowserCrawler`. The steps shall simply use the normal `HttpLoader` and automatically switch to use the headless browser for loading and switch back afterwards if the loader was configured to use the HTTP client.

## [0.1.1] - 2024-02-06
Expand Down
2 changes: 1 addition & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
},
"require": {
"php": "^8.1",
"crwlr/crawler": "^1.5.3"
"crwlr/crawler": "^1.6.1"
},
"require-dev": {
"pestphp/pest": "^2.19",
Expand Down
59 changes: 59 additions & 0 deletions src/Aggregates/RespondedRequestWithScreenshot.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<?php

namespace Crwlr\CrawlerExtBrowser\Aggregates;

use Crwlr\Crawler\Loader\Http\Messages\RespondedRequest;
use Exception;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;

class RespondedRequestWithScreenshot extends RespondedRequest
{
public function __construct(
RequestInterface $request,
ResponseInterface $response,
public string $screenshotPath,
) {
parent::__construct($request, $response);
}

/**
* @throws Exception
*/
public static function fromRespondedRequest(RespondedRequest $respondedRequest, string $screenshotPath): self
{
return new self($respondedRequest->request, $respondedRequest->response, $screenshotPath);
}

public static function fromArray(array $data): RespondedRequestWithScreenshot
{
$respondedRequest = parent::fromArray($data);

return self::fromRespondedRequest($respondedRequest, $data['screenshotPath']);
}

public function __serialize(): array
{
$serialized = parent::__serialize();

$serialized['screenshotPath'] = $this->screenshotPath;

return $serialized;
}

public function __unserialize(array $data): void
{
parent::__unserialize($data);

$this->screenshotPath = $data['screenshotPath'] ?? '';
}

public function toArrayForAddToResult(): array
{
$array = parent::toArrayForAddToResult();

$array['screenshotPath'] = $this->screenshotPath;

return $array;
}
}
3 changes: 3 additions & 0 deletions src/Steps/GetColors.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
namespace Crwlr\CrawlerExtBrowser\Steps;

use Crwlr\Crawler\Steps\Step;
use Crwlr\CrawlerExtBrowser\Aggregates\RespondedRequestWithScreenshot;
use Crwlr\CrawlerExtBrowser\Utils\ImageColors;
use Exception;
use Generator;
Expand Down Expand Up @@ -31,6 +32,8 @@ protected function validateAndSanitizeInput(mixed $input): mixed
{
if (is_array($input) && array_key_exists('screenshotPath', $input)) {
$input = $input['screenshotPath'];
} elseif ($input instanceof RespondedRequestWithScreenshot) {
$input = $input->screenshotPath;
}

return $this->validateAndSanitizeStringOrStringable($input);
Expand Down
23 changes: 15 additions & 8 deletions src/Steps/Screenshot.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@
namespace Crwlr\CrawlerExtBrowser\Steps;

use Crwlr\Crawler\Loader\Http\Messages\RespondedRequest;
use Crwlr\CrawlerExtBrowser\Aggregates\RespondedRequestWithScreenshot;
use Exception;
use Generator;
use HeadlessChromium\Exception\CommunicationException;
use HeadlessChromium\Exception\FilesystemException;
use HeadlessChromium\Exception\ScreenshotFailed;
use Psr\Http\Message\UriInterface;
use Psr\SimpleCache\InvalidArgumentException;
use Throwable;

class Screenshot extends BrowserBaseStep
Expand All @@ -35,8 +37,8 @@ public static function loadAndTake(string $storePath, array $headers = []): Scre

/**
* @param UriInterface|UriInterface[] $input
* @return Generator<array{ response: RespondedRequest, screenshotPath: string }>
* @throws Exception
* @return Generator<RespondedRequestWithScreenshot>
* @throws Exception|InvalidArgumentException
*/
protected function invoke(mixed $input): Generator
{
Expand All @@ -48,14 +50,19 @@ protected function invoke(mixed $input): Generator
$response = $this->getResponseFromInputUri($uri);

if ($response) {
$screenshotPath = $this->makeScreenshot($response);
if (!$response instanceof RespondedRequestWithScreenshot) {
$screenshotPath = $this->makeScreenshot($response);

if (is_string($screenshotPath)) {
yield [
'response' => $response,
'screenshotPath' => $screenshotPath,
];
if (is_string($screenshotPath)) {
$response = RespondedRequestWithScreenshot::fromRespondedRequest($response, $screenshotPath);

$this->loader->addToCache($response);
} else {
return;
}
}

yield $response;
}
}

Expand Down
44 changes: 41 additions & 3 deletions tests/Pest.php
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
<?php

use Crwlr\Crawler\Cache\FileCache;
use Crwlr\Crawler\HttpCrawler;
use Crwlr\Crawler\Loader\Http\HttpLoader;
use Crwlr\Crawler\Loader\Http\Politeness\TimingUnits\MultipleOf;
use Crwlr\Crawler\Loader\LoaderInterface;
use Crwlr\Crawler\UserAgents\UserAgent;
use Crwlr\Crawler\UserAgents\UserAgentInterface;
use Crwlr\CrawlerExtBrowser\Crawlers\HeadlessBrowserCrawler;
use Crwlr\CrawlerExtBrowser\Loaders\HeadlessBrowserLoader;
use Crwlr\Utils\Microseconds;
use Psr\Log\LoggerInterface;
use Symfony\Component\Process\Process;
Expand Down Expand Up @@ -66,6 +65,19 @@ protected function loader(UserAgentInterface $userAgent, LoggerInterface $logger
};
}

function helper_getFastCrawlerWithCache(): HttpCrawler
{
$crawler = helper_getFastCrawler();

$loader = $crawler->getLoader();

/** @var HttpLoader $loader */

$loader->setCache(new FileCache(__DIR__ . '/_cache'));

return $crawler;
}

function helper_testFilePath(string $inPath = ''): string
{
$basePath = __DIR__ . '/_files';
Expand All @@ -77,13 +89,24 @@ function helper_testFilePath(string $inPath = ''): string
return $basePath;
}

function helper_testCachePath(string $inPath = ''): string
{
$basePath = __DIR__ . '/_cache';

if (!empty($inPath)) {
return $basePath . (!str_starts_with($inPath, '/') ? '/' : '') . $inPath;
}

return $basePath;
}

function helper_cleanFiles(): void
{
$scanDir = scandir(helper_testFilePath());

if (is_array($scanDir)) {
foreach ($scanDir as $file) {
if ($file === '.' || $file === '..' || $file === '.gitkeep') {
if ($file === '.' || $file === '..' || $file === 'demo-screenshot.png') {
continue;
}

Expand All @@ -92,6 +115,21 @@ function helper_cleanFiles(): void
}
}

function helper_cleanCache(): void
{
$scanDir = scandir(helper_testCachePath());

if (is_array($scanDir)) {
foreach ($scanDir as $file) {
if ($file === '.' || $file === '..' || $file === '.gitkeep') {
continue;
}

unlink(helper_testCachePath() . '/' . $file);
}
}
}

function helper_dump(mixed $var): void
{
error_log(var_export($var, true));
Expand Down
33 changes: 32 additions & 1 deletion tests/_Integration/GetColorsTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
helper_cleanFiles();
});

it('gets the colors from a screenshot image', function () {
it('gets the colors from a screenshot from a screenshot step execute before it in a crawler', function () {
$crawler = helper_getFastCrawler();

$crawler
Expand All @@ -21,6 +21,37 @@

$result = $results[0]->toArray();

expect($result['colors'][0])
->toHaveKeys(['red', 'green', 'blue', 'rgb', 'percentage'])
->and($result['colors'][0]['percentage'])
->toBeGreaterThanOrEqual(80.3)
->and($result['colors'][1])
->toHaveKeys(['red', 'green', 'blue', 'rgb', 'percentage'])
->and($result['colors'][1]['percentage'])
->toBeGreaterThanOrEqual(15.3)
->and($result['colors'][2])
->toHaveKeys(['red', 'green', 'blue', 'rgb', 'percentage'])
->and($result['colors'][2]['percentage'])
->toBeGreaterThanOrEqual(3.1)
->and($result['colors'][2])
->toHaveKeys(['red', 'green', 'blue', 'rgb', 'percentage'])
->and($result['colors'][2]['percentage'])
->toBeGreaterThanOrEqual(3.1);
});

it('gets the colors from an image', function () {
$crawler = helper_getFastCrawler();

$crawler
->input(['screenshotPath' => helper_testFilePath('demo-screenshot.png')])
->addStep(GetColors::fromImage());

$results = iterator_to_array($crawler->run());

expect($results)->toHaveCount(1);

$result = $results[0]->toArray();

expect($result['colors'])
->toBe([
['red' => 96, 'green' => 177, 'blue' => 119, 'rgb' => '(96,177,119)', 'percentage' => 80.3],
Expand Down
62 changes: 52 additions & 10 deletions tests/_Integration/ScreenshotTest.php
Original file line number Diff line number Diff line change
@@ -1,33 +1,75 @@
<?php

use Crwlr\Crawler\Loader\Http\Messages\RespondedRequest;
use Crwlr\Crawler\Steps\Loading\Http;
use Crwlr\CrawlerExtBrowser\Aggregates\RespondedRequestWithScreenshot;
use Crwlr\CrawlerExtBrowser\Steps\Screenshot;

afterEach(function () {
helper_cleanFiles();

helper_cleanCache();
});

it('takes a screenshot', function () {
$crawler = helper_getFastCrawler();

$crawler
->input('http://localhost:8000/screenshot')
->addStep(Screenshot::loadAndTake(helper_testFilePath()));
->addStep(
Screenshot::loadAndTake(helper_testFilePath())
->addToResult('response')
);

$results = iterator_to_array($crawler->run());

expect($results)
->toHaveCount(1);
expect($results)->toHaveCount(1);

$result = $results[0]->toArray();
$response = $results[0]->get('response');

expect($result['response'])
->toBeInstanceOf(RespondedRequest::class)
->and(Http::getBodyString($result['response']))
expect($response)
->toBeInstanceOf(RespondedRequestWithScreenshot::class)
->and(Http::getBodyString($response))
->toContain('<h1>Hello World!</h1>')
->and($result['screenshotPath'])
->and($response->screenshotPath)
->toBeString()
->and(file_exists($result['screenshotPath']))
->and(file_exists($response->screenshotPath))
->toBeTrue();
});

it('can be cached and remembers the screenshot path', function () {
$crawler = helper_getFastCrawlerWithCache();

$crawler
->input('http://localhost:8000/screenshot')
->addStep(
Screenshot::loadAndTake(helper_testFilePath())
->addToResult(['screenshotPath'])
);

$results = iterator_to_array($crawler->run());

expect($results)->toHaveCount(1);

$screenshotPath = $results[0]->get('screenshotPath');

expect($screenshotPath)
->toBeString()
->not()
->toBeEmpty();

$crawler = helper_getFastCrawlerWithCache();

$crawler
->input('http://localhost:8000/screenshot')
->addStep(
Screenshot::loadAndTake(helper_testFilePath())
->addToResult(['screenshotPath'])
);

$results = iterator_to_array($crawler->run());

expect($results)
->toHaveCount(1)
->and($results[0]->get('screenshotPath'))
->toBe($screenshotPath);
});
File renamed without changes.
Binary file added tests/_files/demo-screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.