Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: React: Flaky screenshots (pixel shift) #7548

Closed
florianbepunkt opened this issue Jul 10, 2021 · 9 comments
Closed

[Question]: React: Flaky screenshots (pixel shift) #7548

florianbepunkt opened this issue Jul 10, 2021 · 9 comments

Comments

@florianbepunkt
Copy link

Your question

I have a lot visual regression tests are quite flaky. Inside our React app I use hook that runs after the current screen to test has been mounted. This hooks adds a "rendered" css class to the documents body.

Inside playwright I await this class as an indicator that the react screen has loaded.

Still I get very flaky and often falling tests that look like the follwing diffs. Both reference and test images are created on the same machine. It looks to me like the DOM has not been fully rendered yet or some other minimal pixel shift is happening.

Any ideas how this can be improved or what might be the cause of this?

Example 1 Diff:
circle-of-suspects-diff

Example 2 Diff:
background-diff

@florianbepunkt florianbepunkt changed the title React: Flaky screenshots (pixel shift) [Question]: React: Flaky screenshots (pixel shift) Jul 11, 2021
@lo1tuma
Copy link

lo1tuma commented Jul 12, 2021

I’m facing the same problem with flaky visual regression tests. I’m currently migrating out test suite from backstop.js to playwright. With backstop.js I’ve never had those pixel shift related problems (but several others). So I think this might come from the different image-diffing tool they use.
I’ve also found related issues in the pixelmatch issuetracker, which is used by playwright.

@florianbepunkt
Copy link
Author

@lo1tuma I'm working on an enhanced image snapshot matcher that supports more image comparison algorithms as well as bluring snapshot and test images. I will report back here, once things matured a bit. But currently I can get rid of a lot of flakiness by using SSIM and slight blurring (by 1-2 pixels).

@florianbepunkt
Copy link
Author

florianbepunkt commented Jul 17, 2021

We migrated the visual regression tests of a middle-complex React app to Playwright and Playwright test runner. A few of our findings regarding test flakiness. Hope this helps people in a similar situation.

1. Disabling of animations and images

Animations

In our case we used material-ui. This enabled us to disable animations in a centralized theme file with a flag, if we are in a test env REACT_APP_ENV=test (see scripts below).

But you could also inject some custom styles into your page to disable animations:

import { Page } from "@playwright/test";

type Params = { page: Page };

export const disableAnimations = async ({ page }: Params): Promise<void> => {
  await page.addStyleTag({
    content: `*,
        *::before,
        *::after {
        -moz-animation: none !important;
        -moz-transition: none !important;
        animation: none !important;
        caret-color: transparent !important;
        transition: none !important;
        }`,
  });
};

This depends on how your app is designed. But since animations make tests very time-sensitive, they inevitably introduce flakiness.

Images

The same as above goes for images. They introduce network load. So we cancel all image requests. Instead of cancelling you could also mock all image requests so that they return a placeholder.

const cancelImageRequests = (route: Route) =>
  route.request().resourceType() === "image" ? route.abort() : route.continue();
await page.route("**/*", cancelImageRequests);

2. Using a docker image that matches the Github action runner OS for reference image creation

Font rendering varies greatly between operating systems (as well as OS versions) and browsers. Reference images should be created with a setup as close as possible to your CI. Our Github actions run on Ubuntu 20. Therefore we use a playwright docker image for local testing and snapshot generation. These are our package.json scripts:

  "scripts": {
    "start": "HTTPS=true REACT_APP_ENV=development react-scripts start",
    "start:test": "HTTPS=true REACT_APP_ENV=test react-scripts start",
    "build:development": "REACT_APP_ENV=development react-scripts build",
    "build:test": "REACT_APP_ENV=test react-scripts build",
    "build:production": "REACT_APP_ENV=production react-scripts build",
    "test:ci": "npx playwright test",
    "pretest": "npm run build:test",
    "test": "npx playwright test",
    "pretest": "npm run build:test",
    "test": "docker run -it --rm --ipc=host -v \"${PWD}:/var/app/\" mcr.microsoft.com/playwright:focal /bin/bash -c 'cd /var/app; npx playwright install; npx playwright test'",
  },

As you can see we set an environment variable with the current environment (test, ci, production, etc.). Based on this environment we disable for example animations, api endpoints, etc. The script build:test creates a build with disabled animations. The test command is used for local tests with a docker image, while test:ci is used in our github actions.

3. Page / component readiness

With React's virtual DOM we experiences some issues when it comes to knowing when a page is fully rendered and ready. We came up with the following solution:

A: Add a rendered class to the body when a screen is fully mounted

How you do this depends heavily on your setup / app structure. In our case we tested single screens/pages instead of single components.

// useHasRenderedClass.ts
export const useHasRenderedClass = (className = "rendered") => {
  React.useEffect(() => {
    if (canUseAnimation) {
     // canUseAnimation is a bool flag, if REACT_APP_ENV=test it is false, in production it is true (see point 1 above)
      return;
    }

    document.body.classList.add(className);

    return function cleanUp() {
      document.body.classList.remove(className);
    };
  }, [className]);
};

// in your compoent

const SomeScreenYouTest: React.FC = () => {
  useHasRenderedClass.ts()
  return <h1>My awesome screen</h1>
}

B: Await page readiness with a custom function

With the given setup above we can now use custom playwright waitForFunction to check if a page is ready for screenshotting:

export const pageLoaded = () =>
  (document as any).fonts.check("12px eb-garamond") && document.body.classList.contains("rendered");

// in your tests
await Promise.all([
  page.click(`[data-test-id="nav-element-some-page"]`), // or page.goto("some-url"),
  page.waitForFunction(pageLoaded)
])

In the example above we also checked for the existence of a font. We are using Adobe Fonts which introduce an external dependency with a network load. Mocking this made no sense, since without the correct font our layout shifted so much that the whole point of running a visual regression suite was moot. The font check above provided a nice workaround.

4. Slightly blur screenshots before comparing them

Our app is very text heavy. We blur our snapshots by 2 pixel before image comparison. This helps a lot with regards to text antialiasing issues.

This is something that @playwright/test currently does not support. Therefore we ported jest-image-snapshots to playwright: https://github.com/florianbepunkt/playwright-image-snapshot

If someone from the playwright team reads this: It would help tremendously if you could incorporate a blurring option into the image comparison part.

Our snapshot settings look like this. The comparison threshold is quite high. This could be finetuned for individual tests

import { ImageSnapshotOptions } from "playwright-image-snapshot";

export const BASE_URL = "http://localhost:3000";
export const SNAPSHOT_SETTINGS: ImageSnapshotOptions = {
  blur: 2,
  comparisonAlgorithm: "pixelmatch",
  failureThreshold: 0.2,
  failureThresholdType: "percent",
};

5. Retries

With all the given above we were able to reduce test flakiness by a lot. Still we experienced some issues where we had to retry a test. So we configured Playwright to retry each test up to 3 times.

@lo1tuma
Copy link

lo1tuma commented Jul 23, 2021

@florianbepunkt thanks for sharing your insights. In our project we already do most of the points you mentioned except (1) and (3) which ware are handling via waitFor(timeout) at the moment. So in our case the test flakiness comes really only from antialiasing of fonts and scaled video contents (the video is always paused when making screenshots) which displays unfortunately a lot of circles (it is the ffmpeg test video).
A workaround is to increase the threshold and retries, but that makes the tests less useful, as we might not catch actual errors when we increase the threshold.

I’ve also found this list of chromium flags and tried some of them in order to make the rendering more deterministic but with little success. The list of chromium args I’m currently using is this one:

                    '--disable-gpu',
                    '--no-sandbox',
                    '--disable-infobars',
                    '--hide-scrollbars',
                    '--disable-setuid-sandbox',
                    '--disable-dev-shm-usage',
                    '--disable-skia-runtime-opt',
                    '--font-render-hinting=none',
                    '--run-all-compositor-stages-before-draw',
                    '--disable-new-content-rendering-timeout',
                    '--disable-threaded-animation',
                    '--disable-threaded-scrolling',
                    '--disable-checker-imaging',
                    '--disable-image-animation-resync',
                    '--disable-features=PaintHolding',
                    '--disable-partial-raster',
                    '--in-process-gpu',
                    '--use-gl=swiftshader',
                    '--force-color-profile=srgb',
                    '--force-device-scale-factor=1',
                    '--single-process',
                    '--disable-background-timer-throttling',
                    '--disable-backgrounding-occluded-windows',
                    '--disable-hang-monitor',
                    '--disable-ipc-flooding-protection',
                    '--disable-renderer-backgrounding',
                    '--disable-background-networking',
                    '--disable-breakpad',
                    '--disable-component-update',
                    '--disable-domain-reliability',
                    '--disable-sync'

An interesting option is --deterministic-mode where you can control when frame rendering should happen via the devtools protocol. But that needs to be supported by playwright first, I guess.


Another thing that I’ve noticed, that every time I update the snapshots, almost all images are changing, even though the tests were not failing before (due to the anti-aliasing detection in pixelmatch). This is quite annoying as it makes it very hard to review which changes are intentional and which not. So I was wondering if it would be possible to apply some filters (e.g. anti-aliasing, blurring etc) even before we save the snapshot or compare it with them. I’m not sure if there are any good libraries out there, for doing things like that. But I think this would be the superior solution, since it would also reduce the noise when updating the snapshots.
Since playwright gives us the raw buffer of a screenshot before we pass it to toMatchSnapshot I think we could try out this approach without any change needed in playwright directly. Or am I missing something?

thewilkybarkid added a commit to PREreview/prereview that referenced this issue Aug 2, 2021
The integration tests seem to have nondeterministic failures regarding antialiasing; this allows
the test runner to retry 3 times to reduce the number of failed builds.

Refs #388, microsoft/playwright#7548
thewilkybarkid added a commit to PREreview/prereview that referenced this issue Aug 3, 2021
I've spent quite a while trying various options to reduce the number of failures due to
antialiasing differences, without success. This takes a bit more of an extreme option to blur the
screenshot slightly to try and normalise it. The goal of the test isn't pixel-precision, so this
should be ok.

Ideally, this will be a feature in Playwright (so the reference image doesn't have to be blurred
too).

Refs #388, microsoft/playwright#7548
@thewilkybarkid
Copy link

Thanks for the pointers @florianbepunkt. I ran into a lot of trouble when using the provided Docker image. I tried various options without luck, so I've ended up blurring the screenshots manually: PREreview/prereview@d36ae9d.

Hopefully, this could become a feature in Playwright to avoid having to blur the reference image (currently it would have to be regenerated to changing the blur level).

@andreyfel
Copy link

Apologies for the off-topic.
@lo1tuma I was wondering why are you migrating from backstop.js? I'm exploring different visual regression testing approaches and backstop seemed pretty cool. I'm considering Playwright as well so, I'm curious to know why are you migrating from one to another?

@lo1tuma
Copy link

lo1tuma commented Aug 4, 2021

@andreyfel There are two reasons:

  1. backstop is quite good when you are happy with its current feature set. I have the impression that it is in a maintenance mode, so bugs are still fixed and dependencies get updated, but new features are not actively developed or at least the progress is very slow.
  2. I love the flexibility of playwright-test and folio, e.g. that you can run the same tests against multiple environments (which differ in more than just the browser vendor). With backstop we had to build a lot of boilerplate code around of creating dynamic scenarios and not everything was configurable per scenario (e.g.fileNameTemplate IIRC).

@andreyfel
Copy link

@lo1tuma Thanks for your reply!
How do you deal with the features missing in playwright compared to backstop? Like the visual comparison of screenshots?

@aslushnikov
Copy link
Collaborator

Folks, we're exploring how we can improve Visual Regression Testing with Playwright Test. The umbrella bug is #8161, and I'll close this in favor of that one. if you have anything to share regarding VRT, please do so there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants