VRT: A series of mistakes

So if you’ve been following my blog for a while, you probably know a bit about my first crack at test automation – the Visual Regression Tests. These take screenshots of our products and then run pixel comparisons after a change is deployed to check if code changes have resulted in unwanted visual changes.

It was a project I got going before Christmas time last year, and I learnt a lot in the process of implementing them – not just about how to write automated tests, but around the context and purpose of those tests, and about planning. It’s been a massive learning experience, despite the fact that the end result has not been super useful yet.

I will readily admit this: the VRT project was a succession of mistakes. Planning mistakes, coding mistakes, tool mistakes, stupid mistakes and mistakes I couldn’t have forseen. I wouldn’t say that the project was a mistake though. It wasn’t. It was neither a mistake nor a failure because it gave me the most amazing type of opportunity imaginable.

The opportunity to learn.

(cringe ahahaha)

In the beginning…

In the beginning, there was a test manager, and he said “let there be VRTs” and a developer wrote them and lo there were VRTs, and they rejoiced. And then they failed a lot and everyone ignored them and then they were deleted. The end.

This was the state of play when I entered the scene. I thought to myself, wow those were terribly planned and exocuted, I could do way better than that!

I was wrong.

Maybe I would have been right had I stopped to consider why they had failed in more depth, or maybe I would have gone on to make the same mistakes anyway. Regardless, I decided one day that I could do better. I’d been learning a bit of Javascript on the side for six months, done the basic tutorials on six different sites and not truly grasped any of the defining concepts. I was ready.

Mistake #1 Arrogance

Regardless of the front I sometimes put up, I am generally not a confident or conceited person. Our original VRTs were pretty awful though, so I didn’t need to be much of anything to hope that I could put something better in place. They worked by taking screenshots of individual components from our ads and checking for changes, but we have a billion gazillion components that go together in different shapes and sizes, so this regression suite took fifteen minutes to run and at the end of it half of the tests would fail because of timeouts and connection errors. All I needed to do was make a version that didn’t have timeouts and connection error failures, and it would already be twenty million times more useful… hahaha… aah…

They didn’t test any of the states of the components either, like hover state or the change as score counters are ticked over, dropdowns being clicked on and the like. Well, mine were going to change that all right! Gotta cover those key user interactions, amirite?

I also decided to whittle the number of tests all the way down. They were destroyed so I couldn’t tell you how many screenshots were taken every time the suite ran, but it was hundreds. Imagine trying to look through all the failures in that, knowing that they were mostly nothing to do with regressions in the product. It just wasn’t worth doing every time they started to fail. My new suite was going to run on a representative set of whole products, going for effective coverage over actual coverage. How much coverage could I humanly cover maintenance and failures on, while leaving it open for expansion at a later date? Not a lot, with my normal workload carrying on.

I started with just one ad as a proof of concept, and then copied out the test over and over for a set of twenty that gave approximate coverage of the different components (we call them molecules, and they’re made of smaller components called atoms. We hate this system now because of how restrictive it is – always asking “is this a molecule or an atom?”, another ten minutes wasted…). I kept with the same tool, which was Gemini. It uses Selenium standalone, and I had no idea how any of it worked but I followed the README as best I could and got something working locally.

Huzzah! Hooray! I’m the best! I made some examples to run on my personal playground site, a horrific yellow red and white Saitama-themed monstrosity I originally created to practice Flexbox and then CSS Grids, to show off for this blog a million years ago. (Who else is excited about subgrids? 😀 I don’t even work with CSS, but I love to play about.)

Screenshot of a webpage I made while learning html and css. The header reads "Ultimate Saitama Fan Club". It is garishly coloured and has images from the animated cartoon series One Punch Man, and text from Monty Python reading "Your mother was a hamster and your father smelled like elderberries". — Throwback to this beauty

Then I tried interacting with the ad to get screenshots at different states. Oof, turns out Gemini doesn’t support iFrames, and we can’t view the ads without ’em! Back to the drawing board…

Mistake #2 Duplication of code

Undeterred, I decided to look up some other cool tool that would mean I didn’t have to write my own screenshotter-come-pixel-matcher from scratch. Around this time I saw a talk on youtube about the many uses of Puppeteer and decided this was just the tool for the job.

It was surprisingly easy to set up and get going, especially since I already had the information needed from having written the tests once already. I didn’t need to worry about browser drivers because it’s all part and parcel, so I had the tests remade in Puppeteer and doing interactions within a few days. It would probably take me about half an hour now, but it was my first go at this sort of thing and I’m still proud of how quickly I managed to pick things up.

Excitedly, I began writing up more tests because it was so easy. I copy-pasted the code for each ad I wanted to test, added cases for interactions and ended up with sixty almost identical test files, and the prospect of duplicating them for each brand. That would have been twenty times the amount, plus more for the unique configurations some of our clients use… I was staring down the barrel of a few hundred files even for my nice basic level of coverage.

Surely there was a way to make one file take in a bunch of options, create tests from them and then run those tests? This is pretty basic stuff, but I had no idea. I kept trying on my own for about a month’s worth of spare time, writing and rewriting arrays and then json files containing all the unique information needed for each test. No matter how I tried, I just didn’t have the knowledge to take that information and make a function that built out the tests I needed. I’d never written my own function before without following along to a tutorial.

In the end my team manager put aside an hour to pair with me on it and we managed to get a working solution in an hour. I suppose the mistake could just as easily be described as “not asking for help early enough”.

Mistake #3 Being a right tool about tools

I mean, you could argue that this was a mistake I made right from the start but I like to think that I couldn’t have expected to make the right tool choice right from the start. After the first time though, I should have learnt to think more about what I wanted to achieve, maybe make a list of requirements at the start and then compare with what a selection of tools could provide so that I could make an informed choice about which requirements could be realistically met at the same time. But naaah I decided not to learn this when I should have, and so here it is.

We decided that the tests were great, so we needed to make them work with MORE BROWSERS. Most of our users are on mobile so why are we only testing on mac Chrome? Um, ah…

Puppeteer runs on Chromium. That’s it. That’s what it’s for.

Baaaaaaaaack to the drawing board!

Mistake #4 Read the notice at the top of the repo, Bruce

I went back to Gemini, rewrote my previous tests into one using the same method as we had done for Puppeteer. Then I installed a package that managed the connection to Browserstack that I had no idea how to set up without spending a few months learning Webdriver (I should have done this instead, by the way ahah), and away we went. I got a few successful test runs with Windows Chrome and Edge, gave myself a pat on the back.

But no matter what I did, I simply could not get Firefox to work. It was days of struggle, pulling out my hair, before I eventually noticed the notice at the top of the Gemini repo. DECRECATED. Argh! Something to do with Selenium standalone and newer versions of Firefox, “use Hermione instead”. Cool cool cool cool cool… It’s not like I’ve been through this enough times already, fine. It’s all good. I’m fine. I’m good ahahaha.

Cooooooooooool…

Differencify report screenshot on top of which I have drawn a cartoon version of Earth-ish showing the difference global climate change and rising sea levels would make with image diffing. — I had fun drawing this so I’m including it. Also TIL that I don’t know off the top of my head what the world map looks like…

Mistake #5 Resource assumptions

So I spent an afternoon swapping over to Hermione, which is like an up to date version of Gemini using WebdriverIO. Lovely. I got it all working locally just fine, but as soon as I tried connecting up Browserstack I ran into problems. The entire project is open-source (<3) but the package that did all the browserstack integration wasn’t available for Hermione, or documentation was missing to make it clear. Googling the issues just gives Harry Potter quotes and memes, predictably.

I was still trying to work out how to do all of the things when I made an even bigger realisation.

We don’t have the mobile automate package as part of our BrowserStack account. So all of the going back and forth to make the original tests potentially work with mobile browsers was done under the misconception that integrating with our currently used cross browser platform would work. Which was false for the most important devices we wanted coverage on.

I did petition for the better plan for a while by leaving very very subtle hints in company demos such as “WOW IMAGINE IF WE HAD THE MOBILE AUTOMATE PLAN WE COULD DO THIS ON MOBILE AHAHAHA”, but no ball. We actually ended up downgrading the plan when the company phoenix’d…

Mistake #6 Feature creep

I still use the puppeteer tests locally as part of my regression testing activities. I can’t say the same for any of the other stuff I did afterwards. When the idea first came into my head to take this project on, it was just going to help me save time and have a bit more confidence in quick releases. Over time, as I spoke about it to others, it became something more. It was going to run in CI, developers were going to run it and it would never ever be flaky and it was going to run on all the browsers and devices, and we would all dance for joy or something. I don’t know.

That does sound like an excellent plan, and I have written other tests by now that are part of our CI workflows and other great things. It’s just that this was my first foray into automation, and I let myself be influenced by the excitement of those around me until it grew into something that wasn’t achievable by me. I just didn’t have the skillset or experience to make a test suite that good, at that time.

The lessons

know what you actually want to do and why
be realistic about what you can achieve with your skills and resources (but play with more if you can)
PSYCH! MAKE LOADS OF MISTAKES THEY’RE THE BEST LOOK AT WHAT I LEARNT?!

This is making me think about one of my old driving instructors (I went through four, don’t judge me >.<‘), who used to say something along the lines of “There are no bad lessons. You either make no mistakes and practice good habits, or you make mistakes and learn from them.” He was a wise man, and I am very lucky to have been under his tutelage because he spouted amazing life advice like there was no tomorrow.

(He also taught me that the biggest thing you notice after moving away from home is the price of toilet roll. Such endless wisdom.)

If I could go back, I’d do it all again. From inside the project trying to make it all work, it was horrible and stressful and I felt awful about my abilities and knowledge a lot of the time. From the outside though, wow did I learn a lot:

Setting up projects and installing packages
The importance of planning test automation
The limitations of a bunch of tools!
Writing my own functions in JavaScript
How to integrate tests into a CI pipeline
Reading docs for usage
Better and more confident use of git commands

So yeah, as I said at the beginning: it was a series of mistakes, but not a mistake to make those mistakes.

VRT: A series of mistakes