Our progress in DevOpsIng it up

When I first started at my current workplace just over six months ago, I had no concept of what a QA did, other than having watched an anime in which the protagonist’s friend gets a job as a video game tester. I didn’t know what systems or processes are common in the development world, let alone how they were important for testing.

From what I know of the company before I joined, until recently all testing was done by developers and it wasn’t a big focus – as you might expect from a start up. It seems to have been a get-the-code-running-and-through-the-door kind of deal, which is fair enough. However, that leaves us now in a place where there are relatively few automated tests, and one of two QAs so busy manually testing everything that they don’t have time to work out how to automate any of it.

So what’s a good starting point for a would-be automator? The actual starting point, surely. I think a good place to look forwards from is the present, and maybe the recent past. I’m going to talk about our release process, and what testing gets done at which stages.

How we released until recently (ie the triple dip hellscape)

We’ve recently adopted a new release process. Obviously this begins with the developers creating a product, fix or update. I’ve done some bits and pieces of this type of work, and I know how my team’s product works behind the scenes, but I’ll skip over this for today’s post.

Until a couple months ago, we had a three-environment testing cycle, aka the triple dip hellscape. It would take a full week for even the simplest of changes to get through this process, and since we were only deploying once per week if something missed a deadline, it would take two weeks.

First off, we would test new changes in the Testing Environment using a testing tool made in-house – each time a new ticket was raised, the developer would create a bucket within the environment, effectively cutting off their changes from everything else. Each change for that week’s release would have its own bucket, except in some cases where the ‘default’ could be used. The new or updated product could be tested there, without affecting or being affected by any other new work done by other developers. Kind of like a fish in an actual bucket – there’s more water outside somewhere, and there are more fish. There’s bigger fish, there’s plankton to eat and places to swim to, but for now we’re literally just testing… does it look and act like a fish?


After passing on Testing the ticket then went through the process of being released to Staging, along with all the other changes going into this release. This would be done once per week, usually on a Thursday or Friday. On staging, all the new work for that week was tested manually a second time to see if the changes would interfere with each other. Staging is supposed to be a pretend-production, ie a somewhat realistic version of reality, using similar feeds and storage. Like moving our fish from the bucket into a small pond. Maybe a fish will eat another fish, or compete for food until one of them dies. Or perhaps the fish hates the temperature variable, or can’t handle dead leaves falling into the pond. For clarity, these are metaphors for bugs causing each other to break, or being broken by other factors such as our data feed, taking assets from the staging database, or text from our offer copy service. There are a lot more complex things that can break on staging.


This is also where UAT comes in. UAT is User Acceptability Testing, and is done by our client services team to make sure that any change matches what the client desires. At this point, the product has had three full rounds of testing, plus retesting bugs as they come up at each stage.


On Tuesdays or Wednesdays, again depending on how quickly testing went, we would deploy to production and then manually test everything there again. This would be a smoke test, ie just checking that changes were applied and nothing was broken. We’d also do a quick regression test by clicking around different clients’ products to see if anything unrelated to new changes had accidentally been broken in the process.


This type of cycle is incredibly slow and restrictive. If we wanted to add something into a release at the last minute, it would need to be pushed through two full rounds of testing before it could go in, making it nearly impossible to react to new needs – and if a bug came up in production, unless it was something major enough to necessitate an extra release, it would wait until the following week for a fix. For QA, the main issue was the need to test everything three times in a week. It was difficult to manage and keep track ticket status. Although the above process might seem clear cut, it really wasn’t. We would be testing things in each stage at the same time – this week’s release would be in Staging at the same time as next week’s release would be in Testing, and QAs needed to switch back and forth. It got worse when there were emergency production deployments for an urgent fix or client request – we’d be testing tickets on Testing, Staging and Production at the same time.

Handover time was also bad, even when handing over to yourself. By the time a ticket got to the next environment, you’d already have tested five or six other tickets and forgotten the details of the first. We had multiple test spreadsheets for the same ticket in different environments, status spreadsheets that said what was in each environment, as well as notes and comments everywhere in slack describing which tickets had to drop out of a release due to bugs brought up in staging not being fixed in time etc. It was a nightmare, and I will confess here and now that I had absolutely no idea whatsoever what I was doing. I relied on developers and my team manager to know where any given ticket was, and I’d have to ask for test links from the dev who had been working on a ticket because I would get the combination of test bucket and environment wrong.

The devs also had to manually release their fixes to each environment, keep track of their buckets and fix bugs in three different places. It was cumbersome, slow and unreactive.

And then there was DevOps

I don’t know how long this was in the planning, but I’d guess many months if not more than a year. DevOps is a really hot concept right now, short for Development Operations. It’s basically releasing small and fast, as often as you can. Obviously there’s more to it than that, but that’s the main crux of it. Break things down into smaller jobs, release them fast and fix them fast.

We’re definitely not right up there on the forefront of this, as we don’t release every day (unless you count cross-team releases, in which case we do). My team releases three times a week to production on average, and tens of times a week to staging. Our products are broken down into many components, which are used across multiple products, so when we make changes to those components it: 1. makes it really fast to make changes; and 2. makes it really slow to test changes. We can’t really break down our client work into smaller tickets to release more often, as the development work across the whole shebang takes only half a day. There’s no point breaking that down, when it’s already so small. It then takes a day to manually test the entire product, and another half day to fix any bugs that come up. This is the main blocker to us achieving even faster releases. Having said that, we got four releases out for our team last week, and we aim to release daily.

We work now through using slices. I drew a picture for the office when we first moved to this process, of how I imagined this working:


First thing’s first, we got rid of the testing environment. This alone shaves 24hrs off our time, reduces complexity of release and takes a lot of annoyance off the developers. There are no bugs that come up in testing that we can’t also report in staging.

Another big change is that it’s now the QA’s job to deploy code to the staging and production environments. We do this by slices – since we don’t want to add complexity by picking and choosing what to add in and what to leave out, we just take EVERYTHING that’s been Peer Reviewed and merged into the develop branch, into staging together. It’s done this way since it was easy to automate it and add a simple button in the CircleCI (Continuous Integration) workflow. The once laborious process is now a simple approve step. This means that everything that’s ready for testing goes into that slice, and the slice moves across the kanban board together. You can’t split up a slice, but you can add new work to it so long as it’s everything new that’s ready for testing. (Kanban is an agile methodology for development, the board shows what status a ticket is in and tickets move from left to right towards Done).

Once everything in the slice has passed this round of testing then it gets released straight to production. Whereas before, we would have to delay releases week after week waiting on UAT, only for them to find nothing major that couldn’t wait a week to be fixed, now we UAT in production. We can do this because we know that if an issue comes up, it can be fixed on production within 24hrs. A lot of the work we do is also behind-the-scenes so no one sees it until the client services team has connected it up on their side.

There is one glaring problem with this system, especially if you have tickets that cannot be broken down into small parts. That is: if I am testing a ticket on staging and it gets a bug, I add a bug ticket. This then goes into Ready For Testing, and I get to deploy staging again in order to bring the new change into the slice. However, this takes everything that’s merged into staging, including non-related work. That means my slice just gets bigger and bigger every time I bring bugfixes in, and I can’t bring those fixes in without also bringing potentially breaking work into the slice – so I have to test those new changes too, and if there’s bugs on them then it happens again.

Unfortunately, we haven’t really solved this issue. The answer we came up with is: Gotta Go Fast. Bug fixes are prioritised by developers, so they get done as quickly as possible, tested even faster and the whole thing is released to Prod, hopefully before any new major work gets merged. We also have the good fortune of having a whole lot of work that isn’t live / available to clients or customers. That means that there are tickets that can stay in the Ready For Testing column even though they’re technically in production and haven’t been tested yet. They’re deemed to be extremely low-risk, and over time this has stopped making me anxious to press the Deploy button.

What do we need to change further?

Our aim is to release daily or multiple times daily, which is not doable when you have one QA who has to manually test everything, as well as check for regressions at each stage.

We have one set of automated Visual Regression tests (comparing screenshots of current product against screenshots of the product before a change) which looks at the components our product is made up of. These are more reliable than they used to be, but still bring up a few false-negatives. We run these weekly, since they take 15-20 minutes to run each time. We have nothing for whole products or client-specific products, and nothing that tests cross-browser. There are a lot of tests we just don’t do, since I don’t have time to manually test them for each daily release but we have no automated tests for them either. For example, analytics testing or usability regression tests. You can’t always tell from a screenshot that there haven’t been any regressions – maybe a click-through is broken now, or a button has the wrong colour on hover. This can only currently be tested manually.

The main problem we have is time, then. QA doesn’t have time to manually test everything that needs testing, but also doesn’t have time to plan and automate those things. Where do we get that time from?

There has been talk every single time we run a retro or MakeOurLivesEasier meeting of more dev testing taking place. Currently, there seems to be a fence over which work is thrown to QA. It doesn’t get any real looking until it gets to the other side of that fence, often with frustratingly obvious bugs that would have been faster and easier to fix before deployment to staging. In order for us to go fast and improve overall velocity, the developers need to actually go slow.

Another point to note is that the line between what jobs are for a developer, and which are for a QA. Developers have better language and writing knowledge, but QA has superior knowledge on testing best practices (ie the ‘journey’ being tested), so pairing on that work would be valuable for both teams.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.