retest

What is the future of Manual Testing?

wcarmich — Thu, 02 Apr 2020 00:13:32 +0000

There have been a dynamic shift in Software Engineering and software testing practices from manual to automated in most areas. With AI, machine learning, and the rise of automation tools, some think manual testing will be replaced by automated testing. This article will debunk this myth.

MYTH: Automation can completely remove the need for manual expertise

Quite a few technicians believe this shift to be universal, however, in reality, this is not the case. Automation can never substitute manual methods to the fullest extent. There are certain critical areas that require manual supervision.

Manual Testing should not be replaced when…

It is the requirement for ideal testing. For example, here are several areas where automation cannot substitute the preference is manual testing.

Small projects: the overhead to implement an automated testing system is comparatively higher than traditional methods of manual testing. So to incur high installation costs for a small project is both a waste of money and efficient.
User experience: A human can better understand another human. Expert manual testers imitate user behavior and then analyze the software according to their needs and demands. This results in a higher UX. Almost impossible in automated testing without the help of a highly trained AI.
Going into minute details: Automation works on fixed predetermined procedures of testing and is oftentimes not customizable. Though the results may be quicker minute defects can get ignored. So manual testing is mandatory to remove any bugs that may be neglected during automation.
The high maintenance cost of automation: Automation may fall out of the budget of some of the small organization. They must prefer manual testing to escape high expenses or maintenance cost of automated methods.

Manual Testing should be replaced when…

It falls short of the requirements for ideal testing. Here are a few points where it is preferable to substitute with automated techniques.

For repetitive steps: Automation is excellent for repetitive steps within the testing process and does not necessarily require manual expertise or supervision. Therefore, automation should be adopted to save manpower, time and energy.
Saving Time: There is not a need to reframe the testing parameters every time checks are done for a part or whole of a software. Mechanisms are reusable, hence they save time and reduce the chances of any delay in software releases.
Reducing human errors: When properly trained, artificial intelligence is more powerful than human intelligence, so it prevents any human errors in the testing of software, thereby making it a more reliable method.
Works on complicated coding: Automated methods are designed to work on all kinds of coding and programs, including complicated ones. Manual methods may prove to be less useful when coding is complicated or new to the testers.

In conclusion, both methods have their own merits and demerits. It is of note neither can substitute the other to the fullest extent. It is true that many of the leading testing agencies are going the path of automated testing. As aforementioned, manual testing will never be replaced, however, the proper balance between manual and automated testing should be practiced.

Web Application Security Testing: What? How? Why?

wcarmich — Wed, 01 Apr 2020 22:34:05 +0000

The what…

According to techopedia web application security testing is the process of measuring, analyzing, and summarizing the security level and/or posture of a Web application.

Web developers and security administrators use this form of security verification to test and gauge the security strength of a Web application using manual and automated security testing techniques. What is the main objective? Identify any vulnerabilities or threats that can jeopardize the security or integrity of the web app.

Typically, this security test is performed after the web app is developed. The web app undergoes a meticulous testing process that includes a series of fabricated malicious attacks to see how well the Web application performs/responds.

The overall security testing process is followed by a report that includes: (1) identified vulnerabilities, (2) possible threats, and (3) recommendations for overcoming the security shortfalls.

The how…

Below are the three different approaches to web application security testing.

Dynamic Application Security Testing (DAST)
Static Application Security Testing (SAST)
Application Penetration Testing

Dynamic Application Security Testing (DAST)

This approach looks for vulnerabilities in a web app that an attacker could try to exploit. DAST works to find which vulnerabilities an attacker could target and how they could break into the system from the outside. Since the app’s original source code is not needed, testing with DAST can be done quickly and frequently.

Static Application Security Testing (SAST)

This is more of an inside-out approach. SAST looks for vulnerabilities within the app’s source code. Since it requires this form of access, i.e. access to the web app’s original source code, SAST offers a realtime snapshot of the web app’s security.

Application Penetration Testing

Application penetration testing brings in the human element. A security professional will try to duplicate how an attacker would break into the web app by using their security know-how and a variety of penetration tests to exploit any potential flaws. This can be outsourced to a web application penetration service provider if in-house resources are limited.

The why…

If your application is not tested and validated against security threats right from the initial stages of development, it may fail to protect valuable corporate data and resources from malicious attacks.

To build a highly secure web application, it is vital to work upon a security development lifecycle. Security is a key element that should be considered throughout the application development lifecycle, especially when it is designed to deal with critical business data and resources. Web application security testing ensures that the information system is capable of protecting the data and maintaining its functionality.

The process encompasses analyzing the application for its technical flaws, weaknesses, and vulnerabilities, right from the design and development phase. The primary objective is to identify the potential risks and subsequently, fix them before the final deployment.

Working from Home: Tips to Get You Through the COVID-19 Crisis

wcarmich — Mon, 23 Mar 2020 16:33:05 +0000

Having the flexibility to work from home on a regular basis, or as and when needed – such as in the midst of a Coronavirus pandemic, certainly has its advantages.

But for some professionals, it’s a completely new concept that has never been tried and tested – how can you best collaborate with colleagues and clients? How do you stay visible and keep up productivity?

If your company is planning on implementing home working contingencies, here are a few remote working tips to help you get the most out of your new working environment.

Access to Hardware, Software, and Data

Hardware and Internet Requirements

Before you start working at home, you need the right hardware in place. Do you have a company laptop, a company desktop or a home computer (i.e. personal computer) that is appropriate for home use? Connect with your IT department and make sure that you have the right hardware to get started.

Next, think about your internet service. Working remotely requires reliable internet. A wired connection is preferential, but a decent wireless connection can also work. Although common in the USA, keep in mind that not all plans provide unlimited data, so that could be a challenge with the large datasets your team may often work with.

With a home internet service, consider bandwidth – both up and down speeds (Autodesk recommends symmetrical 25 Mbps), and find out if you have data caps and what the overage rates are.

Stay Visible And Over Communicate

Next up on our list of working from home tips – visibility and communication.

One of my favorite tools that I frequently use when working remotely is Slack – an instant communication tool that helps keep teams connected. It’s the first thing that I open and check when I power up my laptop each morning.

Slack: A Communication Tool for Teams

Often used by both on-site and off-site employees, tools like Slack can help to quickly communicate company-wide messages via dedicated announcement channels and provide a place for teams to connect and keep projects moving with messaging streams that eliminate the need for endless email threads.

The instant messaging functionality means that you can get fast answers to your questions, either via private one-to-one or group messages.

And with a variety of status options, it’s easy to stay visible and share your availability with colleagues – whether you’re at your desk, on a meeting, or even out sick or on vacation.

When working remotely, it’s important to stay visible and over-communicate. Those SofDev conversations that happen when you see a colleague in the office or coming out of a meeting aren’t as common when you work from home, so it’s a good idea to pro-actively use tools such as instant messaging to keep collaboration channels open and to feel better connected with your teams.

Slack offers a lot of great advice for getting the most out of Slack as a remote worker – check out their tips: Your Guide to Working Remotely in Slack.

Use Tools to keep Collaborating with Colleagues and Clients

With team members and clients based afar, there will be times when you need to connect and collaborate virtually, and to do this, you’ll need to have the right tools in place.

Online Video Meetings

Video meetings offer an affordable way for companies of all sizes to meet online and are particularly popular when travel is not possible.

With the aid of screen sharing, you can share whatever is on your screen – so it’s still possible to walk colleagues and clients through your design concepts and deliver a presentation.

It’s also possible to record meetings and share that recording with non-attendees. Meeting notes can even be transcribed automatically by some meeting tools, helping everyone keep track of key decisions and action items.

If you plan on using a webcam, then be mindful of how your background would appear – you may be working from home, but you still want to look professional. Alternatively, you could look into Microsoft Teams whose meeting tool automatically blurs your background for you.

Project Management Solutions

Another collaboration tool worth exploring is project management software. Particularly useful when working as part of a distributed team on small or complex projects – project management tools provide excellent visibility into open and completed tasks and help teams hit their deadlines and reach their goals.

Cloud-Based Documents

Cloud-based collaboration tools such as Google Docs, are ideal for when you want others to view, edit and comment on your documents simultaneously.

Simple to create your document and share it with your teammates. You can get a shareable link, or email your document directly from the Google Doc itself. Make sure that you enable editing rights to allow others to make edits. Click: Share > Advanced > Change > On – Anyone with the link > Access: Anyone ‘Can edit’:

Google Docs: How to Enable Editing Rights

Still Dress for Work

Although we’re in the comfort of our own homes and it may be tempting to dress in sweats or stay in our PJs, studies have shown that what we wear influences how we act.

Getting dressed for work immediately sets you’re in ‘work-mode’, inspires abstract thinking and gives you a feeling of competency which can help boost your working day.

Not to mention that if you’re invited to an unplanned last-minute video meeting, you’ll look and feel more professional if you’re dressed for work, rather than home.

Health is Wealth

Most of us have probably heard that sitting at a desk for extended periods is not great for you.

A desk that offers both seated and standing options are ideal for both home and office use. But if you don’t have one, you’ll need to build in regular breaks and raise your screen to prevent discomfort.

If you are using a laptop and have a spare keyboard or can bring your keyboard home from the office, raise your laptop and connect your keyboard so that your eyes are in line with the top of the screen. This is far better for your neck.

Another great thing to do when working from home is to deliberately use a small glass of water so that you have to regularly get up to re-fill it. Your back and legs appreciate the short walk to the kitchen, and your eyes are also thankful for a break from your screen.

One final piece of advice is to eat lunch away from your desk and even slip in a quick walk if you can. This gives your brain the chance to be distracted, which can lead to greater creativity, it helps you rack up your steps for the day, and it also keeps your work area crumb-free, with fewer germs!

Here are a few additional tips that apply to both office and remote workers from Lifehack: How You Can Stay Healthy Even Though You Sit At A Desk All Day

Create the right Working Environment

Lastly, if you’re working from home, try to find the best place to work that will be comfortable, practical, yet quiet enough for you to be able to concentrate.

If you miss the buzz of the office and are finding it strange working in a quieter environment, don’t be tempted to turn on the TV – it will more than likely distract you.

Instead, choose instrumental music or songs that are familiar to you – YouTube offers an enormous number of classical tracks that can provide calming background music that may even stimulate your creativity and productivity.

Handling System Properties in JUnit 5

Daniel Kraus — Mon, 13 Jan 2020 10:09:58 +0000

Your basic Allowlist-testing Tools

Jeremias Roßler — Wed, 11 Sep 2019 09:19:45 +0000

If you are new to the concept of allowlist-testing and just want to try it without too many strings attached, what are your basic tool options?

At retest, we are big fans of allowlist-testing. We honestly think allowlist-testing is the future of testing for testing any interface. For those unfamiliar with the term: Denylist-testing is where typical assertion-based testing denies specified changes and ignores all else. Allowlist-testing guards against all changes, except for the changes you specified as irrelevant, i.e. allowlisted. Other names for that technique are whitelist/blacklist testing (now dismissed for being racist), difference testing, snapshot testing, Golden Master testing, approval testing or characterization testing.

We wanted to come up with an executable demo of a tool with Allowlist testing capabilities. It should allow you to focus on the concept and to play around with it to get some first impression, with the least amount of overhead possible. This demo tool can be used in workshops, conferences, tweeted or otherwise easily shared.

For that, there are 3 simple requirements:

The tool should be hassle-free and simple to install and play around with, to produce results with as little overhead and as little prerequisites in terms of knowledge and technology as possible.
The tool should demo the many advantages of allowlist-testing.
It should be vendor-agnostic, i.e. the tool should ideally be open source and to the least extent commercial.

This list of requirements is short and relatively straight forward, and there are many implementations for allowlist-testing. Yet in my opinion, candidates fulfilling those criteria are hard to come by.

Hassle-free and simple to try Allowlist-testing tools

The tool should be hassle-free and simple to try disqualifies quite a lot of interesting candidates. Most open-source frameworks are test frameworks that are used during development by developers. That means that you need to set up a project environment with some build mechanism for test execution.

Jest

Being used in over 1.000.000 public repos on GitHub, Jest is a very popular example of allowlist-testing—if not the most popular example. However, it only works with JavaScript in a Node, React or Angular project environment. And setting up yarn or node.js together in a project just to execute a few tests for a short demo is definitely neither simple nor hassle-free.

Approval Tests

The same applies to another great example of an allowlist-testing tool, Approval Tests. Approval Tests is available as a library in almost any platform (Java, C++, NodeJS, Python, etc.). But it mainly focuses on technical interfaces and works great with XML and other technical formats. The barriers to using it are definitely a lot lower than Jest. You only need a test that can be executed in any of the supported platforms, e.g. in Java. It is open-source and free to use, so Approval Tests fulfills both criteria.

TextTest

Another great tool is TextTest that runs on Python and can be combined with a number of other platforms — Java Swing, SWT, and Tkinter. To set it up, you need to install Python and some more, depending on the platform of your choice. However, it is very much geared towards domain-language-written tests. For a more visual test of e.g. a GUI (custom or Web), it needs some more tooling to drive the GUI. For Python and Java GUIs there is StoryText, which is specially designed to work with TextTest.

recheck-web

recheck-web is an open-source and comes with a Chrome extension that can be used to easily try it. The Chrome extension is simple to install and simple to remove. In order for the Chrome extension to run without any additional setup costs, it sends the data to retest.org. To guard your sensible test data, you need to create a free account before trying it. The detailed results are in a proprietary format, that you need an open-source CLI or a free GUI to open. The GUI comes in a self-contained ZIP file.

Demo the Advantages of Allowlist-testing

Although pixel comparison is a way of Golden Master testing, a tool must implement some mechanism to allowlist changes (i.e. not notifying the user when they occur), in order to also count as an allowlist-testing tool. This is important, so the tool can be used in test automation on a regular basis, without reporting (too many) false positives. Many tools fail at this, such as

In the long term, mere pixel-comparison is of limited value. A good tool should both allow to mass-accept a change, as allowlist-testing often creates redundancies, and should provide convenient ignore options.

Be Open source and not Commercial
The remainder of the tools are commercial, none offer an open-source solution to their testing tool:

Summary

So, what do you choose when you want to demonstrate allowlist-testing to a bunch of testers, that have diverse backgrounds (i.e. are used to work on different platforms)? Do you happen to know any other tool option we forgot?

Whitelist Testing vs. Blacklist Testing

Jeremias Roßler — Wed, 04 Sep 2019 18:37:05 +0000

Current approach to GUI test automation

From an IT security point of view, the current approach to GUI test automation is careless or even dangerous. And here is why…

A general principle in IT security is to forbid everything and only allow what is really needed. This reduces your attack surface and with it the number of problems you can encounter. For most situations (e.g. when configuring a firewall), this means to apply a whitelist: forbid everything and allow only individual, listed exceptions. And make sure to review and document them.

Compare this to the current state of the art of test automation of software GUIs. With tools like Selenium — the quasi-standard in web test automation — it is the other way around. These tools allow every change in the software under test (SUT) unless you manually create an explicit check. With regard to changes, this is a blacklisting approach. If you are familiar with software test automation, you know that this is for good reasons. It is because of both the brittleness of every such check and the maintenance effort it brings about. But apart from why it is that way, does it make sense? After all, false negatives (missing checks) will decay trust in your test automation.

To be defensive would mean to check everything and only allow individual and documented exceptions. Every other change to the software should be highlighted and reviewed. This is comparable to the “track changes” mode in Word or version control as used in source code. And it is the only way to not miss the dancing pony on the screen, that you didn’t create a check for. At the end of the day, this is what automated tests are for: to find regressions.

There are two important considerations when choosing between the two approaches:

How do you reach that middle ground in the most effective way?
What “side” is less risky to approach than if the perfect spot is missed?

IT security guidelines recommend to err on the side of caution. So in case, both approaches create an equal amount of effort, you should choose whitelisting. But, of course, you usually don’t have equal amounts of effort.

A real-life example

Imagine you have a software that features a table. In your GUI test, you should put a check for every column of every row. With seven columns and rows, this would mean 49 checks — just for the table. And if any of the displayed data ever changes, you have to copy & paste the changes manually to adjust the checks.

Starting with a whitelisting approach, the complete table is checked per default. You then only need to exclude volatile data or components (typically build-number or current date and time). And if the data ever changes, maintaining the test is way easier, because you usually (depending on the tool) have efficient ways to update the checks. Guess which of the two approaches is less effort…

…

Text-based vs pixel-based whitelist tests

There are already tools out there that let you create whitelist tests. Some are purely visual/pixel-based, such as PDiff, Applitools and the like. This approach comes with its benefits and drawbacks. It is universally applicable — no matter if you check a web site or a PDF document. But on the other hand, if the same change appears multiple times, it is hard to treat it with one go. Whitelisting of changes (i.e. excluding parts of the image) can be a problem.

Approval Test and TextTest are text-based tools that are much more robust. But PDFs, web sites, or software GUIs have to be converted to plain text or images for comparison. Ignoring changes is usually done via regular expressions.

What is test automation?

Jeremias Roßler — Mon, 26 Aug 2019 05:42:46 +0000

What is Test automation?

Test automation on the other hand is the automated execution of predefined tests. A test in that context is a sequence of predefined actions interspersed with evaluations, that James Bach calls checks. These checks are manually defined algorithmic decision rules that are evaluated on specific and predefined observation points of a software product.

And herein lies the problem. If, for instance, you define an automated test of a website, you might define a check that ascertains a specific text (e.g. the headline) is shown on that website. When executing that test, this is exactly what is checked—and only this. So if your website looks like shown in the picture, your test still passes, making you think everything is ok. A human, on the other hand, recognises with a single glimpse that something has gone awry.

What is testing?

Testing as a craft is a highly complex endeavour, an interactive cognitive process. Humans are able to evaluate hundreds of problem patterns, some of which can only be specified in purely subjective terms. Many others are complex, ambiguous, and volatile. Therefore, we can only automate very narrow spectra of testing, such as searching for technical bugs (i.e. crashes).

What is more important is that testing is not only about finding bugs. As the Testing Manifesto from Growing Agile summarises very illustratively and to the point, testing is about getting to understand the product and the problem(s) it tries to solve and finding areas where the product or the underlying process can be improved.

It is about preventing bugs, rather than finding bugs and building the best system by iteratively questioning each and every aspect and underlying assumption, rather than breaking the system. A good tester is a highly skilled professional, constantly communicating with customers, stakeholders and developers. So talking about automated testing is abstruse to the point of being comical.

Why do we do automate tests in the first place?

Because we have to, there is simply no other way. Because development adds up, testing doesn’t. Each iteration and release adds new features to the software (or so it should). And they need to be tested, manually.

But new features also usually cause changes in the software that can break existing functionality. So existing functionality has to be tested, too. Ideally, you even want existing functionality to be tested continuously, so you recognise fast if changes break existing functionality and need some rework. But even if you only test before releases, in a team with a fixed number of developers and testers, over time, the testers are bound to fall behind. This is why at some point, testing to be automated.

Considering all of its shortcomings, we are lucky that testing existing functionality isn’t really testing. As we said before, real testing is questioning each and every aspect and underlying assumption of the product. Existing functionality has already endured that sort of testing. Although it might be necessary to re-evaluate assumptions that were considered valid at the time of testing, this is typically not necessary before every release and certainly not continuously.

Testing existing functionality is not really testing. It is called regression testing, and although it sounds the same, regression testing is to testing like pet is to carpet—not at all related. The goal of regression testing is merely to recheck that existing functionality still works as it did at the time of the actual testing. So regression testing is about controlling the changes of the behaviour of the software. In that regard it has more to do with version control than with testing.

In fact, one could say that regression testing is the missing link between controlling changes of the static properties of the software (configuration and code) and controlling changes of the dynamic properties of the software (the look and behaviour). Automated tests simply pin those dynamic properties down and transform them to a static artefact (e.g. a test script), which again can be governed by current version control systems.

5 Reasons to Automate Testing

This sort of testing (I’d rather call it checking) can be automated. And it should be automated for several reasons:

In the long run, it is cheaper to automate it.
It can be done continuously, giving you faster feedback whether a change has broken existing functionality.
As the software grows, your testers will not be able to perform it to the full extent necessary anymore, because development adds up—testing doesn’t.
It is a trivial, yet in its repetitiveness boring and exhausting task, that insults the intelligence and abilities of any decent tester and keeps them from their actual work.
Worse yet, testing the same functionality over and over again makes testers routine-blinded and makes them loose their ability to question assumptions and spot improvement potentials.

Test automation is an important part of overall quality control, but since it is not really testing, the term “automated testing” is very misleading and should be avoided. This also emphasises that test automation and manual testing do complement each other, not replace each other.

Many people have tried to make this point in different ways (e.g. this is also the quintessence of the discussion about testing vs. checking, started by James Bach and Michael Bolton). But the emotionally loaded discussions (because it is about peoples self-image and their jobs) often split discussants into two broad camps: those that think test automation is snake oil and should be used sparsely and with caution, and those that think it is a silver bullet and the solution to all of our quality problems. Test automation is an indispensable tool of today’s quality assurance but as every tool it can also be misused.

TL;DR:

Testing is a sophisticated task that requires a broad set of skills and with the means currently available cannot be automated. What can (and should) be automated is regression testing. This is what we usually refer to when we say test automation. Regression testing is not testing, but merely rechecking existing functionality. So regression testing is more like version control of the dynamic properties of the software.

Your 2 Basic Visual Regression Testing Tools

Jeremias Roßler — Thu, 22 Aug 2019 12:40:48 +0000

Can you spot the visual bug on Netflix (still live as of August 2019)?

No wonder, there is a trend to visual regression testing of websites. Most current approaches to that problem are pixel-based, meaning that they compare screenshots of the pages pixel-by-pixel. Instead, deep visual regression testing considers the CSS attributes of all elements, giving you unique advantages.

In this article, we will be comparing your 2 basic visual regression testing options. And which option will offer the most benefit to identifying changes within your web application.

Pixel-comparison based visual regression testing

When you see that Netflix has a simple visual bug on its website for over three months now (as of August 2019) the trend towards visual regression testing of websites is understandable. This approach guards you against unexpected changes (for which writing assertions is impossible) and is much more complete than assertion-based testing. Most current approaches today are pixel-based, meaning that they compare screenshots of the pages pixel-by-pixel. This makes a lot of sense:

The first version of a pure pixel-diffing tool is easy to implement.
TIt works for any browser, app or other situation, as long as a screenshot can be retrieved.
It gives instant results.

However, there are also some downsides to pixel-based visual regression testing:

TSimilar changes cannot easily be recognized: e.g. if the header or footer of the site changes, this affects all tests.
TYou usually gets a lot of false positives, as even a small change can result in many elements changing e.g. position.
TFiltering these false positives is tricky because it can result in either too many false positives (irrelevant changes being reported) or false negatives (important changes being missed).

In the below example, you see all of this play out. The demoed tool uses an AI algorithm to filter the differences for artifacts. As you can see, it produces both false negatives – the added colon after “Password” being missed – and false positives – the “Remember Me” checkbox did not change but was merely moved to the left as a whole.

Deep visual regression testing

recheck-web goes a different route and compares all rendered elements and their respective CSS attributes. So instead of being reported that the pages differ in pixel, where a human has to review the difference and interpret it, recheck reports the exact way in which they differ:

The difference as recheck reports it, displayed in the review GUI

As you can see, the text of the button changed from “Sign in” to “Log in”, the type of the element changed from a (a link) to button. Also, the class changed from btn-primary to btn-secondary. All other changes to the button are probably a result of those last two changes. The changes in the labels (added colons) were truthfully reported.

Since these are now semantic changes in contrast to pixel differences, it is easy to add rules and filters for handling them. Both the review GUI and CLI come with predefined filters. For instance, for the given situation, you could choose to ignore all invisible differences (class, type, background color, etc.). You could also choose to ignore all changes to any CSS style attributes, focusing only on relevant content changes (i.e. text).

Filtering changes

In the case of the changed login button, filtering all of these would show only the changes below.

This powerful mechanism lets you quickly focus on changes that are relevant to you. This works for CSS animations (try pixel-diffing that: http://www.csszengarden.com/215/) as well as for websites that are completely different in layout, but not in content. With this mechanism, you can easily ignore font but not text or color but not size.

It lets you see where these two differ in content:

Even better – you can create your own filters using a simple git-like syntax. For instance, you can filter specific elements (e.g. of tag meta) with: matcher: type=meta. To filter attributes globally (e.g. for the class attribute), use: attribute=class. To ignore attributes e.g. of specific elements (e.g. alt of images), use: matcher: type=img, attribute: alt. You can also use regex for both elements or attributes: attribute-regex=data-.*.

More details and examples can be found in the documentation.

Use recheck-web in your own automated tests (https://retest.de/recheck-open-source/) or demo it using the Chrome Extension (https://retest.de/recheck-web-chrome-extension/).

3 Ways AI will transform the testing process

Jeremias Roßler — Mon, 15 Apr 2019 07:34:36 +0000

To what extent will AI fundamentally change the way we work? What about the change management process? Will AI augment or replace testers? How do we introduce an AI initiative and ensure we are taking the testing team along with us on the journey?

Although biology often inspires human innovation, it hardly leads to a direct implementation. Birds taught humans that flying is possible and inspired human creativity for centuries. But the design of today’s planes and helicopters does not have much in common with their biological role models.

As humans learn and apply principles, we adapt them to our needs. Instead of creating mechanical legs for our vehicles, that can climb over obstacles, we removed the obstacles and paved the way for our wheeled transportation—which happens to be both faster and more efficient.

The same will be true for our AI efforts in testing: hardly will they be a faithful recreation of human testing efforts. To better understand where AI could be applied in the overall testing process, we need to break down the individual tasks and challenges of a tester.

Like a motor is no direct replacement for a muscle, we need to understand the underlying motivation for each task, and how it interplays with the overall testing goals, so that we can envision how the process could be improved and altered while the goals are still being served. So, in the following, we are talking about goals, not actual tasks of human testers.

On a very coarse level, testing can be divided into two situations, where it is applied:

Testing new software and functionality
Testing existing software and functionality

Testing new functionality

New functionality requires thoughtful testing. We must make sure the new functionality makes sense, adheres to UX design principles, is safe and secure, is performant and just generally works as intended. More formally, the ISO 25010 standard consists of 8 main characteristics for product quality, which we will address individually:

Functionality (Completeness, Correctness, Appropriateness)
Performance (Time behavior, Resource utilization, Capacity)
Compatibility (Co-existence, Interoperability)
Usability (Operability, Learnability, User error protection, User interface aesthetics, Accessibility, Recognizability)
Reliability (Maturity, Availability, Fault tolerance, Recoverability)
Security (Confidentiality, Integrity, Non-repudiation, Accountability, Authenticity)
Maintainability (Modularity, Reusability, Analysability, Modifiability, Testability)
Portability (Adaptability, Installability, Replaceability)

Asserting correct and complete functionality is basically an AI-complete problem, meaning that the AI needs to be at least as intelligent as a human to be able to do that. For example, when searching for first and last name in a social network site like Facebook, it should return all people with the specified names. When doing the same on a privacy sensitive site like Ashley Madison, this would be a severe problem.

Whether any give functionality is correct or faulty generally lies in the eye of the beholder. This problem is called the oracle problem, because we would need a Delphian Oracle to tell whether a certain displayed functionality is correct. That means, that in the foreseeable future, we cannot use AI to test for the correctness of software functionality.

Performance criteria on the other hand usually can be specified in an simple and very general manner—e.g. a site should not load longer than 2 seconds and after pressing a button, feedback should not be later than 500 milliseconds. So AI could test for performance, and indeed, products doing that are already available.

Compatibility can have many different meanings. Some widespread instances of compatibility testing, like cross-browser testing, cross-device testing or cross-os testing, which focus mainly on design and functionality can easily be automated. And again, we are already seeing products for that. Other compatibility issues are much more subtle, technically oriented or specific. Developing specialized AI in those cases is often prohibitive for economic reasons.

Usability is currently yet hard for current AI-systems to analyze, although this may be a promising venue in the future. Interesting enough, improving the usability of a software may also improve the ability of AI-systems to understand and test the software, thus leading to further incentives to do so.

Even without AI, there already exists software that analyzes some aspects of the reliability of a software system, such as fault tolerance and recoverability. AI will only improve such analysis and yield better results. Other aspects like maturity and availability are more connected to the long-term usage and operation of such systems and are generally hard to test for—even for humans.

Also for security, there already exists software that tests for some aspects, using existing and well-known attack scenarios. Apart for such standard-attacks, security in general is very hard to test for. Security analysts are usually high-paid professionals, that are very well-versed in their field and ingeniously combine various specific aspects of the system to find new weaknesses and loopholes. If business functionality is hard to test with AI, security (apart for known attacks) is the royal discipline, that will be tackled last.

Maintainability and portability are usually more internal aspects of the software system, very relevant to the development and operation of the system, but hardly tested for.

The ISO 25010 standard also defines 5 characteristics for quality in use:

Efficiency
Satisfaction (Usefulness, Trust, Pleasure, Comfort)
Freedom from risk (Economic, health and safety and environmental risk mitigation)
Context coverage (Context completeness and Flexibility)

As can is obvious, these characteristics all relate to the outcome of human interaction with the software. As such, they are highly personal and can hardly be qualified and tested for in a systematic manner.

It is also clear that, although the aforementioned characteristics are all important for a software product, they hardly account for the same amount of testing effort in the field. Numbers are hard to come by with, but it seems clear that testing for correct and complete functionality is the lion’s share of effort. Unfortunately, this is also the aspect where, following the Oracle problem, we said that we couldn’t employ AI to help us.

But not so fast: A huge part of testing for correct functionality of software is not done on new software, but on existing software. Maybe this could somehow remedy the problem?

Testing existing functionality

Software is very unlike many things we encounter in the non-digital world. If, e.g. we repair the front light of a car, we do not need to test the horn. But because software has so many invisible and unknown inter-dependencies, making a change to one part of a software-system could have unforeseen and unintended side-effects to basically any other part of the system.

Therefore, it is necessary to retest already tested and approved functionality, even if it wasn’t changed, just to make sure that it did not change indeed. This form of testing is called regression testing, and it makes up a significant amount of the overall testing effort.

Now the very interesting aspect of regression testing is, that it is about already tested and approved functionality. Which means that instead of testing for correctness, we can focus on testing for changes. Following this train of thought, regression testing is not so much a form of testing, but a specific form of change control. Developers already routinely use change control in the form of version control systems. The problem is, that these systems only govern static artifacts, like source code and configuration files.

Software as it is encountered by users and subject to testing, however, is a dynamic thing, living in the memory of the computer. The program code and configuration is the starting point for creating this dynamic state of the software. But many more ingredients, such as the specifics of the underlying hardware and operating system, the input data und the user interaction, form that dynamic software state.

While the source code and configuration is analogous to the original blueprint of a building, the dynamic state is comparable to the actual building. The concrete characteristics of that building depend on many more aspects, like building materials, painting, furniture, decoration and houseplants, all of which are not part of the blueprint, yet all are completely relevant to the user experience of the building. The same is true for the dynamic state of the software.

To remedy the fact that the encountered essence of the software, the dynamic state, is not governed by the version control system, the current state of affairs is to create and maintain automated regression tests. These tests then codify the dynamic state of the software, and as such, turn it into static artefacts—which are governable by existing version control systems. The problem, however, is that most existing regression test systems are modeled after the very successful JUnit.

Part of this heir includes the checking mechanism. This checking mechanism consists of individual checks (called asserts), which check one single fact at a time. These facts are considered to be hard (and unchanging) truths. As such, these tests are currently created and maintained manually, bearing a lot of effort, and are not well geared towards detecting and allowing changes.

However, there exist alternatives to this approach. These systems go by names like Golden Master testing, Characterization testing or Snapshot -based testing and are just now coming to fashion. Not only are these tests much easier to create, they are also easier to maintain, as detected changes can simply be applied to the underlying test if intended. Additionally, it showed that these tests remedy some of the other long-standing issues of regression tests.

Using this testing paradigm, an AI could thus create such Golden Master tests for an existing (and approved) version of the software. After changing the software, these tests would then show to the human tester changes of the functionality (or the absence thereof). A human tester would then only need to review new functionality or detected changes to existing functionality.

In many cases, this already bears huge savings in effort and tremendous decrease in risk. The reason why this works for AI is simply that it circumvents the oracle problem. The AI now does not need to decide whether a specific functionality is correct—it merely needs to execute the software and record its behavior.

Having solved the main challenge, that today keeps AI from testing software, we are now able to turn to some remaining challenges. These are additional challenges, that we would face even if we could somehow magically solve the oracle problem. One is, that the AI needs to understand how to execute the software. That is, given a (possibly empty) track of previous actions and a current state of the software, the AI needs to decide what user action to perform next.

Formulated like that, the problem is very comparable to that of playing a game like Chess or Go. Actually, we already have AIs that play computer games, having to solve the exact same problem. So, we have a clear path for how to accomplish the task. The only difference is how to formulate a suitable reward function.

For computer games, such a reward function is rather easy: “increase the number of points”. For executing different use cases of business software, this would maybe be something like “increase code coverage” or some similar metric. Supplying recordings of typical usage scenarios for the AI to learn from would overcome initial challenges like guessing a correct username / password combination or find valid values for date, email or other more obscure input data (think SAP transaction code).

In the process of generating such recordings, the AI could already test for performance and some aspects of reliability and security, as mentioned above. Any technical errors it would encounter (where the oracle is the fact that such errors should simply never occur), it could report, making separate smoke testing obsolete as well. Note that, as mentioned above, improvements to the usability of the software will probably boost the performance of AI in testing as well.

It is noteworthy, that we already and for a long time have an automated testing approach that is, in principle, capable to achieve the same results. It is called monkey testing. This approach is named after the monkey theorem, which states that a monkey on a typewriter, hitting random keys for eternity, will eventually write all works of Shakespeare. The reasoning is simple: in eternity, it will produce all possible combinations of characters.

One such (long) combination will be the works of Shakespeare, together with any possible variation thereof. Monkey testing simply applies this theorem to testing, generating random inputs on the GUI. There already exist systems for that. Using AI, we simply increase the efficiency and get some valuable results in reasonable time, rather than in eternity.

A new testing process

Given the insights from the previous sections, a new testing process could be envisioned, that looks like the following: A new software is created. Human testers make sure that this software is correct and complete and is usable and secure. Note that the first two tasks could as well be assigned to the role of business analysts.

The software is then given to an AI, which is trained by recordings of typical usage scenarios and thus knows how to execute the software. The AI executes the software and records sufficiently many different input/output scenarios as Golden Master that allows to detect changes in the next version of the software. Other quality aspects are take care of by the AI as well, e.g. it tests for performance, known security attacks and for fault tolerance.

Using the feedback from the AI, from testers/business analysts and from actual users, the developers improve the software in the next sprint. A subset of the Golden Master tests could be executed on a nightly basis or after every commit, providing early feedback for developers. After the next version is created, the full set of Golden Master tests is executed, showing every change in behavior and allowing for both easy approval of those changes and stable GUI tests.

This will also increase the test coverage and dramatically reduce the risk of undetected changes. Testers are then free to focus on new functionality and changes to the behavior of the software. Note that this also allows for much better tracking (who approved which change?) and easier certification of the software.

This process will free up testers of repetitive and mundane tasks, such as manual regression testing. It will thus augment testers and not replace them. What we are talking here about is, essentially, codeless and autonomous automation—two buzzwords that have been haunting the realm of test automation tools for years but turned out to be promises which tool vendors have failed to deliver upon. This means that we are freeing many testers from a career choice they don’t want to make—veering into test automation. Applying AI to testing like this, testers have much to gain, but practically nothing to lose.

Long-term perspective

The proposed process changes are such that they can be achieved with AIs current capabilities. Researchers expect that these capabilities will only improve and broaden over time. Once AI has gained human or super-human capabilities, there are practically no tasks that AI cannot perform, from tester to developer to manager. But it is yet unclear when that mark will be reached. And on the path to reaching these capabilities, there are many more interesting milestones.

One ongoing discussion is whether AI threatens the jobs of testers. Following the above train of thought will yield near-complete automated tests, together with the capabilities to generate more such tests on demand. That basically shatters the problem of impact analysis—finding out which parts of the software any given change affects.

Solving this problem allows to apply AI to the adaption and generation of source code. Think of automatically generating patches for bugs, automatically dissolving performance bottlenecks or automatically improving the quality of source code by restructuring it, e.g. into shorter methods and classes.

No major capability comes with a big bang. We had driver assistances that helped us stay in lane, adapt headlights or keep distance long before we had full-fledged autonomous driving. The same will be true for the development and testing of software.

Having AI generate or improve little parts of the code will be the first steps towards generating simple methods, modules or eventually whole systems. And when that happens, the oracle problem will still be unsolved. So even with these approaches, someone still needs to make sure that the generated functionality is correct and complete.

Whether this role is then called developer, business analyst or tester, is beyond my guess. But in my view, those who currently call themselves developers should probably be more worried about the long-term prospect of their jobs than those who call themselves testers.