A Tired Developer’s non-Illustrated Primer to B2G Testing

Dec 21, 2012 by Andrew Halberstadt in mozilla, ateam, b2g

As B2G continues to trod onwards to its release, there is still a lot of confusion about the level and state of test coverage it has. Back in November we started running mochitests, reftests and marionette/webapi tests on ARM emulators. Now we’ve also added xpcshell tests and for the most part we have these nice green letters to look at on TBPL that make us feel good about ourselves. But what is really being run? What is the meaning behind these letters “M”, “R”, “Mn” and “X”? Are there any causes for concern? Are there other tests being run that don’t show up on TBPL? What are the current automation priorities? What are the next platforms to use after emulators?

This blog post aims to answer these questions and more. It is a comprehensive snapshot of the current state of automated testing on B2G.

Pandaboards
Mochitests
Reftests
Marionette/Webapi Tests
XPCShell Tests
Gaia UI Tests
Gaia Integration Tests
Eideticker
Application Startup Tests

Pandaboards

Pandaboards are still the future of automated testing on B2G. We’ve hit many problems with them over the course of the last two quarters, but all the pieces are starting to fall together. In fact we have tests running on pandas on the Cedar branch already (N.B that these are to test the infrastructure, not the product).

Mozpool, Lifeguard and BlackMobileMagic

Mozpool is the system that will be used to control and assign pandas. The build system will send a job request to mozpool which will analyze the available devices and return the IP of a device that meets all of the requirements. Before doing so, it will invoke lifeguard which will perform diagnostic tests on the device and remove it from the pool if it is unsuitable for testing. Lifeguard will use BlackMobileMagic to perform it’s low-level operations on the device, such as diagnosing network issues, restarting, retrieving device info etc. All of these components are currently completed, tested and awaiting the test harnesses.

Test Schedule

How to help

Getting Gaia UI tests running on pandas is well under way. After that we will be shifting focus on the B2G unittests (mochitest, reftest, marionette/webapi and xpcshell tests). Bug 807126 comes to mind as an important bug that we’ll need to complete before we can even start running unittests on pandaboards. It is currently on my radar for sometime in Q1 but has been slipping down my priority list lately.

Mochitests

A subset of mochitest-plain are being run on the emulator. There are no plans for mochitest-chrome. Mochitests will also be used for B2G permissions testing. Mochitests are rolled out to all branches and are being staged on the Cedar branch. These are denoted “M” on TBPL.

What’s being run

See the full list of mochitests running on b2g. Currently it’s only some of the DOM and layout tests, but these are in the process of getting expanded (bug 793045).

Causes for concern

Overall mochitests have been pretty stable save for a few intermittent harness issues.

B2G/emulator instability problems (bug 814551 and bug 802877)
Runs slowly on emulators using linux 32 slave load (the set of enabled tests takes ~60 minutes on a debug build)
There are a fair amount of mochitest failures (see tracking bug 781696)

Future work

Enable a larger set of mochitests (bug 793045)
Run mochitests on pandaboards
Create mochitest permissions tests
Expand mochitest-plain with additional b2g tests
Emulators are slow, test runs take a long time

Try Syntax

try: -b o -p ics_armv7a_gecko -u mochitests -t none

How to help

See the instructions on MDN for information on running mochitests
Fix a bug on the tracking bug’s dependency tree
Harness performance improvements
General refactoring and improvements (consolidate code between the harnesses)

##Reftests

Reftests are being run against the ARM emulators. They are rolled out to all branches and are being staged on Cedar. These are denoted “R” on TBPL.

What’s being run

Only the reftest-sanity tests!!! Yes, there is practically no test coverage here at the moment. The tracking bug to expand the set of tests is bug 811779. The patch in this bug should give a relatively green run of all the reftests but we simply don’t have the capacity to turn them on. Instead I’m in the process of triaging a subset of reftests that Chris Jones deemed “important” on Cedar. These should be ready to turn on soon which is why there are so many chunks.

Causes for concern

See the main reftest tracking bug for a full list of issues associated with reftests on B2G. Some of the highlights include:

B2G/emulator instability problems (bug 814551 and bug 802877)
In general there are a lot of intermittent failures with the B2G reftests
Enabling <iframe mozbrowser> causes additional failures (bug 785074)
The harness is not quite getting launched properly (bug 807970)
No crashtests or jsreftests being run yet
Emulators are slow, test runs take a long time

Future work

Fix remaining harness issues
Enable a larger set of reftests (bug 811779)
Run reftests on pandaboards
Add crashtests and jsreftests

Try Syntax

try: -b o -p ics_armv7a_gecko -u reftest-1,reftest-2,reftest-3,reftest-4,reftest-5,reftest-6,reftest-7,reftest-8,reftest-9,reftest-10 -t none

How to help

See the instructions on MDN for information on running reftests
Pick up a bug listed in the dependency tree
Harness performance improvements
Try running them locally and file/attempt to fix any issues you come across (the dependency list above is definitely not exhaustive)

Marionette and WebAPI tests

Marionette and WebAPI tests are a combination of marionette unittests for testing marionette itself and some B2G webapi tests. These are rolled out to all branches and are being staged on Cedar. They are denoted with an “Mn” on TBPL.

What’s being run

All of the marionette unit tests. In addition there are many other webapi tests being run. These include tests for telephony, battery, sms, network and more.

Causes for concern

The webapi tests tend to be much more crashy than any of the other unit tests. Currently there are a lot of instability issues caused by B2G process crashes and full out emulator crashes.

B2G/emulator instability problems (bug 814551 and bug 802877)
Recently the gaia/gonk snapshot got updated and we see a 70% crash rate now, but oddly not on the b2g18 branch! (see bug 823076 to track fixing it)
Emulators are slow, test runs take a long time

Future work

Add some screen orientation tests
Expand existing test coverage in general
Some of the tests require the emulator (for synthesizing events). Run the ones that don’t on pandaboards
Fix stability issues on the emulator

Try Syntax

try: -b o -p ics_armv7a_gecko -u marionette-webapi -t none

How to help

See the documentation on running marionette tests
Pick up a bug from the dependency tree
Expand the set of marionette/webapi tests

XPCShell Tests

XPCShell tests were just recently added. They are rolled out on all branches and are being staged on Cedar. These are denoted “X” on TBPL.

What’s being run

The xpcshell tests being run include the update tests, the ril tests, the debugger tests and a handful of others. This is just a small subset of tests that were chosen to start out with. If you know of any other tests that should be getting run feel free to let me know (or just add them yourself after verifying that they pass).

Causes for concern

The xpcshell tests seem to be quite reliable on B2G. There are a few open bugs, but nothing near as bleak as e.g reftests or marionette/webapi.

B2G/emulator instability problems (bug 814551 and bug 802877)
See the general tracking bug for xpcshell related problems
Test return codes are always “1” (bug 773703)
Segfaults happen in between tests (bug 816086)
Emulators are slow, test runs take a long time

Future work

Run xpcshell tests on pandas
Expand set of tests being run
Fix remaining issues
Speed up test runs

Try Syntax

try: -b o -p ics_armv7a_gecko -u xpcshell -t none

How to help

Expand the set of tests being run (be mindful of test time and chunks)
Grab a bug from the dependency tree

Gaia UI Tests

The gaia UI tests (aka gaia smoketests) are a set of tests being run by the WebQA team. They are running automatically in a Jenkins instance but are not currently being reported in TBPL. You can see the results (you must be on the MV network to see that link, sorry).

What is being run

You can check out the current set of tests being run from github.

Causes for concern

There are some test stability issues and some issues that appear to be legitimate failures. There are ~13 tests (roughly half) that are currently passing and stable on the pandaboards. The others require things which a pandaboard doesn’t have (e.g camera) and can’t be run.

Most of the gaia ui test issues can be seen on github
Pandas don’t recognize the SD card which is needed by a fair chunk of the tests (see bug 820833)
The two main blockers for getting this going on pandas are bug 820617 and bug 821379

Future work

Get them running in a Jenkins C-I on a Unagi (see bug 801898)
Get them running in TBPL on a pandaboard (see bug 802317)
Expand set of tests
Fix stability issues

How to help

Read the documentation on running them
Check out the dependency trees for bug 801898 and bug 802317
Contribute additional tests (see writeing unit tests)

Gaia Integration Tests

Some of the Gaia integration tests are being run by the WebQA team in a Jenkins CI. Others are being run manually on shipping devices by gaia developers. These are targetted to run on pandaboards after the Gaia UI tests are finished and running stable.

Eideticker

Eideticker is a performance testing harness that captures HDMI out and performs frame by frame analysis. There should be B2G specific eideticker tests running by end of Q4 or early Q1.

Causes for concern

There is a problem where the HDMI output seems to go to sleep (bug 819431)
Pandaboards become unresponsive after idling for too long (bug 821379)

Application Startup Tests

These tests are currently running on unagis and are reporting data to datazilla. Once the remaining panda issues have been ironed out, these will provide some basic per push performance numbers.