My Mozilla QA Adventures: June 2014

Monday, June 30, 2014

Resume screening burnout

While taking care of getting cars fixed in anticipation of summer road trips, I spent two days screening resumes. I am burned out!

So, going back to building my lab tomorrow.

Thursday, June 26, 2014

So, we are hiring in my group. Here is the link to the job posting:

https://hire.jobvite.com/Jobvite/jobvite.aspx?b=nkspCnwF

We are looking for people with automation skills. What does that mean, you might ask? Well, here is what I think.

First of all, automation engineers are developers. The have good skills at developing automation software. At Mozilla, that means that they know Python, JavaScript, C, or C++. They know how to code and debug.

However, they are also QA engineers. They have a desire to break software to make it better. They have a desire to have some level of comfort in measurable quality. They are the kind of person who breaks websites trying to do normal things like buy a shirt or login to their bank. And they always want to know why it breaks, and may try to figure it out because either they are angry, they are curious, or they enjoy the thrill of a problem solved.

These kinds of people are very hard to find. Most people with coding skills want to work on some other kind of software than automation. Most really good testers don't necessarily have the technical skills to write automation.

I am plowing through a bunch of resumes this afternoon. Most of them are from candidates who are submitting their resume to every position that they can find regardless of qualifications. While it is true that there are many fewer jobs in tech than there are people who want them, it is also true that the number of qualified candidates is very small. There is no such thing as a candidate that matches all of the job requirements perfectly. My philosophy is looking for somebody who is smart, has a track record of success, knows some of your skills, has knowledge in similar skills, and has a track record of learning. Add on top of that some requirement of social skills, and you have a good candidate.

It's a large job standing in front of the firehose and screening resumes. It's fun, but at the end of the day, I am glad it is my bosses position and I am only helping out.

Wednesday, June 25, 2014

Steeplechase

In order to run our multi-machine tests, we are going to need some kind of process runner which can execute remote-control commands on the test machines. Our solution uses a technology called Steeplechase (https://github.com/mozilla/steeplechase), which has a simple command language. It uses another technology called signalingserver (https://github.com/luser/simplesignalling). This in turns relies on node.js (http://nodejs.org/), which allows a server to run javascript without having to have a browser.

So, first created another Linux VM to run these pieces in. It won't need as many cores nor as much RAM as the Jenkins machine, but it will potentially need some more disk space for logs or binaries. So, the specs on this machine:

2 cores
2 GB RAM
60 GB disk

Now installed signalling server and its dependencies. I ran the following:

sudo apt-get install git nodejs
npm install socket.io

Now, I ran this to get the signaling server source:

git clone https://github.com/luser/simplesignalling

Finally, I started it up:

cd simplesignalling
nodejs ./server.js

Now, for steeplechase, we need the following dependencies:

sudo apt-get install python-setuptools

Clone the steeplechase repo:

git clone https://github.com/mozilla/steeplechase

We need to bootstrap the python environment for steeplechase.

cd steeplechase
sudo python setup.py install

The machine is now setup and ready to run steeplechase, but without test clients, there is nothing for steeplechase to talk to. Next installment, setting up the test clients.

Tuesday, June 24, 2014

Starting to build out a proto-system

I have a meeting with the person who can provide me with final hardware and lab space today. In the meantime, I am building the prototype system.

Let me talk about the basic setup.

Assumptions:

Each test run requires two machines to run the actual tests.

Each machine has an agent called Negatus on it.
Large pool of these machines to run tests from.

Those machines are controlled by a controlling agent called steeplechase. This runs on another machine.
There needs to be a TURN server on some machine.
There may need to be more VMs created to provide networking gateways/routing to simulate network conditions.
There needs to be some kind of scheduler to run the tests. We are going to use Jenkins (http://jenkins-ci.org/). It's easy, and we should not run into its limitations.

So, I will setup the last one right now, the Jenkins machine.

At this point, I should point out that this is a large number of VMs for one desktop machine to host on VMWare Fusion. I may have to host some of them on a second machine.

Yesterday I built a Ubuntu 14.10 Desktop VM in VMWare Fusion. I use Desktop because I like having X-Windows. Everything else is easy to install. The initial specs:

4 cores
8192 RAM
100 GB RAM

Jenkins loves RAM. Well, it's a Java app, and all Java apps love RAM. Also, all logs and intermediate results from Jenkins processes end up being stored on the host, so the host needs some disk space. And giving this machine 4 cores allow one to administer the machine without shutting down Jenkins.

I then ran all of the software updates. And installed VMWare Tools. Which you need for the clock to work and to be able to copy-paste into apps.

It used to be that you had to install Java, which on Linux can be a pain, and setup your own script to run Jenkins, and setup your own startup script, etc. In the last couple of years, somebody smart made Jenkins an installable Debian package; details here: http://pkg.jenkins-ci.org/debian/. If you are having trouble figuring out how to change /etc/apt/sources.list, see this page for info: http://askubuntu.com/questions/197564/how-do-i-add-a-line-to-my-etc-apt-sources-list.

At this point, Jenkins is running (see http://localhost:8080) to verify.

Tomorrow: Setting up signalingserver, steeplechase, negatus...

Monday, June 23, 2014

What we are going to tackle next

So, now that we have our basic Sunny Day environment more-or-less running, it's time to figure out what to do next.

We have identified two areas as fertile fruit for further testing using the Sunny Day environment.

Establish WebRTC connection - This set of tests will test connecting between two clients in a variety of network situations. Networks with different characteristics will either be setup or simulated with test doubles. We will be testing with high-latency, low-bandwidth, high packet loss and other pathological environments. We will also be testing in various configurations with NAT, firewalls, and other different network topologies.
Quality of established connection - Once the connection is made, how well does it hold up when various bad things happen to the network? We will also attempt to test audio/video quality.

We are first going to write test plans for this and then implement tests to the plan. Other members of the team are primarily responsible for these efforts.

I will be working on the environmental problem. Basically, we have to run in a wide variety of environments:

Different platforms - Linux, Windows, Mac OS X on the desktop; Android and Firefox OS for mobile.
Different Firefox versions - We will test nightly connecting with the various public releases of Firefox - Nightly, Aurora, Beta, Release and Extended Release for Desktop; as-yet-unknown for mobile.
Potentially different browsers - Chrome, Opera

I will be building an environment using both VMWare ESX and VMWare Fusion, with a smattering of bare hardware. The job control software will be Jenkins, with the rest of the machines closely matching the configuration of the Sunny Day environment.

I have to build out the hardware (or have it built out), but I also have to develop the Jenkins instance configuration, find a way to store it in source control, and write scripts to build machines up. We have a few that we use in the Sunny Day environment; I plan on expanding that work.

Should be fun!

What we have done so far

So, before I start blogging-as-I-go, I should explain what we have been doing so far.

We set up some machines in a lab that we have designated our Sunny Day environment. This is designed to test WebRTC with nightly Firefox builds on Linux with real audio and video hardware connected.

Our setup consists of the following:

One linux machine running our test harnesses, server.js (run with node), and steeplechase. server.js is a signalling server, and steeplechase manages commands to remote clients, in this case the other machines in the fleet. This machine also runs TURN, which keeps track of the public IP addresses of the clients, which is necessary when the clients are behind a firewall/NAT.
Two linux machines, each running our test agent, called Negatus. steelplechase sends files and commands to Negatus, and Negatus launches Firefox and runs Firefox test commands. These machines each have an mp3 player and a video camera (trained on a clock in the lab).

We run a series of tests connecting the two clients and sending 3 hours of video and audio back and forth. When the system is finished, it will report results into Mozilla's new result reporting mechanism, treeherder, where interesting engineers can see the results of the tests on the web.

Sunny Day provides a level of security in that we know that if the tests pass, our nightly Firefox can establish WebRTC sessions on a real network with real video/audio streaming, and is a great first step.

The next entry in this blog will talk about what we are doing next.

Hello, and Welcome

Hi. My name is Syd Polk, and I recently joined Mozilla. It's a great place, and it is really great to be able to be open about what I work on.

My position is officially titled Technical Lead for Platform QA. So, what does that really mean? I am still figuring it out, but basically, Mozilla is divided into several engineering groups. We have a centralized QA group, with engineers assigned to test various technologies. The base technology for everything is called the Platform Group, and until very recently, has not had dedicated QA.

You are meeting the only dedicated Platform QA Engineer by reading this blog post. Now, most of the platform technology is tested in the other groups. Platform is used to build the browser, for instance, and there are people who test the browser. I was hired to help identify the cracks where other technology does not test something adequately.

One area that we identified right before I joined was WebRTC (http://www.webrtc.org), the technology that allows Firefox users to connect to each other for real-time video/audio chat sessions. Our existing automation for this feature ran the network test on the build machines we have here. They have no external network connections, so the tests used fake audio/video streams on two browser instances on the same machine running only on Linux. While this does verify that the basic connection mechanism is not fatally broken, it does not test interesting network setups, different network characteristics, or video/audio stream quality. It also does not test the WebRTC connections on heterogeneous platforms, nor with differing versions of Firefox. Or, for that matter, with other browsers that support WebRTC, like Chrome or Opera.

So, right now, I am working with a few Mozilla engineers to build a system to address some of those concerns. Should be fun!