My Mozilla QA Adventures: 2014

Thursday, August 14, 2014

Working Jenkins

Well, that was a lot of trial and error.

So, here is how the Jenkins setup is working. I wrote a script (available at at this repo in my github account) called maintain_firefox_cache.sh. (It calls other scripts in the checkout directory). What it does:

- Calls mozdownload. It will download the latest nightly build, but the name does not stay the same from day-to-day.
- If this is the first time it is downloaded, it will copy the download payload to "firefox-latest-lighty.en-US.<platform>.<ext>", where <platform> and <ext> are appropriate to the platform we are caching.
- If there is one there, it uses the unix find command to find the name of the latest binary, and copies that one. It also finds binaries older than the cached version and removes them.
- On the Mac, this runs in a Mac builder, and this script will open the .dmg, copy the contents out, and repackage them into a .tar.bz2 file, since the steeplechase machine will be on Linux and doesn't easily know how to open a .dmg file.

This leaves an artifact on the Jenkin's master filesystem. This is important later on.

We also have to have the payload (this is the Firefox 34 version) with the tests directory. We are using the linux64 version, because our tests do not required any platform-specific compiled assets from the package. Unfortunately, mozdownload does not know about the tests payload, and this url will have to updated everytime there is a version bump. Maybe I'll add that to mozdownload some day.

So that's great. We have firefox binaries, and test assets. Both of these need to be on the local filesystem of the steeplechase machine. So, how do we get them there?

My previous post talked about a couple of plugins designed to help track assets and when they changed:

URLSCM is really good at copying assets based on URLs. It's polling mechanism is broken, however; it always wants to download the asset even when it has not changed.
URLTrigger allows you to track modification date changes, but does not actually copy them.

For the tests download, I use the URL Trigger to detect changes and URLSCM to download it.

For the Firefox binaries, I was trying to use URL Trigger to track changes in the URL of the Last Successful builds, but they were never triggering. Instead, I use the FSTrigger to detect when files change on the Jenkins master itself.

So there is the sequence (using linux64 as an example):

Once every 24 hours, firefox-nightly-linux24 fires. It run maintain_firefox_cache.sh, which runs mozdownload to get the linux64 binary. If there is a new binary, the new firefox-latest-nightly.en-US.tar.bz2 file is archived.
trigger-firefox-nightly-linux64 (running on the Jenkins master) notices the new file, and immediately triggers expand-firefox-nightly-linux64.
On the steeplechase machine, expand-firefox-nightly-linux64:

Copies the firefox-latest-nightly-en-US.tar.bz2 file from the Jenkins master to the local filesystem.
Expands the payload to the /home/mozilla/firefoxes/nightly/linux64 directory.
Triggers all of the steeplechase jobs based on linux64 to be run.

A steeplechase job will run, passing the correct binaries and test files to the test machines.

I also have add the SCM Sync plugin to the Jenkins instance so hopefully I won't have to create all of these jobs on the ESX machine from scratch (although I will have to edit them).

Monday, August 11, 2014

Back to Jenkins fun

So, now, I know how to download binaries and get the correct versions independent of what their names actually are.

Now, I need my jobs to trigger each other. The scheme I have is:

Run download. If the binary is newer, overwrite the canonically named version, i.e., firefox-latest-nightly.en-US.linux-x86_64.tar.bz2.
Another job triggers when firefox-latest-nightly.en-US.linux-x86_64.tar.bz2. It runs on the Steeplechase machine, and it expands the archive into a known directory location.
It then triggers jobs for all of the steeplechase runs that depend on it.

At Coverity I used the URLSCM plugin to do the triggering. Basically it used the SCM polling mechanism builtin to Jenkins to see if the local copy of a file is newer than the version at a URL. The problem is, this mechanism broke a few years ago, and to this day nobody has fixed the bug.

Today I found out about another Jenkins plugin, URLTrigger Plugin. This allows you to trigger builds of a variety of things, but most germain to me is that it can trigger if md5 checksums are different. I am trying this out overnight; we'll see what happens.

Thursday, August 7, 2014

Which Firefox build to download?

So, eventually this system is going to have to download all Firefox releases and test them. All of the releases are available here:

http://ftp.mozilla.org/pub/mozilla.org/firefox/

So, there are a LOT of releases on this server. So which ones to we want? Mozilla has five (or more) active release trains at any one point:

Nightly - This is a nightly build of the code checked into mozilla-central, the Mercurial repository that holds Firefox. It is the least stable of them all.
Aurora - This is for code that is Feature Complete™, or some other terminology, which means that the feature has landed but has had very little testing.
Beta - We want to release this build next time we release something, so we are going through final testing.
Latest - This is the latest release. Right now, this is Firefox 31, but that will change next release cycle.
ESR (Extended Support Release) - This release is primarily intended for large organizations who want fewer releases for stability. Right now, there are two of these, FF 24, and FF 30. More info on this topic is here.

There is a utility called mozdownload which is here. Once you build and install it, mozdownload is a big help. It deals with changing file names and dates and the like.

So where is all of this stuff on ftp.mozilla.org? And what is the mozdownload command-line for it?

Nightly - https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/ - mozdownload --type=daily
Aurora - https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/ - mozdownload --type=daily --branch=mozilla-aurora
Beta - http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/latest-beta - mozdownload --version=latest-beta
Release ("latest") - http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/latest/ - mozdownload --version=latest
ESR - http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/latest-esr/ - mozdownload --version=latest-esr

I hope that this helps. It sure helps me.

Thursday, July 31, 2014

Finally.

I have a Jenkins instance running the regression suite on Firefox nightly builds in all platform combinations of linux64, linux32, and macosx. The front page looks like this:

So, I have proof-of-concept. I am demoing this at our QA Work Week QA Fair later today. Should be fun!

Lots of work to do, however.

I need to investigate using parameterized builds. Having to create and maintain a separate Jenkins job for each combination is painful (I have a lot of experience with this from Coverity, alas).
Henrick Skupin has a Jenkins instance, and he has solved the problem of the nightly version numbers changing every month. Need to implement that.
Need to add jobs for nightly versions connecting to Aurora, Beta, Release and ERS versions of Firefox.
The test is not incredibly valid. I need to take the test additions done by Geo Mealer in our Sunny Day Environment and run the connections for 1 minute each. I really want to do the parameterization first so I don't have to update dozens of jobs.
I need to put this in our ESX farm. However, I only want to do this when most of the above is done. I am going to set up Jenkins on the ESX farm, running linux64 regressions for now.
I have got to get Jenkins and Steeplechase working for Windows. While I am in Mountain View this week, I do not have access to my Windows VMs. Will have to wait until I get home next week.
Sunny Day Environment does most of its job maintenance and triggering via cron jobs. It would be nice to move this into Jenkins.
Need to report results of test runs into Treeherder. This is a pretty big effort.
And then, there is B2G and Android.
Steeplechase changes

Run with existing binaries but new profile. (or whatever; independent control)
Send binary archives down instead of directories, and tell clients how to unpack them.
Steeplechase should talk to treeherder.

Ought to keep me busy.

Wednesday, July 30, 2014

Another fun fact: For Negatus to be able to run firefox for the tests, it has to run it in a display. Without setting up a fake X session, the Negatus client has to be logged in, at least on Linux. I am sure that this will be true on other platforms as well. For Linux, we could run in a virtual frame buffer, but I am not sure that this is necessary. Just set up the account the test will run in to auto login. Make sure it is on a private network, though...

OK, it's just too hard to nail down all of the libraries to run firefox32 on linux64. I give up. Average users won't do this.

A-ha!

You actually have to do work to run 32-bit binaries on 64-bit linux. I knew this, but I rediscovered this fact this morning.

Of course, this got much harder in modern Ubuntu:

sudo apt-get install libxtst6:i386 libXext6:i386 libxi6:i386 libncurses5:i386 libxt6:i386 libxpm4:i386 libxmu6:i386 libxp6:i386
sudo apt-get install libstdc++6-4.8-dbg:i386

Sigh.

Wednesday, July 23, 2014

Three things I am working on - follow-up

Thing 1 - Bugzilla 1036992 - I have a patch now which completes the split up of test_seek.html, including refactoring of common javascript in test_seek-split*.html files. Figuring out how to add the javascript so that it was actually available was fun. Basically, everything goes in mochitest.ini. Who new?

Thing 2 - mochitest dying on my VM. Well, it still happens, and it often does not leave a stack dump. Still working on this one.

Thing 3 - Jenkins and nodejs. I restarted everything and it started working. Huh. Weird.

So, now I am setting up Jenkins in Mountain View to reproduce the result. Unfortunately, when the physical box was moved into the lab, the networking broke. All of the existing VMs have IPv6 addresses rather than the static addresses I thought were assigned.

Coworker in charge of box said he would look at that.

So, now I am building linux32 builders for this test on my home VM. I am working out of one of my relative's houses in the Midwest right now, and her internet connection was SLOW. Well, the upgrade came through yesterday, and now it is as fast as my home network in Texas.

It helps.

Tuesday, July 22, 2014

Three things I am working on

The first one is this bugzilla. Basically, the audio/video tests sometimes time out in Mozilla's build/test environment, and we are trying to track down which tests are sensitive. We run a lot of tests in Amazon's S3 cloud, and disk and network access are not predictable in that environment.

My boss and I have submitted a few patches to unparallelize some of the tests to see if it helps. Latest patch is run by our environment in tbpl, here. I need to run quite a few more tests and analyze them today. One thing I had to learn was Mercurial queues; using this is the best way to work with patches with Firefox and Bugzilla. More info here and here. You have to be really careful with them, as it is easy to blow away work. Still, it's a really nice system for managing patches. In git, you would do the same with local branches, but it's not quite as easy.

The second is a test time out running mochitest on a subdirectory on a Mac OS X VM. Mochitest is the oldest Firefox test suite. More info on it can be found here. I chose to run it from a build tree, so I had to go build Firefox on Mac. Info on that here. I then ran the following:

./mach mochitest-plain content/media/test

And then watched the magic! Note that this is the same test suite as we are watching with the first problem I am working on. Sometimes, I get a test failure on my VM that nobody else seems to be running into, so I am working on trying to get enough data to file a bug. A coworker directed me to try out setting MINIDUMP_STACKWALK before running the test. Once I dug enough for somebody to tell me that this tools was part of a another mercurial repo (http://hg.mozilla.org/build/tools), I tried it. No dice. Need to submit that bug report today.

And last, I am having trouble getting steeplechase to run in my Jenkins lab. The latest:

cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:55293/index.html']
Traceback (most recent call last):
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 311, in 
    sys.exit(0 if main(sys.argv[1:]) else 1)
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 301, in main
    html_pass_count, html_fail_count = test.run()
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 187, in run
    passes, failures = result
TypeError: 'NoneType' object is not iterable
Exception in thread Client 1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 100, in run
    output = dm.shellCheckOutput(cmd, env=env)
  File "/usr/local/lib/python2.7/dist-packages/mozdevice-0.37-py2.7.egg/mozdevice/devicemanager.py", line 395, in shellCheckOutput
    raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
DMError: Non-zero return code for command: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:55293/index.html'] (output: 'r', retval: '256')

Exception in thread Client 2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 100, in run
    output = dm.shellCheckOutput(cmd, env=env)
  File "/usr/local/lib/python2.7/dist-packages/mozdevice-0.37-py2.7.egg/mozdevice/devicemanager.py", line 395, in shellCheckOutput
    raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
DMError: Non-zero return code for command: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:55293/index.html'] (output: 'r', retval: '256')

Need to see what that is about...

[LAB] Setting up for jenkins

Packages required on Jenkins machine:

Packages required on Steeplechase machine:

openjdk-7-jre-headless
curl

Jenkins plugins:

Git plugin
Mercurial plugin

I am afraid I finished the buildout of this without good notes. Sorry!

Tuesday, July 15, 2014

New task

I was given a new area to investigate on top of building out a lab. Basically, we have some media streaming tests in our tree, in <mozilla-central>/content/media/test, and they are flaky. I have been asked to investigate a couple of things, using test_seek.html as an example:

- The tests actually get run in parallel. I have been asked to see if running them singly will affect the intermediate failure rate.
- I have been asked to split this file up. It currently calls 13 sub-files. I have already generated the first of those 13 files and it works fine.

All of this requires checking out the Firefox source (instructions here), and then running Mochitest (instructions here). I can run the individual test.

If do add a test file in a directory, such as <mozilla-central>/content/media/test, you have to add it to the mochitest.ini file in that directory to be picked up by the system.

I have two build trees (my Mercurial foo is low, and I am lazy), one to generate patches for the split out of the tests, and one to generate patches for our try system running only one test at a time.

Alas, I am not at my house; I am with family in another state, and the internet is slow here. It will be upgraded soon, but this is taking a while...

Thursday, July 10, 2014

OK, time to step back

Occasionally, you reach some little milestone, and it helps to make a To Do list. So, here is what is left to have this lab up and running now that I have 3 machines that can run tests.

1. Get Negatus to run on boot for the two client machines.
2. On my home lab, develop the Jenkins scripts and job templates that will do the work.
3. Install the Jenkins instance in the ESX lab.
4. Port the Jenkins work from the home lab to the ESX lab.
5. Start adding platforms.

That's a good general list.

Tomorrow is a test day, so I probably won't be doing my with this stuff. Let's what I can get done today.

Wednesday, July 9, 2014

Running the tests

One of my coworkers is going to post a public document on how to get steeplechase and Negatus running. Once he posts that, I will repost the link here. Basically, it looks like this:

- One machine needs to run simplesignaling. This is a nodejs-based server to facilitate Firefox communication.
- A machine needs to run steelpechase. This can be the same machine as the one that runs simplesignalling, but not required.
- Each of the client machines runs Negatus, which is a test agent.

The steeplechase machine needs to download the firefox binaries and tests from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/. The binaries and test files have to be de-archived. Steeplechase can then be run:

% tar xvfj firefox-33.0a1.en-US.linux-x86_64.tar.bz2 
% mkdir tests
% cd tests/
% unzip ../firefox-33.0a1.en-US.linux-x86_64.tests.zip 
% mkdir ~/logs
% python ~/src/steeplechase/steeplechase/runsteeplechase.py --binary /home/mozilla/firefox-releases/firefox/firefox --specialpowers-path /home/mozilla/firefox-releases/tests/steeplechase/specialpowers --prefs-file /home/mozilla/firefox-releases/tests/steeplechase/prefs_general.js --signalling-server 'http://192.168.1.2:8080/' --html-manifest /home/mozilla/firefox-releases/tests/steeplechase/tests/steeplechase.ini --save-logs-to ~/logs/ --host1 192.168.1.3:20701 --host2 192.168.1.4:20701
steeplechase INFO | Pushing app to Client 1...
steeplechase INFO | Pushing app to Client 2...
Writing profile for Client 1...
Pushing profile to Client 1...
cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://192.168.1.3:38439/index.html']
Writing profile for Client 2...
Pushing profile to Client 2...
cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://192.168.1.4:38439/index.html']
steeplechase INFO | Waiting for results...
steeplechase INFO | All clients finished
steeplechase INFO | Result summary:
steeplechase INFO | Passed: 118
steeplechase INFO | Failed: 0
%

I now have this working on both my lab at home and the ESX lab in Mountain View. Next: making it work autonomously.

Bootstrapping a data center remotely

Using a VM running vSphere connecting to an ESX server 2000 miles away over VPN, and using VNC to connect to a known Linux box on the same network as the ESX box, I was able to download enough ISOs to create a Windows 7 VM. The idea is to put vSphere Client on it, and use Microsoft Remote Desktop to connect to it. I can then download ISOs to its hard drive and create VMs with "ISO on local disk" option.

Had to download MS RDC. It's free now. This is great.

I am also starting to build up the first linux box. I used VNC on the other machine to download the ISO of Ubuntu 14.10 Desktop. Used the VSphere Client to install the machine. Of course, I had to use the vSphere Client to access the Desktop. Doing this from RDC is painful; typing often results in duplicated characters. So I installed openssh-server:

sudo apt-get install openssh-server
sudo /etc/init.d/ssh restart

And then I could ssh in from a Terminal on my machine. Ah, much better.

I still want desktop access, though. I enabled Desktop Sharing on the Linux VM. Alas, the Mac Screen Sharing could not connect to it, although it could connect to an Ubuntu 12 machine. My coworker point me to this article, and now the Linux machine is good to go.

Monday, July 7, 2014

ESX, remotely

OK, somebody gave me VNC to a linux box in the office over VPN, so I can download ISOs. MSDN's sight has a really really bad captcha on it, and using a password manager was a pain, and it did 2-level, but I'm safe now, right?

Anyway, I can now download ISOs onto a machine on the same network as ESX box. Need to figure out how to get ESX to mount the newly created NFS export where the ISO lives.

OK, I figured it out. You create the VM in vSphereClient. Let the boot fail. Then, connect the .iso stored on your data store to the DVD drive. Finally, press Ctrl-Alt-Ins, and it should boot to your ISO.

More later.

Thursday, July 3, 2014

Home lab

Now, it's time to try everything out. The steeplechase machine needs to have firefox binaries and test artifacts. So, let's go get them.

The nightly build artifacts are here. On the steeplechase machine, we need to download both firefox-33.0a1.en-US.linux-x86_64.tar.bz2 and firefox-33.0a1.en-US.linux-x86_64.tests.zip. We will then unpack them appropriately:

mkdir firefox-releases
cd firefox-releases
mv ~/Downloads/firefox* .
tar xvfz firefox*.tab.bz2
mkdir tests
cd tests
unzip ../firefox*.zip

We have to have node running on this machine:

mozilla@jenkins-steeplechase:~$ cd simplesignalling/
mozilla@jenkins-steeplechase:~/simplesignalling$ ls
package.json  README.md  server.js
mozilla@jenkins-steeplechase:~/simplesignalling$ nodejs server.js

We need to start the agent on the two Negatus machines:

mozilla@ubuntu:~/src$ cd Negatus/
mozilla@ubuntu:~/src/Negatus$ git pull
Already up-to-date.
mozilla@ubuntu:~/src/Negatus$ ./agent
Command handler listening on 0.0.0.0:20701
Heartbeat handler listening on 0.0.0.0:20700
Query url: IPADDR=0.0.0.0%3A20701&NAME=SUTAgent
No SUTAgent.ini data.
No reboot callback data.

Mmm. It looks like I forgot a step. Running server.js should have output something. Looking back on our internal notes, I needed to run this:

npm install socket.io@0.9.6

If you do that from the simplesignalling directory, it will fail kind of like:

npm ERR! Error: Invalid version: "0.1"
npm ERR!     at Object.module.exports.fixVersionField (/usr/lib/nodejs/normalize-package-data/lib/fixer.js:178:13)
npm ERR!     at /usr/lib/nodejs/normalize-package-data/lib/normalize.js:29:38
npm ERR!     at Array.forEach (native)
npm ERR!     at normalize (/usr/lib/nodejs/normalize-package-data/lib/normalize.js:28:15)

Once you install this correctly, then server.js will output something correctly:

mozilla@jenkins-steeplechase:~/simplesignalling$ nodejs server.js 
   info  - socket.io started

Now, we are ready to try to run steeplechase.

mozilla@jenkins-steeplechase:~/steeplechase$ python `pwd`/steeplechase/runsteeplechase.py --binary /home/mozilla/firefox-releases/firefox/firefox --specialpowers-path /home/mozilla/firefox-releases/tests/steeplechase/specialpowers --prefs-file /home/mozilla/firefox-releases/tests/steeplechase/prefs_general.js --signalling-server 'http://172.16.141.51:8080/' --html-manifest /home/mozilla/firefox-releases/tests/steeplechase/tests/steeplechase.ini --host1 172.16.141.52:20701 --host2 172.16.141.53:20701
steeplechase INFO | Pushing app to Client 1...
steeplechase INFO | Pushing app to Client 2...
Writing profile for Client 1...
Pushing profile to Client 1...
cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:59367/index.html']
Writing profile for Client 2...
Pushing profile to Client 2...
cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:59367/index.html']
steeplechase INFO | Waiting for results...
steeplechase INFO | All clients finished
steeplechase INFO | Result summary:
steeplechase INFO | Passed: 112
steeplechase INFO | Failed: 0
mozilla@jenkins-steeplechase:~/steeplechase$

It worked! We have a running lab on Linux now.

Next step: Get Jenkins to invoke this.

Virtual Center

The ESX server I am installing onto does not have any OS IOS images to use to make VMs. So I need to have them locally. However, I'm in Austin, and the ESX server is in Mt. View. This is not going to work. Somebody at Moz HQ is going to get up a machine I can login via Remote Desktop Connection so I can continue.

Networking home lab setup

Things work much better when you have static IP addresses. This means that your IP addresses won't change when you restart your VMs, which means any command lines you use will remain valid. VMWare Fusion does not make assigning IP addresses easy, but it can certainly be done.

When you created your VMs, they were assigned generated MAC addresses by Fusion. Need to retrieve those here:

Virutal Machine -> Network Adapter -> Network Adapter Settings... Turn down the "Advanced Options" disclosure triangle...

Once you have the Mac addresses for your VMs, you can change your config. One source I drew heavily on is here; you have to restart your network services if VMWare is running. And you have to be careful; Fusion loves to blow away your changes.

On my machine, here is the mapping:

# Configuration file for ISC 2.0 vmnet-dhcpd operating on vmnet8.
#
# This file was automatically generated by the VMware configuration program.
# See Instructions below if you want to modify it.
#
# We set domain-name-servers to make some DHCP clients happy
# (dhclient as configured in SuSE, TurboLinux, etc.).
# We also supply a domain name to make pump (Red Hat 6.x) happy.
#


###### VMNET DHCP Configuration. Start of "DO NOT MODIFY SECTION" #####
# Modification Instructions: This section of the configuration file contains
# information generated by the configuration program. Do not modify this
# section.
# You are free to modify everything else. Also, this section must start 
# on a new line 
# This file will get backed up with a different name in the same directory 
# if this section is edited and you try to configure DHCP again.

# Written at: 05/14/2014 16:13:27
allow unknown-clients;
default-lease-time 1800;                # default is 30 minutes
max-lease-time 7200;                    # default is 2 hours

subnet 172.16.141.0 netmask 255.255.255.0 {
 range 172.16.141.128 172.16.141.254;
 option broadcast-address 172.16.141.255;
 option domain-name-servers 172.16.141.2;
 option domain-name localdomain;
 default-lease-time 1800;                # default is 30 minutes
 max-lease-time 7200;                    # default is 2 hours
 option netbios-name-servers 172.16.141.2;
 option routers 172.16.141.2;
}
host vmnet8 {
 hardware ethernet 00:50:56:C0:00:08;
 fixed-address 172.16.141.1;
 option domain-name-servers 0.0.0.0;
 option domain-name "";
 option routers 0.0.0.0;
}
####### VMNET DHCP Configuration. End of "DO NOT MODIFY SECTION" #######

host jenkins {
        hardware ethernet 00:0c:29:ff:39:df;
        fixed-address 172.16.141.50;
}

host steeplechase {
        hardware ethernet 00:0c:29:5a:8e:75;
        fixed-address 172.16.141.51;
}

host linux64-negatus-01 {
        hardware ethernet 00:0c:29:b7:fe:99;
        fixed-address 172.16.141.52;
}

host linux64-negatus-02 {
        hardware ethernet 00:0c:29:69:b0:0f;
        fixed-address 172.16.141.53;
}

Restart the vmware network.

sudo /Applications/VMware\ Fusion.app/Contents/Library/vmnet-cli --configure
sudo /Applications/VMware\ Fusion.app/Contents/Library/vmnet-cli --stop
sudo /Applications/VMware\ Fusion.app/Contents/Library/vmnet-cli --start

This article also discusses this issue.

And we can see that all four machines are up and running by pinging them from Terminal on the same machine (they are not visible outside of the Mac):

sydpolkzillambp:~ spolk$ ping -c 1 172.16.141.50
PING 172.16.141.50 (172.16.141.50): 56 data bytes
64 bytes from 172.16.141.50: icmp_seq=0 ttl=64 time=0.355 ms

--- 172.16.141.50 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.355/0.355/0.355/0.000 ms
sydpolkzillambp:~ spolk$ ping -c 1 172.16.141.51
PING 172.16.141.51 (172.16.141.51): 56 data bytes
64 bytes from 172.16.141.51: icmp_seq=0 ttl=64 time=0.263 ms

--- 172.16.141.51 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.263/0.263/0.263/0.000 ms
sydpolkzillambp:~ spolk$ ping -c 1 172.16.141.52
PING 172.16.141.52 (172.16.141.52): 56 data bytes
64 bytes from 172.16.141.52: icmp_seq=0 ttl=64 time=0.338 ms

--- 172.16.141.52 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.338/0.338/0.338/0.000 ms
sydpolkzillambp:~ spolk$ ping -c 1 172.16.141.53
PING 172.16.141.53 (172.16.141.53): 56 data bytes
64 bytes from 172.16.141.53: icmp_seq=0 ttl=64 time=0.331 ms

--- 172.16.141.53 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.331/0.331/0.331/0.000 ms
sydpolkzillambp:~ spolk$

OK, so next we'll put it all together.

Tuesday, July 1, 2014

Installing Virtual Center

Actually, the first I install on a new VM is Firefox. Duh.

Once that is done, I went to the IP address of the ESX machine. It gives you the option of installing vSphere Client, which is what we will use here.

I downloaded and installed it. Great.

Where are the ISOs to install VMs with. Checking with coworker...

After updating stuff

The Linux machines running Negatus need these packages:

sudo apt-get install git g++ libnspr4-dev

You can then clone it, build it, and run it:

mkdir src
cd src
git clone https://github.com/mozilla/Negatus
cd Negatus
make -f Makefile.linux
./agent

When you run the agent, it looks like this:

%  ./agent
Command handler listening on 0.0.0.0:20701
Heartbeat handler listening on 0.0.0.0:20700
Query url: IPADDR=0.0.0.0%3A20701&NAME=SUTAgent
No SUTAgent.ini data.
No reboot callback data.

That's great. Ctrl-C to quit, shutdown the VM, copy or clone it, and then launch both agents. Next task: Networking

Back to lab building

Doing two things at once:

- Starting to pay with an ESX server a coworker set up.
- Continuing building out a home lab.

So, ESX requires Virtual Center to actually build machines. You download Virtual Center with your browser and install it.

Oh, yeah. It requires Windows. Sigh.

So I decided to make a VM for it. Making a Windows 8 VM is a pain because you have to do registry tricks so as not to be presented with swipe panels you can't dismiss. So I decided to make a Windows 7 VM. Straightforward. Except that now I am waiting for 145 Windows Updates to downloads....

Meanwhile, back to building my lab. So, I need to build clients now. Time for two more Ubuntu VMs. Running out of memory, so these will be 2 core/2 GB machines (we'll try that).

And, of course, Linux has its own set of updates....

And the Macs are now wanting to update...

Lots of waiting at times when building labs. More later.

Monday, June 30, 2014

Resume screening burnout

While taking care of getting cars fixed in anticipation of summer road trips, I spent two days screening resumes. I am burned out!

So, going back to building my lab tomorrow.

Thursday, June 26, 2014

Hiring

So, we are hiring in my group. Here is the link to the job posting:

https://hire.jobvite.com/Jobvite/jobvite.aspx?b=nkspCnwF

We are looking for people with automation skills. What does that mean, you might ask? Well, here is what I think.

First of all, automation engineers are developers. The have good skills at developing automation software. At Mozilla, that means that they know Python, JavaScript, C, or C++. They know how to code and debug.

However, they are also QA engineers. They have a desire to break software to make it better. They have a desire to have some level of comfort in measurable quality. They are the kind of person who breaks websites trying to do normal things like buy a shirt or login to their bank. And they always want to know why it breaks, and may try to figure it out because either they are angry, they are curious, or they enjoy the thrill of a problem solved.

These kinds of people are very hard to find. Most people with coding skills want to work on some other kind of software than automation. Most really good testers don't necessarily have the technical skills to write automation.

I am plowing through a bunch of resumes this afternoon. Most of them are from candidates who are submitting their resume to every position that they can find regardless of qualifications. While it is true that there are many fewer jobs in tech than there are people who want them, it is also true that the number of qualified candidates is very small. There is no such thing as a candidate that matches all of the job requirements perfectly. My philosophy is looking for somebody who is smart, has a track record of success, knows some of your skills, has knowledge in similar skills, and has a track record of learning. Add on top of that some requirement of social skills, and you have a good candidate.

It's a large job standing in front of the firehose and screening resumes. It's fun, but at the end of the day, I am glad it is my bosses position and I am only helping out.

Wednesday, June 25, 2014

Steeplechase

In order to run our multi-machine tests, we are going to need some kind of process runner which can execute remote-control commands on the test machines. Our solution uses a technology called Steeplechase (https://github.com/mozilla/steeplechase), which has a simple command language. It uses another technology called signalingserver (https://github.com/luser/simplesignalling). This in turns relies on node.js (http://nodejs.org/), which allows a server to run javascript without having to have a browser.

So, first created another Linux VM to run these pieces in. It won't need as many cores nor as much RAM as the Jenkins machine, but it will potentially need some more disk space for logs or binaries. So, the specs on this machine:

2 cores
2 GB RAM
60 GB disk

Now installed signalling server and its dependencies. I ran the following:

sudo apt-get install git nodejs
npm install socket.io

Now, I ran this to get the signaling server source:

git clone https://github.com/luser/simplesignalling

Finally, I started it up:

cd simplesignalling
nodejs ./server.js

Now, for steeplechase, we need the following dependencies:

sudo apt-get install python-setuptools

Clone the steeplechase repo:

git clone https://github.com/mozilla/steeplechase

We need to bootstrap the python environment for steeplechase.

cd steeplechase
sudo python setup.py install

The machine is now setup and ready to run steeplechase, but without test clients, there is nothing for steeplechase to talk to. Next installment, setting up the test clients.

Tuesday, June 24, 2014

Starting to build out a proto-system

I have a meeting with the person who can provide me with final hardware and lab space today. In the meantime, I am building the prototype system.

Let me talk about the basic setup.

Assumptions:

Each test run requires two machines to run the actual tests.

Each machine has an agent called Negatus on it.
Large pool of these machines to run tests from.

Those machines are controlled by a controlling agent called steeplechase. This runs on another machine.
There needs to be a TURN server on some machine.
There may need to be more VMs created to provide networking gateways/routing to simulate network conditions.
There needs to be some kind of scheduler to run the tests. We are going to use Jenkins (http://jenkins-ci.org/). It's easy, and we should not run into its limitations.

So, I will setup the last one right now, the Jenkins machine.

At this point, I should point out that this is a large number of VMs for one desktop machine to host on VMWare Fusion. I may have to host some of them on a second machine.

Yesterday I built a Ubuntu 14.10 Desktop VM in VMWare Fusion. I use Desktop because I like having X-Windows. Everything else is easy to install. The initial specs:

4 cores
8192 RAM
100 GB RAM

Jenkins loves RAM. Well, it's a Java app, and all Java apps love RAM. Also, all logs and intermediate results from Jenkins processes end up being stored on the host, so the host needs some disk space. And giving this machine 4 cores allow one to administer the machine without shutting down Jenkins.

I then ran all of the software updates. And installed VMWare Tools. Which you need for the clock to work and to be able to copy-paste into apps.

It used to be that you had to install Java, which on Linux can be a pain, and setup your own script to run Jenkins, and setup your own startup script, etc. In the last couple of years, somebody smart made Jenkins an installable Debian package; details here: http://pkg.jenkins-ci.org/debian/. If you are having trouble figuring out how to change /etc/apt/sources.list, see this page for info: http://askubuntu.com/questions/197564/how-do-i-add-a-line-to-my-etc-apt-sources-list.

At this point, Jenkins is running (see http://localhost:8080) to verify.

Tomorrow: Setting up signalingserver, steeplechase, negatus...

Monday, June 23, 2014

What we are going to tackle next

So, now that we have our basic Sunny Day environment more-or-less running, it's time to figure out what to do next.

We have identified two areas as fertile fruit for further testing using the Sunny Day environment.

Establish WebRTC connection - This set of tests will test connecting between two clients in a variety of network situations. Networks with different characteristics will either be setup or simulated with test doubles. We will be testing with high-latency, low-bandwidth, high packet loss and other pathological environments. We will also be testing in various configurations with NAT, firewalls, and other different network topologies.
Quality of established connection - Once the connection is made, how well does it hold up when various bad things happen to the network? We will also attempt to test audio/video quality.

We are first going to write test plans for this and then implement tests to the plan. Other members of the team are primarily responsible for these efforts.

I will be working on the environmental problem. Basically, we have to run in a wide variety of environments:

Different platforms - Linux, Windows, Mac OS X on the desktop; Android and Firefox OS for mobile.
Different Firefox versions - We will test nightly connecting with the various public releases of Firefox - Nightly, Aurora, Beta, Release and Extended Release for Desktop; as-yet-unknown for mobile.
Potentially different browsers - Chrome, Opera

I will be building an environment using both VMWare ESX and VMWare Fusion, with a smattering of bare hardware. The job control software will be Jenkins, with the rest of the machines closely matching the configuration of the Sunny Day environment.

I have to build out the hardware (or have it built out), but I also have to develop the Jenkins instance configuration, find a way to store it in source control, and write scripts to build machines up. We have a few that we use in the Sunny Day environment; I plan on expanding that work.

Should be fun!

What we have done so far

So, before I start blogging-as-I-go, I should explain what we have been doing so far.

We set up some machines in a lab that we have designated our Sunny Day environment. This is designed to test WebRTC with nightly Firefox builds on Linux with real audio and video hardware connected.

Our setup consists of the following:

One linux machine running our test harnesses, server.js (run with node), and steeplechase. server.js is a signalling server, and steeplechase manages commands to remote clients, in this case the other machines in the fleet. This machine also runs TURN, which keeps track of the public IP addresses of the clients, which is necessary when the clients are behind a firewall/NAT.
Two linux machines, each running our test agent, called Negatus. steelplechase sends files and commands to Negatus, and Negatus launches Firefox and runs Firefox test commands. These machines each have an mp3 player and a video camera (trained on a clock in the lab).

We run a series of tests connecting the two clients and sending 3 hours of video and audio back and forth. When the system is finished, it will report results into Mozilla's new result reporting mechanism, treeherder, where interesting engineers can see the results of the tests on the web.

Sunny Day provides a level of security in that we know that if the tests pass, our nightly Firefox can establish WebRTC sessions on a real network with real video/audio streaming, and is a great first step.

The next entry in this blog will talk about what we are doing next.

Hello, and Welcome

Hi. My name is Syd Polk, and I recently joined Mozilla. It's a great place, and it is really great to be able to be open about what I work on.

My position is officially titled Technical Lead for Platform QA. So, what does that really mean? I am still figuring it out, but basically, Mozilla is divided into several engineering groups. We have a centralized QA group, with engineers assigned to test various technologies. The base technology for everything is called the Platform Group, and until very recently, has not had dedicated QA.

You are meeting the only dedicated Platform QA Engineer by reading this blog post. Now, most of the platform technology is tested in the other groups. Platform is used to build the browser, for instance, and there are people who test the browser. I was hired to help identify the cracks where other technology does not test something adequately.

One area that we identified right before I joined was WebRTC (http://www.webrtc.org), the technology that allows Firefox users to connect to each other for real-time video/audio chat sessions. Our existing automation for this feature ran the network test on the build machines we have here. They have no external network connections, so the tests used fake audio/video streams on two browser instances on the same machine running only on Linux. While this does verify that the basic connection mechanism is not fatally broken, it does not test interesting network setups, different network characteristics, or video/audio stream quality. It also does not test the WebRTC connections on heterogeneous platforms, nor with differing versions of Firefox. Or, for that matter, with other browsers that support WebRTC, like Chrome or Opera.

So, right now, I am working with a few Mozilla engineers to build a system to address some of those concerns. Should be fun!