Thursday, July 31, 2014

Finally.

I have a Jenkins instance running the regression suite on Firefox nightly builds in all platform combinations of linux64, linux32, and macosx. The front page looks like this:


So, I have proof-of-concept. I am demoing this at our QA Work Week QA Fair later today. Should be fun!

Lots of work to do, however.


  1. I need to investigate using parameterized builds. Having to create and maintain a separate Jenkins job for each combination is painful (I have a lot of experience with this from Coverity, alas).
  2. Henrick Skupin has a Jenkins instance, and he has solved the problem of the nightly version numbers changing every month. Need to implement that.
  3. Need to add jobs for nightly versions connecting to Aurora, Beta, Release and ERS versions of Firefox.
  4. The test is not incredibly valid. I need to take the test additions done by Geo Mealer in our Sunny Day Environment and run the connections for 1 minute each. I really want to do the parameterization first so I don't have to update dozens of jobs.
  5. I need to put this in our ESX farm. However, I only want to do this when most of the above is done. I am going to set up Jenkins on the ESX farm, running linux64 regressions for now.
  6. I have got to get Jenkins and Steeplechase working for Windows. While I am in Mountain View this week, I do not have access to my Windows VMs. Will have to wait until I get home next week.
  7. Sunny Day Environment does most of its job maintenance and triggering via cron jobs. It would be nice to move this into Jenkins.
  8. Need to report results of test runs into Treeherder. This is a pretty big effort.
  9. And then, there is B2G and Android.
  10. Steeplechase changes
    1. Run with existing binaries but new profile. (or whatever; independent control)
    2. Send binary archives down instead of directories, and tell clients how to unpack them.
    3. Steeplechase should talk to treeherder.
Ought to keep me busy.

Wednesday, July 30, 2014

Another fun fact: For Negatus to be able to run firefox for the tests, it has to run it in a display. Without setting up a fake X session, the Negatus client has to be logged in, at least on Linux. I am sure that this will be true on other platforms as well. For Linux, we could run in a virtual frame buffer, but I am not sure that this is necessary. Just set up the account the test will run in to auto login. Make sure it is on a private network, though...

OK, it's just too hard to nail down all of the libraries to run firefox32 on linux64. I give up. Average users won't do this.

A-ha!

You actually have to do work to run 32-bit binaries on 64-bit linux. I knew this, but I rediscovered this fact this morning.

Of course, this got much harder in modern Ubuntu:

sudo apt-get install libxtst6:i386 libXext6:i386 libxi6:i386 libncurses5:i386 libxt6:i386 libxpm4:i386 libxmu6:i386 libxp6:i386
sudo apt-get install libstdc++6-4.8-dbg:i386

Sigh.

Wednesday, July 23, 2014

Three things I am working on - follow-up

Thing 1 - Bugzilla 1036992 - I have a patch now which completes the split up of test_seek.html, including refactoring of common javascript in test_seek-split*.html files. Figuring out how to add the javascript so that it was actually available was fun. Basically, everything goes in mochitest.ini. Who new?

Thing 2 - mochitest dying on my VM. Well, it still happens, and it often does not leave a stack dump. Still working on this one.

Thing 3 - Jenkins and nodejs. I restarted everything and it started working. Huh. Weird.

So, now I am setting up Jenkins in Mountain View to reproduce the result. Unfortunately, when the physical box was moved into the lab, the networking broke. All of the existing VMs have IPv6 addresses rather than the static addresses I thought were assigned.

Coworker in charge of box said he would look at that.

So, now I am building linux32 builders for this test on my home VM. I am working out of one of my relative's houses in the Midwest right now, and her internet connection was SLOW. Well, the upgrade came through yesterday, and now it is as fast as my home network in Texas.

It helps.


Tuesday, July 22, 2014

Three things I am working on

The first one is this bugzilla. Basically, the audio/video tests sometimes time out in Mozilla's build/test environment, and we are trying to track down which tests are sensitive. We run a lot of tests in Amazon's S3 cloud, and disk and network access are not predictable in that environment.

My boss and I have submitted a few patches to unparallelize some of the tests to see if it helps. Latest patch is run by our environment in tbpl, here. I need to run quite a few more tests and analyze them today. One thing I had to learn was Mercurial queues; using this is the best way to work with patches with Firefox and Bugzilla. More info here and here. You have to be really careful with them, as it is easy to blow away work. Still, it's a really nice system for managing patches. In git, you would do the same with local branches, but it's not quite as easy.

The second is a test time out running mochitest on a subdirectory on a Mac OS X VM. Mochitest is the oldest Firefox test suite. More info on it can be found here. I chose to run it from a build tree, so I had to go build Firefox on Mac. Info on that here. I then ran the following:


./mach mochitest-plain content/media/test


And then watched the magic! Note that this is the same test suite as we are watching with the first problem I am working on. Sometimes, I get a test failure on my VM that nobody else seems to be running into, so I am working on trying to get enough data to file a bug. A coworker directed me to try out setting MINIDUMP_STACKWALK before running the test. Once I dug enough for somebody to tell me that this tools was part of a another mercurial repo (http://hg.mozilla.org/build/tools), I tried it. No dice. Need to submit that bug report today.

And last, I am having trouble getting steeplechase to run in my Jenkins lab. The latest:


cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:55293/index.html']
Traceback (most recent call last):
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 311, in 
    sys.exit(0 if main(sys.argv[1:]) else 1)
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 301, in main
    html_pass_count, html_fail_count = test.run()
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 187, in run
    passes, failures = result
TypeError: 'NoneType' object is not iterable
Exception in thread Client 1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 100, in run
    output = dm.shellCheckOutput(cmd, env=env)
  File "/usr/local/lib/python2.7/dist-packages/mozdevice-0.37-py2.7.egg/mozdevice/devicemanager.py", line 395, in shellCheckOutput
    raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
DMError: Non-zero return code for command: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:55293/index.html'] (output: 'r', retval: '256')

Exception in thread Client 2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/home/mozilla/jenkins/workspace/linux64-linux64/steeplechase/steeplechase/runsteeplechase.py", line 100, in run
    output = dm.shellCheckOutput(cmd, env=env)
  File "/usr/local/lib/python2.7/dist-packages/mozdevice-0.37-py2.7.egg/mozdevice/devicemanager.py", line 395, in shellCheckOutput
    raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
DMError: Non-zero return code for command: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:55293/index.html'] (output: 'r', retval: '256')


Need to see what that is about...

[LAB] Setting up for jenkins

Packages required on Jenkins machine:

  • git
Packages required on Steeplechase machine:
  • openjdk-7-jre-headless
  • curl
Jenkins plugins:
  • Git plugin
  • Mercurial plugin
I am afraid I finished the buildout of this without good notes. Sorry!

Tuesday, July 15, 2014

New task

I was given a new area to investigate on top of building out a lab. Basically, we have some media streaming tests in our tree, in <mozilla-central>/content/media/test, and they are flaky. I have been asked to investigate a couple of things, using test_seek.html as an example:

- The tests actually get run in parallel. I have been asked to see if running them singly will affect the intermediate failure rate.
- I have been asked to split this file up. It currently calls 13 sub-files. I have already generated the first of those 13 files and it works fine.

All of this requires checking out the Firefox source (instructions here), and then running Mochitest (instructions here). I can run the individual test.

If do add a test file in a directory, such as <mozilla-central>/content/media/test, you have to add it to the mochitest.ini file in that directory to be picked up by the system.

I have two build trees (my Mercurial foo is low, and I am lazy), one to generate patches for the split out of the tests, and one to generate patches for our try system running only one test at a time.

Alas, I am not at my house; I am with family in another state, and the internet is slow here. It will be upgraded soon, but this is taking a while...

Thursday, July 10, 2014

OK, time to step back

Occasionally, you reach some little milestone, and it helps to make a To Do list. So, here is what is left to have this lab up and running now that I have 3 machines that can run tests.


1. Get Negatus to run on boot for the two client machines.
2. On my home lab, develop the Jenkins scripts and job templates that will do the work.
3. Install the Jenkins instance in the ESX lab.
4. Port the Jenkins work from the home lab to the ESX lab.
5. Start adding platforms.

That's a good general list.

Tomorrow is a test day, so I probably won't be doing my with this stuff. Let's what I can get done today.


Wednesday, July 9, 2014

Running the tests

One of my coworkers is going to post a public document on how to get steeplechase and Negatus running. Once he posts that, I will repost the link here. Basically, it looks like this:

- One machine needs to run simplesignaling. This is a nodejs-based server to facilitate Firefox communication.
- A machine needs to run steelpechase. This can be the same machine as the one that runs simplesignalling, but not required.
- Each of the client machines runs Negatus, which is a test agent.

The steeplechase machine needs to download the firefox binaries and tests from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/. The binaries and test files have to be de-archived. Steeplechase can then be run:

% tar xvfj firefox-33.0a1.en-US.linux-x86_64.tar.bz2 
% mkdir tests
% cd tests/
% unzip ../firefox-33.0a1.en-US.linux-x86_64.tests.zip 
% mkdir ~/logs
% python ~/src/steeplechase/steeplechase/runsteeplechase.py --binary /home/mozilla/firefox-releases/firefox/firefox --specialpowers-path /home/mozilla/firefox-releases/tests/steeplechase/specialpowers --prefs-file /home/mozilla/firefox-releases/tests/steeplechase/prefs_general.js --signalling-server 'http://192.168.1.2:8080/' --html-manifest /home/mozilla/firefox-releases/tests/steeplechase/tests/steeplechase.ini --save-logs-to ~/logs/ --host1 192.168.1.3:20701 --host2 192.168.1.4:20701
steeplechase INFO | Pushing app to Client 1...
steeplechase INFO | Pushing app to Client 2...
Writing profile for Client 1...
Pushing profile to Client 1...
cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://192.168.1.3:38439/index.html']
Writing profile for Client 2...
Pushing profile to Client 2...
cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://192.168.1.4:38439/index.html']
steeplechase INFO | Waiting for results...
steeplechase INFO | All clients finished
steeplechase INFO | Result summary:
steeplechase INFO | Passed: 118
steeplechase INFO | Failed: 0
%


I now have this working on both my lab at home and the ESX lab in Mountain View. Next: making it work autonomously.

Bootstrapping a data center remotely

Using a VM running vSphere connecting to an ESX server 2000 miles away over VPN, and using VNC to connect to a known Linux box on the same network as the ESX box, I was able to download enough ISOs to create a Windows 7 VM. The idea is to put vSphere Client on it, and use Microsoft Remote Desktop to connect to it. I can then download ISOs to its hard drive and create VMs with "ISO on local disk" option.

Had to download MS RDC. It's free now. This is great.

I am also starting to build up the first linux box. I used VNC on the other machine to download the ISO of Ubuntu 14.10 Desktop. Used the VSphere Client to install the machine. Of course, I had to use the vSphere Client to access the Desktop. Doing this from RDC is painful; typing often results in duplicated characters. So I installed openssh-server:

sudo apt-get install openssh-server
sudo /etc/init.d/ssh restart

And then I could ssh in from a Terminal on my machine. Ah, much better.

I still want desktop access, though. I enabled Desktop Sharing on the Linux VM. Alas, the Mac Screen Sharing could not connect to it, although it could connect to an Ubuntu 12 machine. My coworker point me to this article, and now the Linux machine is good to go.


Monday, July 7, 2014

ESX, remotely

OK, somebody gave me VNC to a linux box in the office over VPN, so I can download ISOs. MSDN's sight has a really really bad captcha on it, and using a password manager was a pain, and it did 2-level, but I'm safe now, right?

Anyway, I can now download ISOs onto a machine on the same network as ESX box. Need to figure out how to get ESX to mount the newly created NFS export where the ISO lives.

OK, I figured it out. You create the VM in vSphereClient. Let the boot fail. Then, connect the .iso stored on your data store to the DVD drive. Finally, press Ctrl-Alt-Ins, and it should boot to your ISO.

More later.

Thursday, July 3, 2014

Home lab

Now, it's time to try everything out. The steeplechase machine needs to have firefox binaries and test artifacts. So, let's go get them.

The nightly build artifacts are here. On the steeplechase machine, we need to download both firefox-33.0a1.en-US.linux-x86_64.tar.bz2 and firefox-33.0a1.en-US.linux-x86_64.tests.zip. We will then unpack them appropriately:


mkdir firefox-releases
cd firefox-releases
mv ~/Downloads/firefox* .
tar xvfz firefox*.tab.bz2
mkdir tests
cd tests
unzip ../firefox*.zip

We have to have node running on this machine:


mozilla@jenkins-steeplechase:~$ cd simplesignalling/
mozilla@jenkins-steeplechase:~/simplesignalling$ ls
package.json  README.md  server.js
mozilla@jenkins-steeplechase:~/simplesignalling$ nodejs server.js 

We need to start the agent on the two Negatus machines:


mozilla@ubuntu:~/src$ cd Negatus/
mozilla@ubuntu:~/src/Negatus$ git pull
Already up-to-date.
mozilla@ubuntu:~/src/Negatus$ ./agent
Command handler listening on 0.0.0.0:20701
Heartbeat handler listening on 0.0.0.0:20700
Query url: IPADDR=0.0.0.0%3A20701&NAME=SUTAgent
No SUTAgent.ini data.
No reboot callback data.



Mmm. It looks like I forgot a step. Running server.js should have output something. Looking back on our internal notes, I needed to run this:
npm install socket.io@0.9.6


If you do that from the simplesignalling directory, it will fail kind of like:
npm ERR! Error: Invalid version: "0.1"
npm ERR!     at Object.module.exports.fixVersionField (/usr/lib/nodejs/normalize-package-data/lib/fixer.js:178:13)
npm ERR!     at /usr/lib/nodejs/normalize-package-data/lib/normalize.js:29:38
npm ERR!     at Array.forEach (native)
npm ERR!     at normalize (/usr/lib/nodejs/normalize-package-data/lib/normalize.js:28:15)


Once you install this correctly, then server.js will output something correctly:
mozilla@jenkins-steeplechase:~/simplesignalling$ nodejs server.js 
   info  - socket.io started


Now, we are ready to try to run steeplechase.
mozilla@jenkins-steeplechase:~/steeplechase$ python `pwd`/steeplechase/runsteeplechase.py --binary /home/mozilla/firefox-releases/firefox/firefox --specialpowers-path /home/mozilla/firefox-releases/tests/steeplechase/specialpowers --prefs-file /home/mozilla/firefox-releases/tests/steeplechase/prefs_general.js --signalling-server 'http://172.16.141.51:8080/' --html-manifest /home/mozilla/firefox-releases/tests/steeplechase/tests/steeplechase.ini --host1 172.16.141.52:20701 --host2 172.16.141.53:20701
steeplechase INFO | Pushing app to Client 1...
steeplechase INFO | Pushing app to Client 2...
Writing profile for Client 1...
Pushing profile to Client 1...
cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:59367/index.html']
Writing profile for Client 2...
Pushing profile to Client 2...
cmd: ['/tmp/tests/steeplechase/app/firefox', '-no-remote', '-profile', '/tmp/tests/steeplechase/profile', 'http://172.16.141.51:59367/index.html']
steeplechase INFO | Waiting for results...
steeplechase INFO | All clients finished
steeplechase INFO | Result summary:
steeplechase INFO | Passed: 112
steeplechase INFO | Failed: 0
mozilla@jenkins-steeplechase:~/steeplechase$ 


It worked! We have a running lab on Linux now.

Next step: Get Jenkins to invoke this.

Virtual Center

The ESX server I am installing onto does not have any OS IOS images to use to make VMs. So I need to have them locally. However, I'm in Austin, and the ESX server is in Mt. View. This is not going to work. Somebody at Moz HQ is going to get up a machine I can login via Remote Desktop Connection so I can continue.


Networking home lab setup

Things work much better when you have static IP addresses. This means that your IP addresses won't change when you restart your VMs, which means any command lines you use will remain valid. VMWare Fusion does not make assigning IP addresses easy, but it can certainly be done.

When you created your VMs, they were assigned generated MAC addresses by Fusion. Need to retrieve those here:

Virutal Machine -> Network Adapter -> Network Adapter Settings... Turn down the "Advanced Options" disclosure triangle...



Once you have the Mac addresses for your VMs, you can change your config. One source I drew heavily on is here; you have to restart your network services if VMWare is running. And you have to be careful; Fusion loves to blow away your changes.

On my machine, here is the mapping:


# Configuration file for ISC 2.0 vmnet-dhcpd operating on vmnet8.
#
# This file was automatically generated by the VMware configuration program.
# See Instructions below if you want to modify it.
#
# We set domain-name-servers to make some DHCP clients happy
# (dhclient as configured in SuSE, TurboLinux, etc.).
# We also supply a domain name to make pump (Red Hat 6.x) happy.
#


###### VMNET DHCP Configuration. Start of "DO NOT MODIFY SECTION" #####
# Modification Instructions: This section of the configuration file contains
# information generated by the configuration program. Do not modify this
# section.
# You are free to modify everything else. Also, this section must start 
# on a new line 
# This file will get backed up with a different name in the same directory 
# if this section is edited and you try to configure DHCP again.

# Written at: 05/14/2014 16:13:27
allow unknown-clients;
default-lease-time 1800;                # default is 30 minutes
max-lease-time 7200;                    # default is 2 hours

subnet 172.16.141.0 netmask 255.255.255.0 {
 range 172.16.141.128 172.16.141.254;
 option broadcast-address 172.16.141.255;
 option domain-name-servers 172.16.141.2;
 option domain-name localdomain;
 default-lease-time 1800;                # default is 30 minutes
 max-lease-time 7200;                    # default is 2 hours
 option netbios-name-servers 172.16.141.2;
 option routers 172.16.141.2;
}
host vmnet8 {
 hardware ethernet 00:50:56:C0:00:08;
 fixed-address 172.16.141.1;
 option domain-name-servers 0.0.0.0;
 option domain-name "";
 option routers 0.0.0.0;
}
####### VMNET DHCP Configuration. End of "DO NOT MODIFY SECTION" #######

host jenkins {
        hardware ethernet 00:0c:29:ff:39:df;
        fixed-address 172.16.141.50;
}

host steeplechase {
        hardware ethernet 00:0c:29:5a:8e:75;
        fixed-address 172.16.141.51;
}

host linux64-negatus-01 {
        hardware ethernet 00:0c:29:b7:fe:99;
        fixed-address 172.16.141.52;
}

host linux64-negatus-02 {
        hardware ethernet 00:0c:29:69:b0:0f;
        fixed-address 172.16.141.53;
}


Restart the vmware network.

sudo /Applications/VMware\ Fusion.app/Contents/Library/vmnet-cli --configure
sudo /Applications/VMware\ Fusion.app/Contents/Library/vmnet-cli --stop
sudo /Applications/VMware\ Fusion.app/Contents/Library/vmnet-cli --start

This article also discusses this issue.

And we can see that all four machines are up and running by pinging them from Terminal on the same machine (they are not visible outside of the Mac):


sydpolkzillambp:~ spolk$ ping -c 1 172.16.141.50
PING 172.16.141.50 (172.16.141.50): 56 data bytes
64 bytes from 172.16.141.50: icmp_seq=0 ttl=64 time=0.355 ms

--- 172.16.141.50 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.355/0.355/0.355/0.000 ms
sydpolkzillambp:~ spolk$ ping -c 1 172.16.141.51
PING 172.16.141.51 (172.16.141.51): 56 data bytes
64 bytes from 172.16.141.51: icmp_seq=0 ttl=64 time=0.263 ms

--- 172.16.141.51 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.263/0.263/0.263/0.000 ms
sydpolkzillambp:~ spolk$ ping -c 1 172.16.141.52
PING 172.16.141.52 (172.16.141.52): 56 data bytes
64 bytes from 172.16.141.52: icmp_seq=0 ttl=64 time=0.338 ms

--- 172.16.141.52 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.338/0.338/0.338/0.000 ms
sydpolkzillambp:~ spolk$ ping -c 1 172.16.141.53
PING 172.16.141.53 (172.16.141.53): 56 data bytes
64 bytes from 172.16.141.53: icmp_seq=0 ttl=64 time=0.331 ms

--- 172.16.141.53 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.331/0.331/0.331/0.000 ms
sydpolkzillambp:~ spolk$ 

OK, so next we'll put it all together.

Tuesday, July 1, 2014

Installing Virtual Center

Actually, the first I install on a new VM is Firefox. Duh.

Once that is done, I went to the IP address of the ESX machine. It gives you the option of installing vSphere  Client, which is what we will use here.





I downloaded and installed it. Great.

Where are the ISOs to install VMs with. Checking with coworker...


After updating stuff

The Linux machines running Negatus need these packages:

sudo apt-get install git g++ libnspr4-dev

You can then clone it, build it, and run it:


mkdir src
cd src
git clone https://github.com/mozilla/Negatus
cd Negatus
make -f Makefile.linux
./agent


When you run the agent, it looks like this:

%  ./agent
Command handler listening on 0.0.0.0:20701
Heartbeat handler listening on 0.0.0.0:20700
Query url: IPADDR=0.0.0.0%3A20701&NAME=SUTAgent
No SUTAgent.ini data.
No reboot callback data.


That's great. Ctrl-C to quit, shutdown the VM, copy or clone it, and then launch both agents. Next task: Networking


Back to lab building

Doing two things at once:

- Starting to pay with an ESX server a coworker set up.
- Continuing building out a home lab.

So, ESX requires Virtual Center to actually build machines. You download Virtual Center with your browser and install it.

Oh, yeah. It requires Windows. Sigh.

So I decided to make a VM for it. Making a Windows 8 VM is a pain because you have to do registry tricks so as not to be presented with swipe panels you can't dismiss. So I decided to make a Windows 7 VM. Straightforward. Except that now I am waiting for 145 Windows Updates to downloads....

Meanwhile, back to building my lab. So, I need to build clients now. Time for two more Ubuntu VMs. Running out of memory, so these will be 2 core/2 GB machines (we'll try that).

And, of course, Linux has its own set of updates....

And the Macs are now wanting to update...

Lots of waiting at times when building labs. More later.