June 12, 2014

Curtis Hovey
sinzui
Sinzui » Sinzui
Migrating juju to HP Cloud’s Horizon

HP Cloud retired its old regions this month. If you use juju on HP Cloud and your account was created before this year, you may need to update your juju config and possibly complete your project setup in Horizon.

There are two changes you must make to each HP Cloud environment listed in environments.yaml. Set the region option to “region-a.geo-1″ or “region-b.geo-1″, which are US West and US East respectively. The new regions do not automatically assign public IPs. The use-floating-ip option must be set to “true” to tell juju to request a public address.

use-floating-ip: true
region: region-a.geo-1

Note that juju does not yet release the public address when you destroy the machine, You need to use the nova command line tools or the Horizon console > Project > Compute > Access & Security > Floating IPs to release them. We will release a fix for “Floating IPs are not recycled in OpenStack Havana” soon.

If you haven’t used these regions before, you needs to register your public SSH keys and create a default network for each one you use. Visit Horizon console > Project > Compute > Access & Security > Key Pairs. Most people migrating just copied their public keys to the new region. Visit Horizon console > Project > Networks > Network and verify or create a default network, something like 10.0.0.0/24.

June 10, 2014
Building trans-cloud environments with juju

The Juju QA team uses juju to build the CI services that test juju. We have built 3 CIs in fact, because juju makes it easy to move services to the cloud of our choice. Our choice is driven by which cloud provides the best combination of resources, but CI has grown beyond what one cloud can provide. We want CI in all clouds to test with the resources they provide. While juju doesn’t support cross environment relations, that hasn’t stopped the QA team from building an hybrid environment that straddles two clouds and a private network.

The steps to provision a machine on another network are codified in our add-remote-machine.bash script. The Juju QA team uses this script to add physical machines, machines in kvm, and machines launched in other clouds into our Juju environment. Once a machine is registered with a juju environment, any charmed service can be deployed to it. Adding a machine to a open network is painless.

add-remote-machine.bash juju-ci3 my-keys/juju_ci_rsa 10.10.10.3
    |                          |          |              |
  script                     env-name   ssh-key   private-ip

Adding machines on restricted networks may require firewall egress changes.

Provisioning the best resources for services

Juju provides two essential devops features:

  • A cloud-agnostic way to provision machines. Juju supports Azure, EC2, Joyent, MAAS, and OpenStack-based clouds like HP Cloud.
  • A trivial means to deploy configured services to machines. I can put Jenkins slaves and web applications into production in minutes.

The mechanism to deploy services is so valuable, that juju provides a command to add existing machines to the environment. Existing machines often have special resources not provided by a simple cloud image. Using placement, I can deploy one or more services to each machine. For example, to add a machine that I want Jenkins to use, I would issue these commands to register machine number 2 in my environment, then deploy a jenkins slave to it, and finally configure the existing jenkins master to send work to it:

juju add-machine ssh:ubuntu@10.10.10.3
juju deploy --to 2 jenkins-slave ppc64-slave
juju add-relation jenkins-master ppc64-slave

Provisioning remote machines

The problem is that juju wants a private address when provisioning a machine. The solution to building hybrid environments composed of machines from other networks was understanding what network resources juju requires during and after provisioning. First, there are three parts to juju:

  • The juju client (juju command line or juju-gui) that issues commands
  • The juju state-server that manages the machines and services in the environment
  • The many juju agents (1 for each machine and service) that asks the state-server for tasks

During the act of provisioning (bootstrap or add-machine) the client acts as a bridge between the state-server and the remote machine. The private address is thus private to the host that the client is running on! Once the agent is running on the remote machine, it will talk to the state-server using the state-server’s private DNS name. The state-server doesn’t really know where the remote machine is, and it cannot access it.

Since I have ssh access to all the machines via their private address, I can add them to the juju environment. Even when I only know a machine’s public address, I can add an ssh rule that maps a random private address to the public host. As for helping the agent talk to the state-server, a single addition to the agent’s /etc/hosts is required to map the state-server’s public IP address to the private DNS name.

The script checks the required connectivity. It tries to verify common addresses that juju will use, such as the location of the environment’s private container, the server images, and juju tools. It is not authoritative. It will ask me to verify some connections that are not explicitly defined in the environment. It doesn’t know the network requirements of the services that will be deployed to the machine. This isn’t an issue for machines on open networks. Several of the Juju CI machines are on restricted networks, and I asked the IS department to allow egress to the required addresses and ports. We may improve the script’s verification support as we extend the environment to HP Cloud and Joyent.

As for how Juju makes CI easier, that is the subject or a future post.

April 21, 2014
Working with Juju, Ubuntu 12.04 Precise, and 14.04 Trusty

Juju 1.18.x is the first Juju that is aware of two Ubuntu LTS releases.
In the past, when bootstrapping the state-server or deploying a charm,
Juju selected Ubuntu 12.04 Precise when the serie was not specified. If
you care about the version of Ubuntu used, then you want to specify the
series

You can specify the series of the state-server and set the default charm
series by adding “default-series” to environments.yaml. Setting
“default-series” is the only way to specify the series of the
state-server (bootstrap node). For example, if your local machine is
Ubuntu 14.04 trusty, but your organisation only supports Ubuntu 12.04
Precise in the cloud, you can add this to your environment in
environments.yaml:

default-series: precise

There are many ways to specify the series to deploy with a charm. In
most cases you don’t need to. When you don’t specify a series, Juju
checks the environment’s “default-series”, and if that isn’t set, Juju
asks the charm store to select the best series to deploy with the charm.
The series of the state-server does not restrict the series of the
charm. You can use the best series for a charm when deploying a service.

When working with local charms, Juju cannot fall back to the charm
store, it falls back to the environment’s “default-series”. You must
specify the series in the environment or when deploying the charm. If
your environment is running, you can add “default-series” like so:

juju set-environment default-series=precise

These commands choose Ubuntu 12.04 Precise when “default-series” is set to “precise” in the environment:

 juju deploy cs:precise/mysql
 juju deploy precise/mysql
 juju deploy mysql
 juju deploy local:precise/mysql
 juju deploy loca:mysql

November 16, 2013
Restoring network to lxc and juju local-provider

I have experienced two cases where lxc containers stop working or new containers never work because they cannot join the network. My LTS container started, but without network, there were a lot of things I couldn’t do in it. In the case of Juju, I couldn’t deploy new local services. The unit agent status was stuck in PENDING.

You can verify the network is broken by fancy listing the lxc containers you started either by lxc-start or juju deploy:

sudo lxc-ls --fancy

NAME        STATE   IPV4 IPV6 AUTOSTART 
-------------------------------------------
<container> RUNNING -    -    NO

This shows that the running container doesn’t have IPV4 or IPV6 networks.

For each existing container that needs fixing, you need to install the new dhcp packages, but without a network, you cannot do it from a running container. Instead chroot can be used to update the container’s root file system.

cd /var/lib/lxc/<container>/rootfs/
sudo chroot ./
sudo apt-get update
apt-get install isc-dhcp-common isc-dhcp-client

To ensure that new containers work, you need to clear the lxc cache. The lxc image cache was built when you created or deployed your first container. The images there are probably more than a few months old.

sudo ls -lh /var/cache/lxc/precise/
sudo ls -lh /var/cache/lxc/cloud-precise/
sudo rm -r /var/cache/lxc/*

Lxc will get the new images the next time it needs to create a container. The first call will be slower to complete since it will build a fresh cache.

October 28, 2013
Restoring grub after OS X Mavericks

I had a scary moment after I updated my dual boot MacBookAir to Mavericks. Refit started Ubuntu, but grub was showing me the rescue prompt. This was a different misadventure from my upgrade to Mountain Lion; the partitions were there, but grub was lost.

I use refit to manage dual boot. I remembered from past experience that I needed to use the partition tool from the refit boot screen to sync the GPT and EFI tables. The task was done in seconds.

I knew Ubuntu was on the 4th partition (suggested by the GPT output from the sync), but to be sure I listed the partitions, then listed the 4th one.

ls
ls (hd0,gpt4)/boot

and I saw the boot images and a grub dir. I then assumed, all my data was in place, but the MBR was lost during the Mavericks install. To start Ubuntu, I typed:

set root=(hd0,gpt4)
set prefix=(hd0,gpt4)/boot/grub
insmod normal
normal

Ubuntu started up as normal. The grub screen was identical to my last boot. I was confident grub’s configuration was fine, I just needed to restore grub to the MBR. I opened a terminal after logging in and typed:

sudo grub-install --force /dev/sda

With hesitation, I rebooted. All was back to normal.

September 2, 2013

Launchpad blog
lp-blog
Launchpad blog
Launchpad build farm improvements

We’ve made a number of improvements to the Launchpad build farm in the last month, with the aim of improving its performance and robustness.  This sort of work is usually invisible to users except when something goes wrong, so we thought it would be worth taking some time to give you a summary.  Some of this work was on the Launchpad software itself, while some was on the launchpad.net hardware.

(To understand some of the rest of this post, it’s useful to be aware of the distinction between virtualised and devirtualised builders in Launchpad.  Virtualised builders are used for most PPAs: they build untrusted code in a Xen guest which is initialised from scratch at the start of each build, and are only available for i386, amd64, and a small number of ARM builds by way of user-mode QEMU.  Devirtualised builders run on ordinary hardware with less strict containment, and are used for Ubuntu distribution builds and a few specialised PPAs.)

ARM builders have been a headache for some time.  For our devirtualised builders, we were using a farm of PandaBoards, having previously used BeagleBoards and Babbage boards.  These largely did the job, but they’re really a development board rather than server-class hardware, and it showed in places: disk performance wasn’t up to our needs and we saw build failures due to data corruption much more frequently than we were comfortable with.  We recently installed a cluster of Calxeda Highbank nodes, which have been performing much more reliably.

It has long been possible to cancel builds on virtualised builders: this is easy because we can just reset the guest.  However, it was never possible to cancel builds on devirtualised builders: killing the top-level build process isn’t sufficient for builds that are stuck in various creative ways, and you need to make sure to go round and repeatedly kill all processes in the build chroot until they’ve all gone away.  We’ve now hooked this up properly, and it is possible for build daemon maintainers to cancel builds on devirtualised builders without operator assistance, which should eliminate situations where we need urgent builds to jump the queue but can’t because all builders are occupied by long-running builds.  (People with upload privileges can currently cancel builds too, which is intended mainly to allow cancelling your own builds; please don’t abuse this or we may need to tighten up the permissions.)  As a bonus, cancelling a build no longer loses the build log.

Finally, we have been putting quite a bit of work into build farm reliability.  A few problems have led to excessively long queues on virtual builders:

  • Builders hung for some time when they should have timed out, due to a recent change in su; this is now fixed in the affected Ubuntu series.
  • Xen guests often fail to restore for one reason or another, and when this happened builders would fail in ways that required an operator to fix.  We had been dealing with this by having our operators do semi-automatic builder fixing runs a few times a day, but in recent months the frequency of failures has been difficult to keep up with in this way, especially at the weekend.  Some of this is probably related to our current use of a rather old version of Xen, but the builder management code in Launchpad could also handle this much better by trying to reset the guest again in the same way that we do at the start of each build.  As of this morning’s code deployment, we now do this, and the build farm seems to be holding up much more robustly.

This should make things better for everyone, but we aren’t planning to stop here.  We’re intending to convert the virtual builders to an OpenStack deployment, which should allow us to scale them much more flexibly.  We plan to take advantage of more reliable build cancellation to automatically cancel in-progress builds that have been superseded by new source uploads, so that we don’t spend resources on builds that will be rejected on upload.  And we plan to move Ubuntu live file system building into Launchpad so that we can consolidate those two build farms and make better use of our available hardware.

June 25, 2013

Curtis Hovey
sinzui
Sinzui » Sinzui
Managing Juju Charm Versions

It is difficult for Juju charm authors to support forked charms. dev-ops often don’t realise they are forking a charm when they add files that they want the charm to deploy. We cannot assume that the deployed charm’s revision (or the version control revision number) will ever match what the author released. I use Bazaar tags to mark the versions I support and help identify the true version of a deployed charm.

Juju uses the charm’s “revision” file to store the version. The number in this file must be incremented to deploy additions or changes to the charm. Upgrading a deployed charm requires a higher revision number. Organisations commonly create a local charm repo (managed by a version control system like Bazaar) to freeze the versions they deploy. These known version are stable; this is a best practice when working with charms that change often. The local branch will get commits that are not in the branch published at Launchpad. The revision numbers are meaningless between the two branches.

I tag my charm branches with the charm revision number. When a dev-op forks my charm to add it to the local charm repo, we can compare branch tags. For example, I tagged lp:~charming-devs/charms/precise/elasticsearch/trunk r39 as elasticsearch-29 to match the number in the revision file. I can ask the dev-op which tags the local branch has. I can quickly identify missing releases. I can ask for a diff from the tag I last released to help diagnose problems with the changes.

For example, several issues were reported about my elasticsearch charm. I had addressed the problems months before. I suggested the dev-ops upgrade to the latest version. Since their version numbers were higher than my versions, they did not upgrade. Tags are more precise. I can suggest the dev-ops merge “-r tag:elasticsearch-29″.

Another case, charms may need to change often if the application is deploys is under active development. A recent deploy of charmworld failed; the charm upgrade errored. The dev-op and I couldn’t reconcile the version numbers between what was deployed and what I released. We had to read a diff of the charm’s directories and files to discover the deployed charm was missing several releases–the charm was incompatible with the deployed code. If I had tagged my release, we could have identified the issue in minutes.

June 6, 2013
Closing milestone bugs using Launchpad’s API

A few years ago I wrote a contrib script for Launchpad’s launchpadlib called ‘close-my-bugs.py’ which attempted to close (aka mark them ‘fix released’) all of your bugs in a project that were targeted to a particular milestone.

For various reasons it grew out of date and when I needed to use it recently, it didn’t work!  Long story short, I just fixed it up and added a couple of new features:

  • You can optionally close just your own bugs, or all the bugs in the milestone
  • You can search for bugtasks targeted against a series in your project (these are not normally picked up when searching in a project’s milestone)

You can grab the code here:

bzr branch lp:launchpadlib

contrib/close-my-bugs.py

April 24, 2013
Error handling in Go

There’s been a debate raging in some corners of the internet lately about how superior Go‘s error handling is to other languages.  I am going to address some of the points made, here:

Claim 1: It’s impossible to ignore errors in Go, they are “in your face”

This is patently false.  Take this example:

fmt.Println("Hello world")

Pretty innocuous wouldn’t you say?  Well let’s take a look at the language documentation for fmt.Println:

// Println formats using the default formats for its operands and writes to standard output.
// Spaces are always added between operands and a newline is appended.
// It returns the number of bytes written and any write error encountered.
func Println(a ...interface{}) (n int, err error)

So Println can return an error!  Where did we check it?  Well, we didn’t.  Any other claims that it’s OK to ignore it in this case further strengthen my argument.

Some will say that it’s a deliberate choice to ignore the error and I deserve all I get. Well, was it? I didn’t even know that Println returned an error until I looked at the documentation (and who is going to do that for Println?). And that’s the point, if I need to look at the documentation to see that it can return an error, then if I am using a language that raises exceptions I will have also seen its documentation about how it deals with errors.

You could even argue that an exception is superior in this case.  With Go, the code will march on regardless, oblivious to the fact that Println failed.  With exceptions, it’ll fail and show you exactly where it failed.

The language will error at compile time if you try to ignore an error returned as a second value and you only take the first.  But this is trivially bypassed by assigning it to _, which when reading code is easily missed compared to the exception style of “catching then dropping”, because Go itself encourages this style of assigning to _ with its own range statement as a deliberate way of ignoring things that the language is trying to force you to see.

So really in both cases, ignoring the error doesn’t really stand out as wrong.

Here’s a concrete example in Go I was recently shown:

w := bufio.NewWriter(os.Stdout)
for _, name := range ListAll(conf) {
    fmt.Fprintln(w, name)
}
w.Flush()
return

As you can see, the caller completely forgot to check the error returned from Fprintln and Flush and there would be no compiler warning about it.

Claim 2: Exceptions teach developers to not care about errors

Citing an example where someone didn’t catch an exception and the code consequently blew up is really not a good example of this claim.  It’s a bug, for sure and you get a full traceback of your error in the resulting exception, which is handy.  You go away and fix it quickly based on that info.

If I am in the same situation with Go and I ignore a returned error from a function, at some point (which is likely to be nowhere near the place where the error occurred) my code will blow up.  I’ll have to run up the debugger to try and find out where it really occurred though.

Because unused variables in Go are a compile-time error, it’s actively discouraging you from assigning the result of the function to a variable (or you can deal with it, of course).  For anyone who’s not read the full documentation for a function call or missed its return value (we’re all human) as I said above – you’re not even going to notice that you missed it.

Based on this, I can see no difference at all that suggests one way or the other teaches developers to not care about errors.  Developers do care about errors, really, but bugs creep in however careful you are.  And when they do, I’d rather have a decent indication of where the bug is.

Other parts of error handling that I dislike

When you look at the average Go program, you will see a lot of this:

if err != nil {
  return nil, err

This is the recommended way of error handling in in Go.  But this is not error handling, it’s error propagation.  In nearly all languages there will arise situations where in well-factored code you have a low-level error that you need to pass right back up to the entry point for the caller. That means you need this error propagation code in every single place where you check for errors.  There’s no syntactic sugar, just the same three lines everywhere.

For me, this vastly decreases the readability of the code. This is where exceptions excel because inside my own library I can factor the bejeesus out of it into many small functions and if I need to return an error, I just catch a lower-level exception in the top-level function and return something else.  You can do this in Go with a panic(), but it seems to be discouraged.  Panic() feels almost exactly like using exceptions, only the syntax is worse. If Go’s style is to encourage people to handle errors like this, it needs the sugar.

Conclusion

Many people might think that I completely hate Go’s error handling from this post.  That’s not strictly true – I don’t hate it, I just think it can be improved.  I challenge assumptions that I see which state that Go’s error handling is superior in some way, when as far as I can see it’s not that different from other languages in terms of usefulness.

Go is clearly in its infancy.  Most languages will have started out with youthful enthusiasm and realised that some change was needed.  These languages are the successful ones where developers enjoy coding in it and feel productive.  I hope that Go embraces change as it matures and attracts more developers.

I welcome comments on this post – unlike some people I won’t censor them or delete ones I can’t argue with (unless they are outright abusive and use foul language, this is a family blog!).


December 4, 2012

Launchpad blog
lp-blog
Launchpad blog
Private Projects and Private Blueprints leave beta

Today, the Private Projects and Private Blueprints features on Launchpad are leaving beta. These features are available now for use by all Launchpad users. Private Blueprints was started as part of the Private Projects work, with the end goal in mind of truly private projects on Launchpad. Private Projects was described in its beta announcement like this:

When creating a new project on Launchpad, beta testers will have the option to create “Proprietary” or “Embargoed” projects. Embargoed exists for projects that intend to start private but later be revealed publicly. All other private projects should be proprietary. Milestones and series are proprietary or embargoed based on the project setting. To make them public, you will need to make the project itself public.

When you create a proprietary or embargoed project on Launchpad, all of the sharing policies for your project will be set correctly for you. This means that if you start your project as a proprietary project, your branches, bugs, and blueprints will be created proprietary by default. Answers and translations are not available for private projects.

A commercial subscription is required to use private projects, but any user who creates a proprietary or embargoed project on Launchpad will receive a 30 day trial commercial subscription. Launchpad users with existing commercial subscriptions can convert a public project to proprietary or embargoed by changing the information type in the project’s settings. You may have some work to do on your project before you can transition to a private information type — for example, disable answers if you have that app enabled for your project — but Launchpad will block the change and tell you what needs to happen before you can switch to a private information type.

Users should be aware, though, that if your project has been listed on Launchpad publicly until now, then search engines know of its existence already. If you want a proprietary project that no one can learn of its name, you should create a new project on Launchpad. Transitioning a public project to private allows you to keep your series and milestones private going forward, but users may have already been able to discover the existence of the project since it was public already.

We are happy to make truly private projects available for all users on Launchpad. If you run into any issues, please file a bug against Launchpad or ask for help in #launchpad on Freenode.

November 20, 2012
How Novacut uses Launchpad

Launchpad has been a key tool used in developing Novacut. I use Launchpad for code hosting, bug tracking, daily builds, and more. For almost two years I’ve been doing monthly stable releases on Launchpad, and Novacut now spans six separate Launchpad projects. To say the least, I’ve learned a lot about Launchpad in the process.

I don’t think Novacut could be where it is today without Launchpad, so I want to pass on some of what I’ve learned the past two years. Here are my five essential Launchpad best practices:

1. Daily Builds

I’m always very thankful that early on Paul Hummer took the time to school me on using Source Package Recipes to do daily builds. This Launchpad service gives you automated package builds across multiple architectures, and multiple Ubuntu releases.

I don’t know how to emphasize this enough, but seriously, you need daily builds. As a point of reference, daily builds are the 3rd item in the famed Joel Test.

These builds are triggered simply by making commits to the appropriate bzr branch on Launchpad (usually your trunk branch). You’ll automatically get up to one build per 24-hour period, and you can manually trigger additional builds when needed.

You can include your debian/ packaging directory in your project source tree, or you can keep debian/ in a separate bzr branch. For the Novacut components, I’ve found it most helpful to keep debian/ in the source trees because it’s handy to be able to land a code change and its corresponding packaging change in a single merge. This works for us because we currently can use the exact same debian/ for all the Ubuntu versions we support. If that’s not true for your project, you’ll need multiple debian/ branches.

For reference, here’s the Novacut Source Package Recipe.

2. Unit Tests

You should run your unit tests during your package builds, and you should fail the build when any unit test fails. This is particularly important for daily builds, because this will prevent a package with broken unit tests from reaching your daily PPA.

The Launchpad build servers are strict and unforgiving environments, which is a good thing when it comes to unit tests. The build servers are also probably quite different from your local development environment. On countless occasions our daily builds have caught failures that only occur on i386 (my workstation is amd64), or only occur on an Ubuntu release other than the one I’m running, etc.

To run your unit tests during the package build, you’ll need to modify your debian/rules file as appropriate. If you’re using debhelper, add an override_dh_auto_test target.

You might also need to add additional packages to the Build-Depends section of your debian/control file, packages that are needed by the unit tests but are otherwise not needed by the build itself.

For reference, here’s the debian/rules file used to run the Dmedia unit tests (which is also a handy Python3 example).

3. Track Ubuntu+1

When a new Ubuntu version opens up for development, I immediately start doing daily builds on the development version, even though I don’t typically upgrade my own computers till around 4 months into the cycle.

I use daily builds on the development release as an early warning system. With no extra effort on my part, these builds give me a heads-up about code or packaging changes that might be needed to make Novacut work well on the next Ubuntu release.

To enable daily builds on the next Ubuntu version, just go to your Source Package Recipe, click on “Distribution series”, and check the box for the newest series. Now you’ll have daily builds on the newest Ubuntu version, in addition to all the versions you were already building for.

For example, I’m currently in the process of enabling daily builds for Raring, as you can see in the Microfiber Source Package Recipe. And I did indeed encounter a build failure on Raring, seemingly caused by a debhelper issue.

For the first month or so in a cycle, I don’t tend to worry much about build failures on the development version. But after the dust has settled a bit, I make sure to keep the builds in working order, and I even do monthly stable releases for the Ubuntu development version. Again, I do all this pro-actively even before I personally start running the newest Ubuntu version.

4. PPAs & Users

Whenever someone asks me why I use Launchpad instead of github, my short answer is always, “PPAs and users”.

Source Package Recipes give you much more than just a build, they give you daily packages that are easily consumable by your testing community and early adopters. This tight feedback loop prevents you from running too far ahead without getting a good reality check from your target users.

Keep in mind that for some products, the early adopters willing to install from a PPA might not be all that representative of your target user. So when it comes to making design decisions, you might need to politely ignore certain feedback from some of these early adopters. In my experience, this wont cause any hard feelings as long as you have clearly communicated who your target user is, and why.

For reference, you might look at the way we’ve defined the Novacut target user.

I recommend creating PPA names that are well-branded and easy to remember. First, create a Launchpad team with the same name as your product. In our case, we have a ~novacut team. Second, I recommend creating a daily and a stable PPA owned by the same team. In our case, that gives us two easy to remember PPAs:

Although none of our target users (professional video editors) currently use Ubuntu to do their job, I’ve been surprised by how many follow Novacut’s development via our stable PPA, and even our daily PPA. This has helped keep us on track, and has helped us build customer loyalty even before we have a finished product.

For me personally, this daily user engagement also makes the design and development process more enjoyable. It’s hard to empathize with an abstract persona; it’s easier to solve specific problems for specific people.

5. Use Apport

Till recently I didn’t realize that you can use Apport for automated crash reporting in unofficial packages delivered through a PPA.

We haven’t had Apport integration for that long, but it’s already provided us with dozens of highly valuable crash reports. Almost immediately some hardware specific issues came to light and were fixed, convincing me that a key benefit of Apport is knowing how your app might misbehave on a larger, more variable pool of hardware.

Apport also helped some rare bugs come to light. I thought Dmedia was basically crash-free, but those one-in-a-thousand bugs pop out quickly when thousands of people are running it. Most of these bugs would have eventually been found by one of our core devs, but the quicker a bug is discovered, the quicker and easier the bug is to fix.

For more info, check out this blog post and this screencast, where I covered our Apport integration in detail.

And for reference, see the merge proposal that added Apport integration in Novacut.

A big thank you to Jason DeRose for sharing how his project uses Launchpad on a daily basis.

 

November 9, 2012

Deryck Hodge
deryck
Considering Tumblr

I've grown tired of self hosting this site. I think it affects how much I blog or write online because I wrote this site myself, it's a bit heavy weight compared to more recent tools, and I just don't have the time or motivation to work on the site anymore. I just want to blog, link to interesting stuff, post pictures, etc.

Therefore, I'm considering moving this site to Tumblr. Now's your chance to weigh in and tell me I'm crazy. Or not.

What say ye all?

November 6, 2012

Launchpad blog
lp-blog
Launchpad blog
The information sharing feature is complete

Launchpad’s bug and branch privacy features were replaced by information sharing that permits project maintainers to share kinds of confidential information with people at the project level. No one needs to manage bug and branch subscriptions to ensure trusted users have access to confidential information.

The Disclosure features

Disclosure is a super feature composed on many features that will allow commercial projects to work in private. Untrusted users cannot see the project’s data. Project maintainers can share their project with trusted users to reveal all or just some of the project’s data. The ultimate goal is to create private project in Launchpad, but that feature required several other features to be completed first. The Purple squad worked on Trusted Pickers, Privacy Transitions, Hardened Projects, Social Private Teams, and Sharing.

There was a lot of overlap between each feature the Purple squad worked on. Though we could start each feature independent of one another, we could only complete about 90% of each. When the Sharing UI changes entered beta, we were unblocked and fixes about most of the remaining issues, but fixing all the issues required all projects to switch to Sharing.   We did not consider Sharing, or any of the required features complete until we fixed all the bugs.

Disclosure facts

  • Planning started in June 2010 to replace the existing privacy mechanisms with something that would scale.
  • Early testing revealed that users did not trust Launchpad because the UI could not explain what was confidential, or what the consequences of a change would be — this needed to be fixed too.
  • 149 related bugs were identified in Launchpad.
  • Work started in June 2011 by the Purple squad.
  • Replacing the old privacy mechanisms and addressing the trust and information issues took 16 months.
  • About 45,000 lines code were added to support the features.
  • About 15% of the lines were for missing JavaScript test coverage.
  • More that 700 bugs were fixed in total.
  • About 5% of the fixed bugs were caused by the old non-scaling privacy mechanisms.
  • About 4% of the fixed bugs were caused by old JavaScript enhancements that broke features for non-JavaScript users.

Lessons learned

  • Misrepresentation of what is confidential, or what will be confidential or public is very important to users — more important than supporting private data.
  • Privacy/Sharing must be a first-class mechanism beneath all the mechanisms that work with confidential data.
    • Privacy was added on top of bugs, and it failed to scale to 100′s of bugs.
    • Privacy was added on top of branches, and it failed to scale to 1000′s of branches.
    • Filtering private items in code, or in database joins is not fast enough to work with 100,000′s of items.
  • Launchpad’s ReSTful object API is not suitable for working with large collections of objects like bugs or branches; a lighter, service-based approach was used to quickly work with large amounts of data.
  • Users need to work with confidential data via the API, using a text web browser from servers, using a browser with accessibility tools, as well as the common case of using a JavaScript enabled browser.
  • Lots of mock-ups and interactive tests will not predict all the interactions a user will have with real data; test with real code and data early to developer the final design.

October 25, 2012
Private projects for beta testers

If you are part of Launchpad’s beta testers team, you can now start trialing private projects on Launchpad. The private projects feature builds on the great sharing work that Launchpad’s Purple Squad has done, allowing Launchpad users to create true private projects now. A commercial subscription is required to use private projects, but any user who creates a private project on Launchpad will receive a 30 day trial commercial subscription.

When creating a new project on Launchpad, beta testers will have the option to create “Proprietary” or “Embargoed” projects. Embargoed exists for projects that intend to start private but later be revealed publicly. All other private projects should be proprietary. Milestones and series are proprietary or embargoed based on the project setting. To make them public, you will need to make the project itself public.

Be warned, this is a large change to Launchpad and there are certainly bugs in our handling of privacy. You can check out our list of known issues, if you’d like. We, as the Launchpad Orange Squad, are committed to fixing all of those before we leave beta. So don’t worry, we’re still actively working on this feature. We did, however, want users to begin using this feature to get early feedback on the work. Don’t trial your super secret project with this feature just yet, but if you have something safe to try out private projects, now is a good time for beta testers to get going with the feature.

Enjoy private projects on Launchpad now, beta testers! And please file any bugs you find.

October 23, 2012
Launchpad Workshop at UDS-R

After the success of the Launchpad clinics at the last UDS-Q we’ve decided to run some more! This time removed the sterile name of clinic and called them workshops.

If you want to get involved, scratch that itch, learn how to fix that irksome bug that has been bugging you’re not alone. Everyone probably has at least one that they’d like to see fixed.  The problem is now knowing how to fix them or maybe they don’t know how to set up the Launchpad development  environment, well lucky for you we have a lot of Launchpad developers at UDS-R and we’d like to help you help get bugs fixed!

The idea being if you have a bug you would like to fix, or pointed in the right direction  that we’ll be there to help you get on the road  to offer advice on every step of the Launchpad development process from Lines of code, to branch reviews to getting things done. We’ll have EC2 instances ready for you to develop on, so if you haven’t already gone through the process of setting up local Launchpad development on your machine, you don’t need to worry.

I have created a wiki page on which you should register if you’re going to be attending either of the clinics. Just list your name and the ID of the bug(s) you want to work on on that page. We’ll check the bugs out and get in touch with you if we think they’re too big to work on in the clinics – in which case we’ll try and work with you to get them fixed over a longer period. We’ve added the event to summit schedule, for Tuesday and Thursday of UDS so why not sign up and come along!

If you’ve never contributed before, Graham Binns has written a useful guide to contributing to  Launchpad.  He has also done up a screencast on fixing a bug in Launchpad.

October 2, 2012
Burning down critical bugs

I have been analysing Launchpad’s critical bugs to track the Purple squad’s progress while on Launchpad maintenance duty. In January of 2011, the Cloud Engineering team né Launchpad Engineering team was reorganised into squads, where one or more squads would maintain Launchpad while other squads work on features. This change also aligned with a new found effort to enforce the zero-oops policy. The two maintenance squads had more than 332 critical bugs to close before we could consider adding features that the stakeholders and community wanted. By July 2011, the count dropped to its lowest point, 250 known critical bugs. Why did the count stop falling for fifteen months? Why is the count falling again?

Charting and analysing critical bugs

Chart of Launchpad's critical bugs since the formation of Launchpad squads and maintenance duties
The chart above needs some explanation to understand what is happening in Launchpad’s critical bugs over time. (You may want to open the image in a separate window to see everything in detail.) Each iteration is one week. The backlog represent the open critical bugs in launchpad at the start of the iteration. The future bugs are either bugs that are not discovered, not introduced, or reported and fixed within the iteration. The last group is crucial to understand the lines plotting the number of bugs fixed and added during the iteration. We strive to close critical bugs immediately. Most critical bugs are reported and fixed in a few days, so most bugs were not open long enough to be show up in the backlog. The number of bugs fixed must exceed the number added to make the backlog count fall. You can see that the maintenance squads have always been burning down the critical bugs, but if you are just watching the number of open bugs in Launchpad, you get the sense that the squads are running to just stand still.

I use the lp-milestone YUI widget to chart the bugs and analyse the our progress through the critical bugs. It allows me to summarise a set of bugs, or analyse a subset by bug tag.

Launchpad maintenance analysis -- driving critical bugs to zero

Though 22 bugs were fixed this past week, 14 were added, thus the critical count dropped by 8. The last eight iterations are used to calculate the average bugs closed and open per iteration. The relative velocity (velocity – flux) is used to estimate the remaining number of days to drive the count to zero. When the Purple squad started maintenance on September 10th of 2012, the estimated days of effort was more than 1,200. In just three weeks, the number has fallen dramatically. The principle reason the backlog of critical bugs has fallen is that the Purple squad is now giving those bugs their full attention, but that generalisation is unsatisfactory.

Why is the Purple squad so good at closing bugs in the critical backlog?

I do not know the answer to my question. The critical backlog reached its all-time low of 250 bugs with the release of the Purple squad’s maintenance work in July 2011. There was supposition that  Purple fixed the easy bugs, or that the fixes did not address the root cause, so another critical bug was opened. I disagree. The squad had no trouble finding easy bugs, and it too would have been fixing secondary bugs if the first fix was incomplete. I can tell you how the squad works on critical bugs, but not why it is successful.

I was surprised to see the Purple squad were still the top critical bug closers when it returned to maintenance after 15 months of feature work. How could that be?  The squad fixed a lot of old timeout and JavaScript bugs in the last few months through systemic changes — enough to significantly affect the statistics. About 600 critical bugs were closed while Purple squad were on feature work. The squad closed 210 of those bugs. 60 were regressions that were fixed within the iteration, so they never showed up in the backlog. 70 critical bugs were fixed because they blocked the feature, and 80 critical bugs were because Purple was the only squad awake when the issue was reported. The 4 other squads fixed an average of 98 bugs each when they were on maintenance. The Purple squad fixed more bugs then maintenance squads on average even when they were not officially doing maintenance work.  The data, charts, and analysis always includes the Purple squad.

I suspect the Purple squad has more familiarity with bugs in the critical backlog. They never stopped reading the critical bugs when they were on feature work. They saw opportunities to fix critical bugs while solving feature problems. I know some of the squad members are subscribed to all critical bugs and re-read them often. They triage and re-triage Launchpad bugs. This familiarity means that many bugs are ready to code — they know where the problem is and how to fix it before the work is assigned to them. They fixed many bugs in less than a day, often doing exactly what was suggested in the bug comments.

During the first week of their return to maintenance, about 30 critical bugs were discovered to be dupes of other bugs. Though this change does make the backlog count fall, it also revised all the data, so the chart is not showing these 30 bugs as at all now. The decline of backlog bugs does not include dupes. While the squad was familiar enough to find many bugs that they close in a single day, they were not so familiar as to have known that there were 30 duplicate bugs in the backlog when they started.

Most squad have only one person with DB access, but the Purple squad is blessed with 3 people who can test queries against production-level data. This could be a significant factor. It is nigh impossible to fix a timeout bug without proper database testing. Only 13 of the recent bugs closed were timeouts though. The access also helps plan proper fixes for other bugs as well, so maybe 20% of the fixed bugs can be attributed to database access.

Maybe the Purple squad are better maintenance engineers than other squads who work on maintenance. For 28 months, I was the leading bug closer working on Launchpad. I closed 3 times more bugs than the average Launchpad engineer. I am not a great engineer though. My “winning” streak came to a closed shortly after William Grant started working on Launchpad full time; he soundly trounced me over several months. Then he and I were put on the same squad and asked to fix critical bugs. Purple also had Jon Sackett, who was closing almost 2 times the number bugs than the other engineers. I don’t think I need to be humble on this matter. To use the vulgar, we rocked! Ian was the odd man on the Purple squad. He was the slowest bug closer, often going beyond our intended scope to fix an issue. Then Purple switched to feature work…Ian lept to the first rank while the rest of the squad struggled. Ian fixed almost double the number of Disclosure bugs than other squad members. The leading critical bug closer on the squad at the moment though is Steve Kowalik. This is his first time working on maintenance. His productivity has jumped since transitioning to maintenance.

I can only speculate as to why some engineers are better at maintenance, or can just close more bugs than others. A maintenance engineer must be familiar with the code and the rules that guide it. Feature engineers need to analyse issues and create new rules to guide code. I did not gradually become a leading bug closer, it happen in a single day when I realised while solving one issue that the code I was looking at was flawed, it certainly was causing a bug, I knew how to fix it, and with a few extra hours of extra effort, I could close two bugs in a single day. Closing bugs has always been easy since that moment.

I believe the Purple squad values certainty over severity and small scope over large scope when choosing which critical backlog bugs to fix. I created several charts that break the critical bugs into smaller categories. I suggested the squad burn down sub-categories of bugs like regressions, or 404s. The squad members are instead fixing bugs from the entire backlog. They are choosing bugs that they are certain they can fix in a few hours.  I think the squad has tacitly agreed to fix bugs that are less than a day of effort. When this group is exhausted, they will fix issues that require days of effort, but also fix as many bugs. The last bugs to be fixed will be those that require many days to fix a single bug. Fixing the bugs with the highest certainty reduced our churn through the critical bugs, there are fewer to triage, to dupe, to get ready to code.

The Purple squad avoids doing feature-level design and effort to fix critical bugs. Feature-level efforts entail more risk, more planning, and much more time. There is often no guarantee, low certainty, that a feature will fix the issue. A faster change with higher certainty can fix the issue, but leaves cruft in the code that the engineers do not like. Choosing to do feature-level fixes when a more certain fix is available indicates there is tension between the Launchpad users who have a “critical” issue that stops them from using Launchpad, and the engineers who have a “high” issue maintaining mediocre code. I contend it is easier to do feature-level work when you are not interrupted with maintenance issues. When the Purple squad does choose to do feature-level work to fix a critical, they have a list of the bugs they expect to fix, and they cut scope when fixing a single bug delays the fix of the others. The Launchpad Answers email subsystem was re-written when other options were not viable, there we about 20 leading timeouts represented by 5 specific bugs to justify 10 days of effort to fix them.

The Purple squad is not unique

Nothing that I have written explains why the Purple squad are better are closing critical bugs. All squads have roughly the same skills and make decisions like Purple. Maybe the issue is just a matter of degree. If the maintenance squad is not closing enough bugs to burn down the backlog, their time is consumed by triaging and duping new critical bug reports. Familiarity with Launchpad’s 1000′s of bugs is an advantage when triaging bugs and getting a bug ready to code. Being able to test queries yourself on a production-level database takes hours or days off the time needed to fix an issue. Familiarity with the code and the reasoning that guided it increases the certainty of success. The only domain that Purple is not comfortable working with is lp.translations; the squad is comfortable changing 90% of Launchpad’s code. There may be correlation between familiarity with code, and the facts that the squad members participated in the apocalypse that  re-organised the code base, and that some have a LoC credit count in the 1000′s.

September 27, 2012
Mercurial imports will end on October 5th

On the 5th of October we’ll be ending our beta of Mercurial imports in Launchpad. On that day your existing Mercurial imports will cease and you won’t be able to create new ones.

This doesn’t affect Bazaar, Git, Subversion or CVS imports.

You’re probably wondering why. During the beta, we found that not many people wanted to import Mercurial branches into Launchpad. Today there are only around forty people using the facility. It’s also fair to say that our importer wasn’t of the quality we want for Launchpad.

So, with low demand for the feature we decided to focus engineering effort elsewhere rather than continue to maintain, or fix up, a less than satisfactory feature.

I’m sorry if you currently rely on Launchpad to import code from Mercurial into Bazaar. You can, though, still use the bzr-hg plugin locally.

September 25, 2012
Launchpad JavaScript now combo loaded and faster than ever.

Network graph of the combo loaded JavaScript.

Updated network graph

Back in January a side project was started to update the JavaScript used in Launchpad. Launchpad has been using YUI 3.3.0 for a long time, very successfully, however recent advances in YUI 3.5 and higher have added some great tools for development that Launchpad can take advantage of. In order to facilitate easier upgrades our YUI library version Launchpad has been moved to using a combo loader for serving out JavaScript.

This means, that instead of a single launchpad.js file that can be upwards of 3MB in size, each request builds a list of JavaScript modules needed for the current page to work, and the combo loader only sends down those modules. This drastically cuts down on the download size of the JavaScript for users. These combo loaded JavaScript files are also cached for speedy serving to other users of Launchpad.

The combo loader also allows us to specify which YUI version to load via a tweak to the url. In this way we can easily test new version of YUI side by side with the current stable version as they come out. This allows Launchpad to keep with future YUI released much faster.

We’re excited that today Launchpad has moved from YUI 3.3.0 to 3.5.1 and is now served by the combo loader. This change provides a faster experience for users along with easier maintenance and new JavaScript library features for developers.

We’ve still got more to do though. YUI just released version 3.7 and we aim to push that into production faster than ever before. Please let us know how these changes work for you.

Launchpad also wants to thank the folks over at YUI for continuing the great work on a tool that Launchpad heavily depends on.

September 24, 2012
Parallel testing is live

One of the projects the Launchpad Squads (yellow) have been working on has been the Parallel Testing during the last cycle, this has now been completed and is now in operation. WebOps have today finished setting up parallel testing in buildbot. Buildbot-poll has been updated to know about the new builders, and the developers have  confirmed that [testfix] and automatic stable merging etc. work fine. Nothing should have changed except that builds now take 35 minutes rather than 6.5 hours.

If something goes wrong, http://lpbuildbot.canonical.com/waterfall and https://dev.launchpad.net/yellow/ParallelTestingTroubleshooting may be helpful. The buildbot master is still praseodymium, but the slaves are new: sluagh for devel, and radande for db-devel. If you need packages upgraded on the slaves, poke WebOps as before.

If you would like to follow the discussion on this topic you’ll find more on the Launchpad development mailing list

September 17, 2012
Privacy for blueprints enabled for beta testers

To go along with recent work to enable information sharing for bugs and branches, we are now enabling privacy for blueprints for beta testers. This means that blueprints now support some of the different information types that bugs and branches also support. For projects with a commercial subscription on Launchpad, this means blueprints can now be set to proprietary or embargoed. Project owners can also manage sharing for blueprints from their project’s sharing details page. For more on how sharing itself works, see Curtis’ blog post that announced that Information sharing is now in beta for everyone.

We have some minor fit-n-finish issues to complete, like nicer UI elements, and of those, we have one last known bug in progress — we know that blueprints don’t currently honor the sharing policy default when new blueprints are created. However, we thought it was worth getting this work to beta testers now to start getting feedback on this as we turn to finishing off the privacy work that is left to do.

Enjoy privacy for blueprints, beta testers! And please file bugs on any issues you find.