I recently had to restore databases from a rough mysqldump backup in a piecemeal fashion. One necessity is to SET the environment correctly, lest some weird encoding issues happen when restoring the data, leading to failures.

A sed one-liner can help for this.

DBNAME=mydb
sed -n "/^-- Server version/,/^-- Current Database/p;/^-- Current Database.*${DBNAME}\`/,/^-- Current Database/{p}" mysqldump.sql > ${DBNAME}.sql

This extracts SQL from the initial header, to the first database, which contains all the sessions SETs. It then captures statements any time the target database is the current one. Note that this doesn’t restore the GRANTs.

Befor blindly piping the output SQL into mysql, one would be well advised to review the contents of the file, to ensure only the desired modifications are included.

Target metrics

I wrote this article for the Learnosity blog, where it originally appeared. I repost it here, with permission, for archival. With thanks, again, to Micheál Heffernan for countless editing passes.

In this series, I look at how we load test our platform to ensure platform stability during periods of heavy user traffic. For Learnosity, that’s typically during the back-to-school period. The year was different though, as COVID caused a dramatic global pivot to online assessment in education. Here is what the result of that looked like in terms of traffic.

Weekly Learnosity users comparison 2012--2020

We expect major growth every year but that kind of hockey stick curve is not something you can easily predict. But, because scalability is one of the cornerstones of our product offering, we were well-equipped to handle it.

This article series reveals how we prepared for that.

In part one (which was, incidentally, pre-COVID), I detailed how we actually created the load by writing a script using Locust. In this post, I’ll go through the process of running the load test. I’ll also look at some system bottlenecks it helped us find.

Let’s kick things off by looking at some important things a good load-testing method should do. Namely, it should

  1. Apply a realistic load, starting from known-supported levels.
  2. Determine whether the behaviour under load matches the requirements.
    • If the behaviour is not as desired, you need to identify errors and fix them. These could be in
      • the load-test code (not realistic enough)
      • the load-test environment (unable to generate enough load)
      • the system parameters
      • the application code
    • If the behaviour is as desired, then ramp up the load exponentially.

We used two separate tools for steps 1 above (as described in the first part of this series) and tracked the outcomes of step 2 in a spreadsheet.

TL;DR

  • We used Locust to create the load, and a custom application to verify correct behaviour.
  • We found a number of configuration-level issues, mainly around limits on file descriptors and open connections.
  • Stuff we learned along the way:
    • Record all parameters and their values, change one at a time;
    • Be conscious of system limits, particularly on the allowed number of open files and sockets.
Continue reading

Every now and then, some spurious peaks show up on munin graphs. The peaks are order of magnitude higher than the expected range of the data. This particularly happens with DERIVE plugins, that are notably used for network interfaces.

One way to fix this, as suggested by Steve Schnepp (and in the faq), is to set the maximum straight into the RRD database, and then let it reprocess the data to honour this maximum.

Continue reading

When using syspatch on OpenBSD, the upgrade sometimes fails with

Relinking to create unique kernel... failed!
!!! "/usr/libexec/reorder_kernel" must be run manually to install the new kernel

This generally happens after a system upgrade, or an otherwise manual change of kernel. This fix is to update the kernel hash, before re-running reorder_kernel.

# sha256 /bsd > /var/db/kernel.SHA256
# /usr/libexec/reorder_kernel 

I wrote this article for the Learnosity blog, where it originally appeared. I repost it here, with permission, for archival. With thanks to Micheál Heffernan for countless editing passes.

The dramatic increase in Learnosity users during the back-to-school period each year challenges our engineering teams to find new approaches to ensuring rock-solid reliability at all times.

Stability is a core part of Learnosity’s offering. Prior to back-to-school (known as “BTS” internally) we load-test our system to handle a 5x to 10x increase on current usage. That might sound excessive, but it accounts for the surge of first-time users that new customers bring to the fold as well as the additional users that existing customers bring.

Since the BTS traffic spike occurs from mid-August to mid-October, we start preparing in March. We test our infrastructure and apps to find and remove any bottlenecks.

Last year, a larger client ramped up their testing. This created a 3x usage increase of our Events API. In the process, several of our monitoring thresholds were breached and the message delivery latency increased to an unacceptable level.

As a result, we poured resources into testing and ensuring our system was stable even under exceptional stress. To detail the process, I’ve broken the post into two parts:

  1. Creating the load with Locust (this piece)
  2. Running the load test (in part two, coming soon).

TL;DR

Here’s a snapshot of what I cover in this post:

  • Our target metrics.
  • How we wrote a Locust script to generate load for a Publish/Subscribe system.
  • Our observations that:
    • The load test must reflect real user behaviours and interactions
    • Load testing alone doesn’t validate system behaviour against target metrics. It’s better to measure this separately while the system is under load.
Continue reading

I use Munin to monitor a few machines, and bubble up alerts when issues show up. It’s pretty good, easy to set up, and has a large number of contributed plugins to monitor pretty much everything. If still out of luck, it’s easy enough to write your own.

To ease the task of viewing the data, each machine runs munin-node, but only a couple of masters do the data collection with munin-update. This works reasonably well, except that machines monitored by more than one server need to work extra time to provide the same data to both.

Fortunately, Munin 2.0 introduced a proxy mode, allowing to decouple running the plugins to collect fresh data (with munin-asyncd) from giving that data to collection servers (via munin-async).

Setting this up is relatively easy, and the benefits show quickly, in the form of a reduced collection time, and fewer gaps in the data.

Reduced update time for the master, and no more gaps in the data.

Surprisingly it also showed as a substantially reduced load on low-power machines. But beware of the --fork parameter to munin-asyncd.

Continue reading

Until last year, work had Mitel 5212 softphones as the main devices on desks. This was the case since 2008, and was apparently high time to replace them. As they had nowhere to go but the bin, I grabbed a few in the hope to use them at home. While Mitel has a proprietary protocol (MiNET), they also support standard SIP through another vendor firmware, which allowed me to add a few more physical phones behind my FRITZ!Box.

Continue reading

Today, when trying to log in remotely to my home router (a FRITZ!Box), I was greeted with an TLS certificate error. I was pretty sure it’s my router, but am I really keen to type in a password into a field that I have no idea whether it is actual my machine, or a nice-looking replica? A clear indication that it is time to use a better cert than a self-signed one that I cannot verify remotely.

I use Let’s Encrypt for all my other certificates, so why not use it on my router? However, I found precious little information about how to use it with the FRITZ!Box. Fortunately, it’s pretty straightforward.

Continue reading

I’ve long been meaning to store all my passwords in a single, safe, location, as a way to remain sane as well as safe. But which one? Every operating system (or desktop environment) now has its own store, but choosing one casts a lot of things into stone, and most have a lot of third-party dependencies.

KeePass seems to be a good cross-platform solution, with clients for Linux, Windows, OS X and even Android, and nice features such as filling on demand. But I don’t like the whole clicky interface, if only for use without graphical display. It also doesn’t offer a native way to synchronise the stores across boxes.

For a while, I have been storing all my important configuration files in a git repository, with some make magic to install and update the files on the system. This magic would also store all passwords in a GPG-encrypted files, and replace them when installing the files.

The problem, of course, is that the passwords are still in plaintext in the live systems. And it came back to bite me when I sent an innocuous script (the ics2dav.sh script from this post) to a friend… with the password nicely sitting there. Fortunately, I noticed this before him, and changed my password. In addition, this doesn’t cater for passwords stored in other applications, such as Firefox.

So things had to change. And I discovered pass(1), a simple command-line tool based on GPG-encrypted flat files, with an option to sync natively with Git. So there is finally an option for me to store passwords in a way which fits my workflow.
Continue reading