Bash Shellshocked Bug Of Doom

The intertubes are currently ablaze with the news of the Bash Shellshocked bug, with the usual glut of misinformed commentary through to apocalyptic doom-mongering. What I haven't seen mentioned is that there's a relatively straightforward workaround that I think you could use if you can't get hold of a patched version of bash and you have to expose bash scripts to the outside world - which of course you shouldn't be doing anyway, right? ;-) It's to make sure that any such scripts use the -p flag to bash when they are invoked:

$ env x='() { :;}; echo vulnerable' bash -c "echo this is a test"
vulnerable
this is a test

$ env x='() { :;}; echo vulnerable' bash -cp "echo this is a test"
this is a test

As the bash manpage says:

          -p      Turn on privileged mode.   In  this  mode,  the
                  $ENV  and  $BASH_ENV  files  are not processed,
                  shell functions  are  not  inherited  from  the
                  environment,   and   the  SHELLOPTS,  BASHOPTS,
                  CDPATH,  and  GLOBIGNORE  variables,  if   they
                  appear in the environment, are ignored.  If the
                  shell  is  started  with  the  effective   user
                  (group)  id  not equal to the real user (group)
                  id, and the -p option is  not  supplied,  these
                  actions  are taken and the effective user id is
                  set to the real user id.  If the -p  option  is
                  supplied  at  startup, the effective user id is
                  not reset.  Turning this option off causes  the
                  effective  user  and group ids to be set to the
                  real user and group ids.

So simply add -p to the #!/bin/bash line at the start of your scripts, i.e. #!/bin/bash -p. This isn't entirely devoid of side-effects, as the manpage segment says, and there may be clever ways of hacking around even this protection but I'm surprised I haven't seen it mentioned anywhere as a potential workaround.

Categories : Web, Tech

Merging Mercurial repositories

A quick note because if I ever need to do this again I know I'll have forgotten how :-) I had three Mercurial repos that were parts of the same thing, a web-based browser/searcher for engineering documents at work. The three parts were all NetBeans projects that were interrelated, so having them as separate repos didn't make much sense. However, merging them into a single, non-branched repo whilst maintaining history wasn't particularly straightforward, I couldn't find any description of how to do this that didn't end up with branches in the resulting repo, and mercurial extensions such as mq didn't seem to cut it either. After some experimentation, I came up with the following sequence of steps which did what I wanted:

  1. Use the Mercurial Convert extension to move each project one directory level down so they didn't clash when merged. This uses the --filemap flag with a simple filemap of the form
    . NewProjSubDir
    
    to move each project into a subdirectory:
    $ hg convert --filemap filemap orig1 moved1
    

  2. Pull all the intermediate moved projects into a new repo:
    $ hg clone moved1 merged
    $ cd merged
    $ hg pull -f ../moved2
    $ hg pull -f ../moved3
    
    The -f flag is needed because the repos are unrelated. This will result in a repo with a branch for each source repo, with each branch sorted in date order but the repo overall won't be date sorted.

  3. Reorder the history so it is sorted in date order, again using the Convert extension to do that:
    $ hg convert --datesort merged merged1
    

  4. Reparent each changeset in the merged, sorted repo so that its parent is the immediately preceding one, in date order. The Convert extension's --splicemap feature is used to do this, along with a small perl script to create the splicemap:
    #!/bin/perl
    use strict;
    use warnings;
    my ($line, $parent);
    while (defined($line = <>)) {
            if ($line =~ m{\bchangeset:\s+\d+:([\dabcdef]+)\b}) {
                    my $child = $1;
                    if (defined($parent)) {
                            printf("%s %s\n", $child, $parent);
                    }
                    $parent = $child;
            }
    }
    
    
    Then:
    $ cd merged1
    $ hg glog --debug | perl ../splicemap.pl > ../splicemap
    $ cd ..
    $ hg convert --splicemap splicemap merged1 merged2
    
    The resulting merged2 repo will have the merged, date-sorted and branchless union of the contents of the original repos.
Categories : Tech, Work

Rohan? Rubbish

Rohan clothing = expensive, utter junk. Over the last couple of years I've been bought two Rohan products as presents by my mother, a Rohan Daybreak down vest and a Rohan long sleeved shirt. About one inch of the vest zip has pulled clean out of the rest of the garment half way down the front. On careful examination I can see that about an inch of the back edge of the zip has been cut off during manufacture and as a result the zip fabric has shredded and pulled out of the garment. That's already in the bin. The shirt has turned into a pilled static-laden mess and is also destined for the bin as it's unwearable,

Considering how much the clothing costs the quality is awful. The products may look nice on a hanger, or if all you want is to ponce around in a wine bar, but if my experience is anything to go by they clearly aren't serious outdoor equipment. Anyone thinking about buying Rohan should save their money and buy something else, you can buy something for less than half the price and still end up with significantly better quality.

Save your money, avoid http://www.rohan.co.uk/.

Categories : Peak District, Personal

Linux Mint LMDE, Xfce, Toshiba R700, suspend-resume and backlight keys

I've put Linux Mint Debian Edition on my Toshiba R700 laptop and I'm using the Xfce window manager. Initially everything seemed to work fine, but after a suspend-resume cycle the brightness keys no longer worked (Fn-F6 and Fn-F7) although everything else did. A fruitless search for a solution across the intertubes then ensued. It appears that the original problem was caused because somebody decided they'd rip support for a feature they wanted to deprecate out of the Toshiba ACPI driver more-or-less "to see what broke" - nice. Well, the answer is "lots of things" if the number of reports of this problem that I've found are any indication.

I found a lot of suggested workarounds and without going into the gory details, none of them actually worked and in fact they generally made things worse, for example making the backlight not come on at all after resume. I did get a few clues as to how to work around the problem by diddling the backlight driver files under /sys/class/backlight, specifically the ones in the intel_backlight subdirectory. The problem is that you can only write to those files if you are root, so I've knocked up a small C program that can be made setuid-root and used to change the backlight level. I've put this in /usr/local/bin and used the Keyboard settings manager to map the <Super>F7 and <Super>F6 keys to calls to "backlight -u" and "backlight -d" to adjust the brightness up and down. This works both before and after suspend-resume and I've set it up to have 16 different brightness levels as I find the standard 8 a bit too wide apart. In case it helps someone else I've put the source here and a precompiled 64-bit binary here - if you need the 32-bit version you'll have to build from source. Enjoy!

Driving the WS2811 at 800KHz with a 16MHz AVR

Recently some of the folks at the Manchester Hackspace did a bulk-order of WS8211 LED pixels. These are available from several vendors on aliexpress.com (search for "WS2811 5050 RGB") and they combine a 5050 RGB LED with a WS2811 driver chip. The WS2811 provides 24-bit RGB colour plus constant-current output, so no external components are required, although the datasheet does suggest the addition of some impedance matching and decoupling resistors and capacitors. The WS2811 can run at a data rate of either 400KHz or 800KHz, although the 800KHz ones seem more common. Having got hold of some, I set about getting them working using a Minumus 32 board which has an Atmel ATmega32U2 running at 16MHz.

Note also that this part, the WS2811, is commonly confused with one with a very similar name, the WS2801, but they are radically different beasts. The WS2801 has a SPI interface which means you need to provide both a clock and a data signal. That in turn means you can send data at a wide range of speeds (the WS2801 datasheet says up to 25MHz) and still have everything work fine as the clock line signals the WS2801 when to sample the data line. Plus most MCUs have hardware SPI which makes driving the WS2801 pretty much a doddle.

The WS2811 oh the other hand uses a rather unusual control scheme. It uses a single combined clock and data line. You reset the chain by keeping the input low for around 50usec (less will usually work as well), then start sending 24-bit RGB sequences in a continuous stream. The first LED in the chain displays the first RGB value to be sent and passes the rest along the chain, the second displays the second value and so on. The datasheet gives the required timings, and there are a couple of writeups here and here. Mostly these are talking about the WS2811 in low speed (400KHz) mode, the ones I have are 800KHz. The way the WS2811 protocol works is that there is a low-to-high transition at the beginning of each bit cell, then a high-to-low transition at a variable point within the cell, depending if the bit value is 0 or 1. For a logical zero the transition is near to the beginning of the cell, for a logical one it is later on in the cell. The exact timings seem to vary depending on which source you believe (800KHz mode):

Sourcecell timinglogical 0 highlogical 0 lowlogical 1 highlogical 1 low
WS2811 datasheet1.25 usec250 nsec1000 nsec600 nsec650 nsec
aliexpress.com1.25 usec350 nsec800 nsec700 nsec600 nsec
doityourselfchristmas.com1.25 usec250 nsec1000 nsec1000 nsec250 nsec

The timings seem to be all over the place, in particular the aliexpress ones don't even add up to the bit cell length of 1.25usec! The doityourselfchristmas.com explanation of how the WS2811 works made sense, so I used the timings from there and put together a simple test using the delay_x.h that's floating around the net. That worked OK for a single pixel but if I tried slow fades or driving more than one pixel I got a lot of jittering, Hmm, OK, let's look at the timings again. I'm using a 16MHz AVR, so each clock cycle is 62.5 nsec long. The short pulses in the WS2811 protocol are 250 nsec long and each bit cell is 1.25 msec long. Wow, that's only 4 clock cycles for the short pulses, and only 20 cycles for each bit cell, and the allowed +/- timing variation is 75 nsec, which is just over 1 clock cycle. Hmm, that means that driving these with a simple C routine is unlikely to be sufficient. I spent a bit of time looking to see if there was any sort of hardware assist that could be brought to bear, but even SPI at 4MHz, close to the maximum that the MCU can support, wouldn't be fast enough as it would still be necessary to marshal each byte into a series of 5-bit patterns to get the timings right for the WS2811 protocol. And anything interrupt-driven is also out, as it takes 5 clocks just to dispatch an interrupt and we only have 20 cycles to play with. That only leaves bit-banging which I generally try to avoid, but because of the relatively high speed of the WS2811 we could update 100 pixels every 10 msec using around 33% of the available CPU, which is perfectly acceptable. There's another oddity as well - although the WS2811 takes the 8-bit colour value in RGB order, the pixels have been wired up so the order is GRB, which makes life a little more complicated as the bytes need reordering on output.

OK, so the only realistic option looks like it is going to be some had-crafted assembler. Although this post on arduino.cc suggests it is not possible to meet the timing constraints, I thought it was possible - if not particularly simple, and indeed that's the case. Anyway, to cut to the chase, I've put a copy of the resulting code on SourceForge, and there's a demo of it in use there as well. Some notes about the implementation:

  • The basic algorithm is to have an outer loop that iterates over the array of RGB values we've been passed and an inner loop that iterates over each 8-bit R, G or B value, setting the output pin as necessary. This is made somewhat more complicated than it should be because the WS2811 pixels I have are wired in (GRB) order rather than (RGB) order.
  • This code requires instantiating for each port/pin combination it is used on. The reason is that dereferencing a port pointer and assigning a value to it takes 4 cycles, which is too long to be usable here bearing in mind we only have 4 instructions to toggle the pin low/high/low or high/low/high as appropriate to produce the short 250nsec pulse that's required.
  • If we want to keep the timings accurate it is necessary to run with interrupts disabled.
  • Conditional branch instructions on the AVR take a different number of clock cycles depending on whether they are true or false. It's therefore necessary to insert additional instructions to equalise the time taken by the true and false paths. That means a bit test and pin set takes 8 cycles, once code to equalise the timings is added. That's nearly half of the 20 cycles we have available per bit.
  • We only need to do the inner 8-bit loop bit-test-and-set-pin once per bit, to see if it is a 0 bit. If it is, we set the output pin low at 250nsec into the bit cell. For 1 bit we don't need to test at all, we just need to unconditionally set the output pin low 1000nsec into the bit cell. That's because if we are outputting a 0 bit the output pin will already have been set low at 250nsec and the additional set to low at 1000nsec will have no effect. On the other hand, if we are outputting a 1 bit we'll correctly changing the pin from high to low at 1000nsec.
  • We can't leave the outer loop testing, to see if we've reached the end of the array of RGB values, until after we've output each 24-bit RGB value. If we did we'd introduce jitter between one set of 24 bits and the next. We therefore have to interleave the necessary outer loop housekeeping with the inner 8-bit loops that do the actual bit output.
  • We only have at most 6 cycles free per bit once all the inner loop testing pin setting and loop handling is accounted for. We've already established that it takes around 8 cycles to do a conditional bit-test-and-pin-set and perform the necessary adjustments to keep the timings the same - the bare minimum to do a test that takes the same time down the true and false paths is 4 cycles. We need to only do the interleaved outer loop handling on the last iteration of the inner loops so that we don't end up doing it multiple times per RGB value - but it's going to take a minimum of 4 cycles just to do the necessary test, and we only have 6 cycles available.
  • To solve that problem we partially unroll the R and B loops. We loop over the R and B bit values 7 times and output the 8th bit with an unrolled version of the loop. That means there's no need to explicitly test if we are on the last iteration of the R or B loop as we just 'fall through' from the 7th iteration of the loop. That saves us sufficient cycles to be able to interleave the outer loop handling with the handling of the 8th bit of the R and B values.
  • Setting an output pin doesn't change any of the flags in the status register, so it is possible to perform a test then set an output pin and then perform a conditional jump using the result of the prior test.
  • Conditional jumps can only be made -64/+63 bytes relative to the current program counter, if we need to jump further it needs a combination of a local conditional brach and a long-range jump.

To validate the timings I hooked up the Minimus to a scope and verified that the timings were as expected, and they are as per the table above. In particular, the overall period per 8 bits is exactly 10 usec, with no jitter between one 24-bit RGB value and the next (click on the images for a larger version).

1 bit, low
1 bit, low
1 bit, high
1 bit, high

In addition, I also looked at the output of the pixel, which is passed down to the rest of the chain. That revealed that there is a delay of approximately 200 nsec per pixel, and that the signal is reshaped before being passed to the next pixel in the chain. The timings are not the same as those specified in the datasheet, which suggests to me that the datasheet timings are most likely just an average of the output timings of a sample of chips rather than being a characterisation of the operational input range of the chips.

Input versus output of a pixel, high and low cells
in/out signal

The output timings are as follows:

logical 0 highlogical 0 lowlogical 1 highlogical 1 low
338 nsec912 nsec680 nsec570 nsec

That leads me to suspect that the most important thing when driving the WS2811 is not the exact intra-cell timings for low and high bits, it is getting the bit rate as close to the specified 800 KHz as possible and in avoiding jitter between each block of 24 bits. The code I've linked to above does exactly that, so although I've only tested it on a short string it should be fine for driving much longer ones as well.

Two pixels, daisy-chained
daisy chain

And finally, here is the obligatory YouTube video clip - enjoy ;-) The pixels are so bright I had to put 4 layers of paper in front of them to stop the camera overloading. The shot of the scope shows the input to the first pixel at the top, in red. The yellow trace below is the output of the first pixel and that's fed in to the input of the second pixel, and so on. As you can see, the bottom trace is 1/3 shorter than the top trace as this chain has 3 pixels in it. The overall pulse train is 90usec, each pixel taking 30usec to refresh. That comes out as a bit rate of 800KHz, as per the datasheet.

Update

This post got mentioned on Hackaday, after which I've had a lot of feedback, both on Hackaday and here. Some of it has been good, some has been, well, let's just call it ill-informed.

Please don't bother telling me that you can do this with an Xmega, a PIC, an ARM or whatever. All that proves is you've entirely missed the point of this post.

Some of the comments have been along the lines of "Why don't you use hardware SPI, it works for me". Firstly, the WS2811 is not a SPI device but if you do have it working, please leave me a note saying what SPI settings you used, because I've not found an obvious way of getting the right 800KHz data rate and the right mark/space ratio that the WS2811 requires, or of avoiding jitter as the ATmega SPI hardware is not double-buffered. Note in particular if you are using the FastSPI library you are not using hardware SPI. When driving the WS2811, TM1809 or TM1804, FastSPI uses bit-banging, as does this code.

Another suggestion is to use the USART in synchronous mode and set to 5 bits per byte. The problem is that the ATmega sends start and stop bits even in synchronous mode, and the signal polarity is wrong as well. There's an inconclusive discussion on Hackaday about this option, but I don't think it's practical. And, as I note above, because of the data rates required, even if you use hardware you are still going to spend most of the available cycles managing it.

I've also been told that I've wasted my time because the FastSPI library can already drive these chips. FastSPI is a fine library, but if you search around you'll find people who have had trouble getting it to work with the WS2811 (including in the comments to this post). I've done an investigation into FastSPI and the possible causes of the problems people have getting it to work and I have the following comments to make:

  • FastSPI depends on the Arduino environment and libraries, which I don't use. It's therefore no use to me. My code has no dependencies on the Arduino environment.
  • The WS2811 datasheet specifies that the allowed variation is +-75 nsec, that's just over 1 clock cycle (62.5 nsec @ 16MHz) so to stay within spec the timings have to be accurate to +-1 cycle. For '0' bits the FastSPI library is spot-on at 1250 nsec but for '1' bits it is 1 cycle over at 1312.5 nsec. That's just within spec.
  • The FastSPI code sends 3 blocks of 8 bits for each RGB value. The 8th bit of each block is significantly out of spec, 1625 nsec for '0' bits and 1687.5 nsec for '1' bits compared to the spec value of 1250 nsec.
  • Between each RGB block (24 bits) there's an even bigger out of spec gap of 2062.5 nsec which is 65% longer than it should be.
  • The overall effect is that, worst-case, the pulse train that FastSPI outputs can be up to 10% longer than it should be. With some batches of chips you may get away with this but with others you may not, which is most likely why for some people FastSPI works and OK and for others it doesn't - it's down to luck. And as I noted above, individual bit jitter is probably more problematic than a pulse chain that is slightly too fast or too slow.
  • Finally, a simple test program that uses FastSPI to set 3 LEDs to a fixed value is 11048 bytes long. The equivalent program using my code is 450 bytes - about 25 times smaller. On the board I'm using, FastSPI would use up 1/3 of the available program memory and that's more than I can afford. The reason for this difference is simple, FastSPI supports multiple LED driver chips and even allows you to select them at run-time whereas mine is just intended to drive the WS2811. That's a classic flexibility/space design tradeoff that in my case doesn't work out in FastSPI's favour, your mileage may of course vary.

Finally, here's a scope trace showing 2 RGB values being output by FastSPI (top trace) and WS2811.h (bottom trace). You can see the jitter between the 8-bit blocks and the 32-bit RGB blocks on the FastSPI trace.

FastSPI (top) versus WS2811.h (bottom)
WS2811.h versus FastSPI

Categories : AVR, Tech