Driving the WS2811 at 800KHz with a 16MHz AVR

Recently some of the folks at the Manchester Hackspace did a bulk-order of WS8211 LED pixels. These are available from several vendors on aliexpress.com (search for "WS2811 5050 RGB") and they combine a 5050 RGB LED with a WS2811 driver chip. The WS2811 provides 24-bit RGB colour plus constant-current output, so no external components are required, although the datasheet does suggest the addition of some impedance matching and decoupling resistors and capacitors. The WS2811 can run at a data rate of either 400KHz or 800KHz, although the 800KHz ones seem more common. Having got hold of some, I set about getting them working using a Minumus 32 board which has an Atmel ATmega32U2 running at 16MHz.

Note also that this part, the WS2811, is commonly confused with one with a very similar name, the WS2801, but they are radically different beasts. The WS2801 has a SPI interface which means you need to provide both a clock and a data signal. That in turn means you can send data at a wide range of speeds (the WS2801 datasheet says up to 25MHz) and still have everything work fine as the clock line signals the WS2801 when to sample the data line. Plus most MCUs have hardware SPI which makes driving the WS2801 pretty much a doddle.

The WS2811 oh the other hand uses a rather unusual control scheme. It uses a single combined clock and data line. You reset the chain by keeping the input low for around 50usec (less will usually work as well), then start sending 24-bit RGB sequences in a continuous stream. The first LED in the chain displays the first RGB value to be sent and passes the rest along the chain, the second displays the second value and so on. The datasheet gives the required timings, and there are a couple of writeups here and here. Mostly these are talking about the WS2811 in low speed (400KHz) mode, the ones I have are 800KHz. The way the WS2811 protocol works is that there is a low-to-high transition at the beginning of each bit cell, then a high-to-low transition at a variable point within the cell, depending if the bit value is 0 or 1. For a logical zero the transition is near to the beginning of the cell, for a logical one it is later on in the cell. The exact timings seem to vary depending on which source you believe (800KHz mode):

Sourcecell timinglogical 0 highlogical 0 lowlogical 1 highlogical 1 low
WS2811 datasheet1.25 usec250 nsec1000 nsec600 nsec650 nsec
aliexpress.com1.25 usec350 nsec800 nsec700 nsec600 nsec
doityourselfchristmas.com1.25 usec250 nsec1000 nsec1000 nsec250 nsec

The timings seem to be all over the place, in particular the aliexpress ones don't even add up to the bit cell length of 1.25usec! The doityourselfchristmas.com explanation of how the WS2811 works made sense, so I used the timings from there and put together a simple test using the delay_x.h that's floating around the net. That worked OK for a single pixel but if I tried slow fades or driving more than one pixel I got a lot of jittering, Hmm, OK, let's look at the timings again. I'm using a 16MHz AVR, so each clock cycle is 62.5 nsec long. The short pulses in the WS2811 protocol are 250 nsec long and each bit cell is 1.25 msec long. Wow, that's only 4 clock cycles for the short pulses, and only 20 cycles for each bit cell, and the allowed +/- timing variation is 75 nsec, which is just over 1 clock cycle. Hmm, that means that driving these with a simple C routine is unlikely to be sufficient. I spent a bit of time looking to see if there was any sort of hardware assist that could be brought to bear, but even SPI at 4MHz, close to the maximum that the MCU can support, wouldn't be fast enough as it would still be necessary to marshal each byte into a series of 5-bit patterns to get the timings right for the WS2811 protocol. And anything interrupt-driven is also out, as it takes 5 clocks just to dispatch an interrupt and we only have 20 cycles to play with. That only leaves bit-banging which I generally try to avoid, but because of the relatively high speed of the WS2811 we could update 100 pixels every 10 msec using around 33% of the available CPU, which is perfectly acceptable. There's another oddity as well - although the WS2811 takes the 8-bit colour value in RGB order, the pixels have been wired up so the order is GRB, which makes life a little more complicated as the bytes need reordering on output.

OK, so the only realistic option looks like it is going to be some had-crafted assembler. Although this post on arduino.cc suggests it is not possible to meet the timing constraints, I thought it was possible - if not particularly simple, and indeed that's the case. Anyway, to cut to the chase, I've put a copy of the resulting code on SourceForge, and there's a demo of it in use there as well. Some notes about the implementation:

  • The basic algorithm is to have an outer loop that iterates over the array of RGB values we've been passed and an inner loop that iterates over each 8-bit R, G or B value, setting the output pin as necessary. This is made somewhat more complicated than it should be because the WS2811 pixels I have are wired in (GRB) order rather than (RGB) order.
  • This code requires instantiating for each port/pin combination it is used on. The reason is that dereferencing a port pointer and assigning a value to it takes 4 cycles, which is too long to be usable here bearing in mind we only have 4 instructions to toggle the pin low/high/low or high/low/high as appropriate to produce the short 250nsec pulse that's required.
  • If we want to keep the timings accurate it is necessary to run with interrupts disabled.
  • Conditional branch instructions on the AVR take a different number of clock cycles depending on whether they are true or false. It's therefore necessary to insert additional instructions to equalise the time taken by the true and false paths. That means a bit test and pin set takes 8 cycles, once code to equalise the timings is added. That's nearly half of the 20 cycles we have available per bit.
  • We only need to do the inner 8-bit loop bit-test-and-set-pin once per bit, to see if it is a 0 bit. If it is, we set the output pin low at 250nsec into the bit cell. For 1 bit we don't need to test at all, we just need to unconditionally set the output pin low 1000nsec into the bit cell. That's because if we are outputting a 0 bit the output pin will already have been set low at 250nsec and the additional set to low at 1000nsec will have no effect. On the other hand, if we are outputting a 1 bit we'll correctly changing the pin from high to low at 1000nsec.
  • We can't leave the outer loop testing, to see if we've reached the end of the array of RGB values, until after we've output each 24-bit RGB value. If we did we'd introduce jitter between one set of 24 bits and the next. We therefore have to interleave the necessary outer loop housekeeping with the inner 8-bit loops that do the actual bit output.
  • We only have at most 6 cycles free per bit once all the inner loop testing pin setting and loop handling is accounted for. We've already established that it takes around 8 cycles to do a conditional bit-test-and-pin-set and perform the necessary adjustments to keep the timings the same - the bare minimum to do a test that takes the same time down the true and false paths is 4 cycles. We need to only do the interleaved outer loop handling on the last iteration of the inner loops so that we don't end up doing it multiple times per RGB value - but it's going to take a minimum of 4 cycles just to do the necessary test, and we only have 6 cycles available.
  • To solve that problem we partially unroll the R and B loops. We loop over the R and B bit values 7 times and output the 8th bit with an unrolled version of the loop. That means there's no need to explicitly test if we are on the last iteration of the R or B loop as we just 'fall through' from the 7th iteration of the loop. That saves us sufficient cycles to be able to interleave the outer loop handling with the handling of the 8th bit of the R and B values.
  • Setting an output pin doesn't change any of the flags in the status register, so it is possible to perform a test then set an output pin and then perform a conditional jump using the result of the prior test.
  • Conditional jumps can only be made -64/+63 bytes relative to the current program counter, if we need to jump further it needs a combination of a local conditional brach and a long-range jump.

To validate the timings I hooked up the Minimus to a scope and verified that the timings were as expected, and they are as per the table above. In particular, the overall period per 8 bits is exactly 10 usec, with no jitter between one 24-bit RGB value and the next (click on the images for a larger version).

1 bit, low
1 bit, low
1 bit, high
1 bit, high

In addition, I also looked at the output of the pixel, which is passed down to the rest of the chain. That revealed that there is a delay of approximately 200 nsec per pixel, and that the signal is reshaped before being passed to the next pixel in the chain. The timings are not the same as those specified in the datasheet, which suggests to me that the datasheet timings are most likely just an average of the output timings of a sample of chips rather than being a characterisation of the operational input range of the chips.

Input versus output of a pixel, high and low cells
in/out signal

The output timings are as follows:

logical 0 highlogical 0 lowlogical 1 highlogical 1 low
338 nsec912 nsec680 nsec570 nsec

That leads me to suspect that the most important thing when driving the WS2811 is not the exact intra-cell timings for low and high bits, it is getting the bit rate as close to the specified 800 KHz as possible and in avoiding jitter between each block of 24 bits. The code I've linked to above does exactly that, so although I've only tested it on a short string it should be fine for driving much longer ones as well.

Two pixels, daisy-chained
daisy chain

And finally, here is the obligatory YouTube video clip - enjoy ;-) The pixels are so bright I had to put 4 layers of paper in front of them to stop the camera overloading. The shot of the scope shows the input to the first pixel at the top, in red. The yellow trace below is the output of the first pixel and that's fed in to the input of the second pixel, and so on. As you can see, the bottom trace is 1/3 shorter than the top trace as this chain has 3 pixels in it. The overall pulse train is 90usec, each pixel taking 30usec to refresh. That comes out as a bit rate of 800KHz, as per the datasheet.

Update

This post got mentioned on Hackaday, after which I've had a lot of feedback, both on Hackaday and here. Some of it has been good, some has been, well, let's just call it ill-informed.

Please don't bother telling me that you can do this with an Xmega, a PIC, an ARM or whatever. All that proves is you've entirely missed the point of this post.

Some of the comments have been along the lines of "Why don't you use hardware SPI, it works for me". Firstly, the WS2811 is not a SPI device but if you do have it working, please leave me a note saying what SPI settings you used, because I've not found an obvious way of getting the right 800KHz data rate and the right mark/space ratio that the WS2811 requires, or of avoiding jitter as the ATmega SPI hardware is not double-buffered. Note in particular if you are using the FastSPI library you are not using hardware SPI. When driving the WS2811, TM1809 or TM1804, FastSPI uses bit-banging, as does this code.

Another suggestion is to use the USART in synchronous mode and set to 5 bits per byte. The problem is that the ATmega sends start and stop bits even in synchronous mode, and the signal polarity is wrong as well. There's an inconclusive discussion on Hackaday about this option, but I don't think it's practical. And, as I note above, because of the data rates required, even if you use hardware you are still going to spend most of the available cycles managing it.

I've also been told that I've wasted my time because the FastSPI library can already drive these chips. FastSPI is a fine library, but if you search around you'll find people who have had trouble getting it to work with the WS2811 (including in the comments to this post). I've done an investigation into FastSPI and the possible causes of the problems people have getting it to work and I have the following comments to make:

  • FastSPI depends on the Arduino environment and libraries, which I don't use. It's therefore no use to me. My code has no dependencies on the Arduino environment.
  • The WS2811 datasheet specifies that the allowed variation is +-75 nsec, that's just over 1 clock cycle (62.5 nsec @ 16MHz) so to stay within spec the timings have to be accurate to +-1 cycle. For '0' bits the FastSPI library is spot-on at 1250 nsec but for '1' bits it is 1 cycle over at 1312.5 nsec. That's just within spec.
  • The FastSPI code sends 3 blocks of 8 bits for each RGB value. The 8th bit of each block is significantly out of spec, 1625 nsec for '0' bits and 1687.5 nsec for '1' bits compared to the spec value of 1250 nsec.
  • Between each RGB block (24 bits) there's an even bigger out of spec gap of 2062.5 nsec which is 65% longer than it should be.
  • The overall effect is that, worst-case, the pulse train that FastSPI outputs can be up to 10% longer than it should be. With some batches of chips you may get away with this but with others you may not, which is most likely why for some people FastSPI works and OK and for others it doesn't - it's down to luck. And as I noted above, individual bit jitter is probably more problematic than a pulse chain that is slightly too fast or too slow.
  • Finally, a simple test program that uses FastSPI to set 3 LEDs to a fixed value is 11048 bytes long. The equivalent program using my code is 450 bytes - about 25 times smaller. On the board I'm using, FastSPI would use up 1/3 of the available program memory and that's more than I can afford. The reason for this difference is simple, FastSPI supports multiple LED driver chips and even allows you to select them at run-time whereas mine is just intended to drive the WS2811. That's a classic flexibility/space design tradeoff that in my case doesn't work out in FastSPI's favour, your mileage may of course vary.

Finally, here's a scope trace showing 2 RGB values being output by FastSPI (top trace) and WS2811.h (bottom trace). You can see the jitter between the 8-bit blocks and the 32-bit RGB blocks on the FastSPI trace.

FastSPI (top) versus WS2811.h (bottom)
WS2811.h versus FastSPI

Categories : AVR, Tech


Re: Driving the WS2811 at 800KHz with a 16MHz AVR

FastSPI works fine. I have a chain of 50 WS2811s in a row. Fades up and down in brightness and color changes perfectly fine. See http://www.youtube.com/watch?v=QyqxMJVi4tE for one of my early demos.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

As  I said, you may get lucky, especially if they are the slower 400KHz ones.  But they are not SPI - what did you do with the SPI clock line?  What speed did you clock them at?

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 I'm not sure exactly how the implomentation of the FastSPI libraray works, but I just hooked up the data line set the driver type and it works. If you think using SPI's data line is risky then it looks like you can tell fastSPI to bit bang instead.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Right, in which case you weren't using SPI hardware, you were using FastSPI's bit-banging for the WS2811 - see the Update section above.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

I got those LEDs from aliexpress too, and I expected them to work "just like that", but alas, they don't. At least they don't with FastSPI_LED with TM-1809 timing.

So I am curious about this code but the amount of assembler and the lack of a minimal working example is stopping me. Is there an example available?

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

After some help in shape of a complete minimal example from Alan, I can happily confirm this worked as advertized. At least up to 15 LEDs. I have no idea why FastSPI_LED did not work, but given that for WS2811 both use bit-banging, there is no performance difference per se between both solutions as they basically do both the same.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

See the Update section above for a possible reason why it didn't work. The example code in question can be found here.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Hello

Nice work and interresting article

Could you please add a photo of the LED itself ?

Is it a WS2811 and LED inside the same 5050 package ?

or is it a WS2811 close to a 5050 LED ?

Thanks & best regards

~barbudor~

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 The WS2811 is inside the 5050 chip. It's visible as a small rectangular chip connected to the 3 LEDs.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Can you confirm the output timing table figures?

The "logical 0 low" figure of 680 looks wrong - perhaps it should be 880?  This is also more consistent with the trace shown above the table.

 

 

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Yes they weren't right, I've redone the capture, remeasured the timings and fixed the table, thanks. In general, there seems to be no agreement amongst the datasheets about what the timings should be, and none of them really match up with what the WS2811 actually does. As I said, I think the explanation on the doityourselfchristmas site makes the most sense - the point at which the high-to-low transition takes place doesn't matter all that much, it is the state at 625nsec into the cell that counts, that and making sure that the overall bit cell length is 1.25usec and that there's no jitter between bit cells.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Great post, Alan. Well done on the fine tuning of the assembler code to get the timings just so.  I appreciate the captures of the timings - especially the data between cascaded devices.

I am also playing with WS2811 – got a bunch pre-mounted on tape via Aliexpress. Initially, just for playing around I got one of the cheap multi-chip controller boxes from China, but I’m planning on using a PIC32 – surprisingly cheap for the amount of flash and RAM that’s inside, and I need to get back in to microcontrollers (it’s been about 15 years since I last played with them).

Chhers, Les.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 What are your thoughts on the perceptive linearity of the colors on the LEDs? Since you only have 255 values to play with (ie. to send to the WS2811), does the IC appear to light the LEDs in a somewhat linear fashion (so it doesn't look like it's almost topped out at 60%)?

I make my own xmas lights controllers, and I'm trying to find a cheap but effective way to get pixels, as long as they have reasonable dimming performance.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

I don't think any LEDs are truly linear and even if they were, your eye's response is non-linear anyway. And once you start mixing colours it gets even more complicated. There are tables on the web for adjusting brightness levels but I've never really bothered with them myself. Also, I'm not sure if the RGB values you supply to the WS281 are mapped linearly to the PWM values it uses to drive the LEDs, the datasheet doesn't say. And because the WS2811 is embedded inside the pixel there's no real way to find out. Having said all that, I think these will be fine for xmas lights. In the video they are being updated by 1 step every 10 milliseconds, and I can't see any perceptible jumps.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 Thanks for your thoughts Alan. My apologies, I didn't explain myself clearly -- that's pretty much what I meant.  I'm trying to figure out if the ICs *appear* linear to the human eye; ie. if the WS2811 is mapping the 8-bit values to a non-linear 12 or 16 bit PWM value internally which is "gamma" corrected (ie. http://neuroelec.com/2011/04/led-brightness-to-your-eye-gamma-correction-no/).

Are you able to tell how much of a jump there is in the lowest few values, at very slow step changes? eg. between 0x00 -> 0x01, and then to 0x01 -> 0x02?  How much of a noticable jump is there for each step?  (I'm particularly interested in the low end, because I use it heavily in my display for very slow fades, which can be very noticable. For example, have a look at the end of this video from one of my 2012 songs: http://www.youtube.com/watch?v=xLCqIztOCXk&t=2m20s -- you can easily see steps at the klow end as discrete channels are dropping out.)

Similarly, are you able to discern the difference in intensity between values at the top? ie. are you able to see the difference between 250 and 255?

Thanks for your help and thoughts!

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

You can see it step between about the first 4 levels, beyond that it's imperceptible. As for the top end, difficult to say because they are too bright to look at comfortably and beyond the first 4 levels I can't pick out any individual changes.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 I wish I found your work here before I started in on driving my ws2811 strips.  Feedback on what timeings actually work (I got mine from Ray Wu and the datasheet I got was way off with 1 bit frequency = 2.5 us) would have saved me a ton of time.  For anyone looking to do something similar with a PIC32 using the Arduino-like IDE, the example code here works.  http://myroundpeg.blogspot.com/2012/12/driving-ws2811-base-led-strips-with.html  The code drives 2 strips at the same time.

I definately want to clean this up by optimising in assembly, but there is no support for debugging assembly with the MPIDE.  One thing I would like to do if someone is interested in collaborating, I think more strips can be driven in parallel if using pins associated with a single port register.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

The 2.5usec timing sounds like it is for the WS2811 in low-speed mode (400KHz), most of the WS2811 pixels/strips that people have seem to be running in high-speed mode (800KHz).

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Here be some code for doing it a little different

http://www.instructables.com/id/My-response-to-the-WS2811-with-an-AVR-thing/ 

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Alan,

Excellent Work.  I will try this out over the next couple of days.  Do you think this will drive 100 Pixels (ignoring any voltage drop issues that may arise from a long string)?

I wonder if you code would be simplified if there was no need to correct the RGB vs GRB issue.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

That depends on how fast you want to refresh them but yes, 100 should be OK. Each LED draws around 50mA so you'll have to take that into account.

And yes, the code would much simpler if the order was RGB.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Alan,

Partial Sucess here.Pixel ight up randomly Green Some flash Red and Blue for a while

I am using a string of 50 pixels.

the only mods to your code are

#include "WS2811.h"

added #define F_CPU 16000000UL

changed  //void threepixeldemo(void) to int main(void)

I am running a Mega88 at 16M using Studio 6

Must be time to try that Logic Analyser i purchased.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

You almost certainly need to set up the MCU first before calling the LED display routine.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 No interupts, no timers are being used. 

The bit is use is set up by:

    // Configure pin for output.
    SET_BIT_HI(DDRB, 0); - PB0 is an Output
    SET_BIT_LO(PORTB, 0); PB0 no internal Pull up

Fuses set to Extended 0xf9, High 0xDD & Low 0xFF.

What else is needed?  Be cool, I will work this out.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Depending on the MCU you are using you may need to set it up, for example setting the clock prescaler and/or disable the watchdog timer. See for example here - I'm not saying that's right for your particular MCU but it shows the sort of things you might have to do.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Alan

It turns out the Pixels I had been supplied were tjhe Wrong Ones.  After finally buy a Supper cheap WS2811 string that included a mini controller, I discovered my Pixels still did work.  Using a Sharp knife and side cutters revealed I had been supplied with TLS3001 pixels.  I didn't have a chance in Hell os getting your routine working.

It does work a treat on the new WS2811 string I purchased.

Thanks.

 

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Thanks for this!  I picked up a strip of 60 of these LEDs for $30 on eBay and needed a quick way to test them out.  Turns out I didn't even have to select a different pin, since PB0 is right next to a ground pin on an Arduino 2560.

If people keep giving you grief about your loop I volunteer to post the PWM code I wrote for a PIC (which I selected for fan control out of my junkbox before realizing its lack of PWM).  Its main loop is something like a BSF, then a computed jump into 50 NOP followed by a computed jump into 50 BCF.  An interrupt handler does the ADC for the temperature sensor.  After all, what would I do with any cycles I saved?  It's a fan controller!

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Thanks for taking the time to write the code and article, just picked up a strip of 240 WS2811 LEDs and used your WS2811 library and all the LEDs are updating perfectly using an 16Mhz Arduino Nano.

I plan on attaching the LEDs to my snowboard and hooking it all up to an accelerometer which will use the hardware SPI port so it works out perfectly.

Really great work!

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Thanks for the work on your ws2811 library! 

I've picked up some WS2811 LEDs for testing, to see if they are fast enough to be used for a "POV" display.

My speed test is, using an Arduino Uno, I set a string of 59 LEDs, off, then red, then green and finally blue 1000 times (effectively writing 236,000 LEDs) and measure the time to complete the loop.

Useing FastSPI it took more than a millisecond on average to write an LED, not fast enough!

But your library is plenty fast, but I'm getting an odd result in regards to speed. I get a very consistent slow down when repeating my timing loops.

The first 8 loops average 0.0188 milliseconds per LED

The next 15 loops average 0.2965 

The next 15 loops average 0.5742

And every 15 loops after that slow down right at 0.2777 milliseconds.

It is exactly 8 loops the every 15th loop after that.

Any thoughts?

And what about porting this library to the Teensy 3.0? I could use it's memory for building the image to be displayed by the POV.

Here's the very basic loop code:

void loop(){

  StartTime=millis();

  for(int i=0; i<1000; i++){

    RGB_t rgb[] = {{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0}};

      WS2811RGB(rgb, ARRAYLEN(rgb));

    RGB_t rgb1[] = {{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0},{255,0,0}};

      WS2811RGB(rgb1, ARRAYLEN(rgb1));

    RGB_t rgb2[] = {{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0},{0,255,0}};

      WS2811RGB(rgb2, ARRAYLEN(rgb2));

    RGB_t rgb3[] = {{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255},{0,0,255}};

      WS2811RGB(rgb3, ARRAYLEN(rgb3));

  }

 //(time at the end of the loop - time at the start of the loop) / (number of loops * number of LEDs * number of "colors" in loop)

  Serial.println((millis()-StartTime)/(1000.0 * 59.0 * 4), DEC);

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

I think your numbers are untrustworthy. It takes exactly 30usec or 0.03msec to drive each LED, that's the whole point - it has to be (and is) that exact rate to be going a the 800kHz the LEDs require. Your numbers range both sides of 30nsec which means they can't possibly be right - slower than 30usec/LED yes but faster? Impossible. There are a couple of things that might be going on - you are initialising the 3 rgb arrays each time round the loop, depending on what the optimiser does with that you may be doing a lot of unnecessary work in the inner loop. Also, the LED driver code turns off interrupts and the Arduino library uses one of the AVR timer/counters and interrupts to implement millis() so you may be missing clock tick interrupts if you do a long stream of LED updates with no gaps inbetween.

This code is completely separate from the Arduino environment so it should run on any AVR8 including the Teensy, although I don't have one myself, I've been using a Minimus for my testing.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

He's definitely missing clock ticks - millis becomes unreliable when interrupts get disabled. If you want to try to time things, the thing to do is setup a tight loop where you do ~1000-10000 iterations of writing data to the strip and time that with a stopwatch, then do the math to work backwards.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

I'm working on an update to the FastSPI_LED library that, among other things, should be more portable for driving these chips, beyond just the AVRs but also for the ARM and friends. I'm experimenting with some code generation so that for supporting future variations of this style of chip (tm1809, tm1803, ucs1903, now ws2811) I can just specify the timings for a 0 and 1 bits. Also, while still supporting some half dozen or so chipsets, a "set 3 leds" program is down to about 1300 bytes, mostly because I've kept unrolled the handling of individual bits so that I could do some tighter timing control. In the version of the library that's currently published, the ws2811 is just piggybacking on the tm1809 timings which have a lot more give.

Alan, you'll also be happy to know that the new version of the library is stripping out most of the need for the arduino library (though, if it's around it will make use of it for some definition shortcuts) - mostly becasue i've had requests to make the library available with the msp430 and chipkit and friends as well as avr derived stuff (and for the person who asked about teensy 3.0 above - the teensy 3.0 is the next platform that I'm going to be working with after I finish the rewrite).

Also, I spent some time trying to use the hardware SPI support at full speed to drive the chips - and it work fairly poorly, which is to say not at all. Probably because the inter-byte timing w/SPI had some slop going on.

I have a love/hate relationship with the clockless chips - but have some thoughts for games that I can play when running on a 24-96Mhz arm chip to interleave output for multiple strands, greatly increasing the total output of a single device.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 One last update - the core of the library rewrite is finished - depending on chipset, it now adds 290-714 bytes to sketch size, compared to ~12k from before.  It's also faster - getting 5.8Mbps out w/hardware SPI and 2.9-3.1Mbps with bitbanging the SPI output - and I have a mechanism for getting exact clock timings for the various clockless chipsets (for examply, the ws2811 output is now exactly/solidly 20 clocks per bit) and finally, it should be mostly arduino-code free.  The library will also expose classes for higher performance access to pins and SPI output (allowing people to toggle between hardware SPI and bitbang'd SPI by changing the pins they reference, without any other re-writing/changing code).

Next I need to track down a reliable mapping of pins to ports for some of the various chipsets.

Now I need to do up the interfaces that most people will use to interact with the library (as well as, possibly, compatability classes for the older library, as well as adafruit and other providers' libraries as well), and I should start collecting more chipsets to support...  One day i'll go back to working on LED projects instead of the library! :)

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

One thing you might also look at is using the USART in SPI mode - the larger ATMegas often have multiple USARTs but only one SPI channel, so you'd get the ability to do multiple hardware SPI. The other advantage of the USART in SPI mode is that it is double-buffered whereas the normal SPI channel is not, so you might be able to get higher throughput.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 Yeah - one of the many reasons for this re-write is to, among other things, abstract out the code that implements a particular protocol from the code that drives pins.  For example, the teensy 3.0 can do hardware DMA of an arbitrary number of bytes out over SPI :)

I'll check out the mega's USARTs!

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 Also - as a quick peek - this is what the definition for supporting the ws2811 looks like with the new library:

 

template <uint8_t DATA_PIN> class WS2811Controller800Mhz : public ClocklessController<DATA_PIN, NS(350), NS(350), NS(550)> {};

By defining the class in terms of the timings that are needed, the code can self adjust based on the clock speed we're running at (important as I start moving things to the teensy 3.0 where clock rates of 24Mhz, 48Mhz, and even 98Mhz exist - I don't want to have to hand re-do code every time I change chipsets or clock speeds - with this, I only need to hand-do work when I change underlying instruction set, since the timings of existing instructions will change).

 

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 And the library is up - http://waitingforbigo.com/2013/02/19/fastspi_led2_preview_release/ - your post was a useful reference in refining the ws2811 timings.  Also - thank you, again, for the USART comment.  I have preliminary code in place for also, optionally, using the USART in SPI mode, though it needs testing to make work, but the basic framework is now in place!

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Also, I think I need to get a scope for testing some of this code so that i'm not relying on hand counting instruction clocks to check my timing - what're you using/recommend, alan?

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Hi Dan, after much looking around I bought an OWON SDS 7102 <http://shop.owon.co.uk/sds7102-v-15-p.asp>. It's the first scope I've owned but I'm impressed with it, It was reasonably priced (for a DSS) and does everything I need,

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Hi Alan, I have used your work to drive a 300 WS2811 LED chips stripe and it worked like a charm (using a 150W 5V PSU). Just wanted to say thank you for this great piece of code.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Alan, thanks for this post. It inspired me to write a 8Mhz version that can run of the internal oscillator. Turns out, this can be done and the information in your article was really helpful. See my write up.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

You wanted to know if anyone had this working with SPI.

Before I saw this post I saw this one:

https://www.insomnialighting.com/products/rgbpxws2811.html

He suggests using a 64MHz PIC with a 4MHz SPI clock. I wondered if it would work with something Arduino compatiple, say a ATMega328 clocked at 16MHz. It appears to, although I have not worked out the timing and I don't know if it is within spec. I am using the strips with the embeded WS2811 chip and I don't notice any flicker or jitter. Unfortunately, I do not have as nice a scope as you do. Pixels appear to fade smoothly. There is enough clock cycles left over to take in DMX levels with interrupts. This is the code I use to send the output:

http://doityourselfchristmas.com/forums/showthread.php?25582-RPM-s-DMX-Pixel-Bridge-Modified-Version&p=260725#post260725

I am using the inverse of the SPI data output as it idles in a high state. To compensate, I'm using 0b00001111 for "1" and 0b01111111 for "0". 

-MH

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

As I said in the post, you may get lucky and it might work for you. But the timings won't be correct.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 I still have a preference for the timing adjusted bit banging that i'm using, but this thread has me wondering now, on the teensy 3, there's a much wider range of possible clock dividers, 2-10 inclusive, 12, 14, 15, 16, 18, 20, 21, 24, 28, 30, 32, 40, 42, 48 and more...  you may be able to get closer to the right timings using the SPI clock.  The trick then, however, becomes doing the bit explosion quickly enough to keep the SPI output buffer saturated.  Also - hardware SPI often introduces intra-byte delay (it was as high as 3 SPI clock cycles, if not higher, time wise (the SPI clock didn't cycle during this gap).  I've got it down to 1, but it's still always there.  On the flip side, this does mean I can push LPD8806's at 22Mbit real data transfer rate.  Compared to ws2811's 800kbit rate, or even the 6.4Mbit rate with Paul's OctoWS2811 library - I may be leaning more towards the lpd8806 than ws2811 for certain classes of future projects.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Yeah, it's not really pushing out the bits that's the fiddly part, it's marshalling the data to do it. Did you try using the USART in SPI mode? That's double-buffered so it might be possible to use it to avoid the intra-byte delay.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 I still need to go back and finish implementing the USART support for avr - I've been pretty focused lately on the teensy 3 arm support.  On the teensy 3's arm chip the SPI subsystem has a 4 item FIFO buffer, and in fact each item can be 3-16 bytes, so I can queue up 8 bytes worth of data at a time for SPI.  However, that's the system that introduces all sorts of irritating inter-byte timings (that said, I could probably take said timings into account).  DMA support for background pushing out of data is next in the wings for that.

Even then, though - I'm finding myself frustrated by the ws2811's data rate at 800Khz, as  I'm currenlty pushing lpd8806's at just a hair under 22Mhz.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

I reinvented the wheel using PWM:

http://techblog.zenrobotics.com/2013/04/bit-banging-ws2811-led-strips-with-pwm.html

 

 

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

The use of the PWM hardware is interesting, but as you found the tricky part is that you only have 20 cycles to handle each bit, no matter what technique you use to actually waggle an IO pin.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Alan,

Awesome post, insightful, crisp and clear. Thank you for sharing your work. The code works like a charm with a chain of 240 LEDs. Actually, I know no reason why it should not work with longer chains, since the signal is reshaped by the WS2811 chip in the first and all successive LEDs.

I can confirm that timing is critical, even between 24 bit blocks for LEDs, where one could reasonably suspect that a short gap (say, less than 20 us) might not matter. I modified your assembler programme by inserting "sei - nop - cli" after label 2 in the outer loop in order to allow interrupts to take place between sending data for different LEDs. When my main test programme alternates between allowing and suppressing the timer0 overflow interrupt (a fairly short interrupt routine that keeps track of time in the arduino environment),  the LEDs are addressed only smoothly during periods in which this interrupt was suppressed. However, *very* short interrupt routines are possible with above modification: I switched the timer0 overflow interrupt off (TIMSK0 &= ~(1<<TOIE0)) and installed my own timer1 overflow that interrupts at 4800 Hz.  The following test interrupt routine

volatile unsigned long icount, jcount;

ISR(TIMER1_OVF_vect) {
  icount++;
  jcount += 2;
}

is short enough (2-3 us?) to interrupt your assembler code without resetting the data stream to the LED chain. One more long integer increment, though, tips above test interrupt routine over to make the LED chain reset while the interrupt is being carried out. 

Concluding, if one is desperate to count interrupt events (an unsigned char as opposed to a long in the test routine should be sufficient if the LED chain has fewer than 256 LEDs), it appears possible to do that. What is not clear to me is whether the reset time of the WS2812 LED varies much between chips or changes with age (is this the discharge time of a capacitor in the IC?), so don't know how reliable this is in general.

Best wishes

 

Stefan

PS: The only change I had to make for the Arduino environment on a Mac is to replace 

  #include <WS2811.h>

with

 #include "WS2811.h"

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

I'm pleased to find it works with such a long chain. Your ISR is almost identical to the one I use for my clock ISR which is useful to know if I want to drive long chains and still keep track of the time - thanks!

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 Also, since you know the timing of each cycle of the loop, at the end of writing a chain of led data out you could adjust the timer info to line itself back up.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

Hi Alan

I used your code to drive a 40-pixel WS2811 string on an Arduino Uno and it works perfectly. No flicker or anything! Whilst I can program to an acceptable level in C/C++ my skills with assembler hover about the zero mark. I have another bunch of LEDs which use the not-so-good LPD6803. Whilst the C I've written is as good as I can get it, it obviously wouldn't be as fast as hand-coded assembler, is there anything you could do to help? Quite happy to bribe you with beer money if that helps!

Cheers

Tim

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

I believe the FastSPI library supports the LPD6803, I suggest you use that.

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 The first version of the library does - i've dropped support for the lpd6803's in subsequent versions of the library however.  The problem with the 6803's isn't going to be anything that you can achieve at a code level.  It uses SPI for pushing data out to it, and even the most optmized bitbanging is still not going to match or beatthe fastest hardware SPI speeds (though, i've gotten really really close on avr - but on arm on the teensy 3, i'm capping out at about 4.5Mbit for bit-bang'd SPI and over 22Mbit for hardware SPI).  The problem with the 6803's however is that they don't have anything onboard to drive the pwm, unlike just about everything from the ws2801's and on.  Which means I needed to have a timer function that ran in the background that continuously strobed the clock line to "drive" the pwm.  To add insult to injury, if I just blasted all the data out to the 6803's, the clock would run really fast for that block of time, then slower for the rest of time time which resulted in this really really annoying pulsing effect.  I'm not terribly interested in continuing to support things that require a running clock in the background in the future.  

There is a chance I'll be able to continue supporting the 6803's on the teensy 3 - as the spi system there has an option to run the clock line continuously in the background.  I'm going back and forth on whether or not to do this, however.

I really wish the 6803's would just die off, already.  

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

So are we now agreeing that the critical timing is about seeing that the signal is stable high or low at roughly 625 ns after the rising edge (ie: having it fall and stay low long enough before that, or stay long long enough after that, to cover error margins)?  In which case the overall cycle is not as important.   We can't shrink it too much, but it has some room for being stretched.

Of course the trace of the output from a 2811 will have a precise 1250ns cycle time if the input into it has that cycle time - the chip is going to have the same output bit cycle length since it doesn't buffer the bitstream and retime the bit rate on output - all it can do to reshape & clean the output is to change the timing of the falling edge it sends to the next pixel, within each bit cycle.  The length of the reshaped high pulses is more of a clue, albeit a sloppy one (since it too is subject to variation).

Some people are using SPI inverted to drive these.  Since the AVR's SPI inserts at least a couple clock's worth of idle time (high) between bytes, after inversion this just stretches the "low" time (and thus the bit cycle time) before the next rising edge, which isn't as critical.  

What does this mean?  Sometimes being able to insert a cycle or two here or there can be handy - eg: reducing the need to exactly balance branch times.

We can even insert time for a short ISR between bits, stretching the low time before the nest rising edge  (I don't know if it matters whether the stretched bit cycle is between pixels, between bytes in a pixel, or within a byte).

Some believe that 10 us is safe, but not much more.  It's going to depend on the batch, the temperature, etc.

 

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

 Thanks for this great post, Alan!  The detailed explanation is really helpful.

I think I ended up getting the same led strip as you and I was able to get your sample fading program running pretty consistently on my Arduino UNO (16MHz), but trying much of anything else (e.g. moving a white led up and down the strip) is really choppy and has random flashing, especially in the first led.  I recognize that this is probably a timing issue but I feel like your and others' success with your code indicates the timing is pretty well figured out.

I feel pretty SOL without a scope and am hoping I didn't just get a bad batch of controllers.  Do you have any advice here as to what you'd look into?  Or do I just need a scope?

Re: Driving the WS2811 at 800KHz with a 16MHz AVR

I took the threepixeldemo.c and the WS2811.h and compiled it for ATmega32u2 using Atmel Studio 6.1.

However, if I put the ATmega32u2 to the USB of my Notebook (cause of 5V) and put the separate 5V to the LED Strip, the 4 LEDs lit up once. When taking away the 5V from the Strip and reinserting it, it seems the next color is shown once but not in loop.

Any ideas what i did wrong with this setup?