Faster ST7735 image_to_data()

The ST7735 module internally converts a PIL image to a buffer of 16-bit 565 RGB bytes for writing to the display. When tested in isolation, the following version is 3x faster:

def image_to_data(image, rotation=0):
    """Generator function to convert a PIL image to 16-bit 565 RGB bytes."""
    # NumPy is much faster at doing this. NumPy code provided by:
    # Keith (
    # Updated to use byteswap/tobytes instead of dstack.
    pb = np.rot90(np.array(image.convert('RGB')), rotation // 90).astype('uint16')
    color = ((pb[:, :, 0] & 0xF8) << 8) | ((pb[:, :, 1] & 0xFC) << 3) | (pb[:, :, 2] >> 3)
    return color.byteswap(inplace=True).tobytes()

However, when I ran it with the example, I actually saw my framerates drop (very slightly) on the pi0.

Any ideas?

The only thing I can suggest is that the bottleneck is somewhere else. Looking at the Python module, the display (I assume you’re talking about Pimoroni’s 0.96" display?) defaults to 4MHz. At 160x80 resolution and 565RGB, that would imply up to ~150FPS, but the store page says “you should be able to run it at up to ~50FPS”. So something else is bottlenecking it. I wonder if it’s something to do with the Python driver for the SPI bus? The bus might be operating at 4MHz but Python might not be pushing data to it that fast. Just a wild guess.

Long post. Sorry.

So, I can’t find it now, but somewhere I read that the max SPI speed the Sitronix module supports is 15mhz, so I’m currently setting spi_speed_hz=15600000 when initing the ST7735 module.

Also, I tweaked the test app to pre-convert all the images using a local copy of image_to_data(). It calls st7735.set_window() once, then in the main loop calls the st7735.send() function to continuously write frame data, nothing else. No image conversion, no display commands, etc.

The ST7735 module uses ‘spidev’ for all the SPI bus interaction, and specifically in the st7735.send() function uses spidev.xfer3(). Checking the code for xfer3() I saw that it’s written to take just about any data format, as well as optionally setting spi bus params, chunking data, etc. It has a ton of overhead.

Looking at the available spidev interfaces, there’s a much simpler function, spidev.writebytes2() that works with our data buffer types. So I did some tests comparing my image_to_data() function using byteswap/tobytes instead of the original dstack/tolist and xfer3() vs writebytes2().

Changing st7735.send() from using xfer3() to writebytes2() instead:
with the original image_to_data(), my framerates go from ~37 FPS to ~45 FPS.
with my modified image_to_data(), my framerates go from ~34 FPS to ~45 FPS.

That’s a pretty nice speedup and suggests that xfer3() has more overhead for the buffer format returned by byteswap/tobytes than dstack/tolist and that writebytes2() handles them equally well.

As a sanity check, reverting back to the original function where it’s calling st7735.display() which is doing the image conversion on every call and testing with both xfer3 and writebytes2():
with the original image_to_data(), my framerates go from ~25 FPS to ~28 FPS.
with my modified image_to_data(), my framerates go from ~30 FPS to ~34 FPS.

This suggests that the speed boost from using byteswap/tobytes over dstack/tolist (it’s about 3x faster) is worth it, especially when used in conjunction with the writebytes2() change when using non-static frame data.

So, on my non-overclocked pi zero using a SPI speed of 15mhz, short of replacing spidev, ~45 FPS seems to be my experimental max FPS.