PI 5 NVME Base - Issues with dtparam=pciex1_gen=3

Hi Guys,

I have the NVME base with supplied Koxia ssd.

With gen 3 enabled I see poor write benchmarks and lots of these errors:

[  776.324543] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[  776.324545] nvme 0000:01:00.0:   device [1e0f:0009] error status/mask=00001000/00006000
[  776.324548] nvme 0000:01:00.0:    [12] Timeout               
[  776.324700] pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
[  776.324705] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
[  776.324708] pcieport 0000:00:00.0:   device [14e4:2712] error status/mask=00000040/00002000
[  776.324711] pcieport 0000:00:00.0:    [ 6] BadTLP                

I get 87490 KB/s 4K random write using: https://raw.githubusercontent.com/TheRemote/PiBenchmarks/master/Storage.sh

With dtparam=pciex1_gen=3 removed I do not get the errors and I get 176009 KB/s 4K random writes.

I have reseated the cable and ssd to make sure that is not the problem.

Anyone any ideas?

Thanks

Andy

No ideas other than what you have tried (the cables are certainly fiddly).
FYI another user with Kioxia SSD getting errors with gen 3 and dropped back to gen 2.

https://forums.raspberrypi.com/viewtopic.php?t=362617

Regards

Thanks for the info and the link.

I’m thinking though that as it is a ssd provided by pimoroni they must have been pretty sure it was working with gen 3, their benchmarks that they link to seem to show this.

I reseated the cable again and am getting less errors, 50% of the time benchmarks at same speed as the pimoroni figures, 50% terrible!

I’m thinking it is the cable, I wish it was a little longer, it seems quite tight.

You should always keep two things in mind:

  • Gen3 is not officially supported. This is high frequency stuff, so running with Gen3 on a system laid out for Gen2 might or might not work and you are expected to get random errors.
  • I am still waiting for a real-life usage-scenario that gives you an advantage with Gen3 over Gen2. And if you need this additional performance, than I question that the Pi5 is a good basis for your needs.

Well for me a real life advantage would be for building software rather than cross compiling, which is basically what I got the PI 5 for.

Also as the pimoroni page specifically mentions gen3 and how to enable it I’m guessing they think it should work, their benchmarks show it working.

I’m still hoping for a replacement cable, do pimoroni support people use this forum?

As it stands I have just disabled it, it works about the same speed as using a USB3 SSD. I was hoping for a bit more though :)

They do, but if you actually want to talk to support to arrange a replacement, you’ll be wanting to email support (which is what “pimoroni support people” will tell you here anyway!)

When you are building software, you won’t even be near the 250MB/s that Gen2 provides (theoretically it should be near 500MB/s).

@bablokb

What is important is random read and write speeds though isn’t it?

It doesn’t really matter though does it, the pimoroni benchmarks show it working, the website tells you to enable it. And here it is not working.

Thanks I will email them.

Yes, but that is limited by the SSD, not the interface.

So when random read and write speeds in the benchmarks change depending on the interface is the ssd changing?

The Pi Foundation is very clear about PCIe 3:

Warning
The Raspberry Pi 5 is not certified for Gen 3.0 speeds, and connections to PCIe devices at these speeds may be unstable.

And the product page from Pimoroni does not claim in any way anything different. The fact that one specific device+ssd combination works does not imply anything. There are so many tolerances involved. A different cable might make a difference, but maybe it is the FPC connector or a miniscule flux residue on the pcb or maybe humidity or something else.

Thanks @bablokb you have been very helpful, I think I will do what @ahnlak says though.

I’m sure anyone else with this issue that comes here will also thank you for your valuable input.

Thanks for your kind words. It would be helpful if you tell us about your results. If swapping the cable helps, then this is definitely an option regardless of the practical impact for Gen2 - it is always good to rule out any hardware issues.

@AndyCap

Please report back on Pimoroni stance on this. Interesting to see if they will replace due to Gen 3 not reliable when only Gen 2 officially supported on the PI5. There is no big bold discalimer on the product page

Hi Guys.

I got a Samsung 980 to test and that seems to work as the pimoroni benchmarks, actually a bit better.

Samsung 980 https://pibenchmarks.com/benchmark/75739/

My 4k random read       	65,794 KB/s               
Pimoroni 4k random read 	62,718 KB/s

My 4k random write      	205,045 KB/s              
Pimoroni 4k random write 	199,937 KB/s
                                  

Kioxia Exceria https://pibenchmarks.com/benchmark/75561/

My 4k random read       	54,563 KB/s               
Pimoroni 4k random read 	49,117 KB/s

My 4k random write      	87,965 KB/s              
Pimoroni 4k random write 	244,597 KB/s

The links are for the pimoroni benchmark figures,

My write figures for the supplied Kioxia drive though are way off from the pimoroni figures though.

Interestingly there are some more recent benchmarks for the Kioxia User Submissions - pimoroni - pibenchmarks.com

Test 76,937 seems to have the same issue as mine, but there are three others that don’t:

Test #76,992
Test #76,939
Test #76,938

So the current guess is that it is the Kioxia drive that may be a little dodgy.

p.s. I have not heard back from Pimoroni yet…

I’ve come across the same issue using a Patriot P310 (240GB) and it’s happening when using gen2 and gen3 speeds. I’m not seeing as many errors as some people (I had two errors close to each other when running ‘dmesg | grep pcie’).

Could someone in Pimoroni please confirm if these error messages are something to worry about or if they can be simply ignored (do I need to re-seat the PCIe cable)?

It’s early days yet, and mostly everyone seems to be doing well with their NVMe Base. Unleashing it on so many people has given us more to digest than our own internal testing, unsurprisingly.

So, with the RPi PCIe spec barely 1-month old at this point, here’s where we are right now. Which should give people something to work with as far as what is expected and what isn’t:

  • As with all new solutions, don’t go putting your family photo archive on there without backups. This is prudence.

  • Some correctable errors at Gen 3 are not uncommon. A few drives show no errors, but since I’m usually testing one drive from one batch, I can’t say for certain that this makes this an automatically ‘good’ drive.

  • Some drives will be more ‘chatty’ than others. We don’t know yet if this is them pushing the envelope, the firmware not being tolerant of out of spec signals, or something else.

  • It doesn’t seem to stop them performing perfectly well, as long as the rest of the setup is installed and playing nicely.

  • At Gen 2 I expect you to see a handful of correctable errors per day at most unless you live in the path of a lot of cosmic rays. Anything more than this you should try reseating the cable, check if your drive is one of the ones we haven’t tested or known to have issues and then if all else fails, check in with support. Running at Gen 2 speeds and with a drive we’ve tested should give you a quiet life :-)

  • It’s normal to occasionally see a ‘bad’ benchmark. Drives have their own internal lives and housekeeping that means sometimes they’re busy and can’t hoik data as fast as they’d like.

  • If you consistently see poor benchmarks (mutliple tests some hours apart and you can’t get a good result is within, say, 20% of what we got in testing most of the time) then get in touch with support with a photo of your setup and the exact drive so we can take a look. It may be we send you another ribbon cable to try, maybe even a new NVMe base to see if that fixes it. If you bought the drive from us, we may swap that as well just in case that’s the thing upsetting the apple cart.

  • If you have an interesting SSD, and are in the UK, we may ask nicely we could swap it for one of ours to improve our stable of test drives. Donations appreciated by arrangement :-)

  • Otherwise, your statutory rights are not affected, and we’re Pimoroni, so we like to go above and beyond with support and try and get you to a happy place.

  • We’re very, very busy with RPi 5 launch/supply and NVMe Base orders at the moment, so sorry if shop/support are slower than normal. The crew are pulling out all the stops, but we are still a wee company in Sheffield in the scheme of things and the sustained peak of activity (somewhat of our own making) does overwhelm us somewhat.

My Pimoroni PIM700 came with a NVMe Base and a 512GB Netac NV2000. With the default PCIe Gen2 configuration, there were no Advanced Error Reporting (AER) messages in the output of $journalctl -b. With the Gen3 configuration, the output contained the following four lines multiple times:

pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
pcieport 0000:00:00.0: device [14e4:2712] error status/mask=00000040/00002000
pcieport 0000:00:00.0: [ 6] BadTLP

These four lines were repeated more frequently with increased SSD activity. These masked correctable errors did not appear to downgrade performance. I turned them off by adding to /boot/firmware/cmdline.txt

pcie_aspm=off

before rootwait. Call this the Gen3x configuration. Reboot. Note

$ journalctl -b | grep ASPM
Feb 18 15:04:06 rpi5 kernel: PCIe ASPM is disabled

After rebooting, the errors were gone. With Gen2, the pibenchmarks score was 38389. With Gen3x, the score was 44516. Pimoroni benchmarked this Netac model and got a score of 44058.

Boot each configuration, run the command

$sudo lspci -vvv > gen2.txt (gen3.txt, gen3x.txt)

and examine line 99 in the three output files.

  • Gen2: DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
  • Gen3: DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
  • Gen3x: DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-

These lines specify capabilities of the Netac NVMe controller. DevSta stands for device status and CorrErr stands for correctable error. The plus means report errors and the minus means do not report errors. Active State Power Management (ASPM) can be used to lower the PCIe lane power consumption when the NVMe drive is idle, but requires support from the kernel, the bridge, and the NVMe controller. The Broadcom 2712 PCI bridge supports ASPM but it is disabled. The Netac NV2000 NVMe drive I am currently using does not support ASPM. Yet the command line option pcie_aspm=off has flipped the Gen3 plus to the Gen2, Gen3x minus, suppressing the Gen3 correctable errors. The output shows these errors are being masked. Tracking down masked correctable errors may only make sense if they are impacting NVMe performance. Consequently, it would be helpful if users experimenting with PCIe Gen3 would report the NVMe drives they are using, the errors they are seeing, and any impact on performance.