Hello,
I’ve started running into some strange issues with my setup, and I can’t reliably trace them back to a specific cause. My suspicion is that it’s related to storage, but I’m not entirely sure, so I’m trying to shed more light on the problem.
My setup:
-
Raspberry Pi 5, 8GB model (soon upgrading to 16GB)
-
Official active cooler
-
Pimoroni NVMe Base
-
Lexar NM710 500GB NVMe
-
Official USB-C power supply
This setup has been stable for about a year and a half. A few months ago, I moved it to a different environment. The only real change was temperature (smaller room, rack-mounted, less airflow), but it ran fine for about 2–3 months after the move.
A few weeks ago, I noticed the Pi started acting up. When I investigated, I found that I couldn’t log into it. Most of my services were in a semi-crashed state, and it seemed like only the kernel and a few core components were working. For example, the Pi would still respond to pings and route WireGuard connections, but SSH logins would always fail.
I also saw that the CPU usage was abnormally high. From the partial monitoring data (via Netdata), I caught a glimpse of some stats: NFS client activity, page faults, and a large spike in I/O writes. CPU temperature was around 65 °C nad nvme was about 45°C
My theory is that the NVMe drive freezes or otherwise becomes unreadable, which would explain why services crash or return errors (e.g., 404s) and why SSH might fail (perhaps the Pi can’t access the authorized keys). I haven’t been able to gather proper logs, since I can’t connect once it gets into this state.
At this point, my best guesses are either temperature-related or power-related issues. But 45 °C doesn’t seem too high, and the official power supply should be sufficient. I’m running out of ideas, so any help or suggestions would be greatly appreciated.