2

I'm using Raspbian Stretch with desktop which published at 2018 JUNE.

I'm doing some performance testing with Pi3B+ by running my application, the application is just keep send/write data via TCP/IP and USB serial port to some devices.

It all looks good at beginning hours(say 6 hours), but later, I noticed the Pi just shutdown it self (no screen show, no keyboard response), I have to unplug and plug in the power cable to bring it back.

I've logged the performance counters every 60 seconds in testing by using /opt/vc/bin/vcgencmd related commands and top -n 1 for logging temp, volt, frequency, Cpu/memory usage from the beginning until shutdown, I can see the temp was keep rising from 51'C to 63.4'C, volt is quite stable around 1.3625V, frequency is quite stable by jumping at 1400000000 or 600000000, Cpu usage(in top) is range from 10-30, memory usage(in top) is less than 10

Anyone could help, where I can find the root cause of the shutdown?

Shawn
  • 121
  • 5
  • Be sure to check out the answer provided by @Ingo for more detail on debugging the shutdown issue... if it's not temperature :) – Seamus Jul 01 '18 at 17:43
  • @Seamus I'm quite new to linux and confused about the steps, any detail tutorials about the debug process provided below? – Shawn Jul 03 '18 at 07:27
  • Post a comment under @Ingo's answer; I'm sure he's more knowledgeable on this. – Seamus Jul 03 '18 at 10:01

2 Answers2

3

Maybe systemd.debug-shell can help you to find the reason of the unexpected shutdown. Look at:

rpi ~$ zcat /usr/share/doc/systemd/README.Debian.gz | less

There is a section Debugging boot/shutdown problems. For shutdown problems, run

rpi ~$ sudo systemctl start debug-shell

In situations where the debug shell is not available, you can generate a /shutdown-log.txt file instead:
1. Boot with these kernel command line options:
systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M
2. Save the following script as /lib/systemd/system-shutdown/debug.sh and make it executable:
#!/bin/sh
mount -o remount,rw /
dmesg > /shutdown-log.txt
mount -o remount,ro /
3. Reboot

update:
Giving some more details to use /shutdown-log.txt: with specific command line options as shown above you can tell the kernel to execute a shell script at a very late time on shutdown. Edit cmdline.txt:

rpi ~$ sudo -e /boot/cmdline.txt

and append these parameters with a space between them. Don't insert a line break. The whole cmdline must be one line. Save and quit the editor:

systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M

Create the debug script and make the script executable. Start with:

rpi ~$ sudo -Es

You can copy an paste this block including the EOF to your command line and execute it:

cat > /lib/systemd/system-shutdown/debug.sh <<EOF
#!/bin/sh
mount -o remount,rw /
dmesg > /shutdown-log.txt
mount -o remount,ro /
EOF

root@rpi ~# chmod u+x /lib/systemd/system-shutdown/debug.sh
root@rpi ~# exit
rpi ~$

Now you can reboot and afterwards you will find a file /shutdown-log.txt. It may be a little bit difficult to find the shutdown messages because the kernel also logs its old startup messages. You can look at the timestamp or search for systemd-shutdown. It may be also possible that there are important messages before systemd-shutdown. For example here is the relevant snippet from my raspi:

[   14.348539] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   14.348551] Bluetooth: BNEP filters: protocol multicast
[   14.348568] Bluetooth: BNEP socket layer initialized
[  921.128046] Bluetooth: hci0 sending frame failed (-49)
[  923.621971] systemd-shutdow: 33 output lines suppressed due to ratelimiting
[  923.802419] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[  923.809468] systemd-journald[190]: Received SIGTERM from PID 1 (systemd-shutdow).
[  923.888164] systemd-shutdown[1]: Sending SIGKILL to remaining processes...

You can add any command you want to the script. You may insert your temperature logging. For example I have appended the date to the log:

rpi ~$ cat /lib/systemd/system-shutdown/debug.sh
#!/bin/sh
mount -o remount,rw /
dmesg > /shutdown-log.txt

date >> /shutdown-log.txt

mount -o remount,ro /

Hope you can see something when the raspi gets stuck. It can be that it isn't even able to execute the script or simply doesn't shutdown. Don't forget to revert all this settings when you have finished debugging.

Ingo
  • 42,107
  • 20
  • 85
  • 197
1

From the Raspberry Pi organization's FAQS page:

The Raspberry Pi is built from commercial chips which are qualified to different temperature ranges; the LAN9514 (LAN9512 on older models with 2 USB ports) is specified by the manufacturers as being qualified from 0°C to 70°C, while the SoC is qualified from -40°C to 85°C. You may well find that the board will work outside those temperatures, but we’re not qualifying the board itself to these extremes.

So, you're getting fairly close to the cutoff at 70 degC. The info you've given doesn't prove the shutdown was caused by heat, but at that temp, I would say that it places it on the list of suspects. To determine if it is a heat problem, you'll need to make more runs and keep the temperature cooler.

How can you run it cooler? Several things you can try:

  1. If your RPi has a cover, remove it.
  2. If you're running your test in a very warm ambient, reduce that
  3. Airflow! A small fan of any sort that maintains some sort of airflow around the RPi will cool the electronics.
  4. Heatsink: A number of vendors sell small, "stick-on" heatsinks that will lower the thermal resistance of the CPU.

Try that first; if the shutdown problem persists, post your script/application; perhaps there's something there.

Seamus
  • 21,900
  • 3
  • 33
  • 70
  • thanks for the reply, the 63'C must be the CPU core temp which is far below the 85'C, unlikely the reason, but I'll test though. Any way to detect the LAN chip temp since its dead temp is just 70' C? And if any place, the system should have logged the reason of shutdown? – Shawn Jul 01 '18 at 00:44
  • You could try enabling core dumps... you could also try gdb on your app. Some reads you may find useful: one, two, three. Personally, I'd try a lower temp first to see if that fixes it, but I'm lazy. – Seamus Jul 01 '18 at 00:46
  • The SoC is rated at 85 degC, the CPU only for 70 degC. – Seamus Jul 01 '18 at 00:47
  • 1
    could you explain more for Soc chip and LAN9514 chip, aren't they refer to ARM CPU chip and Local area network chip? – Shawn Jul 01 '18 at 01:18
  • Whoops! You're right; my apologies. The SoC contains the CPU, and the LAN is a different chip (maybe also mfd by Broadcom?). Here's some info on the SoC, and Wikipedia has a decent article, although some sections appear to need an update. All that said, the FAQ referenced above is clear that the 85 degC limit doesn't apply to the entire board and all components. I continue to feel that the quickest way forward is to try eliminating temperature as the cause first. – Seamus Jul 01 '18 at 01:43
  • so we have no way to detect the LAN chip temp, is that right? whatever, I'll do another round of testing and we'll see – Shawn Jul 01 '18 at 01:57
  • looks like the LAN chip is always heater than CPU, and now my CPU is above 65' C, so reasonable the LAN chip reached the dead temp(70' C), I'll do more testing. thanks you so much. – Shawn Jul 01 '18 at 02:34
  • one more question may off the topic, do you know what the chip is in left-top side of Pi3B+ which have a Pi logo on it? – Shawn Jul 01 '18 at 03:04
  • I think it's the WiFi chip, based on this information. – Seamus Jul 01 '18 at 17:37