I’ve had a Raspberry Pi 3 in the garage running Raspbian so it was attached to Ethernet for a long time. A few weeks ago, however, I wanted to bring the Pi into the house so that my kid, who was showing interest in robotics, and I could play with it. That required having the ability to place the device onto the dining table, next to a laptop, which meant connecting it to WiFi. Easy peasy, right?

Well… while that should have been trivial, it did not work right away and the solutions I found online back then were all nonsensical. I gave up in desperation because I did not have enough time to find the root cause, and all interest was lost. Until last weekend when I gave this ordeal another try. At this point, I found once again the same nonsensical solutions online, got equally frustrated about the fact that they even existed, and decided to find the real answer to my problem on my own.

Yes, this is mostly a rant about the Internet being littered with misleading answers of the kind “I reinstalled glibc and my problem is gone!”. But this is also the tale of a troubleshooting session—and you know I like to blog about those.

A blog on operating systems, programming languages, testing, build systems, my own software projects and even personal productivity. Specifics include FreeBSD, Linux, Rust, Bazel and EndBASIC.

0 subscribers

First steps

Before even unplugging the Pi from its physical network link, I had to configure its WiFi connection so that I could use it on the dining table. To do so, I was going to need to run more than one command as root so, in preparation for that, I started a root shell. I was helpfully greeted by the following:

jmmv@rpi3:~$ sudo su -
Wi-Fi is currently blocked by rfkill.
Use raspi-config to set the country before use.
jmmv@rpi3:~#

Nice. Somehow Raspbian expects that what one most likely wants to do after becoming root is to configure the WiFi and it points us in the right direction. That’s… a bold assumption, but hey, it was pretty accurate.

In any case, this was the first time I heard of rfkill. From the name, I assumed that this controlled some kind of kill switch for all radio devices in the system—the kind of switch you would need to implement an “airplane mode” feature. I further assumed, based on this message, that rfkill was active because one needs to know the location of the device to decide which radio frequencies can be used.

I did as told: I ran raspi-config, went into the network configuration options to set up the WiFi, selected the right country, joined a WiFi network, and dropped back into the shell. As I did that, the WiFi connection was up and running and rfkill reported the right thing:

root@rpi3:~# ifconfig | grep wlan0
wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
root@rpi3:~# rfkill
ID TYPE      DEVICE      SOFT      HARD
 0 wlan      phy0   unblocked unblocked
 1 bluetooth hci0     blocked unblocked

Seeing this and knowing from past experience that the raspi-config setting changes are permanent, I tried to SSH into the Pi over the WiFi adapter, confirmed that it was working, shut the system down, unplugged the Raspberry Pi from the network, brought it into the house from the garage, turned it on and… it wasn’t reachable over WiFi anymore. What the…!?

Thinking about the problem

This did not make any sense. The instructions told me to configure a country and I had set one up, but somehow that was insufficient. Interestingly, though, logging back in as root showed me the same warning about having to configure a country… so I double-checked things.

jmmv@rpi3:~$ sudo su -
Wi-Fi is currently blocked by rfkill.
Use raspi-config to set the country before use.
root@rpi3:~# rg country /etc/
/etc/wpa_supplicant/wpa_supplicant.conf
3:country=US
... and more matches ...

The country seemed to be configured properly in persistent storage in a place that made sense but:

root@rpi3:~# rfkill
ID TYPE      DEVICE      SOFT      HARD
 0 wlan      phy0     blocked unblocked
 1 bluetooth hci0     blocked unblocked

The WiFi adapter was indeed back to the blocked state. Why? As you might imagine, running the Raspberry Pi in a headless state and with a WiFi connection surely has to be a well-anticipated scenario, which explains why logging in as root has, of all things, a check for the WiFi state and a pointer to configure it. The WiFi should work without extra effort.

Initial research

I don’t remember how exactly I came across this, but I ended up noticing the following files:

jmmv@rpi3:~# fd rfkill /lib/systemd/
/lib/systemd/system/systemd-rfkill.service
/lib/systemd/system/systemd-rfkill.socket
/lib/systemd/systemd-rfkill

As it turns out, there is a systemd service to manage the rfkill state. Looking at its manual page (while shocked that one even existed):

SYSTEMD-RFKILL.SERVICE(8)   systemd-rfkill.service   SYSTEMD-RFKILL.SERVICE(8)

NAME
       systemd-rfkill.service, systemd-rfkill.socket, systemd-rfkill – Load
       and save the RF kill switch state at boot and change

systemd-rfkill is a shutdown service that saves the blocked/unblocked state of rfkill and restores it at system startup time. It seems that this whole rfkill thing was well-thought after all. But this finding meant that rfkill should have remained unblocked after the reboot. And it didn’t. So this is the question that had to be answered.

Trying to find an answer online

Searching online for a solution, using obvious queries like raspberry pi enable wifi boot rfkill, resulted in me finding the following “solutions”:

Disable rfkill via a kernel command line setting and ask systemd to not restore state. Sure, that’ll work. But it’s… drastic? Why would I have to do that when the system seems to be prepared to “just work” after a raspi-config? Why should I disable a core feature that’s installed by default to make headless WiFi work? This will “break” whatever thing is using rfkill during the boot process to disable the network, and it will possibly break any “airplane mode”-like toggles that might exist in the UI. Not that I care about the UI, but still.
Add a systemd startup script to force rfkill to the unblocked state. Another thing that will work. But why should I do that? This will just paper over whatever exists in the boot process that disables the network after systemd-rfkill has done its thing. It does not address the root cause.
Create a WiFi configuration file in the boot partition. OK, maybe this one makes sense. Maybe if the system knows to configure the WiFi early enough during the boot process, it knows that the WiFi must remain enabled later on. A bit of research proved this solution wrong too though. I stumbled upon /lib/systemd/system/raspberrypi-net-mods.service, which is the boot code that handles this feature, and it reads like this:
```
[Unit]
Description=Copy user wpa_supplicant.conf
ConditionPathExists=/boot/wpa_supplicant.conf
Before=dhcpcd.service
After=systemd-rfkill.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/mv /boot/wpa_supplicant.conf /etc/wpa_supplicant/wpa_supplicant.conf
ExecStartPost=/bin/chmod 600 /etc/wpa_supplicant/wpa_supplicant.conf
ExecStartPost=/usr/sbin/rfkill unblock wifi

[Install]
WantedBy=multi-user.target
```
This snippet runs when /boot/wpa_supplicant.conf is present (ConditionPathExists), but all it does is move that file to the right place and unblocks the WiFi using the rfkill unblock wifi command—the same one that raspi-config invokes and the command I had already tried to run by hand. The key insight is that because this unit moves the file, it only runs once. That makes sense if you look at this unit as a first-time setup action for headless installations, but it had no chance of fixing my problem.

So no. None of these “solutions” address the root cause. They are hacks and workarounds that may achieve the desired outcome, but they don’t explain why the systemd-rfkill automation, which is built into the system and should restore the changes made by raspbi-config, isn’t working as designed.

Proper troubleshooting

Given that these answers were all misguided, I had to do some extra work to reach the true solution.

The first question to answer was: did the systemd-rfkill service even work fine? It took me a bit of fiddling to discover how to enable debug logging in systemd, but once I did that, I could confirm that this service was indeed working fine. The service correctly persisted the disabled state to a file and restored it on the next boot.

This meant that there had to be something else in the boot process that disabled the WiFi after systemd-rfkill had run. The question was finding what that was, and having this knowledge meant I could better scope my next searches a bit more accurately.

My next thought took me to NetworkManager. I have had my previous fights with this service in the past, so I thought that maybe this was installed on the system to handle the state of network connections in the UI. I do have the standard Raspbian desktop installed but I had no easy way to log into it… so verifying this wasn’t exactly trivial. (In retrospect, I think that if I had gone through the hassle of logging into the UI and clicked on whichever button exists to enable the WiFi, my problems would have been resolved immediately. But then I wouldn’t have gained all of this knowledge.)

Fear not though. NetworkManager has a CLI to manipulate its state so I could use that. Except for the fact that NetworkManager was nowhere to be seen on the system. Dead end.

My next step was to look into the standard Debian configuration under /etc/network/, but I couldn’t find anything obvious either. Actually, I couldn’t find anything because the interfaces aren’t apparently configured there at all anymore. Yay for modern Linux?

It is then that I reverted to searching online (I don’t remember the right search terms though) and I ended in yet another forum that talked about connman. And bingo! I did have connman installed and it sounded like the same NetworkManager thing I had seen before. Running a connman enable wifi was all it took to mark the WiFi link enabled and get the right fix in place. And by “right fix” I mean a fix that works in conjuction with the way the whole system is expected to run, not in spite of it.

In other words: it was connman that was kicking in after the systemd-rfkill startup script, and it was connman the one that was re-disabling the WiFi because connman thought that it had to be disabled. Now, as to why I have connman installed and whether it should be there or not… I do not know. This Raspbian installation has been in place for years now and has been upgraded a few times, so my guess is that connman is there to support the lightweight Raspbian desktop, and at some point in the past I logged into it and clicked on the button to disable the WiFi.

The AI future

The Internet being polluted by wrong answers obviously impacts everyone, from beginner engineers and sysadmins that will not know/care enough to find the true root causes behind problems, to our new AI overlords which will outright lie to us.

Or will they? Let’s try to get an answer from Bing’s AI-powered new search engine:

Q: I enabled the WiFi on the Raspberry Pi with raspi-config but, after a reboot, rfkill reports that the WiFi is blocked. How do I fix it?
A: You can try running the command sudo rfkill unblock wifi to unblock the wifi. If that doesn’t work, you can try adding the command rfkill unblock all to /etc/rc.local file before the line that says exit 0. This will ensure that the command is run at boot time and should unblock the wifi.

Yeah, no. While that might have the right effect as described earlier, it’s not the correct answer. But I know the answer, so I can cheat and prompt the question towards what I think the problem really is:

Q: But why doesn’t the rfkill unblocked state persist after the reboot? The systemd unit says it should.
A: The reason why the rfkill unblocked state doesn’t persist after reboot is because the Network Manager is blocking it back¹. You can try creating a service to unblock the wifi devices at boot time. Here’s how you can do it: … snippet to modify systemd …

Oh wow, I’m actually surprised. This answer was pretty close!

Q: But I do not have NetworkManager installed. What else could it be?
A: If you don’t have NetworkManager installed, then it could be that the wifi is being blocked by the kernel. You can try adding the following line to /etc/modprobe.d/raspi-blacklist.conf file: … more steps …

Back to the wrong heavy-handed solution of fully disabling rfkill. Bing… you were soooo close, but yet so far. Finally:

Q: Could it be connman?
A: Yes, it could be connman. You can try disabling connman by running the following command: … steps to fully disable …

Of course it could obviously be connman’s fault, but the answer is wrong. I might want to have connman installed and the system ought to work with it (it was there by default after all). Uninstalling random system components isn’t usually the answer.

Is there any moral to this story? Sure there is, the usual one: in order to solve a problem, you must always understand what the root cause really is. This typically requires understanding how the system works top-to-bottom and reasoning about what the behavior should be end-to-end. Only then you can come up with the right fix. Asking “why” several times in a row tends to work well in discovering what you might be missing. Oh, and this all applies to software bugs as well: the next time you find yourself adding a nullness check to fix a crash… keep digging.

It would be pretty funny if my root-causing in this post was also wrong. If that’s the case, don’t hesitate to say so!