… or, how to upload lots of data to the file server for the first time using the fastest possible mechanism.
I recently got1 an old PowerMac G5 to play with and installed FreeBSD 12 on it. At this point in time, FreeBSD seems to be the only moden OS that fully works on this machine, which is great given my OS preferences but sad overall. (Things were different in 2013.)
The PowerMac had been sitting under my desk since early January, but it’s noisier than I’d like and I only access it over SSH2… so I spent last weekend setting up space in the garage to host the machine. As part of this, I extended my physical network leveraging the already-existing coax wiring in the house, using an extra goCoax MoCA 2.5 adapter (non-affiliate link). And just yesterday, I got a second-hand 4TB drive to see how well this machine can act as a file server.
In particular, I would like to host pictures and videos on the machine so that they can be accessed from multiple devices at home. All of these files currently reside on an external USB 3.0 drive on my Mac Pro and amount to about 2TB of storage.
So the question was: how can I seed the contents of the G5 with these files?
The first thought that you might have is: well, the data currently lives on a USB drive. Just attach it to the PowerMac and copy the data! Unfortunately, it’s not that easy. First of all, the drive is using APFS and I can’t imagine this file system being accessible from FreeBSD. But I didn’t bother to look because there was one more important issue: while the drive is USB 3.0, the PowerMac only sports USB 2.0, which means the transfer rates I’d see would be around 20MB/s. Not great.
The second thought is to leverage the network, of course. A transfer rate test shows that my connection from the Mac Pro to the PowerMac can sustain 70MB/s, which is almost three times as fast as the USB 2.0 option—and 2TB of data are a lot to copy. (Yes, yes, 70MB/s is shy of what the Gigabit equipment involved in the connection should reach. I still have to diagnose where the inefficiencies lie and fix those.)
The third thought is to take the external drive out of its USB enclosure and attach it to the SATA bus on the PowerMac. That’d be the fastest indeed, but I’d encounter the issue with APFS… and the lack of SATA ports on the machine.
So, we are left with network transfers and a limit of 70MB/s.
But I measured those 70MB/s with a synthetic test using Netcat: in other words, a simple socket without encryption nor a complex server processes backing it up. Could I reach these same transfer speeds during the data copies? Copying data usually involves more-complex protocols so there can be other inefficiencies.
The first obvious choice we have is to use scp. Using scp involes encryption (high CPU usage) and the scp protocol doesn’t seem to be very friendly to lots of small files. The maximum rates I saw using this mechanism where about 30MB/s, which are very far from the maximum limit I had measured.
The second obvious choice is rsync. But it isn’t, because rsync is backed by SSH and suffers from the same issues as scp. Yes, I know rsync can be configured to not use SSH (as it used to do in the distant past), but I was lazy to set things up for this.
The third obvious choice was to rely on the Samba service I had already set up on the G5. Large file copies over this protocol were able to reach the peak rates I had measured, but transferring lots of small files (the Lightroom catalog, for example) was excruciatingly slow. I still wonder how NFS might stack to this test but, as before, I have been lazy to set it up.
So we get to the fourth, not-so-obvious choice. If Netcat was the thing that reached the fastest transfer rates, can we use it to copy the files? Yes, we can! So, how?
Copying files with tar and Netcat
On the server, open a socket and pipe its output to
tar so that it unpacks the contents on the external drive (which I’ve mounted under
$ cd /depot/jmmv
$ nc -l 1234 | tar xvf -
And on the client, upload all contents using
tar and piping the output to Netcat:
$ cd /Volumes/Data
$ tar cvf - Pictures | nc g5 1234
Voilà. A sustained 70MB/s during large file transfers. (Small file transfer are still slower due to inefficiencies involving file seeks on the local and remote machines.)
I believe this is the fastest you are going to get for a bulk file copy:
We are using a single socket and streaming all files as “just a bunch of bytes”.
tarpacks the data in a very simple sequential format and the network connection does not interpret it. In particular, there is no cross-machine protocol overhead to communicate that different files are being transfered (which is what killed Samba performance with small files).
We are not using compression because most files are already compressed (pictures and videos) and we’d be burning CPU for no reason. The CPUs on the G5 are relatively strong but no need to stress them.
We are not using encryption because, well, I’m assuming my network is secure (I know, I know…), which is what was killing scp and rsync performance. If you are worried about data integrity, you can run a follow-up rsync over SSH once the initial seeding is complete to ensure the data in the destination is good. And if you are worried about data snooping… well, then use encryption and pay the cost.
So that’s about it. We’ll see how well this experiment with the G5 goes. The file server isn’t going to be faster than the locally-attached USB drive I was using, but I don’t think that’s going to be noticeable given my simplistic usage patterns.
Yes, I already had one of these machines a few years ago, but I stupidly sold it so I had to buy it again. I just like this machine. I guess the second one I got is better because it is liquid cooled so it is generally silent and it’s slightly more modern. ↩︎
I had wanted to use this machine as a desktop… but, unfortunately, the NVIDIA card it includes is not well-supported, the machine doesn’t have AGP, and finding a PCI Express ATI card for this machine seems impossible. ↩︎