The text below is a message I just sent to NetBSD's tech-kern mailing list. I'm reproducing it here with better formatting and with some e-mail specific sentences removed.
The tmpfs code is up to the point where I have to start implementing the vnode operations. To do this, I have to decide how to organize the file data in memory as well as all other information needed to manage it.
After thinking about this for a while, it seems that the best way to do this is to follow a layout similar to the one used for existing on-disk file systems. That is, I need:
- A set of nodes that describe files. These could be like regular inodes.
- A set of blocks that store file contents. These could store directories as well (i.e., the "file" representing the directory contents.).
Suppressing the former (merging it with directory contents, thus becoming something like the FAT) means that implementing hard links becomes very complex, so the separation is worth it.
Now, to store these two data structures in memory. Before I started coding, I thought I could use malloc/free to allocate and deallocate blocks of memory on the fly. I.e., when a new block is needed, I do a malloc and use it. When it is useless, I do a free and its associated memory is released.
This is inappropriate because malloc/free operate on wired memory. Furthermore, I guess malloc is an expensive operation and should not be used intensively to avoid its overhead (think about how tmpfs could abuse it).
Therefore, I started reading a bit about UVM, and although I don't understand it very well yet, it seems to me that it is possible to map a region of virtual (unwired) memory and use it at will (just as if you had done a malloc). So, for now, I'm using malloc/free to allocate some (few) memory blocks, and I hope that changing this to use unwired memory will be as easy as doing some call to UVM to request a memory region and use pointers inside it for our own use.
With this in mind, it seems that a good way to map the data structures said before into these memory blocks could be:
- Allocate a memory block of no. nodes * sizeof node. This could be used to hold all nodes. It could be handled with a bitmap.
- Allocate a memory block of no. blocks * sizeof blocks. As before, with an associated bitmap describing its contents.
(Note that the number of blocks and nodes is configured by the user at mount time, and is already implemented.)
The advantages of this structure are:
- Access to blocks and nodes is constant time, as they are located in known memory positions (it's a vector).
- Allocation of blocks and nodes is of linear time due to the location of empty holes in the bitmap. Can be solved with a pool of available entries (at a later time).
- Deallocation of blocks and nodes is constant time.
- It's easy to grow (and to some extent, shrink) the file system at will.
- The implementation is straightforward.
- It is very similar to traditional Unix file systems, so it will be easy to understand.
The code in the CVS did exactly this some revisions before HEAD. In a later change, I undid the separation explained here and used blocks for everything. But I'm starting to regret that decision because it will be very complex to store nodes if I don't want to waste much memory.
If you have any better ides, suggestions or you see something wrong in this rationale, please say it! :-)
Edit (Jul 20th, 00:00): Check out the original mail and all its follow ups. They include very interesting suggestions (such as this one) and expose that my proposal is just wrong (frankly, I could imagine this ;).