After a very long delay, Google has finally chosen the projects that will be part of the Summer of Code program. There seems to be no official announcement in the page yet, but I already received a mail... and... my project is accepted! :-)
I briefly outlined my project some days ago, but I'll explain it in more detail now (copying some paragraphs from the application form verbatim).
At the moment, NetBSD includes a memory-based file-system called mfs. mfs is is just an implementation of the regular ffs - designed for persistent storage - on top of the (volatile) virtual memory system. This means that it uses the same data structures as the on-disk implementation, rendering less than optimal performance and memory usage. As regards the latter, and in words of another NetBSD developer, the physical memory and swap space needed to back these pages constantly grows.
The NetBSD OS is in a need of an efficient memory file-system that uses its own data structures to manage the stored files. The main design goal is to make it use the correct amount of memory to work correctly and efficiently; no more, no less.
Having said this, the visible goals of the project are:
- Implement this memory efficient file-system under the NetBSD OS using a 3-clause BSD license (without the advertising clause). I'll call it tmpfs throughout this post, as I think it'll take this name.
- Document tmpfs in detail, describing its data structures, algorithms used, and the rationales that lead to the decisions taken.
- Write a "file-system how-to" document explaining how to write a file-system driver for NetBSD from scratch. This will be similar in spirit to the Device Driver Writing Guide and will be probably merged into it.
Of course, another goal is to learn about kernel internals. I would love to be able to contribute more to this part of the system, but I'm actually not capable of doing much. I hope to learn the details of the VFS layer during this summer, aiming for other possible contributions in the future.
How will this project be driven? Here is a preview of the development plan:
- Read the file-system chapter in the Design and Implementation of the 4.4BSD Operating System book. I've been told it includes some few notes about how tmpfs could be done (or which are mfs' limitations).
- Read the code of some existing simple file-systems to get an idea of the whole picture (which are the call traces, how is data handled, which are the main entry points, etc.). This includes reading code from ptyfs, kernfs and maybe procfs, all of which are memory-based systems.
- Probably read code from mfs. The systems described in the previous point are very special as regards write support (i.e., it is almost lacking, because it makes no sense), so I feel they won't be specially useful to understand how this functionality works (but I don't know yet). On the other hand, mfs is a complete filesystem, so it will be helpful in this area.
- Design the necessary in-memory data structures to hold directories, files and all the associated meta-data. This will be done with memory-constraints in mind, but, of course, also trying to be fast. A correct balance between speed and memory usage needs to be achieved, hence a design is required beforehand. To make things simpler, the design could be made in two iterations: the first using very simple data structures to speed up development, and the second improving these to achieve maximum performance.
- Implement the file-system itself according to the decisions taken in 4. and the knowledge acquired in 2. and 3.
- Debug (although this will be heavily overlapped with 5.).
- Optimize the code, if possible, and more debugging as this goes on.
- Write the document describing how to write file-systems and the document explaining the internal layout of tmpfs, based on all the notes I'll have taken during the previous points.
I'm willing to start working on this. However, there will be some timing constraints I haven't planned at first: we are having some work done at home, which will take us, at the very least, another two weeks. Furthermore, I have my last final exam on the 28th, so it will be difficult to start any work before that.
At last, I think I'll be mostly away from the Internet during July, though I will surely have intermittent access. This will make things a bit more difficult as regards asking questions or contacting developers, but I'll have to manage it. Fortunately, August will be completely dedicated to the project :-)