Tmpfs - Julio Merino (jmmv.dev)

SoC: Status report

It has already been a week since the last SoC-related post, so I owe you an status report.

Development has continued at a constant rate and, despite I work a lot on the project, it may seem to advance slowly from an external point of view. The thing is that getting the ATF core components complete and right is a tough job! Just look at the current and incomplete TODO list to see what I mean.

Some things worth to note:

The NetBSD cross-build tool-chain no longer requires a C++ compiler to build the atf-compile host tool. I wrote a simplified version in POSIX shell to be used as the host tool alone (not to be installed). This is also used by the ATF's distfile to allow "cross-building" its own test programs.
Improved the cleanup procedure of the test case's work directories by handling mount points in them. This is done through a new tool called atf-cleanup.
Added a property to allow test cases specify if they require root privileges or not.
Many bug fixes, cleanups and new test cases; these are driving development right now.

On the NetBSD front, there have also been several cosmetic improvements and bug fixes, but most importantly I've converted the tmpfs' test suite to ATF. This conversion is what has spotted many bugs and missing features in ATF's code. The TODO file has grown basically due to this.

So, at the moment, both the regress/bin and regress/sys/fs/tmpfs trees in NetBSD have been converted to ATF. I think that's enough for now and that I should focus on adding the necessary features to ATF to improve these tests. One of these is to add support for a configuration file to let the user specify how certain tests should behave; e.g. how to become root or which specific file system to use for certain tests.

I also have a partial implementation to add a "fork" property to test cases to execute them in subprocesses. This way they will be able to mess all they want with the open file descriptors without disturbing the main test program. But to get here, I first need to clean up the reporting of test case's results.

On the other hand, I also started preparing manual pages for the user tools as some of them should remain fairly stable at this point.

July 28, 2007 · Tags: atf, soc, tmpfs
Continue reading (about 2 minutes)

tmpfs added to FreeBSD

A bit more than a year ago, I reported that tmpfs was being ported to FreeBSD from NetBSD (remember that tmpfs was my Google SoC 2005 project and was integrated into NetBSD soon after the program ended). And Juan Romero Pardines has just brought to my attention that tmpfs is already part of FreeBSD-current! This is really cool :-)

The code was imported to FreeBSD-current on the 16th as seen in the commit mail, so I suppose it will be part of the next major version (7.0). I have to thank Rohit Jalan, Howard Su and Glen Leeder for their efforts in this area.

Some more details are given in their TMPFS wiki page.

Edit (June 23): Mentioned where tmpfs is being ported from!

June 22, 2007 · Tags: freebsd, tmpfs
Continue reading (about 1 minute)

tmpfs marked non-experimental

The implementation of an efficient memory-based file system (tmpfs) for NetBSD was my Google Summer of Code 2005 project. After the program was over, the code was committed to the repository and some other developers (specially YAMAMOTO Takashi) did several fixes and improvements in it. However, several problems remained in it that prevented tagging it release quality (see this thread).

Finally I found some time to deal with most of them, something that has kept me busy for around three weeks (and which I should have done much, much earlier). All the issues that were resolved are detailed in this other post.

There still are some problems in the code (which code doesn't have any?) but these do not prevent tmpfs from working fine. Of course they should be addressed in the future but people is already enjoying tmpfs in their installations and have been requesting its activation by default for a long time.

Hence, after core@'s blessing, I'm proud to announce that tmpfs has been marked non-experimental and is now enabled by default in the GENERIC kernels of amd64, i386, macppc and sparc64. Other platforms will probably follow soon.

The next logical step is to replace mfs with tmpfs wherever the former is used (e.g. in sysinst) but more testing is required before this happens. And this is what 4.0_BETA will allow users to do :-) Enjoy!

November 11, 2006 · Tags: netbsd, tmpfs
Continue reading (about 2 minutes)

Making vnd(4) work with tmpfs

vnd(4) is the virtual disk driver found in NetBSD. It provides a disk-like interface to files which allows you to treat them as if they were disks. This is useful, for example, when a file holds a file system image (e.g. the typical ISO-9660 files) and you want to inspect its contents.

Up until now vnd(4) used the vnode's bmap and strategy operations to access the backing file. These operate at the block-level and therefore do not involve any system-wide caches; this is why they were used (see below). Unfortunately, some file systems (e.g. tmpfs and smbfs) do not implement these operations so vnd could not work with files stored inside them.

One of the possible fixes to resolve this problem was to make vnd(4) use the regular read and write operations; these act on a higher (byte) level and are so fundamental that must be implemented by all file systems. The disadvantage is that all data that flows through these two methods ends up in the buffer cache. (If I understand it correctly, this is problematic because vnd itself will also push a copy of the same data into the cache thus ending up with duplicates in there.)

Despite that minor problem, I believe it is better to have vnd(4) working in all cases even if that involves some performance penalty in some situations (which can be fixed anyway by implementing the missing operations later on). So this is what I have done: vnd(4) will now use read and write for those files stored in file systems where bmap and strategy are not available and continue to use the latter two if they are present (as it has always done).

Some more information can be found in the CVS commit and its corresponding bug report.

November 9, 2006 · Tags: netbsd, tmpfs, vnd
Continue reading (about 2 minutes)

NetBSD's KNF: Prefixes for struct members

The NetBSD coding style guide, also known as Kernel Normal Form (KNF), suggests to prefix a struct's members with a string that represents the structure they belong to. For example: all struct tmpfs_node members are prefixed by tn_ and all struct wsdisplay_softc members start with sc_. But why there is such a rule? After all, the style guide does not mention the reasons behind this.

The first reason is clarity. When accessing a structure instance, whose name may be anything, seeing a known prefix in the attribute helps in determining the variable's type. For example, if I see foo->sc_flags, I know that foo is an instance of some softc structure. As happens with all clarity guidelines, this is subjective.

But there is another reason, which is not more technical. Using unprefixed names pollutes the global namespace, a specially dangerous situation if the structure belongs to a public header. Why? Because of the preprocessor — that thing that should have never existed — or more specifically, the macros provided by it.

Let's see an example: consider a foo.h file that does this:

#ifndef _FOO_H_
#define _FOO_H_

struct foo {
    int locked;
};

#endif

And now take the following innocent piece of code:

#define locked 1
#include <foo.h>

Any attempt to access struct foo's locked member will fail later on because of the macro definition. Prefixing the variable mitigates this situation.

April 18, 2006 · Tags: tmpfs
Continue reading (about 2 minutes)

tmpfs on FreeBSD

It has just been brought to my attention that tmpfs is being ported to FreeBSD by Rohit Jalan. These are good news: more eyes looking at the code (even if it has been modified to work on another OS) means that more bugs can be catched.

April 14, 2006 · Tags: tmpfs
Continue reading (about 1 minute)

SoC: Introductory article to tmpfs

Dr. Dobb's Journal is running a set of mini-articles promoting Summer of Code projects. Next month's issue includes the tmpfs' introductory article, written by me and William Studenmund, the project's mentor.

Looks like you have to register to access the full article; previous issues used to have them publically available. Personally, I'm going to wait for the printed version :-)

February 11, 2006 · Tags: tmpfs
Continue reading (about 1 minute)

File systems documentation for review

My Summer of Code project, tmpfs, promised that I would write documentation describing how file systems work in NetBSD (and frankly, I think this point had to do a lot with my proposal being picked up). I wrote such documentation during August but I failed to make it public — my mentor and I first thought about making it an article (which would have delayed it anyway) but soon after it became apparent that that structure was inappropriate.

Anyway, I proposed myself to deal with the documentation whenever I had enough free time to rewrite most of it and restructure its sections to make it somewhat decent. And guess what, this is what I started to do some days (a week?) ago. So... here is the so-promised documentation!

Be aware that this is still just for review. The documentation will end up either being part of The NetBSD Guide or being a "design and implementation" guide on its own.

Also note that there is still much work to do. Many sections are not yet written. In fact, I started writing the general ideas to get into file system development because, once you know the basics, learning other stuff is relatively easy by looking at existing manual pages and code. Of course, the document should eventually be completed, specially to avoid having to reverse-engineer code.

I'll seize this post to state something: the lack of documentation is a serious problem in most open source projects, specially those that have some kind of extensibility. Many developers don't like to write documentation and, what is worse, they think it's useless, that other developers will happily read the existing code to understand how things work. Wrong! If you are willing to extend some program, you want its interface to be clearly defined and documented; if there is no need to look at the code (except for carefully done examples), even better. FWIW, reading the program's code can be dangerous because you may get things wrong and end up relying on implementation details. So, write documentation, even if it is tough and difficult (I know, it can be very difficult, but don't you like challenges? ;-).

January 27, 2006 · Tags: tmpfs
Continue reading (about 2 minutes)

NetBSD: File system directories, part 2

In the first part, we saw what a directory is and gave some fuzzy ideas on how it is implemented. Let's now outline the most common operations run on directories: lookup and readdir.

The lookup operation receives a path component name (a string without slashes) and returns the node pointed to by this name within the directory, assuming, of course, that the entry exists. Otherwise, it tells the caller that the entry is missing or incorrect (i.e., not a directory). This operation takes advantage of the name cache because it must be fast; keep in mind that lookups are executed extremely often.

The implementation of the lookup operation, however, is very complex. It is cluttered by a weird locking protocol and has a lot of special cases. These include access advices, a technique used to tell the operation what kind of lookup is happening: a creation, a removal, a rename or a simple lookup. UFS uses these to locate empty holes in the directory while looking for an entry, among other things. tmpfs uses it to avoid two lookups for the same file on some operations, such as the creation.

On the other hand, we have the readdir operation, the one used to read the contents of a directory. This operation is conceptually simple, as all it has to do is read as much entries as possible from the offset given to it. These entries are returned in a standard format, described in getdents(2).

However, there is a tricky thing in readdir: the cookies. They are used by the NFS server to map directory entries to offsets within it so that further lookups can be done in a more efficient manner. For each entry returned by readdir, a cookie is also returned that specifies its physical offset inside the directory. A further call to this operation using the cookie's value could restart the read at the point where the entry lives.

It is also interesting to note that some file systems return fake cookies because they do not have physical offsets within them — in other words, they are not stored on disk. This happens in, e.g., tmpfs or kernfs.

Post suggested by Pavel Cahyna.

November 18, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

NetBSD: File system directories, part 1

A file-system directory is an object that maps file names to nodes (i-nodes in UFS terminology). When given a file name, the directory must be able to tell whether it has that name or not and return the node number attached to it. File names are not stored in the nodes themselves as this allows for hard link creation flawlessly: you can have multiple directory entries pointing to the same file with no extra cost.

Let's see a very simple directory implementation, coming from NetBSD's tmpfs (see tmpfs.h). This file system implements directories by using a linked list (struct tmpfs_dir) of directory entries (struct tmpfs_dirent). These structures look like:

struct tmpfs_dirent {
        TAILQ_ENTRY(tmpfs_dirent) td_entries;
        uint16_t td_namelen;
        char *td_name;
        struct tmpfs_node *td_node;
};
TAILQ_HEAD(tmpfs_dir, tmpfs_dirent);

Of special interest are the td_name field, which holds the entry name (file name), and the td_node pointer, which points to the file system node related to this directory entry (UFS could use i-node numbers instead).

This implementation is really simple as it is completely backed by virtual memory; adding and removing entries is as easy as allocating a memory block and modifying the linked list accordingly. It could, of course, be more complex if it used a B+ tree or some other structure instead.

However, on-disk file systems do extra tasks to optimize directory accesses. For example, when an entry is removed, it is marked as deleted using a special flag but it is not really removed from disk, because shrinking the directory could be expensive. Similarly, new entries overwrite previously deleted entries, if possible.

In the next post, we will outline how some directory operations (such as lookup and readdir) work.

Post suggested by Pavel Cahyna.

November 17, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

SoC: Payment received

Being part of Planet SoC, I think it is a good idea to post this: I've just received Google's cheque for my Summer of Code 2005 tmpfs project! I'm happy :-)

Unfortunately, due to some tax issues, Google has withhold a 30% of the original payment. I hope to be able to ask for a refund the next year...

October 24, 2005 · Tags: tmpfs
Continue reading (about 1 minute)

NFS exports lists rototill

After two weeks of work, the NFS exports lists rototill that I briefly outlined in this past post is finished and committed into NetBSD's source tree. Believe it or not, the whole set of changes was triggered by a XXX mark in mountd(8)'s code (in other words, fixing code marked as such is not always trivial).

In the past, when a file system wanted to support NFS, it had to include two fields in a fixed position of its mount arguments structure due to the broken way in which mountd(8) handled the mount(2) calls. Furthermore, each file system had to deal internally with the NFS exports list, duplicating code among all NFS-aware file systems; this feature was clearly generic, so it had to be placed in an upper generalization level. At last, you had to use the mount(2) system call in a wired way to change the exports.

September 23, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

Linker's link sets

I don't know about other linkers, but GNU ld provides a very useful feature: link sets. A link set is a list of symbols constructed during link time which can then be inspected in regular code. This is very interesting in situations when you want to initialize several subsystems from a centralized place but don't know which of these will be available; that is, you don't know which ones will be in the final binary.

September 18, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

Interface to change NFS exports

While adding NFS support to tmpfs, I found how NetBSD currently manages NFS exports from userland. The interface is, IMHO, scary. As NetBSD aims for clean code and design, it must be fixed.

See my mail to the tech-kern@ mailing list for more details on the issue and a preliminary patch.

September 11, 2005 · Tags: tmpfs
Continue reading (about 1 minute)

tmpfs: Project merged into NetBSD

After listening to many queries from developers asking when tmpfs could be integrated into NetBSD, I finally imported the code into the CVS repository. I'm really happy about this :-) Development will be simplified from now on and it will be a lot easier for interested parties to test the code. Please read the announcement for more information.

I'd like to comment now some of the improvements I've been doing during the past days, which mostly addressed optimization. I started by removing the storage of . and .. directory entries from directories, generating them on the fly when requested. This was done for three reasons. First, to remove redundancy: as . points to the directory itself, the entry can always be generated; similarly, .. can be generated from the pointer to the parent directory stored in the nodes. Secondly, to simplify the code: there were multiple assertions to ensure that these entries were correct and there was code to update them on file-system changes; this code was hard to understand and, as you can see, avoidable. Lastly, to reduce memory consumption: this removed around 1KB of storage from each directory (given to a change I've done recently, this gain is lower, because entries are now a lot smaller, but still this saves 40 bytes from each directory).

September 10, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

NFS file handles

NFS uses a structure called a file handle to uniquely identify exported files. When a client requests to access a file, the server constructs a file handle that identifies it; from this point on, this identifier will be used in all communications between the server and the client to access that specific file. (Look at the fs_vptofh and fs_fhtovp hooks in NetBSD's VFS layer to see how this mapping works.)

In order to identify a file univocally, you need three things: the file-system identifier (i.e., something unique to each mount point), the node number (e.g., the inode number in FFS terminology) and a generation number. It is clear why the first two are needed, but the generation number may be confusing. In order to explain this last concept, let's see an example:

September 10, 2005 · Tags: tmpfs
Continue reading (about 3 minutes)

SoC: The end

So... the deadline for Google's Summer of Code 2005 program arrived some time between yesterday and today (don't know exactly due to timezones). The final results from my side: a functional memory file-system (not efficient yet) for NetBSD named tmpfs, as well as the beginnings of a book/article on file-system development under NetBSD.

As regards the file-system itself, its code can be found in the CVS repository. Despite it has some bugs and misfeatures, it is functional from the user's point of view. The file-system's code is around 4000 lines (plus 500 from the mount utility) and the regression test suite around 2000 (half of which are license texts). I know these numbers are low, but man... this is the hardest code I've ever written (mostly due to lack of documentation and having to reverse engineer existing stuff). I've also written a document describing tmpfs' internals (as said in the initial proposal), which is available in the form of a manual page in the repository and is around 700 lines long.

September 1, 2005 · Tags: tmpfs
Continue reading (about 3 minutes)

SoC: Project announced

Despite I don't like doing premature announcements of my projects, I've been kind of forced to do it for tmpfs. The reason is that SoC's deadline is really close now and people should have a chance to test it. Not to mention that the code won't suffer any serious improvements in the subsequent days, so delaying the announcement is not worth it either.

You can read the announcement in my mail to the tech-kern@ mailing list, which also includes a step-by-step guide to test tmpfs.

August 26, 2005 · Tags: tmpfs
Continue reading (about 1 minute)

SoC: Status report 6

This past week has not been excessively productive because I spent some time dealing with long overdue pkgsrc tasks (mainly updating GNOME to 2.10.2, the latest stable version) and was away from computer more time than usual. Anyway, I have done a bunch of things, although they are not as visible as the work from other weeks (this is, in part, why I felt less productive).

I started by adding support for local sockets, which was easy enough to do but caused panics (specially when switching /tmp to tmpfs and starting an X session).

August 23, 2005 · Tags: tmpfs
Continue reading (about 3 minutes)

SoC: Status report 5

I started this week's work by reading the first chapters of Design and Implementation of the UVM virtual memory system to see if I'd learn how to manage anonymous memory. I had been suggested to use anonymous memory objects (aobjs, for short) to store file contents, so I was shown with the task to learn what they are and how to use them. I have to confess that I was afraid of not knowing how to complete the read/write operations for the file-system, because things were very confusing to me even after reading the document. In fact, I spent two or three days reading documentation and code, as well as doing tests, but not doing any real work.

August 15, 2005 · Tags: tmpfs
Continue reading (about 3 minutes)

SoC: Status report 4

This past week has been quite productive as regards my SoC project, tmpfs, although at the beginning I was a bit stalled (and afraid of not knowing how to solve the problems I had).

I started trying to fix the rmdir operation, which was broken since its addition. Thanks to the dedicated test machine, I was able to discover the point of failure quite easily because it panic'ed long before the iBook did. Solving the issue was not easy, though, as I didn't have some concepts clear (which the nice guys at tech-kern@ quickly clarified after my post). The thing is that I had serious issues with vnode allocation (duplicate vnodes for a single real file) and node removal (which can only happen after a reclaim operation).

August 8, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

Using 'goto's in C

It is common knowledge that usages of the goto statement are potentially dangerous in any structured programming language, as their abuse can quickly make your code unreadable. This is why this construction is seldom explained to people learning how to program and their use is strongly discouraged.

However, there are some situations in which it is very useful and, despite what some people might say, makes your code more readable. I had never used gotos before, but have experienced this recently while writing tmpfs.

August 7, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

Dedicated machine for kernel testing

During the past month, I had to do all tmpfs development on my laptop. This includes coding and testing. If you have ever done any kernel hacking you know what this means: reboot every now and then to test your changes, which can drive you crazy after few reboots (specially if things keep breaking).

So when I got back home, the first thing I did was to set up a machine I had lying around for kernel testing exclusively. The machine is a Pentium 133Mhz with 32MB of RAM and a 3GB hard disk, more than enough for my purposes. (I'd have also used qemu... but since I had the hardware...)

August 5, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

SoC: Status report 3

It has been a long time since the previous status report; I'm sorry for that, but I haven't been able to publish one earlier. The good thing is I'm finally back from my vacations, so I'll able to work on tmpfs more seriously and continuously from now on (and I have to!).

Anyway, to the point of this post. I've just pushed all the changes I had in my work tree to the mainstream CVS server. Most of these changes focused on adding new vnode operations (none of them were implemented when I posted the previous entry), although there have been multiple improvements all around the code too.

August 2, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

SoC: Thoughts about tmpfs data representation

The text below is a message I just sent to NetBSD's tech-kern mailing list. I'm reproducing it here with better formatting and with some e-mail specific sentences removed.

The tmpfs code is up to the point where I have to start implementing the vnode operations. To do this, I have to decide how to organize the file data in memory as well as all other information needed to manage it.

After thinking about this for a while, it seems that the best way to do this is to follow a layout similar to the one used for existing on-disk file systems. That is, I need:

July 19, 2005 · Tags: tmpfs
Continue reading (about 4 minutes)

SoC: Status report 2

After several days since the previous status report, it's time for a new one. During the past week, I've improved several aspects of the existing code, without adding many new stuff.

First of all, I added several other VFS hooks needed to avoid crashes due to null pointers and completed the code up to the point where the file system can be mounted and unmounted (the latter was more difficult than I thought).

July 18, 2005 · Tags: tmpfs
Continue reading (about 2 minutes)

SoC: Status report 1

During this past week, I've been working a bit on my SoC project; only a bit because I had to prepare the slides for the NetBSD presentation I'm giving tomorrow at Partyzip@. Fortunately, from now on I won't have anything else to do, so I'll be able to devote all my time to tmpfs :-)

So... here is a little status report: I started the week reading the I/O chapter from the Design and Implementation of the 4.4BSD Operating System. It was a really interesting read, as I could understand some common concepts in detail.

July 9, 2005 · Tags: tmpfs
Continue reading (about 3 minutes)

SoC: Project page ready

I've just set up the NetBSD-SoC: Efficient memory file-system project page. At the moment, it includes a list of the project goals, a copy of the original proposal text (in case you would like to read it) and a list of existing documentation.

This page will be extended to hold technical information as well as installation instructions (that is, how to merge the code in that page with NetBSD's source tree) when the project matures during the summer.

June 27, 2005 · Tags: tmpfs
Continue reading (about 1 minute)

SoC: The NetBSD-SoC project

The NetBSD Project has set up a project at the Sourceforge site that aims to centralize the development of the eight projects chosen for the Summer of Code program. Its name is NetBSD-SoC and its page contains information about all the elected projects, information about mailing lists of interest and a CVS repository for the students and their mentors. Read the official announcement for more information.

As regards my project, I'll start filling up its page in the site when I've got access to it (probably tomorrow). BTW, all the other seven projects are really nice and I hope all of them to be "finished" (or mostly working) by the end of the summer.

June 26, 2005 · Tags: tmpfs
Continue reading (about 1 minute)

SoC: Accepted!

After a very long delay, Google has finally chosen the projects that will be part of the Summer of Code program. There seems to be no official announcement in the page yet, but I already received a mail... and... my project is accepted! :-)

I briefly outlined my project some days ago, but I'll explain it in more detail now (copying some paragraphs from the application form verbatim).

At the moment, NetBSD includes a memory-based file-system called mfs. mfs is is just an implementation of the regular ffs - designed for persistent storage - on top of the (volatile) virtual memory system. This means that it uses the same data structures as the on-disk implementation, rendering less than optimal performance and memory usage. As regards the latter, and in words of another NetBSD developer, the physical memory and swap space needed to back these pages constantly grows.

June 25, 2005 · Tags: tmpfs
Continue reading (about 4 minutes)

Posts: Tmpfs