CS414 Homework 5.  Solutions

 

1.      Suppose that a file with contents F that has two “names” in the file system, A and B.  You delete the file by deleting name B, then create a new file with contents F’ and give it this same name, B.

    1. If A was a “symbolic link” and B was a “true link” (e.g. A was a symbolic link to B) and you print A, would you see the original file contents, F, or the modified contents, F’?

You’ll see F’.  In this case A just has the name “B” in it hence you’ll see whatever the contents of the file named B are – F’.

    1. If A was a “true link” and B was a “symbolic link” ” (e.g. B was a symbolic link to A) and you print A, would you see the original file contents, F, or the modified contents, F’?

You’ll see F.  In this case once “B” was removed, the symbolic link to “A” was also removed.  B thus ends up containing F’ and A still contains F.

    1. If A was a “true link” and B was a “true link” and you print A, would you see the original file contents, F, or the modified contents, F’?

Here, there was actually just one “real” file at first, with two names.  But when you removed B one link to that file was gone and when B was recreated a new file was created with one link, from the name B, to it.  Thus we ended up with two files – A is the file name for contents F, and B the file name for the new contents, F’.


 

2.      Suppose that you carefully measure file copy times on your PC and discover that for files up to 8 blocks in length, copying occurs at 550KB/sec.  As the file grows to include even a single byte of the 9’th block copying suddenly slows down to 500KB/second.  You check a bunch of larger files and this 500KB/second number is typical for most of them, but there are a few enormous files on your computer for which copying is even slower: 400KB/second.  Explain why you are getting these different numbers.

 

The point here is that as files get larger, the way the file system represents the blocks within the file changes.  A very small file can be represented just using the inode itself, which usually can list a few blocks right in the inode structure.  But as files get larger we switch: first, we add one level of pointer indirection (e.g. a block listed in the inode contains a list of block numbers for the file) and then in huge files, a second level of indirection.  Reading the indirect pointers takes time and probably could account for a very small loss of data transfer speed.

 

3.      Suppose that a new generation of disks will use special laser optical access technology with no moving parts.  This technology has identical access time for all blocks on the disk no matter where the block is located.  Would the file system need to worry about placement of blocks close to inodes or file fragmentation when using such a disk?  (In fact a form of this type of disk already exists – memory sticks for digital cameras and PDA’s are treated like a disk but because the actual technology is just a flash memory, access to the data has the same cost no matter what you access and no matter what the access pattern).

 

This would make a very big difference.  Current file systems go to great lengths to cluster data close to the inode and thus minimize the overheads of disk arm (head) motion.  The new disks described here would eliminate that issue and simplify life enormously.  There would be no issue of fragmentation when using such a disk. (Files would still get fragmented but there would be no cost associated with this).  The only possible issue would arise if somehow contiguous blocks can be read as a single “large” I/O transfer and fragmented files require multiple smaller ones – in that case perhaps there would still be a reason to want files to be organized as sequentially as possible.

 

4.      An experiment.  On your PC or a department computer, using C or C++, write a program that outputs characters into a file (using the file “create” and “write” system calls).  Write an n MB file in chunks of various sizes, starting with a single byte and then doubling the “chunk size” repeatedly until it reaches 32KB (2^10).  Run your program and time it.  Convert these times into an I/O “rate” (bytes/second) and graph the results for a few values of n:  1, 5, 10 and 20. 

 

We didn’t get a chance to make these graphs to show you the expected answer.

 

Your experiment will result in a graph with several curves, one for each value of n, and with 10 data points on each.  How do you explain the shapes of these curves?  Why aren’t the curves flat lines?

 

You’ll see that the I/O rate of the file system is very low when writing small amounts of data, e.g. a byte at a time.  Performance becomes pretty good if you write a block or so at a time.  With very larg transfers it can slow down again because the O/S needs to do more and more work just to build the file for you.  If  the file already exists and you are simply rewriting into it, this effect is smaller; it would be the same one noted in problem 2 but without the added cost of needing to allocate blocks for the file.