CS414 Homework 5. Solutions
1. Suppose that a file with contents F that has two “names” in the file system, A and B. You delete the file by deleting name B, then create a new file with contents F’ and give it this same name, B.
You’ll see F’. In this case A just
has the name “B” in it hence you’ll see whatever the contents of the file named
B are – F’.
You’ll see F. In this case once “B”
was removed, the symbolic link to “A” was also removed. B thus ends up containing F’ and A still contains F.
Here, there was actually just
one “real” file at first, with two names.
But when you removed B one link to that file was gone and when B was
recreated a new file was created with one link, from the name B, to it. Thus we ended up with two files – A is the
file name for contents F, and B the file name for the new contents, F’.
2. Suppose that you carefully measure file copy times on your PC and discover that for files up to 8 blocks in length, copying occurs at 550KB/sec. As the file grows to include even a single byte of the 9’th block copying suddenly slows down to 500KB/second. You check a bunch of larger files and this 500KB/second number is typical for most of them, but there are a few enormous files on your computer for which copying is even slower: 400KB/second. Explain why you are getting these different numbers.
The point here is that as files get larger, the way the file system
represents the blocks within the file changes.
A very small file can be represented just using the inode itself, which
usually can list a few blocks right in the inode structure. But as files get larger we switch: first, we
add one level of pointer indirection (e.g. a block listed in the inode contains
a list of block numbers for the file) and then in huge files, a second level of
indirection. Reading the indirect
pointers takes time and probably could account for a very small loss of data
transfer speed.
3. Suppose that a new generation of disks will use special laser optical access technology with no moving parts. This technology has identical access time for all blocks on the disk no matter where the block is located. Would the file system need to worry about placement of blocks close to inodes or file fragmentation when using such a disk? (In fact a form of this type of disk already exists – memory sticks for digital cameras and PDA’s are treated like a disk but because the actual technology is just a flash memory, access to the data has the same cost no matter what you access and no matter what the access pattern).
This would make a very big difference.
Current file systems go to great lengths to cluster data close to the
inode and thus minimize the overheads of disk arm (head) motion. The new disks described here would eliminate
that issue and simplify life enormously.
There would be no issue of fragmentation when using such a disk. (Files would
still get fragmented but there would be no cost associated with this). The only possible issue would arise if
somehow contiguous blocks can be read as a single “large” I/O transfer and
fragmented files require multiple smaller ones – in that case perhaps there
would still be a reason to want files to be organized as sequentially as
possible.
4. An experiment. On your PC or a department computer, using C or C++, write a program that outputs characters into a file (using the file “create” and “write” system calls). Write an n MB file in chunks of various sizes, starting with a single byte and then doubling the “chunk size” repeatedly until it reaches 32KB (2^10). Run your program and time it. Convert these times into an I/O “rate” (bytes/second) and graph the results for a few values of n: 1, 5, 10 and 20.
We didn’t get a chance to make these graphs to show you the expected
answer.
Your experiment will result in a graph with several curves, one for each value of n, and with 10 data points on each. How do you explain the shapes of these curves? Why aren’t the curves flat lines?
You’ll see that the I/O rate of the file system is very low when
writing small amounts of data, e.g. a byte at a time. Performance becomes pretty good if you write
a block or so at a time. With very larg
transfers it can slow down again because the O/S needs to do more and more work
just to build the file for you. If the file already
exists and you are simply rewriting into it, this effect is smaller; it would
be the same one noted in problem 2 but without the added cost of needing to
allocate blocks for the file.