<article>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#article09_11_02_2117206</id>
	<title>ZFS Gets Built-In Deduplication</title>
	<author>ScuttleMonkey</author>
	<datestamp>1257160860000</datestamp>
	<htmltext>elREG writes to mention that <a href="http://www.theregister.co.uk/2009/11/02/zfs\_gets\_dedupe/">Sun's ZFS now has built-in deduplication</a> utilizing a master hash function to map duplicate blocks of data to a single block instead of storing multiples.  <i>"File-level deduplication has the lowest processing overhead but is the least efficient method. Block-level dedupe requires more processing power, and is said to be good for virtual machine images. Byte-range dedupe uses the most processing power and is ideal for small pieces of data that may be replicated and are not block-aligned, such as e-mail attachments. Sun reckons such deduplication is best done at the application level since an app would know about the data.  ZFS provides block-level deduplication, using SHA256 hashing, and it maps naturally to ZFS's 256-bit block checksums. The deduplication is done inline, with ZFS assuming it's running with a multi-threaded operating system and on a server with lots of processing power. A multi-core server, in other words."</i></htmltext>
<tokenext>elREG writes to mention that Sun 's ZFS now has built-in deduplication utilizing a master hash function to map duplicate blocks of data to a single block instead of storing multiples .
" File-level deduplication has the lowest processing overhead but is the least efficient method .
Block-level dedupe requires more processing power , and is said to be good for virtual machine images .
Byte-range dedupe uses the most processing power and is ideal for small pieces of data that may be replicated and are not block-aligned , such as e-mail attachments .
Sun reckons such deduplication is best done at the application level since an app would know about the data .
ZFS provides block-level deduplication , using SHA256 hashing , and it maps naturally to ZFS 's 256-bit block checksums .
The deduplication is done inline , with ZFS assuming it 's running with a multi-threaded operating system and on a server with lots of processing power .
A multi-core server , in other words .
"</tokentext>
<sentencetext>elREG writes to mention that Sun's ZFS now has built-in deduplication utilizing a master hash function to map duplicate blocks of data to a single block instead of storing multiples.
"File-level deduplication has the lowest processing overhead but is the least efficient method.
Block-level dedupe requires more processing power, and is said to be good for virtual machine images.
Byte-range dedupe uses the most processing power and is ideal for small pieces of data that may be replicated and are not block-aligned, such as e-mail attachments.
Sun reckons such deduplication is best done at the application level since an app would know about the data.
ZFS provides block-level deduplication, using SHA256 hashing, and it maps naturally to ZFS's 256-bit block checksums.
The deduplication is done inline, with ZFS assuming it's running with a multi-threaded operating system and on a server with lots of processing power.
A multi-core server, in other words.
"</sentencetext>
</article>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318</id>
	<title>BTRFS is better</title>
	<author>Anonymous</author>
	<datestamp>1257178680000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>At first, BTRFS started out as an also-ran, trying to duplicate a bunch of ZFS features for Linux (where licensing wasn't compatible to incorporate ZFS into Linux).  But then BTRFS took a number of things that were overly rigid about ZFS (shrinking volumes, block sizes, and some other stuff), and made it better, including totally unifying how data and metadata are stored.  I'm sure there are a number of ways in which ZFS is still better (RAIDZ), but putting aside some of the enterprise features that most of us don't need, BTRFS is turning out to be more flexible, more expandable, more efficient, and better supported.</p></htmltext>
<tokenext>At first , BTRFS started out as an also-ran , trying to duplicate a bunch of ZFS features for Linux ( where licensing was n't compatible to incorporate ZFS into Linux ) .
But then BTRFS took a number of things that were overly rigid about ZFS ( shrinking volumes , block sizes , and some other stuff ) , and made it better , including totally unifying how data and metadata are stored .
I 'm sure there are a number of ways in which ZFS is still better ( RAIDZ ) , but putting aside some of the enterprise features that most of us do n't need , BTRFS is turning out to be more flexible , more expandable , more efficient , and better supported .</tokentext>
<sentencetext>At first, BTRFS started out as an also-ran, trying to duplicate a bunch of ZFS features for Linux (where licensing wasn't compatible to incorporate ZFS into Linux).
But then BTRFS took a number of things that were overly rigid about ZFS (shrinking volumes, block sizes, and some other stuff), and made it better, including totally unifying how data and metadata are stored.
I'm sure there are a number of ways in which ZFS is still better (RAIDZ), but putting aside some of the enterprise features that most of us don't need, BTRFS is turning out to be more flexible, more expandable, more efficient, and better supported.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29965190</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Anonymous</author>
	<datestamp>1257271440000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>And yet, until windows Vista, if you deleted a &quot;reparse point&quot; in Windows Explorer, it also <i>deleted the original directory it linked to</i>.</p><p>I prefer the &quot;weaker&quot; symlinks, Thanks.</p></htmltext>
<tokenext>And yet , until windows Vista , if you deleted a " reparse point " in Windows Explorer , it also deleted the original directory it linked to.I prefer the " weaker " symlinks , Thanks .</tokentext>
<sentencetext>And yet, until windows Vista, if you deleted a "reparse point" in Windows Explorer, it also deleted the original directory it linked to.I prefer the "weaker" symlinks, Thanks.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957986</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961728</id>
	<title>Re:This is good news...</title>
	<author>BrentH</author>
	<datestamp>1257249660000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>I want my Zee Ef Ess.....

<br> <br> <i>guitar solo</i></htmltext>
<tokenext>I want my Zee Ef Ess.... . guitar solo</tokentext>
<sentencetext>I want my Zee Ef Ess.....

  guitar solo</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959156</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Captain Segfault</author>
	<datestamp>1257177360000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>This is block based. Changing one block of each file will only result in one new block written, not a full copy of the file -- unless the file is only one block.</p></htmltext>
<tokenext>This is block based .
Changing one block of each file will only result in one new block written , not a full copy of the file -- unless the file is only one block .</tokentext>
<sentencetext>This is block based.
Changing one block of each file will only result in one new block written, not a full copy of the file -- unless the file is only one block.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957658</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962128</id>
	<title>Re:BTRFS is better</title>
	<author>samjam</author>
	<datestamp>1257254280000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>And so I thank apple for being mean and stinky about ZFS, or we wouldn't get BTRFS</p></htmltext>
<tokenext>And so I thank apple for being mean and stinky about ZFS , or we would n't get BTRFS</tokentext>
<sentencetext>And so I thank apple for being mean and stinky about ZFS, or we wouldn't get BTRFS</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384</id>
	<title>There are three types of files.</title>
	<author>Animats</author>
	<datestamp>1257173040000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>5</modscore>
	<htmltext><p>
I'd argue that file systems should know about and support three types of files:
</p><ul>
<li> <b>Unit files.</b> Unit files are written once, and change only by being replaced.  Most common files are unit files.  Program executables, HTML files, etc. are unit files.  The file system should guarantee that if you open a unit file, you will always read a consistent version; it will never change underneath a read.  Unit files are replaced by opening for write, writing a new version, and closing; upon close, the new version replaces the old. In the event of a system crash during writing, the old version of the file remains.  If the writing program crashes before an explicit close, the old file remains. Unit files are good candidates for unduplication via hashing.  While the file is open for writing, attempts to open for reading open the old version.  This should be the default mode.  (This would be a big convenience; you always read a good version.  Good programs try to fake this by writing a new file, then renaming it to replace the old file, but most operating systems and file systems don't support atomic multiple rename, so there's a window of vulnerability. The file system should give you that for free.)</li>
<li> <b>Log files</b> Log files can only be appended to.  UNIX supports this, with an open mode of O\_APPEND.  But it doesn't enforce it (you can still seek) and NFS doesn't implement it properly.  Nor does Windows.  Opens of a log file for reading should be guaranteed that they will always read exactly out to the last write.  In the event of a system crash during writing, log files may be truncated, but must be truncated at an exact write boundary; trailing off into junk is unacceptable.   Unduplication via hashing probably isn't worth the trouble.</li>
<li> <b>Managed files</b> Managed files are random-access files managed by a database or archive program.  Random access is supported.  The use of open modes O\_SYNC, O\_EXCL, or O\_DIRECT during file creation indicates a managed file.  Seeks while open for write are permitted, multiple opens access the same file, and O\_SYNC and O\_EXCL must work as documented. Unduplication via hashing probably isn't worth the trouble and is bad for database integrity.</li>
</ul><p>
That's a useful way to look at files.  Almost all files are "unit" files; they're written once and are never changed; they're only replaced.  A relatively small number of programs and libraries use "managed" files, and they're mostly databases of one kind or another. Those are the programs that have to manage files very carefully, and those programs are usually written to be aware of concurrency and caching issues.
</p><p>
Unix and Linux have the right modes defined.  File systems just need to use them properly.</p></htmltext>
<tokenext>I 'd argue that file systems should know about and support three types of files : Unit files .
Unit files are written once , and change only by being replaced .
Most common files are unit files .
Program executables , HTML files , etc .
are unit files .
The file system should guarantee that if you open a unit file , you will always read a consistent version ; it will never change underneath a read .
Unit files are replaced by opening for write , writing a new version , and closing ; upon close , the new version replaces the old .
In the event of a system crash during writing , the old version of the file remains .
If the writing program crashes before an explicit close , the old file remains .
Unit files are good candidates for unduplication via hashing .
While the file is open for writing , attempts to open for reading open the old version .
This should be the default mode .
( This would be a big convenience ; you always read a good version .
Good programs try to fake this by writing a new file , then renaming it to replace the old file , but most operating systems and file systems do n't support atomic multiple rename , so there 's a window of vulnerability .
The file system should give you that for free .
) Log files Log files can only be appended to .
UNIX supports this , with an open mode of O \ _APPEND .
But it does n't enforce it ( you can still seek ) and NFS does n't implement it properly .
Nor does Windows .
Opens of a log file for reading should be guaranteed that they will always read exactly out to the last write .
In the event of a system crash during writing , log files may be truncated , but must be truncated at an exact write boundary ; trailing off into junk is unacceptable .
Unduplication via hashing probably is n't worth the trouble .
Managed files Managed files are random-access files managed by a database or archive program .
Random access is supported .
The use of open modes O \ _SYNC , O \ _EXCL , or O \ _DIRECT during file creation indicates a managed file .
Seeks while open for write are permitted , multiple opens access the same file , and O \ _SYNC and O \ _EXCL must work as documented .
Unduplication via hashing probably is n't worth the trouble and is bad for database integrity .
That 's a useful way to look at files .
Almost all files are " unit " files ; they 're written once and are never changed ; they 're only replaced .
A relatively small number of programs and libraries use " managed " files , and they 're mostly databases of one kind or another .
Those are the programs that have to manage files very carefully , and those programs are usually written to be aware of concurrency and caching issues .
Unix and Linux have the right modes defined .
File systems just need to use them properly .</tokentext>
<sentencetext>
I'd argue that file systems should know about and support three types of files:

 Unit files.
Unit files are written once, and change only by being replaced.
Most common files are unit files.
Program executables, HTML files, etc.
are unit files.
The file system should guarantee that if you open a unit file, you will always read a consistent version; it will never change underneath a read.
Unit files are replaced by opening for write, writing a new version, and closing; upon close, the new version replaces the old.
In the event of a system crash during writing, the old version of the file remains.
If the writing program crashes before an explicit close, the old file remains.
Unit files are good candidates for unduplication via hashing.
While the file is open for writing, attempts to open for reading open the old version.
This should be the default mode.
(This would be a big convenience; you always read a good version.
Good programs try to fake this by writing a new file, then renaming it to replace the old file, but most operating systems and file systems don't support atomic multiple rename, so there's a window of vulnerability.
The file system should give you that for free.
)
 Log files Log files can only be appended to.
UNIX supports this, with an open mode of O\_APPEND.
But it doesn't enforce it (you can still seek) and NFS doesn't implement it properly.
Nor does Windows.
Opens of a log file for reading should be guaranteed that they will always read exactly out to the last write.
In the event of a system crash during writing, log files may be truncated, but must be truncated at an exact write boundary; trailing off into junk is unacceptable.
Unduplication via hashing probably isn't worth the trouble.
Managed files Managed files are random-access files managed by a database or archive program.
Random access is supported.
The use of open modes O\_SYNC, O\_EXCL, or O\_DIRECT during file creation indicates a managed file.
Seeks while open for write are permitted, multiple opens access the same file, and O\_SYNC and O\_EXCL must work as documented.
Unduplication via hashing probably isn't worth the trouble and is bad for database integrity.
That's a useful way to look at files.
Almost all files are "unit" files; they're written once and are never changed; they're only replaced.
A relatively small number of programs and libraries use "managed" files, and they're mostly databases of one kind or another.
Those are the programs that have to manage files very carefully, and those programs are usually written to be aware of concurrency and caching issues.
Unix and Linux have the right modes defined.
File systems just need to use them properly.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957074</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>Anonymous</author>
	<datestamp>1257167160000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>True, though commodity grade hard drives are so inexpensive these days that the cost of providing a generously larger amount of them than what you plan to store is usually not a big deal.<br>The only really expensive drives today are high end enterprise type SAS / SAN / SCSI units or FLASH based ones.  If you're storing media like digitized video, the benefits of dedup are usually insignificant since you're unlikely to accidentally / routinely have duplicated data at anything less than the file level, and at the file level you'd probably have easy options not to duplicate that content by design if so desired.</p><p>The REAL "wake me up when they integrate it into the drive itself" list for me is:<br>* drive integrated mirroring at a head/platter level with the functions of different platters being independent enough so you could still stand a good chance of reading functional ones even after one head/platter is damaged.</p><p>* drive integrated gigabit / 10GbE ethernet interfaces for commodity drives and an iSCSI protocol over ipV6.</p><p>* drive integrated ECC and spatial data striping at user selectable and much higher than default levels so that even a single drive could give you much better data reliability / redundancy across platters.</p><p>* drives with integrated encryption being the norm</p><p>* drives with built in ZFS / NAS and the ability to link to each other over e.g. PCIE / infiniband so you could set up small clusters of RAIDZ'd drives just with a few cables and inexpensive drives.</p></htmltext>
<tokenext>True , though commodity grade hard drives are so inexpensive these days that the cost of providing a generously larger amount of them than what you plan to store is usually not a big deal.The only really expensive drives today are high end enterprise type SAS / SAN / SCSI units or FLASH based ones .
If you 're storing media like digitized video , the benefits of dedup are usually insignificant since you 're unlikely to accidentally / routinely have duplicated data at anything less than the file level , and at the file level you 'd probably have easy options not to duplicate that content by design if so desired.The REAL " wake me up when they integrate it into the drive itself " list for me is : * drive integrated mirroring at a head/platter level with the functions of different platters being independent enough so you could still stand a good chance of reading functional ones even after one head/platter is damaged .
* drive integrated gigabit / 10GbE ethernet interfaces for commodity drives and an iSCSI protocol over ipV6 .
* drive integrated ECC and spatial data striping at user selectable and much higher than default levels so that even a single drive could give you much better data reliability / redundancy across platters .
* drives with integrated encryption being the norm * drives with built in ZFS / NAS and the ability to link to each other over e.g .
PCIE / infiniband so you could set up small clusters of RAIDZ 'd drives just with a few cables and inexpensive drives .</tokentext>
<sentencetext>True, though commodity grade hard drives are so inexpensive these days that the cost of providing a generously larger amount of them than what you plan to store is usually not a big deal.The only really expensive drives today are high end enterprise type SAS / SAN / SCSI units or FLASH based ones.
If you're storing media like digitized video, the benefits of dedup are usually insignificant since you're unlikely to accidentally / routinely have duplicated data at anything less than the file level, and at the file level you'd probably have easy options not to duplicate that content by design if so desired.The REAL "wake me up when they integrate it into the drive itself" list for me is:* drive integrated mirroring at a head/platter level with the functions of different platters being independent enough so you could still stand a good chance of reading functional ones even after one head/platter is damaged.
* drive integrated gigabit / 10GbE ethernet interfaces for commodity drives and an iSCSI protocol over ipV6.
* drive integrated ECC and spatial data striping at user selectable and much higher than default levels so that even a single drive could give you much better data reliability / redundancy across platters.
* drives with integrated encryption being the norm* drives with built in ZFS / NAS and the ability to link to each other over e.g.
PCIE / infiniband so you could set up small clusters of RAIDZ'd drives just with a few cables and inexpensive drives.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958462</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Drishmung</author>
	<datestamp>1257173460000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>No, not inefficient at all, because of the nature of ZFS.<p>
When you write in ZFS, it does not do a write-in-place, overwriting what was there before. What it does is write a new block somewhere else, then mark the old block as free for garbage collection.</p><p>
With de-dup, each block also has a reference count. When you write a block it notes that the ref count is greater than one, and does not mark the old block as food for the GC until the ref count decrements to zero.</p><p>Note that this is at the block level, not the file level. What means that de-dup is very efficient and has no particular performance penalty for writes. The performance hit only comes in identifying duplicate blocks.</p><p>In the above instance, ZFS needs to check to see if the block it just wrote already exists. It does this by calculating a block checksum (which it does anyway, so no extra overhead there), and then looking that up in a table to see if it already exists. If it does, then it changes the reference count of the exiting block.</p><p>Note there are two ways to proceed here: One delays the write until the pre-existence of an identical block has been determined. The other writes the block out then checks asynchronously for a duplicate and if found fixes the reference count (and recycles the  just written block). I'm not sure which route ZFS takes.</p></htmltext>
<tokenext>No , not inefficient at all , because of the nature of ZFS .
When you write in ZFS , it does not do a write-in-place , overwriting what was there before .
What it does is write a new block somewhere else , then mark the old block as free for garbage collection .
With de-dup , each block also has a reference count .
When you write a block it notes that the ref count is greater than one , and does not mark the old block as food for the GC until the ref count decrements to zero.Note that this is at the block level , not the file level .
What means that de-dup is very efficient and has no particular performance penalty for writes .
The performance hit only comes in identifying duplicate blocks.In the above instance , ZFS needs to check to see if the block it just wrote already exists .
It does this by calculating a block checksum ( which it does anyway , so no extra overhead there ) , and then looking that up in a table to see if it already exists .
If it does , then it changes the reference count of the exiting block.Note there are two ways to proceed here : One delays the write until the pre-existence of an identical block has been determined .
The other writes the block out then checks asynchronously for a duplicate and if found fixes the reference count ( and recycles the just written block ) .
I 'm not sure which route ZFS takes .</tokentext>
<sentencetext>No, not inefficient at all, because of the nature of ZFS.
When you write in ZFS, it does not do a write-in-place, overwriting what was there before.
What it does is write a new block somewhere else, then mark the old block as free for garbage collection.
With de-dup, each block also has a reference count.
When you write a block it notes that the ref count is greater than one, and does not mark the old block as food for the GC until the ref count decrements to zero.Note that this is at the block level, not the file level.
What means that de-dup is very efficient and has no particular performance penalty for writes.
The performance hit only comes in identifying duplicate blocks.In the above instance, ZFS needs to check to see if the block it just wrote already exists.
It does this by calculating a block checksum (which it does anyway, so no extra overhead there), and then looking that up in a table to see if it already exists.
If it does, then it changes the reference count of the exiting block.Note there are two ways to proceed here: One delays the write until the pre-existence of an identical block has been determined.
The other writes the block out then checks asynchronously for a duplicate and if found fixes the reference count (and recycles the  just written block).
I'm not sure which route ZFS takes.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957936</id>
	<title>NTFS has bit level dedup</title>
	<author>Cur8or</author>
	<datestamp>1257170820000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext>Just store one 0 and one 1.
Then just store references to each from in the bits.</htmltext>
<tokenext>Just store one 0 and one 1 .
Then just store references to each from in the bits .</tokentext>
<sentencetext>Just store one 0 and one 1.
Then just store references to each from in the bits.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962376</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Anonymous</author>
	<datestamp>1257257040000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>How about you go back playing with your toy OS of choice</p><p>How about you RTFM</p><p>How about you understand how ZFS works, what's its model of storage, what are devices and vdevs, and that they are not interchangeable.</p><p>How about you don't expect ZFS to behave the way you want? For the fantastic tool that it is, oh, and free, you complain loudly if things aren't done The Way Ignorants Think Ought To Be Done(TM).</p><p>How about you issue the right commands to offline a device that you plan to substitute instead of telling ZFS to forget about it for the time being and then complain that ZFS did what you told it to?</p><p>Downsizing a pool is the only half-sane requirement for ZFS that you manage to write (and that's only because you saw it written somewhere else). I might half-agree to that but I fail to see how is that a priority in our ever-expanding, data guzzling environments. Again, ZFS is meant as an enterprise tool. The only time you want to delete data in a company is because you are closing for good.</p><p>How about you have a nice day, and RTFM.</p></htmltext>
<tokenext>How about you go back playing with your toy OS of choiceHow about you RTFMHow about you understand how ZFS works , what 's its model of storage , what are devices and vdevs , and that they are not interchangeable.How about you do n't expect ZFS to behave the way you want ?
For the fantastic tool that it is , oh , and free , you complain loudly if things are n't done The Way Ignorants Think Ought To Be Done ( TM ) .How about you issue the right commands to offline a device that you plan to substitute instead of telling ZFS to forget about it for the time being and then complain that ZFS did what you told it to ? Downsizing a pool is the only half-sane requirement for ZFS that you manage to write ( and that 's only because you saw it written somewhere else ) .
I might half-agree to that but I fail to see how is that a priority in our ever-expanding , data guzzling environments .
Again , ZFS is meant as an enterprise tool .
The only time you want to delete data in a company is because you are closing for good.How about you have a nice day , and RTFM .</tokentext>
<sentencetext>How about you go back playing with your toy OS of choiceHow about you RTFMHow about you understand how ZFS works, what's its model of storage, what are devices and vdevs, and that they are not interchangeable.How about you don't expect ZFS to behave the way you want?
For the fantastic tool that it is, oh, and free, you complain loudly if things aren't done The Way Ignorants Think Ought To Be Done(TM).How about you issue the right commands to offline a device that you plan to substitute instead of telling ZFS to forget about it for the time being and then complain that ZFS did what you told it to?Downsizing a pool is the only half-sane requirement for ZFS that you manage to write (and that's only because you saw it written somewhere else).
I might half-agree to that but I fail to see how is that a priority in our ever-expanding, data guzzling environments.
Again, ZFS is meant as an enterprise tool.
The only time you want to delete data in a company is because you are closing for good.How about you have a nice day, and RTFM.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959884</id>
	<title>Re:Open Source Cures Cancer</title>
	<author>frankm\_slashdot</author>
	<datestamp>1257183840000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I breath a sigh of relief every time someone combats the "use open source" with "blah blah blah, AutoCAD, blah blah blah". It's like you're reading my mind.</p><p>AutoCAD has barely acceptable performance when running on *great* hardware, let alone a virtualized instance or whatever concoction the avoid-windows-at-all-costs crowd would come up with. Not to say the grandparent-commenter falls into this category but very often people do.</p><p>When people step into my computer room they usually note the PC, macbook, G5 and solaris box and give me the 50 questions. My usual response is "AutoCAD, travel, Final Cut/Logic/everyday use &amp; ZFS storage pool".</p></htmltext>
<tokenext>I breath a sigh of relief every time someone combats the " use open source " with " blah blah blah , AutoCAD , blah blah blah " .
It 's like you 're reading my mind.AutoCAD has barely acceptable performance when running on * great * hardware , let alone a virtualized instance or whatever concoction the avoid-windows-at-all-costs crowd would come up with .
Not to say the grandparent-commenter falls into this category but very often people do.When people step into my computer room they usually note the PC , macbook , G5 and solaris box and give me the 50 questions .
My usual response is " AutoCAD , travel , Final Cut/Logic/everyday use &amp; ZFS storage pool " .</tokentext>
<sentencetext>I breath a sigh of relief every time someone combats the "use open source" with "blah blah blah, AutoCAD, blah blah blah".
It's like you're reading my mind.AutoCAD has barely acceptable performance when running on *great* hardware, let alone a virtualized instance or whatever concoction the avoid-windows-at-all-costs crowd would come up with.
Not to say the grandparent-commenter falls into this category but very often people do.When people step into my computer room they usually note the PC, macbook, G5 and solaris box and give me the 50 questions.
My usual response is "AutoCAD, travel, Final Cut/Logic/everyday use &amp; ZFS storage pool".</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957442</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961640</id>
	<title>I wrote about this</title>
	<author>jesset77</author>
	<datestamp>1257248520000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I recently wrote an article about my thoughts on filesystems and operating systems by way of a fictional reference OS mentioning ZFS in a positive light for reasons including the dedupe feature mentioned in today's article:</p><p> <a href="http://bbt8.blogspot.com/2009/10/ironcloud-outline-of-what-modern-os.html" title="blogspot.com" rel="nofollow">IRON/Cloud &mdash; the outline of what a modern OS should be</a> [blogspot.com] </p><p>I link back to the (yes, slashdot) article wherein I first learned about ZFS, and a rundown of the features I like about ZFS.</p><p>But no, I checked and our article texts do not hash to the same value, so I do not believe we would be stored at the same location on disk.<nobr> <wbr></nobr>;D</p></htmltext>
<tokenext>I recently wrote an article about my thoughts on filesystems and operating systems by way of a fictional reference OS mentioning ZFS in a positive light for reasons including the dedupe feature mentioned in today 's article : IRON/Cloud    the outline of what a modern OS should be [ blogspot.com ] I link back to the ( yes , slashdot ) article wherein I first learned about ZFS , and a rundown of the features I like about ZFS.But no , I checked and our article texts do not hash to the same value , so I do not believe we would be stored at the same location on disk .
; D</tokentext>
<sentencetext>I recently wrote an article about my thoughts on filesystems and operating systems by way of a fictional reference OS mentioning ZFS in a positive light for reasons including the dedupe feature mentioned in today's article: IRON/Cloud — the outline of what a modern OS should be [blogspot.com] I link back to the (yes, slashdot) article wherein I first learned about ZFS, and a rundown of the features I like about ZFS.But no, I checked and our article texts do not hash to the same value, so I do not believe we would be stored at the same location on disk.
;D</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956668</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957268</id>
	<title>What's the point?</title>
	<author>Mask</author>
	<datestamp>1257168000000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>The amount of resources it reportedly takes makes this not so practical.</p><p>What do one would want to have deduplication for? The cost of disk storage has two big elements - speed (latency&amp;throughput) and backup.</p><p>It does not seem that this technology would help much in the speed department, it might actually hurt. Managing copy on write has several potential costs. It may help backup if the backup program knows the fine details of deduplication, but that means that old backup software will have to be replaced.</p><p>It reminds me the compressed file system I used to have on my old SLS Linux PC which had a small disk (1992 if memory serves me right). It was dog slow to run X11 on it. I have not seen a compressed file system since, there was no need. Disk storage grows much faster than my need for data.</p></htmltext>
<tokenext>The amount of resources it reportedly takes makes this not so practical.What do one would want to have deduplication for ?
The cost of disk storage has two big elements - speed ( latency&amp;throughput ) and backup.It does not seem that this technology would help much in the speed department , it might actually hurt .
Managing copy on write has several potential costs .
It may help backup if the backup program knows the fine details of deduplication , but that means that old backup software will have to be replaced.It reminds me the compressed file system I used to have on my old SLS Linux PC which had a small disk ( 1992 if memory serves me right ) .
It was dog slow to run X11 on it .
I have not seen a compressed file system since , there was no need .
Disk storage grows much faster than my need for data .</tokentext>
<sentencetext>The amount of resources it reportedly takes makes this not so practical.What do one would want to have deduplication for?
The cost of disk storage has two big elements - speed (latency&amp;throughput) and backup.It does not seem that this technology would help much in the speed department, it might actually hurt.
Managing copy on write has several potential costs.
It may help backup if the backup program knows the fine details of deduplication, but that means that old backup software will have to be replaced.It reminds me the compressed file system I used to have on my old SLS Linux PC which had a small disk (1992 if memory serves me right).
It was dog slow to run X11 on it.
I have not seen a compressed file system since, there was no need.
Disk storage grows much faster than my need for data.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29966948</id>
	<title>Re:What if...</title>
	<author>Sulphur</author>
	<datestamp>1257280080000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>In Soviet Russia Zek File System never stores dupes.</p><p>--</p><p>This is an upper bound, not an actual value.</p></htmltext>
<tokenext>In Soviet Russia Zek File System never stores dupes.--This is an upper bound , not an actual value .</tokentext>
<sentencetext>In Soviet Russia Zek File System never stores dupes.--This is an upper bound, not an actual value.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961166</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960186</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Samah</author>
	<datestamp>1257186420000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I've been a ZFS fanboy for quite a while now, and last weekend I finally made the shift from Ubuntu to OpenSolaris on my server.  At the moment I'm just using a basic mirrored pool and no raid-z (it scares me).</p><p>The only thing that really bugs me at the moment is its poor support for ext2/3.  I was getting ridiculously slow transfer speeds from my old drives (sub 100k/s) and/or hard locks where I've had to kill off the copy process and hope that I can unmount it.</p><p>I've been booting into Ubuntu, copying files over the network to my desktop PC, booting OpenSolaris, and copying back.  I'm sure there's an easier way to copy 4TB, but I'm not in a hurry.</p><p>What have been your experiences in migration from Linux to OpenSolaris?</p><p>*Disclaimer: I use Solaris 10 at work.</p></htmltext>
<tokenext>I 've been a ZFS fanboy for quite a while now , and last weekend I finally made the shift from Ubuntu to OpenSolaris on my server .
At the moment I 'm just using a basic mirrored pool and no raid-z ( it scares me ) .The only thing that really bugs me at the moment is its poor support for ext2/3 .
I was getting ridiculously slow transfer speeds from my old drives ( sub 100k/s ) and/or hard locks where I 've had to kill off the copy process and hope that I can unmount it.I 've been booting into Ubuntu , copying files over the network to my desktop PC , booting OpenSolaris , and copying back .
I 'm sure there 's an easier way to copy 4TB , but I 'm not in a hurry.What have been your experiences in migration from Linux to OpenSolaris ?
* Disclaimer : I use Solaris 10 at work .</tokentext>
<sentencetext>I've been a ZFS fanboy for quite a while now, and last weekend I finally made the shift from Ubuntu to OpenSolaris on my server.
At the moment I'm just using a basic mirrored pool and no raid-z (it scares me).The only thing that really bugs me at the moment is its poor support for ext2/3.
I was getting ridiculously slow transfer speeds from my old drives (sub 100k/s) and/or hard locks where I've had to kill off the copy process and hope that I can unmount it.I've been booting into Ubuntu, copying files over the network to my desktop PC, booting OpenSolaris, and copying back.
I'm sure there's an easier way to copy 4TB, but I'm not in a hurry.What have been your experiences in migration from Linux to OpenSolaris?
*Disclaimer: I use Solaris 10 at work.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956712</id>
	<title>ehem</title>
	<author>oldhack</author>
	<datestamp>1257164880000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext>Before we get all excited and look all silly, can somebody confirm with Netcraft first?</htmltext>
<tokenext>Before we get all excited and look all silly , can somebody confirm with Netcraft first ?</tokentext>
<sentencetext>Before we get all excited and look all silly, can somebody confirm with Netcraft first?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957866</id>
	<title>Re:What's the point?</title>
	<author>Anonymous</author>
	<datestamp>1257170460000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>The canonical use case for dedup is backup servers.  Imagine you have one Solaris file server serving 40 workstations.  Each of these does a full backup of its 10GB Window (or Linux, or whatever) install.  You then have 400GB of data, but only about 12GB of unique data.  Dedup lets you only store this 12GB, and you can store it with n redundant copies so it's easier to recover in cases of partial hardware failure.  Each workstation then does incremental backups, copying files with any changes to the server.  The server dedups these and only store the changed blocks.  <p>
The clients can be using NFS, CIFS, or iSCSI for the backup, and the server has a complete disk image (and periodic snapshots) of the clients' disks, but uses a tiny fraction of the space that this may require.  </p><p>
Oh, and with regard to this:</p><p><div class="quote"><p>  It may help backup if the backup program knows the fine details of deduplication</p></div><p>The entire point of dedup in the FS layer is that the backup software can be completely unaware of it.  As long as it produces a copy of the data on the server, the server will handle turning it from a full backup or a per-file incremental backup into a per-block incremental backup.</p></div>
	</htmltext>
<tokenext>The canonical use case for dedup is backup servers .
Imagine you have one Solaris file server serving 40 workstations .
Each of these does a full backup of its 10GB Window ( or Linux , or whatever ) install .
You then have 400GB of data , but only about 12GB of unique data .
Dedup lets you only store this 12GB , and you can store it with n redundant copies so it 's easier to recover in cases of partial hardware failure .
Each workstation then does incremental backups , copying files with any changes to the server .
The server dedups these and only store the changed blocks .
The clients can be using NFS , CIFS , or iSCSI for the backup , and the server has a complete disk image ( and periodic snapshots ) of the clients ' disks , but uses a tiny fraction of the space that this may require .
Oh , and with regard to this : It may help backup if the backup program knows the fine details of deduplicationThe entire point of dedup in the FS layer is that the backup software can be completely unaware of it .
As long as it produces a copy of the data on the server , the server will handle turning it from a full backup or a per-file incremental backup into a per-block incremental backup .</tokentext>
<sentencetext>The canonical use case for dedup is backup servers.
Imagine you have one Solaris file server serving 40 workstations.
Each of these does a full backup of its 10GB Window (or Linux, or whatever) install.
You then have 400GB of data, but only about 12GB of unique data.
Dedup lets you only store this 12GB, and you can store it with n redundant copies so it's easier to recover in cases of partial hardware failure.
Each workstation then does incremental backups, copying files with any changes to the server.
The server dedups these and only store the changed blocks.
The clients can be using NFS, CIFS, or iSCSI for the backup, and the server has a complete disk image (and periodic snapshots) of the clients' disks, but uses a tiny fraction of the space that this may require.
Oh, and with regard to this:  It may help backup if the backup program knows the fine details of deduplicationThe entire point of dedup in the FS layer is that the backup software can be completely unaware of it.
As long as it produces a copy of the data on the server, the server will handle turning it from a full backup or a per-file incremental backup into a per-block incremental backup.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957268</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963946</id>
	<title>Re:This is good news...</title>
	<author>Anonymous</author>
	<datestamp>1257266280000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>http://en.wikipedia.org/wiki/Nexenta\_OS</p></htmltext>
<tokenext>http : //en.wikipedia.org/wiki/Nexenta \ _OS</tokentext>
<sentencetext>http://en.wikipedia.org/wiki/Nexenta\_OS</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961470</id>
	<title>Re:Hash Collisions</title>
	<author>kripkenstein</author>
	<datestamp>1257246240000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>The probability of a hash collision for a 256 bit hash (or even a 128 bit one) is negligible.</p><p>How negligible? Well, the probability of a collision is never more then N^2 / 2^h, where N is the number of blocks stored and h is the number of bits in the hash. So, if we have 2^64 blocks stored (a mere billion terabytes or so for 128 byte blocks) , the probability of a collision is less than 2^(-128), or 10^(-38). Hardly worth worrying about.</p><p>And that's an upper limit, not the actual value.</p></div><p>
There are a <b>lot</b> of assumptions there. For one thing, you assume that hash functions on normal data give 'random' hashes. Optimally that is the case, and it seems to be so in practice, but it isn't a mathematical certainty. In other words there is a risk here we cannot quantify.
<br> <br>
For another thing, hashes can have security vulnerabilities. That is, if someone is intentionally trying to find collisions, that might be easier than attempting to do so at random. This could then lead to attacks of the following sort:
</p><ul>
<li>Rent a VM on a hosted server</li><li>Find the hash value of some crucial area on the disk (e.g. part of the kernel). This might be easy if you know what OS they use.</li><li>Create a block with the same hash, potentially confusing the underlying filesystem into using yours.</li><li>(Most likely it won't, because it will use the older one. But in theory you can do this before say a security patch is applied, and your data will be used instead.)</li></ul><p>
Other attacks might be against data and not the OS, say if you know some data is stored on another VM on the same machine.
<br> <br>
I would personally not run this cool feature without the flag to actually check for duplicates.
</p></div>
	</htmltext>
<tokenext>The probability of a hash collision for a 256 bit hash ( or even a 128 bit one ) is negligible.How negligible ?
Well , the probability of a collision is never more then N ^ 2 / 2 ^ h , where N is the number of blocks stored and h is the number of bits in the hash .
So , if we have 2 ^ 64 blocks stored ( a mere billion terabytes or so for 128 byte blocks ) , the probability of a collision is less than 2 ^ ( -128 ) , or 10 ^ ( -38 ) .
Hardly worth worrying about.And that 's an upper limit , not the actual value .
There are a lot of assumptions there .
For one thing , you assume that hash functions on normal data give 'random ' hashes .
Optimally that is the case , and it seems to be so in practice , but it is n't a mathematical certainty .
In other words there is a risk here we can not quantify .
For another thing , hashes can have security vulnerabilities .
That is , if someone is intentionally trying to find collisions , that might be easier than attempting to do so at random .
This could then lead to attacks of the following sort : Rent a VM on a hosted serverFind the hash value of some crucial area on the disk ( e.g .
part of the kernel ) .
This might be easy if you know what OS they use.Create a block with the same hash , potentially confusing the underlying filesystem into using yours .
( Most likely it wo n't , because it will use the older one .
But in theory you can do this before say a security patch is applied , and your data will be used instead .
) Other attacks might be against data and not the OS , say if you know some data is stored on another VM on the same machine .
I would personally not run this cool feature without the flag to actually check for duplicates .</tokentext>
<sentencetext>The probability of a hash collision for a 256 bit hash (or even a 128 bit one) is negligible.How negligible?
Well, the probability of a collision is never more then N^2 / 2^h, where N is the number of blocks stored and h is the number of bits in the hash.
So, if we have 2^64 blocks stored (a mere billion terabytes or so for 128 byte blocks) , the probability of a collision is less than 2^(-128), or 10^(-38).
Hardly worth worrying about.And that's an upper limit, not the actual value.
There are a lot of assumptions there.
For one thing, you assume that hash functions on normal data give 'random' hashes.
Optimally that is the case, and it seems to be so in practice, but it isn't a mathematical certainty.
In other words there is a risk here we cannot quantify.
For another thing, hashes can have security vulnerabilities.
That is, if someone is intentionally trying to find collisions, that might be easier than attempting to do so at random.
This could then lead to attacks of the following sort:

Rent a VM on a hosted serverFind the hash value of some crucial area on the disk (e.g.
part of the kernel).
This might be easy if you know what OS they use.Create a block with the same hash, potentially confusing the underlying filesystem into using yours.
(Most likely it won't, because it will use the older one.
But in theory you can do this before say a security patch is applied, and your data will be used instead.
)
Other attacks might be against data and not the OS, say if you know some data is stored on another VM on the same machine.
I would personally not run this cool feature without the flag to actually check for duplicates.

	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956856</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960688</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>Hurricane78</author>
	<datestamp>1257191340000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Why didn't you simply copy it to *another* drive with built-in compression?</p><p>That will be $5000 then. Do you pay cash? ^^</p></htmltext>
<tokenext>Why did n't you simply copy it to * another * drive with built-in compression ? That will be $ 5000 then .
Do you pay cash ?
^ ^</tokentext>
<sentencetext>Why didn't you simply copy it to *another* drive with built-in compression?That will be $5000 then.
Do you pay cash?
^^</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29964416</id>
	<title>Re:well ...</title>
	<author>swordgeek</author>
	<datestamp>1257268260000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>We're starting to roll out ZFS in our (large!) enterprise. We've played with it in the lab, and in our internal support systems (e.g. documentation and authentication systems) enough to be comfortable with it.</p><p>However, you nailed the biggest weakness with it in five words:</p><p><i>"Sun support had no explanation."</i></p><p>We are a BIG Sun shop, and this has been our general experience with Sun in the last two years or so. Sun is bleeding competence faster than they can fire it. For every good person they lay off (because tech staff are expensive--especially tech support staff), two more will quit in disgust.</p><p>I'm a big Sun fan - have been since SunOS 4 was the new kid on the block. I also think that ZFS is the third-best thing since sliced bread (if they added volume shrinking and online relayout, it'd be #1). Solaris 10, for all of its warts, is still the best Unix on the market right now. However, I don't see Sun surviving much longer--enterprises with a lost of investment and loyalty are starting to turn away in frustration.</p></htmltext>
<tokenext>We 're starting to roll out ZFS in our ( large !
) enterprise .
We 've played with it in the lab , and in our internal support systems ( e.g .
documentation and authentication systems ) enough to be comfortable with it.However , you nailed the biggest weakness with it in five words : " Sun support had no explanation .
" We are a BIG Sun shop , and this has been our general experience with Sun in the last two years or so .
Sun is bleeding competence faster than they can fire it .
For every good person they lay off ( because tech staff are expensive--especially tech support staff ) , two more will quit in disgust.I 'm a big Sun fan - have been since SunOS 4 was the new kid on the block .
I also think that ZFS is the third-best thing since sliced bread ( if they added volume shrinking and online relayout , it 'd be # 1 ) .
Solaris 10 , for all of its warts , is still the best Unix on the market right now .
However , I do n't see Sun surviving much longer--enterprises with a lost of investment and loyalty are starting to turn away in frustration .</tokentext>
<sentencetext>We're starting to roll out ZFS in our (large!
) enterprise.
We've played with it in the lab, and in our internal support systems (e.g.
documentation and authentication systems) enough to be comfortable with it.However, you nailed the biggest weakness with it in five words:"Sun support had no explanation.
"We are a BIG Sun shop, and this has been our general experience with Sun in the last two years or so.
Sun is bleeding competence faster than they can fire it.
For every good person they lay off (because tech staff are expensive--especially tech support staff), two more will quit in disgust.I'm a big Sun fan - have been since SunOS 4 was the new kid on the block.
I also think that ZFS is the third-best thing since sliced bread (if they added volume shrinking and online relayout, it'd be #1).
Solaris 10, for all of its warts, is still the best Unix on the market right now.
However, I don't see Sun surviving much longer--enterprises with a lost of investment and loyalty are starting to turn away in frustration.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957690</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957876</id>
	<title>Re:Hash Collisions</title>
	<author>Junta</author>
	<datestamp>1257170520000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>They have the 'verify' mode to do what you prescribe, though I'm presuming it comes with a hefty performance penalty.</p><p>I have no idea if they do this up front, inducing latency on all write operations, or as it goes.</p><p>What I would like to see is a strategy where it does the hash calculation, writes block to new part of disk assuming it is unique, records the block location as an unverified block in a hash table, and schedules a dedupe scan if one not already pending.  Then, a very low priority io task could scan that structure for block locations that have yet to be verified, and then scan all the blocks that match its hash for sameness and update the structures to retroactively make it a single copy (effectively unlinking a block deemed duplicate after the fact).  The absolute hard guarantee of sameness without a write performance penalty.</p><p>I'm very far from a filesystem designer, and I recognize the likelihood of a collision given sufficiently large block size is low, but I'd really be wary of something that relies on not having bad luck to accidentally lose data on a write due to an unlikely hash collision.</p></htmltext>
<tokenext>They have the 'verify ' mode to do what you prescribe , though I 'm presuming it comes with a hefty performance penalty.I have no idea if they do this up front , inducing latency on all write operations , or as it goes.What I would like to see is a strategy where it does the hash calculation , writes block to new part of disk assuming it is unique , records the block location as an unverified block in a hash table , and schedules a dedupe scan if one not already pending .
Then , a very low priority io task could scan that structure for block locations that have yet to be verified , and then scan all the blocks that match its hash for sameness and update the structures to retroactively make it a single copy ( effectively unlinking a block deemed duplicate after the fact ) .
The absolute hard guarantee of sameness without a write performance penalty.I 'm very far from a filesystem designer , and I recognize the likelihood of a collision given sufficiently large block size is low , but I 'd really be wary of something that relies on not having bad luck to accidentally lose data on a write due to an unlikely hash collision .</tokentext>
<sentencetext>They have the 'verify' mode to do what you prescribe, though I'm presuming it comes with a hefty performance penalty.I have no idea if they do this up front, inducing latency on all write operations, or as it goes.What I would like to see is a strategy where it does the hash calculation, writes block to new part of disk assuming it is unique, records the block location as an unverified block in a hash table, and schedules a dedupe scan if one not already pending.
Then, a very low priority io task could scan that structure for block locations that have yet to be verified, and then scan all the blocks that match its hash for sameness and update the structures to retroactively make it a single copy (effectively unlinking a block deemed duplicate after the fact).
The absolute hard guarantee of sameness without a write performance penalty.I'm very far from a filesystem designer, and I recognize the likelihood of a collision given sufficiently large block size is low, but I'd really be wary of something that relies on not having bad luck to accidentally lose data on a write due to an unlikely hash collision.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960</id>
	<title>Re:Hash Collisions</title>
	<author>shutdown -p now</author>
	<datestamp>1257166500000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>4</modscore>
	<htmltext><p>Before I left Acronis, I was the lead developer and designer for <a href="http://www.acronis.com/backup-recovery/advanced-server/deduplication.html" title="acronis.com">deduplication in Acronis Backup &amp; Recovery 10</a> [acronis.com]. We also used SHA256 there, and naturally the possibility of a hash collision was investigated. After we did the math, it turned out that you're about 10^6 times more likely to lose data because of hardware failure (even considering RAID) than you are to lose it because of a hash collision.</p></htmltext>
<tokenext>Before I left Acronis , I was the lead developer and designer for deduplication in Acronis Backup &amp; Recovery 10 [ acronis.com ] .
We also used SHA256 there , and naturally the possibility of a hash collision was investigated .
After we did the math , it turned out that you 're about 10 ^ 6 times more likely to lose data because of hardware failure ( even considering RAID ) than you are to lose it because of a hash collision .</tokentext>
<sentencetext>Before I left Acronis, I was the lead developer and designer for deduplication in Acronis Backup &amp; Recovery 10 [acronis.com].
We also used SHA256 there, and naturally the possibility of a hash collision was investigated.
After we did the math, it turned out that you're about 10^6 times more likely to lose data because of hardware failure (even considering RAID) than you are to lose it because of a hash collision.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962566</id>
	<title>Combine with BitTorrent?</title>
	<author>Anonymous</author>
	<datestamp>1257258420000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Downloading a chunk with SHA.DEADBEAF...</p><p>oh<nobr> <wbr></nobr>... looky here... DEADBEAF is already on disk.</p><p>Done.</p></htmltext>
<tokenext>Downloading a chunk with SHA.DEADBEAF...oh ... looky here... DEADBEAF is already on disk.Done .</tokentext>
<sentencetext>Downloading a chunk with SHA.DEADBEAF...oh ... looky here... DEADBEAF is already on disk.Done.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956984</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Anonymous</author>
	<datestamp>1257166560000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Plan 9 I pioneered filesystems that do block level deduplication in it's backup filesystem.</p></htmltext>
<tokenext>Plan 9 I pioneered filesystems that do block level deduplication in it 's backup filesystem .</tokentext>
<sentencetext>Plan 9 I pioneered filesystems that do block level deduplication in it's backup filesystem.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>TheSpoom</author>
	<datestamp>1257168480000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>What I'm wondering about all of this is what happens when you edit one of the files?  Does it "reduplicate" them?  And if so, isn't that inefficient in terms of the time needed to update a large file (in that it would need to recopy the file over to another section of the disk in order to maintain the fact that there are two now-different copies)?</p></htmltext>
<tokenext>What I 'm wondering about all of this is what happens when you edit one of the files ?
Does it " reduplicate " them ?
And if so , is n't that inefficient in terms of the time needed to update a large file ( in that it would need to recopy the file over to another section of the disk in order to maintain the fact that there are two now-different copies ) ?</tokentext>
<sentencetext>What I'm wondering about all of this is what happens when you edit one of the files?
Does it "reduplicate" them?
And if so, isn't that inefficient in terms of the time needed to update a large file (in that it would need to recopy the file over to another section of the disk in order to maintain the fact that there are two now-different copies)?</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959214</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Captain Segfault</author>
	<datestamp>1257177780000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Block based deduplication does not have that problem. Writing to a deduplicated block only requires a copy of that block.</p><p>This isn't actually a matter of ZFS being a "copy on write" filesystem. Any filesystem implementing block level deduplication needs to support copy on write for duplicate blocks, but it doesn't need to support copy on write for everything.</p></htmltext>
<tokenext>Block based deduplication does not have that problem .
Writing to a deduplicated block only requires a copy of that block.This is n't actually a matter of ZFS being a " copy on write " filesystem .
Any filesystem implementing block level deduplication needs to support copy on write for duplicate blocks , but it does n't need to support copy on write for everything .</tokentext>
<sentencetext>Block based deduplication does not have that problem.
Writing to a deduplicated block only requires a copy of that block.This isn't actually a matter of ZFS being a "copy on write" filesystem.
Any filesystem implementing block level deduplication needs to support copy on write for duplicate blocks, but it doesn't need to support copy on write for everything.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960128</id>
	<title>Billions of dollars</title>
	<author>Deton8</author>
	<datestamp>1257185880000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>I bet EMC is happy they just out-bid NetApp to the tune of $2.4 billion, for basically the same technology that Jeff Bonwick is giving away for free.</htmltext>
<tokenext>I bet EMC is happy they just out-bid NetApp to the tune of $ 2.4 billion , for basically the same technology that Jeff Bonwick is giving away for free .</tokentext>
<sentencetext>I bet EMC is happy they just out-bid NetApp to the tune of $2.4 billion, for basically the same technology that Jeff Bonwick is giving away for free.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957986</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Anonymous</author>
	<datestamp>1257171000000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>4</modscore>
	<htmltext>You recall wrong. NTFS has long supported both hard links and a mechanism called 'reparse points,' which are much more powerful than simple symlinks.</htmltext>
<tokenext>You recall wrong .
NTFS has long supported both hard links and a mechanism called 'reparse points, ' which are much more powerful than simple symlinks .</tokentext>
<sentencetext>You recall wrong.
NTFS has long supported both hard links and a mechanism called 'reparse points,' which are much more powerful than simple symlinks.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957456</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961342</id>
	<title>Re:Hash Collisions</title>
	<author>MrNemesis</author>
	<datestamp>1257243960000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Not disagreeing with you per se (I concede that the possibility of a hash collision is infinitesimal with SHA256) but even so wouldn't a collision be the worst kind of failure - namely silent data corruption?</p><p>Does anyone know if the ZFS code incorporates a mode where you can enforce checking of the blocks bit-for-bit in the event of the hashes being the same? More IO intensive but it's a checkbox for all those "data integrity is paramount" applications.</p></htmltext>
<tokenext>Not disagreeing with you per se ( I concede that the possibility of a hash collision is infinitesimal with SHA256 ) but even so would n't a collision be the worst kind of failure - namely silent data corruption ? Does anyone know if the ZFS code incorporates a mode where you can enforce checking of the blocks bit-for-bit in the event of the hashes being the same ?
More IO intensive but it 's a checkbox for all those " data integrity is paramount " applications .</tokentext>
<sentencetext>Not disagreeing with you per se (I concede that the possibility of a hash collision is infinitesimal with SHA256) but even so wouldn't a collision be the worst kind of failure - namely silent data corruption?Does anyone know if the ZFS code incorporates a mode where you can enforce checking of the blocks bit-for-bit in the event of the hashes being the same?
More IO intensive but it's a checkbox for all those "data integrity is paramount" applications.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29970150</id>
	<title>Re:There are three types of files.</title>
	<author>octal\_sio</author>
	<datestamp>1257249660000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Plan 9's filesystem has an append permission mode bit.</p><p>It also achieves deduplication in a simple way using <a href="http://en.wikipedia.org/wiki/Venti" title="wikipedia.org" rel="nofollow">Venti</a> [wikipedia.org]. Clever stuff.</p></htmltext>
<tokenext>Plan 9 's filesystem has an append permission mode bit.It also achieves deduplication in a simple way using Venti [ wikipedia.org ] .
Clever stuff .</tokentext>
<sentencetext>Plan 9's filesystem has an append permission mode bit.It also achieves deduplication in a simple way using Venti [wikipedia.org].
Clever stuff.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962330</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>julesh</author>
	<datestamp>1257256500000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><i>Imagine he amount of stuff you could (unreliably) store on a hard disk if massive de-duplication was built into the drive electronics. It could even do this quietly in the background.</i></p><p>Not as good as installing extra processing power in your machine and doing it in the OS.  Honestly.  The primary advantage here isn't actually the saving of disk space.  Nobody really cares about that too much.</p><p>The main advantage is that if two processes have two files with identical blocks in them, and map those files into memory (or just read them so they're cached), if they're deduped you'll end up with both processes having copy-on-write references to the same memory block.  The big win here is in saving RAM, not disk space.  And that requires the OS to understand and be aware that the deduplication has happened.</p></htmltext>
<tokenext>Imagine he amount of stuff you could ( unreliably ) store on a hard disk if massive de-duplication was built into the drive electronics .
It could even do this quietly in the background.Not as good as installing extra processing power in your machine and doing it in the OS .
Honestly. The primary advantage here is n't actually the saving of disk space .
Nobody really cares about that too much.The main advantage is that if two processes have two files with identical blocks in them , and map those files into memory ( or just read them so they 're cached ) , if they 're deduped you 'll end up with both processes having copy-on-write references to the same memory block .
The big win here is in saving RAM , not disk space .
And that requires the OS to understand and be aware that the deduplication has happened .</tokentext>
<sentencetext>Imagine he amount of stuff you could (unreliably) store on a hard disk if massive de-duplication was built into the drive electronics.
It could even do this quietly in the background.Not as good as installing extra processing power in your machine and doing it in the OS.
Honestly.  The primary advantage here isn't actually the saving of disk space.
Nobody really cares about that too much.The main advantage is that if two processes have two files with identical blocks in them, and map those files into memory (or just read them so they're cached), if they're deduped you'll end up with both processes having copy-on-write references to the same memory block.
The big win here is in saving RAM, not disk space.
And that requires the OS to understand and be aware that the deduplication has happened.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</id>
	<title>Any other file systems with that feature?</title>
	<author>Anonymous</author>
	<datestamp>1257165060000</datestamp>
	<modclass>None</modclass>
	<modscore>2</modscore>
	<htmltext><p>Are there any other filesystems with that feature?  If not, I'm very strongly considering writing my own.</p></htmltext>
<tokenext>Are there any other filesystems with that feature ?
If not , I 'm very strongly considering writing my own .</tokentext>
<sentencetext>Are there any other filesystems with that feature?
If not, I'm very strongly considering writing my own.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960814</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Hurricane78</author>
	<datestamp>1257279180000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Yeah, but if you actually used them, you'd know that Windows neither has any support for them, nor are they anything other than a ugly hack. (After all, there's not much money in the pot, for something that is no feature in the UI anyway.)</p><p>Sadly...</p><p>But hey, I use Linux anyway.</p></htmltext>
<tokenext>Yeah , but if you actually used them , you 'd know that Windows neither has any support for them , nor are they anything other than a ugly hack .
( After all , there 's not much money in the pot , for something that is no feature in the UI anyway .
) Sadly...But hey , I use Linux anyway .</tokentext>
<sentencetext>Yeah, but if you actually used them, you'd know that Windows neither has any support for them, nor are they anything other than a ugly hack.
(After all, there's not much money in the pot, for something that is no feature in the UI anyway.
)Sadly...But hey, I use Linux anyway.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957986</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956856</id>
	<title>Re:Hash Collisions</title>
	<author>Rising Ape</author>
	<datestamp>1257165840000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>2</modscore>
	<htmltext><p>The probability of a hash collision for a 256 bit hash (or even a 128 bit one) is negligible.</p><p>How negligible? Well, the probability of a collision is never more then N^2 / 2^h, where N is the number of blocks stored and h is the number of bits in the hash. So, if we have 2^64 blocks stored (a mere billion terabytes or so for 128 byte blocks) , the probability of a collision is less than 2^(-128), or 10^(-38). Hardly worth worrying about.</p><p>And that's an upper limit, not the actual value.</p></htmltext>
<tokenext>The probability of a hash collision for a 256 bit hash ( or even a 128 bit one ) is negligible.How negligible ?
Well , the probability of a collision is never more then N ^ 2 / 2 ^ h , where N is the number of blocks stored and h is the number of bits in the hash .
So , if we have 2 ^ 64 blocks stored ( a mere billion terabytes or so for 128 byte blocks ) , the probability of a collision is less than 2 ^ ( -128 ) , or 10 ^ ( -38 ) .
Hardly worth worrying about.And that 's an upper limit , not the actual value .</tokentext>
<sentencetext>The probability of a hash collision for a 256 bit hash (or even a 128 bit one) is negligible.How negligible?
Well, the probability of a collision is never more then N^2 / 2^h, where N is the number of blocks stored and h is the number of bits in the hash.
So, if we have 2^64 blocks stored (a mere billion terabytes or so for 128 byte blocks) , the probability of a collision is less than 2^(-128), or 10^(-38).
Hardly worth worrying about.And that's an upper limit, not the actual value.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962340</id>
	<title>Re:Hash Collisions</title>
	<author>Anonymous</author>
	<datestamp>1257256620000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Might I point out that lost data and corrupted data are two very different beasts.</p></htmltext>
<tokenext>Might I point out that lost data and corrupted data are two very different beasts .</tokentext>
<sentencetext>Might I point out that lost data and corrupted data are two very different beasts.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959348</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>evilviper</author>
	<datestamp>1257178980000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>the time needed to update a large file (in that it would need to recopy the file over to another section of the disk in order to maintain the fact that there are two now-different copies)?</p></div></blockquote><p>You're thinking of file-level "de-duplication".  But this is block-level.  So, if you make a small change, it doesn't have to write 500 blocks, just the one.</p><p>Everyone else already mentioned ZFS is CoW, so I'll leave it at that.</p></div>
	</htmltext>
<tokenext>the time needed to update a large file ( in that it would need to recopy the file over to another section of the disk in order to maintain the fact that there are two now-different copies ) ? You 're thinking of file-level " de-duplication " .
But this is block-level .
So , if you make a small change , it does n't have to write 500 blocks , just the one.Everyone else already mentioned ZFS is CoW , so I 'll leave it at that .</tokentext>
<sentencetext>the time needed to update a large file (in that it would need to recopy the file over to another section of the disk in order to maintain the fact that there are two now-different copies)?You're thinking of file-level "de-duplication".
But this is block-level.
So, if you make a small change, it doesn't have to write 500 blocks, just the one.Everyone else already mentioned ZFS is CoW, so I'll leave it at that.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956866</id>
	<title>Re:Hash Collisions</title>
	<author>pclminion</author>
	<datestamp>1257165900000</datestamp>
	<modclass>Funny</modclass>
	<modscore>2</modscore>
	<htmltext><p>Suppose you can tolerate a chance of collision of 10^-18 per-block. Given a 256-bit hash, it would take 4.8e29 blocks to achieve this collision probability. Supposing a block size of 512 bytes, that's 223517417907714843750 terabytes.</p><p>Now, supposing you have a 223517417907714843750 terabyte drive, and you can NOT tolerate a collision probability of 10^-18, then you can just do a bit-for-bit check of the colliding blocks before deciding if they are identical or not.</p></htmltext>
<tokenext>Suppose you can tolerate a chance of collision of 10 ^ -18 per-block .
Given a 256-bit hash , it would take 4.8e29 blocks to achieve this collision probability .
Supposing a block size of 512 bytes , that 's 223517417907714843750 terabytes.Now , supposing you have a 223517417907714843750 terabyte drive , and you can NOT tolerate a collision probability of 10 ^ -18 , then you can just do a bit-for-bit check of the colliding blocks before deciding if they are identical or not .</tokentext>
<sentencetext>Suppose you can tolerate a chance of collision of 10^-18 per-block.
Given a 256-bit hash, it would take 4.8e29 blocks to achieve this collision probability.
Supposing a block size of 512 bytes, that's 223517417907714843750 terabytes.Now, supposing you have a 223517417907714843750 terabyte drive, and you can NOT tolerate a collision probability of 10^-18, then you can just do a bit-for-bit check of the colliding blocks before deciding if they are identical or not.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29964594</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Anonymous</author>
	<datestamp>1257268980000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>I usually tie shrinking in with with vdev removal.  Both of these are a large business requirement, and likely less important to smaller shops or home usage.</p><p>Both fulfill similiar roles when dealing with expensive SAN storage.  SAN is commonly shared on as LUN's (chunks of storage) and then parceled up on the client system.</p><p>Common large enterprise activities include migrating SAN:<br>
&nbsp; &nbsp; - as part of the continuous hardware lifecycle,<br>
&nbsp; &nbsp; - due to problems,<br>
&nbsp; &nbsp; - for performance,<br>
&nbsp; &nbsp; - and so on.</p><p>* Assuming the LUNs from the two devices are *exactly* the same size, you can do a replace in ZFS.  But this isn't ideal, and often (at least here) LUN sizes are close, but not exact.  (BTW Growing during migration is not a good option here).<br>
&nbsp; &nbsp; + Some SAN has restrictions on LUN sizes, so migrating can involve a change in the overall number of LUNs ( 4 x 50G  --&gt; 2 x 100G)</p><p>* Temporary growth (high load), can be *very* expensive to compensate for on hundreds or thousands of systems.  Being able to grow for exceptional circumstances, then shrink again later, can save millions.</p><p>And yes, these can be worked around.  But the work-arounds are often awkward, and hard to justify, when a paid product like Veritas obviously provides the capability.</p><p>These workaronds often require downtime - any downtime can be expensive in large organizations.</p><p>I prefer ZFS, Veritas licenses are a rip-off, but in a business case, ZFS currently means regular downtimes when storage changes.</p></htmltext>
<tokenext>I usually tie shrinking in with with vdev removal .
Both of these are a large business requirement , and likely less important to smaller shops or home usage.Both fulfill similiar roles when dealing with expensive SAN storage .
SAN is commonly shared on as LUN 's ( chunks of storage ) and then parceled up on the client system.Common large enterprise activities include migrating SAN :     - as part of the continuous hardware lifecycle ,     - due to problems ,     - for performance ,     - and so on .
* Assuming the LUNs from the two devices are * exactly * the same size , you can do a replace in ZFS .
But this is n't ideal , and often ( at least here ) LUN sizes are close , but not exact .
( BTW Growing during migration is not a good option here ) .
    + Some SAN has restrictions on LUN sizes , so migrating can involve a change in the overall number of LUNs ( 4 x 50G -- &gt; 2 x 100G ) * Temporary growth ( high load ) , can be * very * expensive to compensate for on hundreds or thousands of systems .
Being able to grow for exceptional circumstances , then shrink again later , can save millions.And yes , these can be worked around .
But the work-arounds are often awkward , and hard to justify , when a paid product like Veritas obviously provides the capability.These workaronds often require downtime - any downtime can be expensive in large organizations.I prefer ZFS , Veritas licenses are a rip-off , but in a business case , ZFS currently means regular downtimes when storage changes .</tokentext>
<sentencetext>I usually tie shrinking in with with vdev removal.
Both of these are a large business requirement, and likely less important to smaller shops or home usage.Both fulfill similiar roles when dealing with expensive SAN storage.
SAN is commonly shared on as LUN's (chunks of storage) and then parceled up on the client system.Common large enterprise activities include migrating SAN:
    - as part of the continuous hardware lifecycle,
    - due to problems,
    - for performance,
    - and so on.
* Assuming the LUNs from the two devices are *exactly* the same size, you can do a replace in ZFS.
But this isn't ideal, and often (at least here) LUN sizes are close, but not exact.
(BTW Growing during migration is not a good option here).
    + Some SAN has restrictions on LUN sizes, so migrating can involve a change in the overall number of LUNs ( 4 x 50G  --&gt; 2 x 100G)* Temporary growth (high load), can be *very* expensive to compensate for on hundreds or thousands of systems.
Being able to grow for exceptional circumstances, then shrink again later, can save millions.And yes, these can be worked around.
But the work-arounds are often awkward, and hard to justify, when a paid product like Veritas obviously provides the capability.These workaronds often require downtime - any downtime can be expensive in large organizations.I prefer ZFS, Veritas licenses are a rip-off, but in a business case, ZFS currently means regular downtimes when storage changes.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959644</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957518</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>c6gunner</author>
	<datestamp>1257169020000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>This could trigger a similar situation where there was suddenly not enough room to store the same amount of data that was already on the device. (For some values of "suddenly" and "already").</p></div><p>Yes, but what's the likelihood of that occurring?  We're talking about block level duplication here.  If you have two identical files and you add a bit to the end of one, you're not creating a duplicate fi;e - you're just adding a few blocks while still referencing the original de-dupped file.  Now, if you were doing file-level duplication it might be an issue, but this way<nobr> <wbr></nobr>... I can't see it ever being a problem unless your array is already at 99.9\% percent capacity (and that's just a bad idea in general).</p></div>
	</htmltext>
<tokenext>This could trigger a similar situation where there was suddenly not enough room to store the same amount of data that was already on the device .
( For some values of " suddenly " and " already " ) .Yes , but what 's the likelihood of that occurring ?
We 're talking about block level duplication here .
If you have two identical files and you add a bit to the end of one , you 're not creating a duplicate fi ; e - you 're just adding a few blocks while still referencing the original de-dupped file .
Now , if you were doing file-level duplication it might be an issue , but this way ... I ca n't see it ever being a problem unless your array is already at 99.9 \ % percent capacity ( and that 's just a bad idea in general ) .</tokentext>
<sentencetext>This could trigger a similar situation where there was suddenly not enough room to store the same amount of data that was already on the device.
(For some values of "suddenly" and "already").Yes, but what's the likelihood of that occurring?
We're talking about block level duplication here.
If you have two identical files and you add a bit to the end of one, you're not creating a duplicate fi;e - you're just adding a few blocks while still referencing the original de-dupped file.
Now, if you were doing file-level duplication it might be an issue, but this way ... I can't see it ever being a problem unless your array is already at 99.9\% percent capacity (and that's just a bad idea in general).
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961748</id>
	<title>Re:This is good news...</title>
	<author>BrentH</author>
	<datestamp>1257249840000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Thing is, there not good reason not to do it. CPU's have more cycles to spare than HD's have bandwidth (dont forget that aspect of dedup!) or bytes. The state of filesystems in 2009 is basically the same as it was 15 years ago, while ZFS shows that you can do lot's of little things that make life easier, simpler, more effecient, more secure, etc etc. Why not have that in 2009? The machines can do it, ZFS is the software that can do it, why not?</htmltext>
<tokenext>Thing is , there not good reason not to do it .
CPU 's have more cycles to spare than HD 's have bandwidth ( dont forget that aspect of dedup !
) or bytes .
The state of filesystems in 2009 is basically the same as it was 15 years ago , while ZFS shows that you can do lot 's of little things that make life easier , simpler , more effecient , more secure , etc etc .
Why not have that in 2009 ?
The machines can do it , ZFS is the software that can do it , why not ?</tokentext>
<sentencetext>Thing is, there not good reason not to do it.
CPU's have more cycles to spare than HD's have bandwidth (dont forget that aspect of dedup!
) or bytes.
The state of filesystems in 2009 is basically the same as it was 15 years ago, while ZFS shows that you can do lot's of little things that make life easier, simpler, more effecient, more secure, etc etc.
Why not have that in 2009?
The machines can do it, ZFS is the software that can do it, why not?</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957584</id>
	<title>I Heard ISPs Were Doing This</title>
	<author>sexconker</author>
	<datestamp>1257169320000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I Heard ISPs Were Doing This With Broadband.<br>Simply duplicate your advertised pipe across 100 subscribers.</p><p>If they want to access it at the same time, just shift stuff around.</p><p>If they want to access it at the same time, and you don't have room to shift stuff around, just impose caps and bill them progressively out the ass.</p></htmltext>
<tokenext>I Heard ISPs Were Doing This With Broadband.Simply duplicate your advertised pipe across 100 subscribers.If they want to access it at the same time , just shift stuff around.If they want to access it at the same time , and you do n't have room to shift stuff around , just impose caps and bill them progressively out the ass .</tokentext>
<sentencetext>I Heard ISPs Were Doing This With Broadband.Simply duplicate your advertised pipe across 100 subscribers.If they want to access it at the same time, just shift stuff around.If they want to access it at the same time, and you don't have room to shift stuff around, just impose caps and bill them progressively out the ass.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958048</id>
	<title>I tried this on my RAID system</title>
	<author>ljw1004</author>
	<datestamp>1257171360000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I tried this on my RAID-1 system and it got converted to RAID-0.</p></htmltext>
<tokenext>I tried this on my RAID-1 system and it got converted to RAID-0 .</tokentext>
<sentencetext>I tried this on my RAID-1 system and it got converted to RAID-0.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957334</id>
	<title>Re:Hash Collisions</title>
	<author>Just Some Guy</author>
	<datestamp>1257168300000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>Surely with high amounts of data (that zfs is supposed to be able to handle), a hash collision may occur?</p></div><p>The birthday paradox says you'd have to look at 2^(n/2) candidates, on average, to find a collision for a given n-bit hash.  In this case, that means you'd have to look at about 2^128 objects to find a collision with a particular one.</p><p>On my home server, the default block size is 128KB.  With a terabyte drive, that gives about 8.4 million blocks.</p><p>GmPy says the likelihood of an event with probably of 1/(2^128) <em>not</em> happening 8.4 million times (well, 1024^4/(128*1024) times) in a row is 0.99999999999999999999999999999997534809671184338108088348233.  In other words, that's how likely you are to fill a 1TB drive with 128KB blocks without a single hash collision.</p><p>I can live with that.</p></div>
	</htmltext>
<tokenext>Surely with high amounts of data ( that zfs is supposed to be able to handle ) , a hash collision may occur ? The birthday paradox says you 'd have to look at 2 ^ ( n/2 ) candidates , on average , to find a collision for a given n-bit hash .
In this case , that means you 'd have to look at about 2 ^ 128 objects to find a collision with a particular one.On my home server , the default block size is 128KB .
With a terabyte drive , that gives about 8.4 million blocks.GmPy says the likelihood of an event with probably of 1/ ( 2 ^ 128 ) not happening 8.4 million times ( well , 1024 ^ 4/ ( 128 * 1024 ) times ) in a row is 0.99999999999999999999999999999997534809671184338108088348233 .
In other words , that 's how likely you are to fill a 1TB drive with 128KB blocks without a single hash collision.I can live with that .</tokentext>
<sentencetext>Surely with high amounts of data (that zfs is supposed to be able to handle), a hash collision may occur?The birthday paradox says you'd have to look at 2^(n/2) candidates, on average, to find a collision for a given n-bit hash.
In this case, that means you'd have to look at about 2^128 objects to find a collision with a particular one.On my home server, the default block size is 128KB.
With a terabyte drive, that gives about 8.4 million blocks.GmPy says the likelihood of an event with probably of 1/(2^128) not happening 8.4 million times (well, 1024^4/(128*1024) times) in a row is 0.99999999999999999999999999999997534809671184338108088348233.
In other words, that's how likely you are to fill a 1TB drive with 128KB blocks without a single hash collision.I can live with that.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959094</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Captain Segfault</author>
	<datestamp>1257176940000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>Are there any other filesystems with that feature?</p></div><p> <a href="http://en.wikipedia.org/wiki/Write\_Anywhere\_File\_Layout" title="wikipedia.org" rel="nofollow">WAFL</a> [wikipedia.org].</p></div>
	</htmltext>
<tokenext>Are there any other filesystems with that feature ?
WAFL [ wikipedia.org ] .</tokentext>
<sentencetext>Are there any other filesystems with that feature?
WAFL [wikipedia.org].
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963144</id>
	<title>Re:Hash Collisions</title>
	<author>Anonymous</author>
	<datestamp>1257262080000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Please provide the math, and most importantly, the critical assumptions. (such as disk size, hardware failure rates, etc). While I think you may have been correct when you were at Acronis, when we reach 10 TB drives that last on average 5 years, I believe I can create a case in which you are 10^6 times more likely to lose data to a hash function and software errors in de-duplication than good old-fashioned redundancy.</p></htmltext>
<tokenext>Please provide the math , and most importantly , the critical assumptions .
( such as disk size , hardware failure rates , etc ) .
While I think you may have been correct when you were at Acronis , when we reach 10 TB drives that last on average 5 years , I believe I can create a case in which you are 10 ^ 6 times more likely to lose data to a hash function and software errors in de-duplication than good old-fashioned redundancy .</tokentext>
<sentencetext>Please provide the math, and most importantly, the critical assumptions.
(such as disk size, hardware failure rates, etc).
While I think you may have been correct when you were at Acronis, when we reach 10 TB drives that last on average 5 years, I believe I can create a case in which you are 10^6 times more likely to lose data to a hash function and software errors in de-duplication than good old-fashioned redundancy.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957660</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>TheRaven64</author>
	<datestamp>1257169560000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>ZFS is copy on write, so every time you write a block it generates a new copy then decrements the reference count of the old copy.  The 'reduplication' doesn't require any additional support, it will work automatically.  Of course, you also want to check if the new block can be deduplicated...</htmltext>
<tokenext>ZFS is copy on write , so every time you write a block it generates a new copy then decrements the reference count of the old copy .
The 'reduplication ' does n't require any additional support , it will work automatically .
Of course , you also want to check if the new block can be deduplicated.. .</tokentext>
<sentencetext>ZFS is copy on write, so every time you write a block it generates a new copy then decrements the reference count of the old copy.
The 'reduplication' doesn't require any additional support, it will work automatically.
Of course, you also want to check if the new block can be deduplicated...</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960086</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>wsloand</author>
	<datestamp>1257185340000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>But, the design delays when you would have to buy more disk space.  The problem you're referring to is a problem to a specific disk usage scenario.  Not all problems are the same, and if you're planning to mass-edit identical files that are a) large enough to make a meaningful impact on your disk usage and b) being edited in a non-uniform way, then don't use the de-duplication feature or plan ahead.</p></htmltext>
<tokenext>But , the design delays when you would have to buy more disk space .
The problem you 're referring to is a problem to a specific disk usage scenario .
Not all problems are the same , and if you 're planning to mass-edit identical files that are a ) large enough to make a meaningful impact on your disk usage and b ) being edited in a non-uniform way , then do n't use the de-duplication feature or plan ahead .</tokentext>
<sentencetext>But, the design delays when you would have to buy more disk space.
The problem you're referring to is a problem to a specific disk usage scenario.
Not all problems are the same, and if you're planning to mass-edit identical files that are a) large enough to make a meaningful impact on your disk usage and b) being edited in a non-uniform way, then don't use the de-duplication feature or plan ahead.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29967668</id>
	<title>Re:This is good news...</title>
	<author>stefanlasiewski</author>
	<datestamp>1257240480000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>In the case of Deduplication, Open Source has been lagging far behind commercial alternatives. Deduplication has been available from DataDomain, Netapp and other vendors for several years now.</p><p>DataDomains are a great alternative to tape storage. Several tapes were ruined, but I never had a problem retrieving data from a DataDomain.</p><p>With ZFS, maybe I can finally have my cheap Dedup server at home.</p></htmltext>
<tokenext>In the case of Deduplication , Open Source has been lagging far behind commercial alternatives .
Deduplication has been available from DataDomain , Netapp and other vendors for several years now.DataDomains are a great alternative to tape storage .
Several tapes were ruined , but I never had a problem retrieving data from a DataDomain.With ZFS , maybe I can finally have my cheap Dedup server at home .</tokentext>
<sentencetext>In the case of Deduplication, Open Source has been lagging far behind commercial alternatives.
Deduplication has been available from DataDomain, Netapp and other vendors for several years now.DataDomains are a great alternative to tape storage.
Several tapes were ruined, but I never had a problem retrieving data from a DataDomain.With ZFS, maybe I can finally have my cheap Dedup server at home.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957240</id>
	<title>Re:This is good news...</title>
	<author>Anonymous</author>
	<datestamp>1257167820000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>Use open source, get cutting edge things.</p></div><p>Cutting edge is nice for the functionality; unfortunately it more often than not comes with unintended functionality. I like standing back a bit - not too much mind you, but enough to avoid the bleeding edge.</p></div>
	</htmltext>
<tokenext>Use open source , get cutting edge things.Cutting edge is nice for the functionality ; unfortunately it more often than not comes with unintended functionality .
I like standing back a bit - not too much mind you , but enough to avoid the bleeding edge .</tokentext>
<sentencetext>Use open source, get cutting edge things.Cutting edge is nice for the functionality; unfortunately it more often than not comes with unintended functionality.
I like standing back a bit - not too much mind you, but enough to avoid the bleeding edge.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959800</id>
	<title>Re:Hash Collisions</title>
	<author>Anonymous</author>
	<datestamp>1257183240000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>All due respect to Wallace Shawn, just because the chances of something occurring are inconceivably small, that doesn't mean it won't happen.  I don't want there to be "almost no chance" that my recent tax records won't be corrupted by a block of data from a photograph of my recent trip to Bora-bora, I want there to be "no chance".  Luckily, if collisions are going to be rare, the extra investment of a bit-for-bit check is probably not all that expensive for the system to do.</htmltext>
<tokenext>All due respect to Wallace Shawn , just because the chances of something occurring are inconceivably small , that does n't mean it wo n't happen .
I do n't want there to be " almost no chance " that my recent tax records wo n't be corrupted by a block of data from a photograph of my recent trip to Bora-bora , I want there to be " no chance " .
Luckily , if collisions are going to be rare , the extra investment of a bit-for-bit check is probably not all that expensive for the system to do .</tokentext>
<sentencetext>All due respect to Wallace Shawn, just because the chances of something occurring are inconceivably small, that doesn't mean it won't happen.
I don't want there to be "almost no chance" that my recent tax records won't be corrupted by a block of data from a photograph of my recent trip to Bora-bora, I want there to be "no chance".
Luckily, if collisions are going to be rare, the extra investment of a bit-for-bit check is probably not all that expensive for the system to do.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956866</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957694</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Anonymous</author>
	<datestamp>1257169680000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p><div class="quote"><p>Windows Storage Server 2003  (yes, yes I know its from Microsoft) shipped with this feature (that is called Single Instance Storage)<br><a href="http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a" title="technet.com" rel="nofollow">http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a</a> [technet.com] </p></div><p>Not quite. From the above link it works at the file level:</p><p><div class="quote"><p>The files don&rsquo;t need to be on the same folder, have the same name or have the same date, but they do need to be in the same volume, have <strong>exactly the same size</strong> and the <strong>contents of both need to be exactly the same</strong>.</p></div><p>ZFS' dedupe (and similar technologies like NetApp's A-SIS) work a the block level. From one of the leads of ZFS:</p><p><div class="quote"><p>Data can be deduplicated at the level of files, blocks, or bytes.</p><p>File-level assigns a hash signature to an entire file. File-level dedup has the lowest overhead when the natural granularity of data duplication is whole files, but it also has significant limitations: any change to any block in the file requires recomputing the checksum of the whole file, which means that if even one block changes, any space savings is lost because the two versions of the file are no longer identical. This is fine when the expected workload is something like JPEG or MPEG files, but is completely ineffective when managing things like virtual machine images, which are mostly identical but differ in a few blocks.</p><p>Block-level dedup has somewhat higher overhead than file-level dedup when whole files are duplicated, but unlike file-level dedup, it handles block-level data such as virtual machine images extremely well. Most of a VM image is duplicated data -- namely, a copy of the guest operating system -- but some blocks are unique to each VM. With block-level dedup, only the blocks that are unique to each VM consume additional storage space. All other blocks are shared. [...]</p><p>ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system.</p></div><p>http://blogs.sun.com/bonwick/en\_US/entry/zfs\_dedup</p></div>
	</htmltext>
<tokenext>Windows Storage Server 2003 ( yes , yes I know its from Microsoft ) shipped with this feature ( that is called Single Instance Storage ) http : //blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a [ technet.com ] Not quite .
From the above link it works at the file level : The files don    t need to be on the same folder , have the same name or have the same date , but they do need to be in the same volume , have exactly the same size and the contents of both need to be exactly the same.ZFS ' dedupe ( and similar technologies like NetApp 's A-SIS ) work a the block level .
From one of the leads of ZFS : Data can be deduplicated at the level of files , blocks , or bytes.File-level assigns a hash signature to an entire file .
File-level dedup has the lowest overhead when the natural granularity of data duplication is whole files , but it also has significant limitations : any change to any block in the file requires recomputing the checksum of the whole file , which means that if even one block changes , any space savings is lost because the two versions of the file are no longer identical .
This is fine when the expected workload is something like JPEG or MPEG files , but is completely ineffective when managing things like virtual machine images , which are mostly identical but differ in a few blocks.Block-level dedup has somewhat higher overhead than file-level dedup when whole files are duplicated , but unlike file-level dedup , it handles block-level data such as virtual machine images extremely well .
Most of a VM image is duplicated data -- namely , a copy of the guest operating system -- but some blocks are unique to each VM .
With block-level dedup , only the blocks that are unique to each VM consume additional storage space .
All other blocks are shared .
[ ... ] ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system.http : //blogs.sun.com/bonwick/en \ _US/entry/zfs \ _dedup</tokentext>
<sentencetext>Windows Storage Server 2003  (yes, yes I know its from Microsoft) shipped with this feature (that is called Single Instance Storage)http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a [technet.com] Not quite.
From the above link it works at the file level:The files don’t need to be on the same folder, have the same name or have the same date, but they do need to be in the same volume, have exactly the same size and the contents of both need to be exactly the same.ZFS' dedupe (and similar technologies like NetApp's A-SIS) work a the block level.
From one of the leads of ZFS:Data can be deduplicated at the level of files, blocks, or bytes.File-level assigns a hash signature to an entire file.
File-level dedup has the lowest overhead when the natural granularity of data duplication is whole files, but it also has significant limitations: any change to any block in the file requires recomputing the checksum of the whole file, which means that if even one block changes, any space savings is lost because the two versions of the file are no longer identical.
This is fine when the expected workload is something like JPEG or MPEG files, but is completely ineffective when managing things like virtual machine images, which are mostly identical but differ in a few blocks.Block-level dedup has somewhat higher overhead than file-level dedup when whole files are duplicated, but unlike file-level dedup, it handles block-level data such as virtual machine images extremely well.
Most of a VM image is duplicated data -- namely, a copy of the guest operating system -- but some blocks are unique to each VM.
With block-level dedup, only the blocks that are unique to each VM consume additional storage space.
All other blocks are shared.
[...]ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system.http://blogs.sun.com/bonwick/en\_US/entry/zfs\_dedup
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958582</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>seifried</author>
	<datestamp>1257174060000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>It's not just storage, it's about caching in ram. If my Linux box caches say one gig of data that happens to be shared amongst multiple (nearly) identical VM's I will see a huge performance increase vs. trying to cache 20 gigs of data (one for each of the 20 VM's). If it's the exact same data why would I want multiple copies floating around unless I explicitly ask for it (i.e. RAID, time machine, backups, etc.).</htmltext>
<tokenext>It 's not just storage , it 's about caching in ram .
If my Linux box caches say one gig of data that happens to be shared amongst multiple ( nearly ) identical VM 's I will see a huge performance increase vs. trying to cache 20 gigs of data ( one for each of the 20 VM 's ) .
If it 's the exact same data why would I want multiple copies floating around unless I explicitly ask for it ( i.e .
RAID , time machine , backups , etc .
) .</tokentext>
<sentencetext>It's not just storage, it's about caching in ram.
If my Linux box caches say one gig of data that happens to be shared amongst multiple (nearly) identical VM's I will see a huge performance increase vs. trying to cache 20 gigs of data (one for each of the 20 VM's).
If it's the exact same data why would I want multiple copies floating around unless I explicitly ask for it (i.e.
RAID, time machine, backups, etc.
).</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957336</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960628</id>
	<title>Re:Hash Collisions</title>
	<author>Hurricane78</author>
	<datestamp>1257190740000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>ZFS offers error scrubbing and repair. So the likeliness to lose data from a hardware failure goes way down, to nearly zero. (Your HDD would have to fail big time, for it to pose any risk.)</p><p>But I don't think that scrubbing protects from hash collisions. Rather the opposite...</p></htmltext>
<tokenext>ZFS offers error scrubbing and repair .
So the likeliness to lose data from a hardware failure goes way down , to nearly zero .
( Your HDD would have to fail big time , for it to pose any risk .
) But I do n't think that scrubbing protects from hash collisions .
Rather the opposite.. .</tokentext>
<sentencetext>ZFS offers error scrubbing and repair.
So the likeliness to lose data from a hardware failure goes way down, to nearly zero.
(Your HDD would have to fail big time, for it to pose any risk.
)But I don't think that scrubbing protects from hash collisions.
Rather the opposite...</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29967492</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Big Boss</author>
	<datestamp>1257239580000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I built a new server when I installed OpenSolaris, so I just booted the new server and used NFS to copy the data over. It worked really well. raidz is working really well for me, you should set up a test array and try it out. Even as file devices just for testing.</p></htmltext>
<tokenext>I built a new server when I installed OpenSolaris , so I just booted the new server and used NFS to copy the data over .
It worked really well .
raidz is working really well for me , you should set up a test array and try it out .
Even as file devices just for testing .</tokentext>
<sentencetext>I built a new server when I installed OpenSolaris, so I just booted the new server and used NFS to copy the data over.
It worked really well.
raidz is working really well for me, you should set up a test array and try it out.
Even as file devices just for testing.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960186</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961332</id>
	<title>Re:This is good news...</title>
	<author>Anonymous</author>
	<datestamp>1257243840000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>ZFS is under "license problems" since the license its under is incompatible with GPL.<br>And its troublesome<nobr> <wbr></nobr>:(</p></htmltext>
<tokenext>ZFS is under " license problems " since the license its under is incompatible with GPL.And its troublesome : (</tokentext>
<sentencetext>ZFS is under "license problems" since the license its under is incompatible with GPL.And its troublesome :(</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29964336</id>
	<title>I'm super happy about this!</title>
	<author>Mysticalfruit</author>
	<datestamp>1257267840000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>As someone whose got a HUGE amount of data currently in ZFS (and a lot of it is redudant!) I can't wait to get my hands on this!  I figure along on my backup server it's going to save me 10's of TB's worth of space.<br><br>I just wish there was more details on what release of Open Solaris or Solaris this is going to be in, or patch sets that'll include this!</htmltext>
<tokenext>As someone whose got a HUGE amount of data currently in ZFS ( and a lot of it is redudant !
) I ca n't wait to get my hands on this !
I figure along on my backup server it 's going to save me 10 's of TB 's worth of space.I just wish there was more details on what release of Open Solaris or Solaris this is going to be in , or patch sets that 'll include this !</tokentext>
<sentencetext>As someone whose got a HUGE amount of data currently in ZFS (and a lot of it is redudant!
) I can't wait to get my hands on this!
I figure along on my backup server it's going to save me 10's of TB's worth of space.I just wish there was more details on what release of Open Solaris or Solaris this is going to be in, or patch sets that'll include this!</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957780</id>
	<title>SAN, ZFS with dedupe is not a backup system</title>
	<author>caseih</author>
	<datestamp>1257169980000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Don't mistake in-filesystem deduplication and snapshots for a backup system.  It's most certainly not backup and if you treat it as such you will eventually be very sorry.  A SAN with ZFS, snapshots, and deduplication features is at best an archive, which is distinct in form and purpose from a backup.  Still very useful, though.  Ideally you have both archive and backup systems.   To get a feel for the difference, consider that an archive is for when a user says, "I overwrote a file last week sometime.  Can you recover the version before I made this change or saved over this file?"  Whereas a backup is for recovering an entire system from when there's a catastrophic failure (like a SAN dying).  Very distinct things.  Both are useful.</p><p>I get strange looks when I tell people that a Time Capsule is not a backup.  Nor is a single Time Machine external disk.  Now 2, 3 or even 4 external disks could constitute a backup (and as a bonus with Time Machine an archive also).</p></htmltext>
<tokenext>Do n't mistake in-filesystem deduplication and snapshots for a backup system .
It 's most certainly not backup and if you treat it as such you will eventually be very sorry .
A SAN with ZFS , snapshots , and deduplication features is at best an archive , which is distinct in form and purpose from a backup .
Still very useful , though .
Ideally you have both archive and backup systems .
To get a feel for the difference , consider that an archive is for when a user says , " I overwrote a file last week sometime .
Can you recover the version before I made this change or saved over this file ?
" Whereas a backup is for recovering an entire system from when there 's a catastrophic failure ( like a SAN dying ) .
Very distinct things .
Both are useful.I get strange looks when I tell people that a Time Capsule is not a backup .
Nor is a single Time Machine external disk .
Now 2 , 3 or even 4 external disks could constitute a backup ( and as a bonus with Time Machine an archive also ) .</tokentext>
<sentencetext>Don't mistake in-filesystem deduplication and snapshots for a backup system.
It's most certainly not backup and if you treat it as such you will eventually be very sorry.
A SAN with ZFS, snapshots, and deduplication features is at best an archive, which is distinct in form and purpose from a backup.
Still very useful, though.
Ideally you have both archive and backup systems.
To get a feel for the difference, consider that an archive is for when a user says, "I overwrote a file last week sometime.
Can you recover the version before I made this change or saved over this file?
"  Whereas a backup is for recovering an entire system from when there's a catastrophic failure (like a SAN dying).
Very distinct things.
Both are useful.I get strange looks when I tell people that a Time Capsule is not a backup.
Nor is a single Time Machine external disk.
Now 2, 3 or even 4 external disks could constitute a backup (and as a bonus with Time Machine an archive also).</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956810</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>jack2000</author>
	<datestamp>1257165480000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Meet NTFS, it has this thing named SiS for Single Instance Storage.
There's a service known as the SiS groveller, it scans your files and links them if they are duplicate, it does that for parts of your files aswell.</htmltext>
<tokenext>Meet NTFS , it has this thing named SiS for Single Instance Storage .
There 's a service known as the SiS groveller , it scans your files and links them if they are duplicate , it does that for parts of your files aswell .</tokentext>
<sentencetext>Meet NTFS, it has this thing named SiS for Single Instance Storage.
There's a service known as the SiS groveller, it scans your files and links them if they are duplicate, it does that for parts of your files aswell.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676</id>
	<title>This is good news...</title>
	<author>Anonymous</author>
	<datestamp>1257164700000</datestamp>
	<modclass>Offtopic</modclass>
	<modscore>1</modscore>
	<htmltext><p>...and would normally make me happy; except I'm a Mac user. Still good news, but could've been better for a certain sub-set of the population, darn it. </p><p>File systems are one area where computer technology is lagging, comparatively speaking, so good to see innovation such as this.</p></htmltext>
<tokenext>...and would normally make me happy ; except I 'm a Mac user .
Still good news , but could 've been better for a certain sub-set of the population , darn it .
File systems are one area where computer technology is lagging , comparatively speaking , so good to see innovation such as this .</tokentext>
<sentencetext>...and would normally make me happy; except I'm a Mac user.
Still good news, but could've been better for a certain sub-set of the population, darn it.
File systems are one area where computer technology is lagging, comparatively speaking, so good to see innovation such as this.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957970</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>drsmithy</author>
	<datestamp>1257170880000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p> <i>I'd certainly expect that. I don't quite get what people are so desperate to de-duplicate anyway. A stripped VM os image is less than a gigabyte, you can fit 150 of them on a drive that costs less than $100. </i>
</p><p>Firstly, because dedup gives you the space savings without the hassle of "stripping" the VM image.
<br>Secondly, because dedup also delivers other advantages by reducing physical disk IOs, improving cache efficiency and reducing replication traffic.
<br>Thirdly, because enterprise storage costs a lot more than that, especially once you account for backups.
</p><p> <i>I can't really see many situations where the extra complexity and cost would end up actually saving money.</i>
</p><p>NetApp have quite a few white papers and blogs.  The most high profile winner is virtualisation, of course, but things like SAN-booted OS images, mailboxes, backups and data replication also see huge benefits.</p></htmltext>
<tokenext>I 'd certainly expect that .
I do n't quite get what people are so desperate to de-duplicate anyway .
A stripped VM os image is less than a gigabyte , you can fit 150 of them on a drive that costs less than $ 100 .
Firstly , because dedup gives you the space savings without the hassle of " stripping " the VM image .
Secondly , because dedup also delivers other advantages by reducing physical disk IOs , improving cache efficiency and reducing replication traffic .
Thirdly , because enterprise storage costs a lot more than that , especially once you account for backups .
I ca n't really see many situations where the extra complexity and cost would end up actually saving money .
NetApp have quite a few white papers and blogs .
The most high profile winner is virtualisation , of course , but things like SAN-booted OS images , mailboxes , backups and data replication also see huge benefits .</tokentext>
<sentencetext> I'd certainly expect that.
I don't quite get what people are so desperate to de-duplicate anyway.
A stripped VM os image is less than a gigabyte, you can fit 150 of them on a drive that costs less than $100.
Firstly, because dedup gives you the space savings without the hassle of "stripping" the VM image.
Secondly, because dedup also delivers other advantages by reducing physical disk IOs, improving cache efficiency and reducing replication traffic.
Thirdly, because enterprise storage costs a lot more than that, especially once you account for backups.
I can't really see many situations where the extra complexity and cost would end up actually saving money.
NetApp have quite a few white papers and blogs.
The most high profile winner is virtualisation, of course, but things like SAN-booted OS images, mailboxes, backups and data replication also see huge benefits.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957336</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961068</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>bertok</author>
	<datestamp>1257239760000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>Are there any other filesystems with that feature?  If not, I'm very strongly considering writing my own.</p></div><p>I was actually thinking the same kind of thing a few years back, but I did some back-of-the-envelope maths and realized that a de-dupe filesystem is actually quite hard to implement.</p><p>A naive implementation is simple, but <i>slow</i>. The issue is that the hash codes are basically random, so you have to store all of them in memory, or suffer horrendously expensive random disk lookups, which can't be cached easily.</p><p>Imagine this scenario: If you use SHA-256, then that's 32 bytes per has code, minimum. If you take a single 2TB SATA disk, and carve it up into (relatively large) 64 KB blocks, then you have 16M blocks, or 512MB of raw hash code data that you have to keep in RAM, all at once, <i>ignoring overheads</i>, which are substantial. In practice, expect that to be more like 1 or 2GB. Sure, that's only 0.1\% of the original disk capacity, but that's just <i>one disk</i>! A SUN thumper has 48 SATA disks in a single chassis, or about 80 TB usable after overheads, which adds up to at least 40 GB of hash code data, or more like 80-100 GB for a typical naive implementation. That's a lot of data to be keeping in the kernel, and would require 128GB of physical memory in the server if you also wanted some room for file data caches and whatnot.</p><p>Real world de-dupe filers often use several fancy algorithms at once to reduce effective RAM requirements, but it takes a lot of work. For example, some filers use hierarchical hashes, others use <a href="http://en.wikipedia.org/wiki/Bloom\_filter" title="wikipedia.org">Bloom Filters</a> [wikipedia.org], and I've heard of filers that partition the hashtable and use file identification heuristics to load likely partitions on demand.</p></div>
	</htmltext>
<tokenext>Are there any other filesystems with that feature ?
If not , I 'm very strongly considering writing my own.I was actually thinking the same kind of thing a few years back , but I did some back-of-the-envelope maths and realized that a de-dupe filesystem is actually quite hard to implement.A naive implementation is simple , but slow .
The issue is that the hash codes are basically random , so you have to store all of them in memory , or suffer horrendously expensive random disk lookups , which ca n't be cached easily.Imagine this scenario : If you use SHA-256 , then that 's 32 bytes per has code , minimum .
If you take a single 2TB SATA disk , and carve it up into ( relatively large ) 64 KB blocks , then you have 16M blocks , or 512MB of raw hash code data that you have to keep in RAM , all at once , ignoring overheads , which are substantial .
In practice , expect that to be more like 1 or 2GB .
Sure , that 's only 0.1 \ % of the original disk capacity , but that 's just one disk !
A SUN thumper has 48 SATA disks in a single chassis , or about 80 TB usable after overheads , which adds up to at least 40 GB of hash code data , or more like 80-100 GB for a typical naive implementation .
That 's a lot of data to be keeping in the kernel , and would require 128GB of physical memory in the server if you also wanted some room for file data caches and whatnot.Real world de-dupe filers often use several fancy algorithms at once to reduce effective RAM requirements , but it takes a lot of work .
For example , some filers use hierarchical hashes , others use Bloom Filters [ wikipedia.org ] , and I 've heard of filers that partition the hashtable and use file identification heuristics to load likely partitions on demand .</tokentext>
<sentencetext>Are there any other filesystems with that feature?
If not, I'm very strongly considering writing my own.I was actually thinking the same kind of thing a few years back, but I did some back-of-the-envelope maths and realized that a de-dupe filesystem is actually quite hard to implement.A naive implementation is simple, but slow.
The issue is that the hash codes are basically random, so you have to store all of them in memory, or suffer horrendously expensive random disk lookups, which can't be cached easily.Imagine this scenario: If you use SHA-256, then that's 32 bytes per has code, minimum.
If you take a single 2TB SATA disk, and carve it up into (relatively large) 64 KB blocks, then you have 16M blocks, or 512MB of raw hash code data that you have to keep in RAM, all at once, ignoring overheads, which are substantial.
In practice, expect that to be more like 1 or 2GB.
Sure, that's only 0.1\% of the original disk capacity, but that's just one disk!
A SUN thumper has 48 SATA disks in a single chassis, or about 80 TB usable after overheads, which adds up to at least 40 GB of hash code data, or more like 80-100 GB for a typical naive implementation.
That's a lot of data to be keeping in the kernel, and would require 128GB of physical memory in the server if you also wanted some room for file data caches and whatnot.Real world de-dupe filers often use several fancy algorithms at once to reduce effective RAM requirements, but it takes a lot of work.
For example, some filers use hierarchical hashes, others use Bloom Filters [wikipedia.org], and I've heard of filers that partition the hashtable and use file identification heuristics to load likely partitions on demand.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956686</id>
	<title>First posts!</title>
	<author>Anonymous</author>
	<datestamp>1257164700000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>I wrote two first posts, but I guess<nobr> <wbr></nobr>/. is on ZFS now.</p></htmltext>
<tokenext>I wrote two first posts , but I guess / .
is on ZFS now .</tokentext>
<sentencetext>I wrote two first posts, but I guess /.
is on ZFS now.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958186</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>644bd346996</author>
	<datestamp>1257172020000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I use hard links frequently on my NTFS filesystem (albeit created from within cygwin bash). NTFS also supports symbolic links and mount points these days, although Microsoft clearly has no interest in exposing those features to consumers.</p></htmltext>
<tokenext>I use hard links frequently on my NTFS filesystem ( albeit created from within cygwin bash ) .
NTFS also supports symbolic links and mount points these days , although Microsoft clearly has no interest in exposing those features to consumers .</tokentext>
<sentencetext>I use hard links frequently on my NTFS filesystem (albeit created from within cygwin bash).
NTFS also supports symbolic links and mount points these days, although Microsoft clearly has no interest in exposing those features to consumers.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957456</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958578</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Drishmung</author>
	<datestamp>1257174060000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>And a 255 byte filename limit. Not 255 unicode characters, 255 bytes. ReiserFS got this right. Btrfs alas gets it wrong. (Just call me picky)</htmltext>
<tokenext>And a 255 byte filename limit .
Not 255 unicode characters , 255 bytes .
ReiserFS got this right .
Btrfs alas gets it wrong .
( Just call me picky )</tokentext>
<sentencetext>And a 255 byte filename limit.
Not 255 unicode characters, 255 bytes.
ReiserFS got this right.
Btrfs alas gets it wrong.
(Just call me picky)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956786</id>
	<title>Re:Hash Collisions</title>
	<author>Anonymous</author>
	<datestamp>1257165360000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>If blocks that are supposedly from different files have the same block data, does it really matter if it's marked redundant?</p><p>Not only that, do you really think a SHA256 hash collision can occur?  And even if it does, for the sake of CPU time, a hash table is made for a quick check rather than checking every piece of data from the to be written and already available data to see if there is a copy in situations as this.  If somehow they have the same hash, it SHOULD be checked to see if it is the same data byte by byte, THEN marked redundant.</p></htmltext>
<tokenext>If blocks that are supposedly from different files have the same block data , does it really matter if it 's marked redundant ? Not only that , do you really think a SHA256 hash collision can occur ?
And even if it does , for the sake of CPU time , a hash table is made for a quick check rather than checking every piece of data from the to be written and already available data to see if there is a copy in situations as this .
If somehow they have the same hash , it SHOULD be checked to see if it is the same data byte by byte , THEN marked redundant .</tokentext>
<sentencetext>If blocks that are supposedly from different files have the same block data, does it really matter if it's marked redundant?Not only that, do you really think a SHA256 hash collision can occur?
And even if it does, for the sake of CPU time, a hash table is made for a quick check rather than checking every piece of data from the to be written and already available data to see if there is a copy in situations as this.
If somehow they have the same hash, it SHOULD be checked to see if it is the same data byte by byte, THEN marked redundant.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957752</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>PRMan</author>
	<datestamp>1257169860000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>It would be great for ISPs, where each of their user instances have files in common.  Also, for a backup drive for user PCs, where each user has the OS and probably a lot of documents in common.</htmltext>
<tokenext>It would be great for ISPs , where each of their user instances have files in common .
Also , for a backup drive for user PCs , where each user has the OS and probably a lot of documents in common .</tokentext>
<sentencetext>It would be great for ISPs, where each of their user instances have files in common.
Also, for a backup drive for user PCs, where each user has the OS and probably a lot of documents in common.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957336</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956760</id>
	<title>Re:Hash Collisions</title>
	<author>Score Whore</author>
	<datestamp>1257165180000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Yeah. If you are concerned by the fact that a block might be 128 KB and the hashed value is only 256 bits, then an option like:</p><blockquote><div><p>zfs set dedup=verify tank</p></div></blockquote><p>Might be helpful.</p></div>
	</htmltext>
<tokenext>Yeah .
If you are concerned by the fact that a block might be 128 KB and the hashed value is only 256 bits , then an option like : zfs set dedup = verify tankMight be helpful .</tokentext>
<sentencetext>Yeah.
If you are concerned by the fact that a block might be 128 KB and the hashed value is only 256 bits, then an option like:zfs set dedup=verify tankMight be helpful.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962098</id>
	<title>Re:Hash Collisions</title>
	<author>samjam</author>
	<datestamp>1257254040000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>And when you've got 10^6 customers (and your sales people REALLY want to make it come true) or customers with 10^6 more files than most, it gets quite likely that a few of them are going to get "strange corruptions" which:<br>1) you won't be able to detect the cause of<br>2) everybody will think is bad memory/cables/software<br>but really it will be your fault.</p><p>10^6 is a small number.</p><p>I caught someone using MD5 instead of RC5 to "encrypt" personal database keys once; not only were the chances of collision less that what you cite, the harm from collision was minimal (it was statistical research data for trend recognition) but they real key had less bits than MD5 output, so I think there was not actually any collision at all.</p><p>Sam</p></htmltext>
<tokenext>And when you 've got 10 ^ 6 customers ( and your sales people REALLY want to make it come true ) or customers with 10 ^ 6 more files than most , it gets quite likely that a few of them are going to get " strange corruptions " which : 1 ) you wo n't be able to detect the cause of2 ) everybody will think is bad memory/cables/softwarebut really it will be your fault.10 ^ 6 is a small number.I caught someone using MD5 instead of RC5 to " encrypt " personal database keys once ; not only were the chances of collision less that what you cite , the harm from collision was minimal ( it was statistical research data for trend recognition ) but they real key had less bits than MD5 output , so I think there was not actually any collision at all.Sam</tokentext>
<sentencetext>And when you've got 10^6 customers (and your sales people REALLY want to make it come true) or customers with 10^6 more files than most, it gets quite likely that a few of them are going to get "strange corruptions" which:1) you won't be able to detect the cause of2) everybody will think is bad memory/cables/softwarebut really it will be your fault.10^6 is a small number.I caught someone using MD5 instead of RC5 to "encrypt" personal database keys once; not only were the chances of collision less that what you cite, the harm from collision was minimal (it was statistical research data for trend recognition) but they real key had less bits than MD5 output, so I think there was not actually any collision at all.Sam</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29965698</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Anonymous</author>
	<datestamp>1257273660000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I am building a new home NAS and I have been seriously considering zfs. However, I most likely won't go this route because there exist zfs failures that are catastrophic.<br><a href="http://www.opensolaris.org/jive/thread.jspa?threadID=108213&amp;tstart=0" title="opensolaris.org" rel="nofollow">http://www.opensolaris.org/jive/thread.jspa?threadID=108213&amp;tstart=0</a> [opensolaris.org]</p><p>I really don't want to loose all my data in the filesystem because the machine locked up at the wrong time. I may reconsider zfs once automated recovery tools become available.</p></htmltext>
<tokenext>I am building a new home NAS and I have been seriously considering zfs .
However , I most likely wo n't go this route because there exist zfs failures that are catastrophic.http : //www.opensolaris.org/jive/thread.jspa ? threadID = 108213&amp;tstart = 0 [ opensolaris.org ] I really do n't want to loose all my data in the filesystem because the machine locked up at the wrong time .
I may reconsider zfs once automated recovery tools become available .</tokentext>
<sentencetext>I am building a new home NAS and I have been seriously considering zfs.
However, I most likely won't go this route because there exist zfs failures that are catastrophic.http://www.opensolaris.org/jive/thread.jspa?threadID=108213&amp;tstart=0 [opensolaris.org]I really don't want to loose all my data in the filesystem because the machine locked up at the wrong time.
I may reconsider zfs once automated recovery tools become available.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</id>
	<title>Wake me when they build it into the hard disk</title>
	<author>Anonymous</author>
	<datestamp>1257165480000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>4</modscore>
	<htmltext><p>Imagine he amount of stuff you could (unreliably) store on a hard disk if massive de-duplication was built into the drive electronics.  It could even do this quietly in the background.</p><p>I say unreliably, because years ago we had a Novell server that used an automated compression scheme.  Eventually, the drive got full anyway, and we  had to migrate to a larger disk.</p><p>But since the copy operation de-compressed files on the fly we couldn't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive.  What ensued was a nightmare of copy and delete files beginning with the smallest, and working our way up to the largest.  It took over a day of manual effort before we freed up enough space to mass-move the remaining files.</p><p>De-duplication is pretty much the same thing, compression by recording and eliminating duplicates.  But any minor automated update of some files runs the risk of changing them such that what was a duplicate, must now be stored separately.</p><p>This could trigger a similar situation where there was suddenly not enough room to store the same amount of data that was already on the device.  (For some values of "suddenly" and "already").</p><p>For archival stuff or OS components (executables, and source code etc) which virtually never change this would be great.</p><p>But there is a hell to pay somewhere down the road.</p></htmltext>
<tokenext>Imagine he amount of stuff you could ( unreliably ) store on a hard disk if massive de-duplication was built into the drive electronics .
It could even do this quietly in the background.I say unreliably , because years ago we had a Novell server that used an automated compression scheme .
Eventually , the drive got full anyway , and we had to migrate to a larger disk.But since the copy operation de-compressed files on the fly we could n't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive .
What ensued was a nightmare of copy and delete files beginning with the smallest , and working our way up to the largest .
It took over a day of manual effort before we freed up enough space to mass-move the remaining files.De-duplication is pretty much the same thing , compression by recording and eliminating duplicates .
But any minor automated update of some files runs the risk of changing them such that what was a duplicate , must now be stored separately.This could trigger a similar situation where there was suddenly not enough room to store the same amount of data that was already on the device .
( For some values of " suddenly " and " already " ) .For archival stuff or OS components ( executables , and source code etc ) which virtually never change this would be great.But there is a hell to pay somewhere down the road .</tokentext>
<sentencetext>Imagine he amount of stuff you could (unreliably) store on a hard disk if massive de-duplication was built into the drive electronics.
It could even do this quietly in the background.I say unreliably, because years ago we had a Novell server that used an automated compression scheme.
Eventually, the drive got full anyway, and we  had to migrate to a larger disk.But since the copy operation de-compressed files on the fly we couldn't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive.
What ensued was a nightmare of copy and delete files beginning with the smallest, and working our way up to the largest.
It took over a day of manual effort before we freed up enough space to mass-move the remaining files.De-duplication is pretty much the same thing, compression by recording and eliminating duplicates.
But any minor automated update of some files runs the risk of changing them such that what was a duplicate, must now be stored separately.This could trigger a similar situation where there was suddenly not enough room to store the same amount of data that was already on the device.
(For some values of "suddenly" and "already").For archival stuff or OS components (executables, and source code etc) which virtually never change this would be great.But there is a hell to pay somewhere down the road.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963514</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>JoeMerchant</author>
	<datestamp>1257264180000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p> the disk was thrashed like crazy</p></div><p>Isn't that the sign of an "advanced" OS?  Each new version of Windows has progressively thrashed my hard drive more until I finally got Vista Ultimate and now the hard drive never stops.</p></div>
	</htmltext>
<tokenext>the disk was thrashed like crazyIs n't that the sign of an " advanced " OS ?
Each new version of Windows has progressively thrashed my hard drive more until I finally got Vista Ultimate and now the hard drive never stops .</tokentext>
<sentencetext> the disk was thrashed like crazyIsn't that the sign of an "advanced" OS?
Each new version of Windows has progressively thrashed my hard drive more until I finally got Vista Ultimate and now the hard drive never stops.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958518</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770</id>
	<title>Re:This is good news...</title>
	<author>Anonymous</author>
	<datestamp>1257165300000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>4</modscore>
	<htmltext><blockquote><div><p>...and would normally make me happy; except I'm a Mac user. Still good news, but could've been better for a certain sub-set of the population, darn it.</p></div></blockquote><p>Use open source, get cutting edge things.</p></div>
	</htmltext>
<tokenext>...and would normally make me happy ; except I 'm a Mac user .
Still good news , but could 've been better for a certain sub-set of the population , darn it.Use open source , get cutting edge things .</tokentext>
<sentencetext>...and would normally make me happy; except I'm a Mac user.
Still good news, but could've been better for a certain sub-set of the population, darn it.Use open source, get cutting edge things.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958598</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Anonymous</author>
	<datestamp>1257174180000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>3</modscore>
	<htmltext><p>How alarmist and uninformed; borderline FUD.  The reality is as follows...</p><p>First, you can't remove a vdev yet, but development is in progress, and support is expected very soon now.  Same with crypto.</p><p>Second, mistakenly typing add instead of attach will result in a warning that the specified redundancy is different, and refuse to add it.</p><p>Third, yes, you can't expand the width of a RAID-Z.  You can still grow it though, by replacing it with larger drives.  Once the block pointer rewrite work is merged, removal will be possible, and expansion won't be far off either.</p><p>Forth, vdevs no longer autoexpand by default.  If you want that behavior, you can to set the autoexpand property to yes.</p><p>Last, there was no such assumption, it is simply a matter of priorities.  If it were an easier problem, it would have been done long ago, but I'm happy to be patient, knowing that it will be done right.  Most everyone who has seriously used ZFS will understand that the advantages will hugely outweigh these minor nits, which are easily worked around.</p></htmltext>
<tokenext>How alarmist and uninformed ; borderline FUD .
The reality is as follows...First , you ca n't remove a vdev yet , but development is in progress , and support is expected very soon now .
Same with crypto.Second , mistakenly typing add instead of attach will result in a warning that the specified redundancy is different , and refuse to add it.Third , yes , you ca n't expand the width of a RAID-Z .
You can still grow it though , by replacing it with larger drives .
Once the block pointer rewrite work is merged , removal will be possible , and expansion wo n't be far off either.Forth , vdevs no longer autoexpand by default .
If you want that behavior , you can to set the autoexpand property to yes.Last , there was no such assumption , it is simply a matter of priorities .
If it were an easier problem , it would have been done long ago , but I 'm happy to be patient , knowing that it will be done right .
Most everyone who has seriously used ZFS will understand that the advantages will hugely outweigh these minor nits , which are easily worked around .</tokentext>
<sentencetext>How alarmist and uninformed; borderline FUD.
The reality is as follows...First, you can't remove a vdev yet, but development is in progress, and support is expected very soon now.
Same with crypto.Second, mistakenly typing add instead of attach will result in a warning that the specified redundancy is different, and refuse to add it.Third, yes, you can't expand the width of a RAID-Z.
You can still grow it though, by replacing it with larger drives.
Once the block pointer rewrite work is merged, removal will be possible, and expansion won't be far off either.Forth, vdevs no longer autoexpand by default.
If you want that behavior, you can to set the autoexpand property to yes.Last, there was no such assumption, it is simply a matter of priorities.
If it were an easier problem, it would have been done long ago, but I'm happy to be patient, knowing that it will be done right.
Most everyone who has seriously used ZFS will understand that the advantages will hugely outweigh these minor nits, which are easily worked around.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29973000</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>ZorkZero</author>
	<datestamp>1257265320000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>What you seem to be describing is file-level deduplication, which is not what is being described here.</p></htmltext>
<tokenext>What you seem to be describing is file-level deduplication , which is not what is being described here .</tokentext>
<sentencetext>What you seem to be describing is file-level deduplication, which is not what is being described here.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958320</id>
	<title>Cause and effect</title>
	<author>scanrate</author>
	<datestamp>1257172740000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>And the modification of a duplicate block will generate not only a copy-on-write fault but also a law suit by whoever owns the COW patent.</p></htmltext>
<tokenext>And the modification of a duplicate block will generate not only a copy-on-write fault but also a law suit by whoever owns the COW patent .</tokentext>
<sentencetext>And the modification of a duplicate block will generate not only a copy-on-write fault but also a law suit by whoever owns the COW patent.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958516</id>
	<title>Re:Open Source Cures Cancer</title>
	<author>Anonymous</author>
	<datestamp>1257173700000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>You can run lots of windows software including photoshop, excel, autocad, and many games using Wine.</p></htmltext>
<tokenext>You can run lots of windows software including photoshop , excel , autocad , and many games using Wine .</tokentext>
<sentencetext>You can run lots of windows software including photoshop, excel, autocad, and many games using Wine.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957442</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960288</id>
	<title>Re:I worked with De-duplication</title>
	<author>Anonymous</author>
	<datestamp>1257187260000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>You need to read the posts way up above that show the math for the probability. Given the advantages that de-dupe offers, it's nearly always worth it.</p></htmltext>
<tokenext>You need to read the posts way up above that show the math for the probability .
Given the advantages that de-dupe offers , it 's nearly always worth it .</tokentext>
<sentencetext>You need to read the posts way up above that show the math for the probability.
Given the advantages that de-dupe offers, it's nearly always worth it.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957772</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29988350</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>duffbeer703</author>
	<datestamp>1256994240000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Another example: Your company gets sued and you have to capture user data from 200 computers. The lawyers do their thing a year later and tell you that you can throw out data from 150 of them. Now you have 10TB of empty SAN that costs you $27/GB/mo.</p></htmltext>
<tokenext>Another example : Your company gets sued and you have to capture user data from 200 computers .
The lawyers do their thing a year later and tell you that you can throw out data from 150 of them .
Now you have 10TB of empty SAN that costs you $ 27/GB/mo .</tokentext>
<sentencetext>Another example: Your company gets sued and you have to capture user data from 200 computers.
The lawyers do their thing a year later and tell you that you can throw out data from 150 of them.
Now you have 10TB of empty SAN that costs you $27/GB/mo.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959644</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>iMaple</author>
	<datestamp>1257165480000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>5</modscore>
	<htmltext><p>Windows Storage Server 2003  (yes, yes I know its from Microsoft) shipped with this feature (that is called Single Instance Storage)<br><a href="http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a" title="technet.com">http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a</a> [technet.com]</p></htmltext>
<tokenext>Windows Storage Server 2003 ( yes , yes I know its from Microsoft ) shipped with this feature ( that is called Single Instance Storage ) http : //blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a [ technet.com ]</tokentext>
<sentencetext>Windows Storage Server 2003  (yes, yes I know its from Microsoft) shipped with this feature (that is called Single Instance Storage)http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a [technet.com]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956906</id>
	<title>Re:This is good news...</title>
	<author>MBCook</author>
	<datestamp>1257166080000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>It's neat. I can see it being rather useful for our systems at work to de-duplicate our VMs (and perhaps our DB files, since we have replicated slaves). Network storage (where multiple users may have their own copies of static documents that they've never edited) could benefit, perhaps email storage as well.
</p><p>Personally though, I don't think there is too much on my hard drive that would benefit from this. I would <i>love</i> for OS X to get the built in checksumming that ZFS has so it can detect silent corruption that may have happened during a bad boot/power loss etc when I try to read the file later.
</p><p>It's pretty obvious that HFS+ will have to be replaced soon, and Apple is reportedly working on it (since they ditched ZFS). I'd really like the checksumming, at this point (having so much cheap storage and extra CPU cycles) it should be a gimme.</p></htmltext>
<tokenext>It 's neat .
I can see it being rather useful for our systems at work to de-duplicate our VMs ( and perhaps our DB files , since we have replicated slaves ) .
Network storage ( where multiple users may have their own copies of static documents that they 've never edited ) could benefit , perhaps email storage as well .
Personally though , I do n't think there is too much on my hard drive that would benefit from this .
I would love for OS X to get the built in checksumming that ZFS has so it can detect silent corruption that may have happened during a bad boot/power loss etc when I try to read the file later .
It 's pretty obvious that HFS + will have to be replaced soon , and Apple is reportedly working on it ( since they ditched ZFS ) .
I 'd really like the checksumming , at this point ( having so much cheap storage and extra CPU cycles ) it should be a gim me .</tokentext>
<sentencetext>It's neat.
I can see it being rather useful for our systems at work to de-duplicate our VMs (and perhaps our DB files, since we have replicated slaves).
Network storage (where multiple users may have their own copies of static documents that they've never edited) could benefit, perhaps email storage as well.
Personally though, I don't think there is too much on my hard drive that would benefit from this.
I would love for OS X to get the built in checksumming that ZFS has so it can detect silent corruption that may have happened during a bad boot/power loss etc when I try to read the file later.
It's pretty obvious that HFS+ will have to be replaced soon, and Apple is reportedly working on it (since they ditched ZFS).
I'd really like the checksumming, at this point (having so much cheap storage and extra CPU cycles) it should be a gimme.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29967664</id>
	<title>Re:BTRFS is better</title>
	<author>Anonymous</author>
	<datestamp>1257240480000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>"putting aside some of the enterprise features that most of us don't need" - Oh you need them, you just very likely don't ever <em>see</em> them.  I work for the largest storage vendor on the planet and your bank account lives on enterprise storage as do your medical records, insurance, etc.<br><br><br>I really look forward to seeing btrfs and other ideas like <a href="http://www.lessfs.com/" title="lessfs.com" rel="nofollow">lessfs</a> [lessfs.com] move forward and hope that people will be amazed at what they're actually doing because it is really, <em>really</em> hard to get this stuff right so that you get your data back.  Deduplication introduces brain-hurting complexities and challenges.</p></htmltext>
<tokenext>" putting aside some of the enterprise features that most of us do n't need " - Oh you need them , you just very likely do n't ever see them .
I work for the largest storage vendor on the planet and your bank account lives on enterprise storage as do your medical records , insurance , etc.I really look forward to seeing btrfs and other ideas like lessfs [ lessfs.com ] move forward and hope that people will be amazed at what they 're actually doing because it is really , really hard to get this stuff right so that you get your data back .
Deduplication introduces brain-hurting complexities and challenges .</tokentext>
<sentencetext>"putting aside some of the enterprise features that most of us don't need" - Oh you need them, you just very likely don't ever see them.
I work for the largest storage vendor on the planet and your bank account lives on enterprise storage as do your medical records, insurance, etc.I really look forward to seeing btrfs and other ideas like lessfs [lessfs.com] move forward and hope that people will be amazed at what they're actually doing because it is really, really hard to get this stuff right so that you get your data back.
Deduplication introduces brain-hurting complexities and challenges.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957690</id>
	<title>well ...</title>
	<author>wsanders</author>
	<datestamp>1257169680000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>There are enough tales of woe in the discussion groups of ZFS file systems that have melted down on people that I would not start shorting the midrange storage companies stock just yet. I myself have an 18TB ZFS filesystem on a X4540 and it was brought to a standstill a few weeks ago by one dead SATA disk. Didn't lose any data, and it might be buggy hardware and drivers, but still, Sun support had no explanation. That should not happen!</p><p>I'm still a ZFS fanboy though - for about $1 per GB how can you lose. The host is a backup / virtual tape library server so it's not super high availability, and it's hella fast. No problem stuffing data into it at 2 X 1000baseT wire speed.</p></htmltext>
<tokenext>There are enough tales of woe in the discussion groups of ZFS file systems that have melted down on people that I would not start shorting the midrange storage companies stock just yet .
I myself have an 18TB ZFS filesystem on a X4540 and it was brought to a standstill a few weeks ago by one dead SATA disk .
Did n't lose any data , and it might be buggy hardware and drivers , but still , Sun support had no explanation .
That should not happen ! I 'm still a ZFS fanboy though - for about $ 1 per GB how can you lose .
The host is a backup / virtual tape library server so it 's not super high availability , and it 's hella fast .
No problem stuffing data into it at 2 X 1000baseT wire speed .</tokentext>
<sentencetext>There are enough tales of woe in the discussion groups of ZFS file systems that have melted down on people that I would not start shorting the midrange storage companies stock just yet.
I myself have an 18TB ZFS filesystem on a X4540 and it was brought to a standstill a few weeks ago by one dead SATA disk.
Didn't lose any data, and it might be buggy hardware and drivers, but still, Sun support had no explanation.
That should not happen!I'm still a ZFS fanboy though - for about $1 per GB how can you lose.
The host is a backup / virtual tape library server so it's not super high availability, and it's hella fast.
No problem stuffing data into it at 2 X 1000baseT wire speed.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29965556</id>
	<title>Re:There are three types of files.</title>
	<author>davecb</author>
	<datestamp>1257273060000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>There are also "brinch hansen" files, named after Per Brinch Hansen, who implemented them in (from memory) the R2000 OS.  They were write-once, read-once, and they disappeared "soon" after the reader read the block.

</p><p>They were used for large-scale intercommunication, rather like a pipe or queue, but larger and on-disk. That allowed one to pick up and continue from where you left off if your program or OS crashed, modulo some definition of "soon".

</p><p>--dave</p></htmltext>
<tokenext>There are also " brinch hansen " files , named after Per Brinch Hansen , who implemented them in ( from memory ) the R2000 OS .
They were write-once , read-once , and they disappeared " soon " after the reader read the block .
They were used for large-scale intercommunication , rather like a pipe or queue , but larger and on-disk .
That allowed one to pick up and continue from where you left off if your program or OS crashed , modulo some definition of " soon " .
--dave</tokentext>
<sentencetext>There are also "brinch hansen" files, named after Per Brinch Hansen, who implemented them in (from memory) the R2000 OS.
They were write-once, read-once, and they disappeared "soon" after the reader read the block.
They were used for large-scale intercommunication, rather like a pipe or queue, but larger and on-disk.
That allowed one to pick up and continue from where you left off if your program or OS crashed, modulo some definition of "soon".
--dave</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956668</id>
	<title>Does that mean...</title>
	<author>Anonymous</author>
	<datestamp>1257164640000</datestamp>
	<modclass>Funny</modclass>
	<modscore>4</modscore>
	<htmltext><p>Duplicate slashdot articles will be links back to the original one?</p></htmltext>
<tokenext>Duplicate slashdot articles will be links back to the original one ?</tokentext>
<sentencetext>Duplicate slashdot articles will be links back to the original one?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957456</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Anonymous</author>
	<datestamp>1257168780000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>4</modscore>
	<htmltext><p>From that link: It is file-based and a service indexes it (whereas in ZFS it is block-based and on-the-fly). And they first introduced it in Windows Server 2000. Amazing. I'm sure it is a ugly hack since Windows has no soft/hard-links IIRC.</p></htmltext>
<tokenext>From that link : It is file-based and a service indexes it ( whereas in ZFS it is block-based and on-the-fly ) .
And they first introduced it in Windows Server 2000 .
Amazing. I 'm sure it is a ugly hack since Windows has no soft/hard-links IIRC .</tokentext>
<sentencetext>From that link: It is file-based and a service indexes it (whereas in ZFS it is block-based and on-the-fly).
And they first introduced it in Windows Server 2000.
Amazing. I'm sure it is a ugly hack since Windows has no soft/hard-links IIRC.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957452</id>
	<title>Re:Hash Collisions</title>
	<author>Anonymous</author>
	<datestamp>1257168780000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Apparently you suck at math as much as in life.</p></htmltext>
<tokenext>Apparently you suck at math as much as in life .</tokentext>
<sentencetext>Apparently you suck at math as much as in life.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957658</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>PRMan</author>
	<datestamp>1257169560000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>And worse...What happens when you go through a set of files A and change a single IP Address in each of them, defeating the duplication, while filesets B &amp; C still point to the same set.  Now, you have just increased your disk space usage by 200\% while not increasing the "size" of the files at all.</p><p>This will be extremely counter-intuitive when you run out of disk space by globally changing "192.168.1.1" to "192.168.1.2" in a huge set of files.</p></htmltext>
<tokenext>And worse...What happens when you go through a set of files A and change a single IP Address in each of them , defeating the duplication , while filesets B &amp; C still point to the same set .
Now , you have just increased your disk space usage by 200 \ % while not increasing the " size " of the files at all.This will be extremely counter-intuitive when you run out of disk space by globally changing " 192.168.1.1 " to " 192.168.1.2 " in a huge set of files .</tokentext>
<sentencetext>And worse...What happens when you go through a set of files A and change a single IP Address in each of them, defeating the duplication, while filesets B &amp; C still point to the same set.
Now, you have just increased your disk space usage by 200\% while not increasing the "size" of the files at all.This will be extremely counter-intuitive when you run out of disk space by globally changing "192.168.1.1" to "192.168.1.2" in a huge set of files.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957720</id>
	<title>Re:This is good news...</title>
	<author>Trepidity</author>
	<datestamp>1257169800000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>4</modscore>
	<htmltext><p>If you're running a normal desktop or laptop, this isn't likely to be of great use in any case. There's non-negligible overhead in doing the deduplication process, and drive space at consumer-level sizes is dirt-cheap, so it's only really worth doing this you have a <i>lot</i> of block-level duplicate data. That might be the case if e.g. you have 30 VMs on the same machine each with a separate install of the same OS, but is unlikely to be the case on a normal Mac laptop.</p></htmltext>
<tokenext>If you 're running a normal desktop or laptop , this is n't likely to be of great use in any case .
There 's non-negligible overhead in doing the deduplication process , and drive space at consumer-level sizes is dirt-cheap , so it 's only really worth doing this you have a lot of block-level duplicate data .
That might be the case if e.g .
you have 30 VMs on the same machine each with a separate install of the same OS , but is unlikely to be the case on a normal Mac laptop .</tokentext>
<sentencetext>If you're running a normal desktop or laptop, this isn't likely to be of great use in any case.
There's non-negligible overhead in doing the deduplication process, and drive space at consumer-level sizes is dirt-cheap, so it's only really worth doing this you have a lot of block-level duplicate data.
That might be the case if e.g.
you have 30 VMs on the same machine each with a separate install of the same OS, but is unlikely to be the case on a normal Mac laptop.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961888</id>
	<title>Re:There are three types of files.</title>
	<author>Anonymous</author>
	<datestamp>1257251340000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>But when it dips to fs level, it merely signify. From db level, Yes it does.<br>we have a world of inodes poiniting to universe of blocks. any intellegence put in to make it worthy of business is special.</p></htmltext>
<tokenext>But when it dips to fs level , it merely signify .
From db level , Yes it does.we have a world of inodes poiniting to universe of blocks .
any intellegence put in to make it worthy of business is special .</tokentext>
<sentencetext>But when it dips to fs level, it merely signify.
From db level, Yes it does.we have a world of inodes poiniting to universe of blocks.
any intellegence put in to make it worthy of business is special.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959944</id>
	<title>so when the fanboys are done jizzing</title>
	<author>Anonymous</author>
	<datestamp>1257184320000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>It's a filesystem. It stores files. efficiently. Uhhh.... that's cool and all<br>Quit jizzing. Realize the practical benefit to society, meditate on it,  and then go back to that righteous ftp client you were writing.<br>
&nbsp;</p></htmltext>
<tokenext>It 's a filesystem .
It stores files .
efficiently. Uhhh.... that 's cool and allQuit jizzing .
Realize the practical benefit to society , meditate on it , and then go back to that righteous ftp client you were writing .
 </tokentext>
<sentencetext>It's a filesystem.
It stores files.
efficiently. Uhhh.... that's cool and allQuit jizzing.
Realize the practical benefit to society, meditate on it,  and then go back to that righteous ftp client you were writing.
 </sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959316</id>
	<title>Re:This is good news...</title>
	<author>evilviper</author>
	<datestamp>1257178680000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>it's only really worth doing this you have a lot of block-level duplicate data. That might be the case if e.g. you have 30 VMs on the same machine</p></div></blockquote><p><nobr> <wbr></nobr>...or if you, I dunno, <b>EVER BACK-UP YOUR DAMN SYSTEM!!!</b></p><p>Duplicate data is exactly what filesystem snapshots are all about.  Try backing up your data every day, for 10 days, keeping all the changed versions of files...  Gee, do I want to buy a HDD that is 10X as large as all the rest of my storage space combined, or to I want one that is 1.5X as large?  Tough choice.</p><p>I've been doing this for a long time with rsync and hard links.  I'm not sure how reliably and well-performing this is going to be built-in to the filesystem, and at the block level, but I'm damn sure happy to try it out... at home, at work, everywhere.</p></div>
	</htmltext>
<tokenext>it 's only really worth doing this you have a lot of block-level duplicate data .
That might be the case if e.g .
you have 30 VMs on the same machine ...or if you , I dunno , EVER BACK-UP YOUR DAMN SYSTEM ! !
! Duplicate data is exactly what filesystem snapshots are all about .
Try backing up your data every day , for 10 days , keeping all the changed versions of files... Gee , do I want to buy a HDD that is 10X as large as all the rest of my storage space combined , or to I want one that is 1.5X as large ?
Tough choice.I 've been doing this for a long time with rsync and hard links .
I 'm not sure how reliably and well-performing this is going to be built-in to the filesystem , and at the block level , but I 'm damn sure happy to try it out... at home , at work , everywhere .</tokentext>
<sentencetext>it's only really worth doing this you have a lot of block-level duplicate data.
That might be the case if e.g.
you have 30 VMs on the same machine ...or if you, I dunno, EVER BACK-UP YOUR DAMN SYSTEM!!
!Duplicate data is exactly what filesystem snapshots are all about.
Try backing up your data every day, for 10 days, keeping all the changed versions of files...  Gee, do I want to buy a HDD that is 10X as large as all the rest of my storage space combined, or to I want one that is 1.5X as large?
Tough choice.I've been doing this for a long time with rsync and hard links.
I'm not sure how reliably and well-performing this is going to be built-in to the filesystem, and at the block level, but I'm damn sure happy to try it out... at home, at work, everywhere.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752</id>
	<title>More reason to be a ZFS fanboy</title>
	<author>BitZtream</author>
	<datestamp>1257165120000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>3</modscore>
	<htmltext><p>I'm wondering how long its going to take for them to do something with ZFS that actually makes me slow down my overwhelming ZFS fanboyism.</p><p>I just love these guys.</p><p>My virtual machine NFS server is going to have to get this as soon as FBSD imports it, and I'll no longer have to worry about having backup software (like BackupPC, good stuff btw) that does this.</p><p>I don't use high end SANs but it would seem to me that they are rapidly losing any particular advantage to a Solaris or FBSD file server.</p></htmltext>
<tokenext>I 'm wondering how long its going to take for them to do something with ZFS that actually makes me slow down my overwhelming ZFS fanboyism.I just love these guys.My virtual machine NFS server is going to have to get this as soon as FBSD imports it , and I 'll no longer have to worry about having backup software ( like BackupPC , good stuff btw ) that does this.I do n't use high end SANs but it would seem to me that they are rapidly losing any particular advantage to a Solaris or FBSD file server .</tokentext>
<sentencetext>I'm wondering how long its going to take for them to do something with ZFS that actually makes me slow down my overwhelming ZFS fanboyism.I just love these guys.My virtual machine NFS server is going to have to get this as soon as FBSD imports it, and I'll no longer have to worry about having backup software (like BackupPC, good stuff btw) that does this.I don't use high end SANs but it would seem to me that they are rapidly losing any particular advantage to a Solaris or FBSD file server.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957810</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>Anonymous</author>
	<datestamp>1257170220000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>I say unreliably, because years ago we had a Novell server that used an automated compression scheme. Eventually, the drive got full anyway, and we had to migrate to a larger disk.
<br> <br>But since the copy operation de-compressed files on the fly we couldn't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive. What ensued was a nightmare of copy and delete files beginning with the smallest, and working our way up to the largest. It took over a day of manual effort before we freed up enough space to mass-move the remaining files.</p></div></blockquote><p>This is because you didn't use NetWare's tools to copy the files - the command line NCOPY, for example, with<nobr> <wbr></nobr>/Ror and<nobr> <wbr></nobr>/RU (available when file compression was introduced with NetWare 4) would have copied the files in their compressed format, avoiding this (Link: <a href="http://support.novell.com/techcenter/articles/ana19940603.html" title="novell.com" rel="nofollow">http://support.novell.com/techcenter/articles/ana19940603.html</a> [novell.com]). Using the Novell Client for Windows, I'd imagine that its Explorer shell integration would give you GUI tools, too, though I no longer have a NetWare server to verify this, and always preferred the command line anyway<nobr> <wbr></nobr>:).<br> <br>No offense, but the scenario you describe is the result of ignorance, nor poor design.</p></div>
	</htmltext>
<tokenext>I say unreliably , because years ago we had a Novell server that used an automated compression scheme .
Eventually , the drive got full anyway , and we had to migrate to a larger disk .
But since the copy operation de-compressed files on the fly we could n't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive .
What ensued was a nightmare of copy and delete files beginning with the smallest , and working our way up to the largest .
It took over a day of manual effort before we freed up enough space to mass-move the remaining files.This is because you did n't use NetWare 's tools to copy the files - the command line NCOPY , for example , with /Ror and /RU ( available when file compression was introduced with NetWare 4 ) would have copied the files in their compressed format , avoiding this ( Link : http : //support.novell.com/techcenter/articles/ana19940603.html [ novell.com ] ) .
Using the Novell Client for Windows , I 'd imagine that its Explorer shell integration would give you GUI tools , too , though I no longer have a NetWare server to verify this , and always preferred the command line anyway : ) .
No offense , but the scenario you describe is the result of ignorance , nor poor design .</tokentext>
<sentencetext>I say unreliably, because years ago we had a Novell server that used an automated compression scheme.
Eventually, the drive got full anyway, and we had to migrate to a larger disk.
But since the copy operation de-compressed files on the fly we couldn't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive.
What ensued was a nightmare of copy and delete files beginning with the smallest, and working our way up to the largest.
It took over a day of manual effort before we freed up enough space to mass-move the remaining files.This is because you didn't use NetWare's tools to copy the files - the command line NCOPY, for example, with /Ror and /RU (available when file compression was introduced with NetWare 4) would have copied the files in their compressed format, avoiding this (Link: http://support.novell.com/techcenter/articles/ana19940603.html [novell.com]).
Using the Novell Client for Windows, I'd imagine that its Explorer shell integration would give you GUI tools, too, though I no longer have a NetWare server to verify this, and always preferred the command line anyway :).
No offense, but the scenario you describe is the result of ignorance, nor poor design.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960882</id>
	<title>Not quite the wonderful thing it appears to be</title>
	<author>pjr.cc</author>
	<datestamp>1257280320000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>2</modscore>
	<htmltext><p>De dupe has been around for a while and has some advantages and quite a few negatives... First off, i'd be interested to see how many patent trolls this might stumble over. But de-dup has always gone hand in hand with backups and golden images. EMC, HDS and co never did a good job supporting golden images, but other storage have done well with it (3par, compellent, equalogic).</p><p>For the uninitiated, golden images usually consist of building a machine on a SAN, and then using that one image to power many machines (i.e. the same blocks on disk). It then usually just stores deltas from the golden image for each machine... its got its advantages and disadvantages much like de-dup.</p><p>Now, the reasons for its use are simple "pay less for storage" which sounds dumb in this day and age (with 1tb drives costing virtually nothing), but the reality is in the SAN world 1tb drives cost a fortune and wherever you use de-dup or golden images, you usually use the fastest (and smallest) disk you can get your hands on. (if you dont understand why this is, see the backblaze article from a little while ago - ultimately, putting more space in a bit of SAN storage kit is freeking expensive). In the enterprise world, its almost impossible to step away from SAN storage (unless your google or backblaze).</p><p>The big problem with de-dup (and why its primarily used for backups, and primarily only disk-based backups) is how it effects the storage. If you suddenly have one hot spot, even on fast disk, the storage starts grinding to a halt (even when considering caching) because lots of things start accessing the same blocks on the disk. This is not a problem for backups because its usually a once-written, rarely-read scenario. On file servers and databases, its a performance killer (something akin to raid5/6 in software). But de-dup is fantastic for archival storage!... De-dup and performance often tend to be a self-fulfilling prophecy though, simply because data that is duplicated is often duplicated cause its heavily accessed. Take email as a good example. Joe sends out an email with an attachment of some form (perhaps its a document template, but it really doesn't matter so long as he's sending it to a large number of people), all those people save the attachment and probably make some edits. This introduces the next load of pain, fragmentation. All those delta's from the original now need to be saved "somewhere else" and meanwhile all these people are accessing not just the de-dup'ed blocks but the fragmented changes (consider the kernel source for linux as well, tonnes of branches of code that would possible get de-duped and fragmented). Databases are another great example. Often in tablespaces there is quite a load of block-alligned duplicate data, often this is the nature of how databases store data. Sometimes this data can be quite critical to their function and to have a database slamming the same blocks (again with small fragmented changes) is pain personified.</p><p>Still, i wonder how many patents sun are likely to trip up on... I see this being non-fun as there are many people who make serious cash from de-dup at various levels....</p></htmltext>
<tokenext>De dupe has been around for a while and has some advantages and quite a few negatives... First off , i 'd be interested to see how many patent trolls this might stumble over .
But de-dup has always gone hand in hand with backups and golden images .
EMC , HDS and co never did a good job supporting golden images , but other storage have done well with it ( 3par , compellent , equalogic ) .For the uninitiated , golden images usually consist of building a machine on a SAN , and then using that one image to power many machines ( i.e .
the same blocks on disk ) .
It then usually just stores deltas from the golden image for each machine... its got its advantages and disadvantages much like de-dup.Now , the reasons for its use are simple " pay less for storage " which sounds dumb in this day and age ( with 1tb drives costing virtually nothing ) , but the reality is in the SAN world 1tb drives cost a fortune and wherever you use de-dup or golden images , you usually use the fastest ( and smallest ) disk you can get your hands on .
( if you dont understand why this is , see the backblaze article from a little while ago - ultimately , putting more space in a bit of SAN storage kit is freeking expensive ) .
In the enterprise world , its almost impossible to step away from SAN storage ( unless your google or backblaze ) .The big problem with de-dup ( and why its primarily used for backups , and primarily only disk-based backups ) is how it effects the storage .
If you suddenly have one hot spot , even on fast disk , the storage starts grinding to a halt ( even when considering caching ) because lots of things start accessing the same blocks on the disk .
This is not a problem for backups because its usually a once-written , rarely-read scenario .
On file servers and databases , its a performance killer ( something akin to raid5/6 in software ) .
But de-dup is fantastic for archival storage ! .. .
De-dup and performance often tend to be a self-fulfilling prophecy though , simply because data that is duplicated is often duplicated cause its heavily accessed .
Take email as a good example .
Joe sends out an email with an attachment of some form ( perhaps its a document template , but it really does n't matter so long as he 's sending it to a large number of people ) , all those people save the attachment and probably make some edits .
This introduces the next load of pain , fragmentation .
All those delta 's from the original now need to be saved " somewhere else " and meanwhile all these people are accessing not just the de-dup'ed blocks but the fragmented changes ( consider the kernel source for linux as well , tonnes of branches of code that would possible get de-duped and fragmented ) .
Databases are another great example .
Often in tablespaces there is quite a load of block-alligned duplicate data , often this is the nature of how databases store data .
Sometimes this data can be quite critical to their function and to have a database slamming the same blocks ( again with small fragmented changes ) is pain personified.Still , i wonder how many patents sun are likely to trip up on... I see this being non-fun as there are many people who make serious cash from de-dup at various levels... .</tokentext>
<sentencetext>De dupe has been around for a while and has some advantages and quite a few negatives... First off, i'd be interested to see how many patent trolls this might stumble over.
But de-dup has always gone hand in hand with backups and golden images.
EMC, HDS and co never did a good job supporting golden images, but other storage have done well with it (3par, compellent, equalogic).For the uninitiated, golden images usually consist of building a machine on a SAN, and then using that one image to power many machines (i.e.
the same blocks on disk).
It then usually just stores deltas from the golden image for each machine... its got its advantages and disadvantages much like de-dup.Now, the reasons for its use are simple "pay less for storage" which sounds dumb in this day and age (with 1tb drives costing virtually nothing), but the reality is in the SAN world 1tb drives cost a fortune and wherever you use de-dup or golden images, you usually use the fastest (and smallest) disk you can get your hands on.
(if you dont understand why this is, see the backblaze article from a little while ago - ultimately, putting more space in a bit of SAN storage kit is freeking expensive).
In the enterprise world, its almost impossible to step away from SAN storage (unless your google or backblaze).The big problem with de-dup (and why its primarily used for backups, and primarily only disk-based backups) is how it effects the storage.
If you suddenly have one hot spot, even on fast disk, the storage starts grinding to a halt (even when considering caching) because lots of things start accessing the same blocks on the disk.
This is not a problem for backups because its usually a once-written, rarely-read scenario.
On file servers and databases, its a performance killer (something akin to raid5/6 in software).
But de-dup is fantastic for archival storage!...
De-dup and performance often tend to be a self-fulfilling prophecy though, simply because data that is duplicated is often duplicated cause its heavily accessed.
Take email as a good example.
Joe sends out an email with an attachment of some form (perhaps its a document template, but it really doesn't matter so long as he's sending it to a large number of people), all those people save the attachment and probably make some edits.
This introduces the next load of pain, fragmentation.
All those delta's from the original now need to be saved "somewhere else" and meanwhile all these people are accessing not just the de-dup'ed blocks but the fragmented changes (consider the kernel source for linux as well, tonnes of branches of code that would possible get de-duped and fragmented).
Databases are another great example.
Often in tablespaces there is quite a load of block-alligned duplicate data, often this is the nature of how databases store data.
Sometimes this data can be quite critical to their function and to have a database slamming the same blocks (again with small fragmented changes) is pain personified.Still, i wonder how many patents sun are likely to trip up on... I see this being non-fun as there are many people who make serious cash from de-dup at various levels....</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29972298</id>
	<title>Re:There are three types of files.</title>
	<author>Anonymous</author>
	<datestamp>1257259620000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>O\_APPEND most certainly is enforced.  Sure you can seek, but any time you write it'll seek back to the end before doing that.</p><p>If the file offset were used normally for O\_APPEND files, it would be broken; if any other process appends more data before your write, you would no longer be at the end of the file.  Fortunately, it works correctly, and multiple processes can share a log file.  Not that it's usually a good idea, but it works.</p></htmltext>
<tokenext>O \ _APPEND most certainly is enforced .
Sure you can seek , but any time you write it 'll seek back to the end before doing that.If the file offset were used normally for O \ _APPEND files , it would be broken ; if any other process appends more data before your write , you would no longer be at the end of the file .
Fortunately , it works correctly , and multiple processes can share a log file .
Not that it 's usually a good idea , but it works .</tokentext>
<sentencetext>O\_APPEND most certainly is enforced.
Sure you can seek, but any time you write it'll seek back to the end before doing that.If the file offset were used normally for O\_APPEND files, it would be broken; if any other process appends more data before your write, you would no longer be at the end of the file.
Fortunately, it works correctly, and multiple processes can share a log file.
Not that it's usually a good idea, but it works.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Anonymous</author>
	<datestamp>1257167160000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>5</modscore>
	<htmltext><p>How about this: you can't remove a top-level vdev without destroying your storage pool. That means that if you accidentally use the "zpool add" command instead of "zpool attach" to add a new disk to a mirror, you are in a world of hurt.</p><p>How about this: after years of ZFS being around, you still can't add or remove disks from a RAID-Z.</p><p>How about this: If you have a mirror between two devices of different sizes, and you remove the smaller one, you won't be able to add it back. The vdev will autoexpand to fill the larger disk, even if no data is actually written, and the disk that was just a moment ago part of the mirror is now "too small".</p><p>How about this: the whole system was designed with the implicit assumption that your storage needs would only ever grow, with the result that in nearly all cases it's impossible to ever scale a ZFS pool down.</p></htmltext>
<tokenext>How about this : you ca n't remove a top-level vdev without destroying your storage pool .
That means that if you accidentally use the " zpool add " command instead of " zpool attach " to add a new disk to a mirror , you are in a world of hurt.How about this : after years of ZFS being around , you still ca n't add or remove disks from a RAID-Z.How about this : If you have a mirror between two devices of different sizes , and you remove the smaller one , you wo n't be able to add it back .
The vdev will autoexpand to fill the larger disk , even if no data is actually written , and the disk that was just a moment ago part of the mirror is now " too small " .How about this : the whole system was designed with the implicit assumption that your storage needs would only ever grow , with the result that in nearly all cases it 's impossible to ever scale a ZFS pool down .</tokentext>
<sentencetext>How about this: you can't remove a top-level vdev without destroying your storage pool.
That means that if you accidentally use the "zpool add" command instead of "zpool attach" to add a new disk to a mirror, you are in a world of hurt.How about this: after years of ZFS being around, you still can't add or remove disks from a RAID-Z.How about this: If you have a mirror between two devices of different sizes, and you remove the smaller one, you won't be able to add it back.
The vdev will autoexpand to fill the larger disk, even if no data is actually written, and the disk that was just a moment ago part of the mirror is now "too small".How about this: the whole system was designed with the implicit assumption that your storage needs would only ever grow, with the result that in nearly all cases it's impossible to ever scale a ZFS pool down.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957336</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>Znork</author>
	<datestamp>1257168300000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><i>But there is a hell to pay somewhere down the road.</i></p><p>I'd certainly expect that. I don't quite get what people are so desperate to de-duplicate anyway. A stripped VM os image is less than a gigabyte, you can fit 150 of them on a drive that costs less than $100. You'd have to have vast ranges of perfectly synchronized virtual machines before you'd have made back even the cost of the time spent listening to the sales pitch.</p><p>I can't really see many situations where the extra complexity and cost would end up actually saving money. The few I can see it would be where somebody's been tricked into buying such excruciatingly expensive SAN storage that they can barely afford to store anything on it any more, or situations where their storage is a complete mess and they can't use more intelligent means of not storing the same thing many times (snapshots, shared file systems, overlay devices, etc). In those cases it seems there would be more to gain by solving the actual problem than tacking another patch onto the stack. Storage, for most purposes, is dirt cheap today.</p></htmltext>
<tokenext>But there is a hell to pay somewhere down the road.I 'd certainly expect that .
I do n't quite get what people are so desperate to de-duplicate anyway .
A stripped VM os image is less than a gigabyte , you can fit 150 of them on a drive that costs less than $ 100 .
You 'd have to have vast ranges of perfectly synchronized virtual machines before you 'd have made back even the cost of the time spent listening to the sales pitch.I ca n't really see many situations where the extra complexity and cost would end up actually saving money .
The few I can see it would be where somebody 's been tricked into buying such excruciatingly expensive SAN storage that they can barely afford to store anything on it any more , or situations where their storage is a complete mess and they ca n't use more intelligent means of not storing the same thing many times ( snapshots , shared file systems , overlay devices , etc ) .
In those cases it seems there would be more to gain by solving the actual problem than tacking another patch onto the stack .
Storage , for most purposes , is dirt cheap today .</tokentext>
<sentencetext>But there is a hell to pay somewhere down the road.I'd certainly expect that.
I don't quite get what people are so desperate to de-duplicate anyway.
A stripped VM os image is less than a gigabyte, you can fit 150 of them on a drive that costs less than $100.
You'd have to have vast ranges of perfectly synchronized virtual machines before you'd have made back even the cost of the time spent listening to the sales pitch.I can't really see many situations where the extra complexity and cost would end up actually saving money.
The few I can see it would be where somebody's been tricked into buying such excruciatingly expensive SAN storage that they can barely afford to store anything on it any more, or situations where their storage is a complete mess and they can't use more intelligent means of not storing the same thing many times (snapshots, shared file systems, overlay devices, etc).
In those cases it seems there would be more to gain by solving the actual problem than tacking another patch onto the stack.
Storage, for most purposes, is dirt cheap today.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959188</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>Anonymous</author>
	<datestamp>1257177540000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>"Eventually, the drive got full anyway, and we had to migrate to a larger disk." - Incorrect, you could have added the disk and expanded the volume, you didn't "have" to migrate to a new disk.</p><p>"But since the copy operation de-compressed files on the fly we couldn't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive" - There is a set parameter to control when decompression occurs, the default is something like two accesses within a day will cause decompression.  But these setting are on the commit to disk, the decompression still occurs from a client perspective as it does the decompression in memory.</p><p>"But there is a hell to pay somewhere down the road." - Very likely true.  With compression when you do migrate off to a uncompressed volume, you are going to add additional cpu cycles for decompression.  Depending on how much compression, number of files, io of system, cpu, will determine how much time it is going to cost you.   I tend to recommend compression for user data, as a huge percentage of user data doesn't get accessed very often, so there is a big gain in disk savings, very infrequent decompresses, and having already compressed data help increase the effective backup rate.</p></htmltext>
<tokenext>" Eventually , the drive got full anyway , and we had to migrate to a larger disk .
" - Incorrect , you could have added the disk and expanded the volume , you did n't " have " to migrate to a new disk .
" But since the copy operation de-compressed files on the fly we could n't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive " - There is a set parameter to control when decompression occurs , the default is something like two accesses within a day will cause decompression .
But these setting are on the commit to disk , the decompression still occurs from a client perspective as it does the decompression in memory .
" But there is a hell to pay somewhere down the road .
" - Very likely true .
With compression when you do migrate off to a uncompressed volume , you are going to add additional cpu cycles for decompression .
Depending on how much compression , number of files , io of system , cpu , will determine how much time it is going to cost you .
I tend to recommend compression for user data , as a huge percentage of user data does n't get accessed very often , so there is a big gain in disk savings , very infrequent decompresses , and having already compressed data help increase the effective backup rate .</tokentext>
<sentencetext>"Eventually, the drive got full anyway, and we had to migrate to a larger disk.
" - Incorrect, you could have added the disk and expanded the volume, you didn't "have" to migrate to a new disk.
"But since the copy operation de-compressed files on the fly we couldn't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive" - There is a set parameter to control when decompression occurs, the default is something like two accesses within a day will cause decompression.
But these setting are on the commit to disk, the decompression still occurs from a client perspective as it does the decompression in memory.
"But there is a hell to pay somewhere down the road.
" - Very likely true.
With compression when you do migrate off to a uncompressed volume, you are going to add additional cpu cycles for decompression.
Depending on how much compression, number of files, io of system, cpu, will determine how much time it is going to cost you.
I tend to recommend compression for user data, as a huge percentage of user data doesn't get accessed very often, so there is a big gain in disk savings, very infrequent decompresses, and having already compressed data help increase the effective backup rate.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957400</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Anonymous</author>
	<datestamp>1257168480000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>You make some good points about ZFS annoyances.</p><p>I've seen some recent activity around the first limitation you mention (i.e. you can't remove a top-level vdev), so hopefully we'll see a fix soon.</p><p>You may have missed that there's now a ZFS property you can set to control whether pools automatically expand into free space.  Note that previously autoexpansion could only happen if you gave ZFS entire disks without partitions.</p></htmltext>
<tokenext>You make some good points about ZFS annoyances.I 've seen some recent activity around the first limitation you mention ( i.e .
you ca n't remove a top-level vdev ) , so hopefully we 'll see a fix soon.You may have missed that there 's now a ZFS property you can set to control whether pools automatically expand into free space .
Note that previously autoexpansion could only happen if you gave ZFS entire disks without partitions .</tokentext>
<sentencetext>You make some good points about ZFS annoyances.I've seen some recent activity around the first limitation you mention (i.e.
you can't remove a top-level vdev), so hopefully we'll see a fix soon.You may have missed that there's now a ZFS property you can set to control whether pools automatically expand into free space.
Note that previously autoexpansion could only happen if you gave ZFS entire disks without partitions.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957026</id>
	<title>Nice, but can it ...</title>
	<author>Anonymous</author>
	<datestamp>1257166800000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>... strategically populate the available space with duplicates of commonly read blocks, for increased fault tolerance and performance?</p></htmltext>
<tokenext>... strategically populate the available space with duplicates of commonly read blocks , for increased fault tolerance and performance ?</tokentext>
<sentencetext>... strategically populate the available space with duplicates of commonly read blocks, for increased fault tolerance and performance?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958136</id>
	<title>File-level? Block-level?</title>
	<author>Anonymous</author>
	<datestamp>1257171780000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Pssht! Not good enough, Sun. I require bit-level. I won't be satisfied until I can create a zpool wherein all my data are deduped down to one 1 and one 0.</p></htmltext>
<tokenext>Pssht !
Not good enough , Sun .
I require bit-level .
I wo n't be satisfied until I can create a zpool wherein all my data are deduped down to one 1 and one 0 .</tokentext>
<sentencetext>Pssht!
Not good enough, Sun.
I require bit-level.
I won't be satisfied until I can create a zpool wherein all my data are deduped down to one 1 and one 0.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957012</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>dgatwood</author>
	<datestamp>1257166740000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>That's just classic bad design.  There's no reason for the decompressed files to exist on disk at all just to decompress them.  The software should have decompressed to RAM on the fly instead of storing the decompressed files as temp files on the hard drive.  It's all probably because they made a poor attempt at shoehorning compression into a VFS layer that was too block-centric.  Classic bad design all around.</p></htmltext>
<tokenext>That 's just classic bad design .
There 's no reason for the decompressed files to exist on disk at all just to decompress them .
The software should have decompressed to RAM on the fly instead of storing the decompressed files as temp files on the hard drive .
It 's all probably because they made a poor attempt at shoehorning compression into a VFS layer that was too block-centric .
Classic bad design all around .</tokentext>
<sentencetext>That's just classic bad design.
There's no reason for the decompressed files to exist on disk at all just to decompress them.
The software should have decompressed to RAM on the fly instead of storing the decompressed files as temp files on the hard drive.
It's all probably because they made a poor attempt at shoehorning compression into a VFS layer that was too block-centric.
Classic bad design all around.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958192</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>Anonymous</author>
	<datestamp>1257172020000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Because when Novell fails, all subsequent technologies that sound similar will also fail.</p></htmltext>
<tokenext>Because when Novell fails , all subsequent technologies that sound similar will also fail .</tokentext>
<sentencetext>Because when Novell fails, all subsequent technologies that sound similar will also fail.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960662</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Hurricane78</author>
	<datestamp>1257191100000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>And how about this: The Linux FUSE ZFS implementation (the only one on Linux) eats half of your (not the newest generation) processor cores and 600MB RAM for breakfast. Yes, that's right. It uses <em>that</em> much resources.</p><p>Although I must say, for my archive, it's still worth it. Because it's the only thing that can protect my data from the data corruption that happens more and more often with "modern" HDDs.</p></htmltext>
<tokenext>And how about this : The Linux FUSE ZFS implementation ( the only one on Linux ) eats half of your ( not the newest generation ) processor cores and 600MB RAM for breakfast .
Yes , that 's right .
It uses that much resources.Although I must say , for my archive , it 's still worth it .
Because it 's the only thing that can protect my data from the data corruption that happens more and more often with " modern " HDDs .</tokentext>
<sentencetext>And how about this: The Linux FUSE ZFS implementation (the only one on Linux) eats half of your (not the newest generation) processor cores and 600MB RAM for breakfast.
Yes, that's right.
It uses that much resources.Although I must say, for my archive, it's still worth it.
Because it's the only thing that can protect my data from the data corruption that happens more and more often with "modern" HDDs.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957772</id>
	<title>I worked with De-duplication</title>
	<author>Anonymous</author>
	<datestamp>1257169980000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>You are loosing reliability. Some hashes will collide on some computer somewhere.<br>The idea is that if you assume that blocks on HD are random then odds of hitting Hash collision are tiny.<br>But data is not random - humans and programs make it non-random!</p><p>Here is an example:<br>What are the odds that 256 people going across the street will all be men?<br>That would be 2 ^ -256 - that will never happen.<br>But guess what? Image that you see a parade and 300 marines are marching by...<br>It just happened.<br>Do you want to bet your server data on that?</p></htmltext>
<tokenext>You are loosing reliability .
Some hashes will collide on some computer somewhere.The idea is that if you assume that blocks on HD are random then odds of hitting Hash collision are tiny.But data is not random - humans and programs make it non-random ! Here is an example : What are the odds that 256 people going across the street will all be men ? That would be 2 ^ -256 - that will never happen.But guess what ?
Image that you see a parade and 300 marines are marching by...It just happened.Do you want to bet your server data on that ?</tokentext>
<sentencetext>You are loosing reliability.
Some hashes will collide on some computer somewhere.The idea is that if you assume that blocks on HD are random then odds of hitting Hash collision are tiny.But data is not random - humans and programs make it non-random!Here is an example:What are the odds that 256 people going across the street will all be men?That would be 2 ^ -256 - that will never happen.But guess what?
Image that you see a parade and 300 marines are marching by...It just happened.Do you want to bet your server data on that?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961166</id>
	<title>What if...</title>
	<author>azav</author>
	<datestamp>1257241020000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>What if that single block goes bad?</p></htmltext>
<tokenext>What if that single block goes bad ?</tokentext>
<sentencetext>What if that single block goes bad?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960672</id>
	<title>Re:Hash Collisions</title>
	<author>Anonymous</author>
	<datestamp>1257191160000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>They should be, but is not by default.  You have to enable it in the filesystem options, which kind of makes no sense.</p><p>The compare is pretty cheap: when you digest a block that has a hash match it'll be waiting in memory so you only need to read the target block to do the compare.</p><p>It really makes no sense that they'd use an expensive hash that has "really, really low chance of collision" instead of cheap hash and direct compare that has no algorithmic chance of collision.</p></htmltext>
<tokenext>They should be , but is not by default .
You have to enable it in the filesystem options , which kind of makes no sense.The compare is pretty cheap : when you digest a block that has a hash match it 'll be waiting in memory so you only need to read the target block to do the compare.It really makes no sense that they 'd use an expensive hash that has " really , really low chance of collision " instead of cheap hash and direct compare that has no algorithmic chance of collision .</tokentext>
<sentencetext>They should be, but is not by default.
You have to enable it in the filesystem options, which kind of makes no sense.The compare is pretty cheap: when you digest a block that has a hash match it'll be waiting in memory so you only need to read the target block to do the compare.It really makes no sense that they'd use an expensive hash that has "really, really low chance of collision" instead of cheap hash and direct compare that has no algorithmic chance of collision.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956786</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961778</id>
	<title>Re:Hash Collisions</title>
	<author>Alex Belits</author>
	<datestamp>1257250080000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>So next time I am going to write some "Enterprise-quality" software, I should add something like this to every cron job script:</p><p>--- 8&lt; ---<br>TMPFILE1=`mktemp<nobr> <wbr></nobr>/tmp/tempXXXXXXXX`<br>TMPFILE2=`mktemp<nobr> <wbr></nobr>/tmp/tempXXXXXXXX`<br>dd if=/dev/urandom bs=4096 count=1 of=$TMPFILE1<br>dd if=/bin/ls bs=4096 count=1 of=$TMPFILE2</p><p>cmp $TMPFILE1 $TMPFILE2 &amp;&amp; dd if=/dev/urandom of=/dev/md0<br>rm $TMPFILE1 $TMPFILE2<br>--- &gt;8 ---</p><p>Right?</p></htmltext>
<tokenext>So next time I am going to write some " Enterprise-quality " software , I should add something like this to every cron job script : --- 8TMPFILE1 = ` mktemp /tmp/tempXXXXXXXX ` TMPFILE2 = ` mktemp /tmp/tempXXXXXXXX ` dd if = /dev/urandom bs = 4096 count = 1 of = $ TMPFILE1dd if = /bin/ls bs = 4096 count = 1 of = $ TMPFILE2cmp $ TMPFILE1 $ TMPFILE2 &amp;&amp; dd if = /dev/urandom of = /dev/md0rm $ TMPFILE1 $ TMPFILE2--- &gt; 8 ---Right ?</tokentext>
<sentencetext>So next time I am going to write some "Enterprise-quality" software, I should add something like this to every cron job script:--- 8TMPFILE1=`mktemp /tmp/tempXXXXXXXX`TMPFILE2=`mktemp /tmp/tempXXXXXXXX`dd if=/dev/urandom bs=4096 count=1 of=$TMPFILE1dd if=/bin/ls bs=4096 count=1 of=$TMPFILE2cmp $TMPFILE1 $TMPFILE2 &amp;&amp; dd if=/dev/urandom of=/dev/md0rm $TMPFILE1 $TMPFILE2--- &gt;8 ---Right?</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958536</id>
	<title>223517417907714843750 terabyte drive</title>
	<author>Anonymous</author>
	<datestamp>1257173820000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>newegg has these on special for 169.00, but the reviews stink. Wait for the SATA version.</p></htmltext>
<tokenext>newegg has these on special for 169.00 , but the reviews stink .
Wait for the SATA version .</tokentext>
<sentencetext>newegg has these on special for 169.00, but the reviews stink.
Wait for the SATA version.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956866</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961750</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Anonymous</author>
	<datestamp>1257249900000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Take a look to lessfs.<br>It's still experimental though.</p><p><a href="http://www.lessfs.com/wordpress/" title="lessfs.com" rel="nofollow">http://www.lessfs.com/wordpress/</a> [lessfs.com]</p></htmltext>
<tokenext>Take a look to lessfs.It 's still experimental though.http : //www.lessfs.com/wordpress/ [ lessfs.com ]</tokentext>
<sentencetext>Take a look to lessfs.It's still experimental though.http://www.lessfs.com/wordpress/ [lessfs.com]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960692</id>
	<title>ZFS, the first sexy file system</title>
	<author>Anonymous</author>
	<datestamp>1257191460000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>ZFS is so sexy, I want a nekkid picture of it.</p></htmltext>
<tokenext>ZFS is so sexy , I want a nekkid picture of it .</tokentext>
<sentencetext>ZFS is so sexy, I want a nekkid picture of it.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956982</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>hapalibashi</author>
	<datestamp>1257166560000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>2</modscore>
	<htmltext>Yes, Venti. I believe it originated in Plan9 from Bell Labs.</htmltext>
<tokenext>Yes , Venti .
I believe it originated in Plan9 from Bell Labs .</tokentext>
<sentencetext>Yes, Venti.
I believe it originated in Plan9 from Bell Labs.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960360</id>
	<title>Re:Hash Collisions</title>
	<author>pclminion</author>
	<datestamp>1257187860000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Are you suggesting that the general data corruption rate of a modern disk is lower than 10^-18? I wonder where you find these magical drives.</htmltext>
<tokenext>Are you suggesting that the general data corruption rate of a modern disk is lower than 10 ^ -18 ?
I wonder where you find these magical drives .</tokentext>
<sentencetext>Are you suggesting that the general data corruption rate of a modern disk is lower than 10^-18?
I wonder where you find these magical drives.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959800</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957442</id>
	<title>Open Source Cures Cancer</title>
	<author>Anonymous</author>
	<datestamp>1257168720000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>Use open source, get cutting edge things.</p></div><p>Like a cutting edge CAD packages, games, financial management and office suites?  Good thing we had you to tell us that open source will solve our every problem just by virtue of it being open source.  I'm sure every print shop is going to dump Photoshop for GIMP, every finance firm will dump Excel for Openoffice Calc and every engineering firm will dump AutoCAD for... what exactly?</p><p>Maybe, just maybe open source isn't the answer for everything after all...</p></div>
	</htmltext>
<tokenext>Use open source , get cutting edge things.Like a cutting edge CAD packages , games , financial management and office suites ?
Good thing we had you to tell us that open source will solve our every problem just by virtue of it being open source .
I 'm sure every print shop is going to dump Photoshop for GIMP , every finance firm will dump Excel for Openoffice Calc and every engineering firm will dump AutoCAD for... what exactly ? Maybe , just maybe open source is n't the answer for everything after all.. .</tokentext>
<sentencetext>Use open source, get cutting edge things.Like a cutting edge CAD packages, games, financial management and office suites?
Good thing we had you to tell us that open source will solve our every problem just by virtue of it being open source.
I'm sure every print shop is going to dump Photoshop for GIMP, every finance firm will dump Excel for Openoffice Calc and every engineering firm will dump AutoCAD for... what exactly?Maybe, just maybe open source isn't the answer for everything after all...
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961770</id>
	<title>Re:There are three types of files.</title>
	<author>dkf</author>
	<datestamp>1257250020000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><ul> <li> <b>Managed files</b> Managed files are random-access files managed by a database or archive program.  Random access is supported.  The use of open modes O\_SYNC, O\_EXCL, or O\_DIRECT during file creation indicates a managed file.  Seeks while open for write are permitted, multiple opens access the same file, and O\_SYNC and O\_EXCL must work as documented. Unduplication via hashing probably isn't worth the trouble and is bad for database integrity.</li></ul></div><p>O\_EXCL is used sometimes for unit files too, such as when they want a guarantee that the file is created by them. This can be important in a security context.</p><p><div class="quote"><p>A relatively small number of programs and libraries use "managed" files, and they're mostly databases of one kind or another.</p></div><p>There's more than there used to be due to the rise of the small-database-as-library, such as sqlite. Generally, this is a good thing (applications with data integrity without masses of configuration or reinvention of the wheel can hardly be anything but good!) but it does mean that more files are "managed" in your sense than used to be.</p></div>
	</htmltext>
<tokenext>Managed files Managed files are random-access files managed by a database or archive program .
Random access is supported .
The use of open modes O \ _SYNC , O \ _EXCL , or O \ _DIRECT during file creation indicates a managed file .
Seeks while open for write are permitted , multiple opens access the same file , and O \ _SYNC and O \ _EXCL must work as documented .
Unduplication via hashing probably is n't worth the trouble and is bad for database integrity.O \ _EXCL is used sometimes for unit files too , such as when they want a guarantee that the file is created by them .
This can be important in a security context.A relatively small number of programs and libraries use " managed " files , and they 're mostly databases of one kind or another.There 's more than there used to be due to the rise of the small-database-as-library , such as sqlite .
Generally , this is a good thing ( applications with data integrity without masses of configuration or reinvention of the wheel can hardly be anything but good !
) but it does mean that more files are " managed " in your sense than used to be .</tokentext>
<sentencetext>  Managed files Managed files are random-access files managed by a database or archive program.
Random access is supported.
The use of open modes O\_SYNC, O\_EXCL, or O\_DIRECT during file creation indicates a managed file.
Seeks while open for write are permitted, multiple opens access the same file, and O\_SYNC and O\_EXCL must work as documented.
Unduplication via hashing probably isn't worth the trouble and is bad for database integrity.O\_EXCL is used sometimes for unit files too, such as when they want a guarantee that the file is created by them.
This can be important in a security context.A relatively small number of programs and libraries use "managed" files, and they're mostly databases of one kind or another.There's more than there used to be due to the rise of the small-database-as-library, such as sqlite.
Generally, this is a good thing (applications with data integrity without masses of configuration or reinvention of the wheel can hardly be anything but good!
) but it does mean that more files are "managed" in your sense than used to be.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957078</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>ZerdZerd</author>
	<datestamp>1257167160000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>I hope btrfs will get it. Or else you will have to add it<nobr> <wbr></nobr>:)</p></htmltext>
<tokenext>I hope btrfs will get it .
Or else you will have to add it : )</tokentext>
<sentencetext>I hope btrfs will get it.
Or else you will have to add it :)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29966590</id>
	<title>Re:What if...</title>
	<author>raynet</author>
	<datestamp>1257277980000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>You mark that as bad and use the backup copy you have as I assume you would be using atleast mirroring or better yet, raidz or raidz2.</p></htmltext>
<tokenext>You mark that as bad and use the backup copy you have as I assume you would be using atleast mirroring or better yet , raidz or raidz2 .</tokentext>
<sentencetext>You mark that as bad and use the backup copy you have as I assume you would be using atleast mirroring or better yet, raidz or raidz2.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961166</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959562</id>
	<title>Re:well ...</title>
	<author>TrevorDoom</author>
	<datestamp>1257181020000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>My company used a X4500 and we discovered the bug that caused Sun to make the X4540 - the Marvell SATA chipset in the X4500 had a serious bug in firmware that was exacerbated by the Solaris X86 Marvell chipset driver.<br>Under heavy small block random IO intermingled with heavy sequential large block IO, the box would kernel panic and hang - only a power cycle would reset the box.</p><p>Sun ended up refunding us the cost of the servers and providing us exceptionally large incentives to purchase Sun StorageTek storage.</p><p>It wouldn't surprise me if the X4540 would have similar issues because they were rushing to replace the X4500 to try and minimize the possibility about bad PR over the X4500 being amazingly unstable.</p><p>This is why I'll be waiting for FreeBSD to support this because they will probably have better SATA chipset drivers and the chances of the system hanging because the Solaris kernel drivers for the SATA chipset (nevermind that it's a SATA chipset that Sun put into their own board).</p></htmltext>
<tokenext>My company used a X4500 and we discovered the bug that caused Sun to make the X4540 - the Marvell SATA chipset in the X4500 had a serious bug in firmware that was exacerbated by the Solaris X86 Marvell chipset driver.Under heavy small block random IO intermingled with heavy sequential large block IO , the box would kernel panic and hang - only a power cycle would reset the box.Sun ended up refunding us the cost of the servers and providing us exceptionally large incentives to purchase Sun StorageTek storage.It would n't surprise me if the X4540 would have similar issues because they were rushing to replace the X4500 to try and minimize the possibility about bad PR over the X4500 being amazingly unstable.This is why I 'll be waiting for FreeBSD to support this because they will probably have better SATA chipset drivers and the chances of the system hanging because the Solaris kernel drivers for the SATA chipset ( nevermind that it 's a SATA chipset that Sun put into their own board ) .</tokentext>
<sentencetext>My company used a X4500 and we discovered the bug that caused Sun to make the X4540 - the Marvell SATA chipset in the X4500 had a serious bug in firmware that was exacerbated by the Solaris X86 Marvell chipset driver.Under heavy small block random IO intermingled with heavy sequential large block IO, the box would kernel panic and hang - only a power cycle would reset the box.Sun ended up refunding us the cost of the servers and providing us exceptionally large incentives to purchase Sun StorageTek storage.It wouldn't surprise me if the X4540 would have similar issues because they were rushing to replace the X4500 to try and minimize the possibility about bad PR over the X4500 being amazingly unstable.This is why I'll be waiting for FreeBSD to support this because they will probably have better SATA chipset drivers and the chances of the system hanging because the Solaris kernel drivers for the SATA chipset (nevermind that it's a SATA chipset that Sun put into their own board).</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957690</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959808</id>
	<title>Re:What's the point?</title>
	<author>phoenix\_rizzen</author>
	<datestamp>1257183300000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Create a backup server that does remote backups of hundreds of Linux and Windows servers and what do you get?  Multiple copies of identical OS system files all taking up space.  Add dedupe and you can cut the storage requirements by a whole lot.</p><p>Create a VM host server running multiple VMs using the same guest OS and what do you get?  Multiple copies of identical OS systems files all taking up space.  Add dedupe and you can cut the storage requirements by a whole lot.</p><p>There are other situations where you end up with lots of identical files/blocks on a storage pool.  Dedupe may not be useful on a single OS system, but that doesn't make it useless.</p></htmltext>
<tokenext>Create a backup server that does remote backups of hundreds of Linux and Windows servers and what do you get ?
Multiple copies of identical OS system files all taking up space .
Add dedupe and you can cut the storage requirements by a whole lot.Create a VM host server running multiple VMs using the same guest OS and what do you get ?
Multiple copies of identical OS systems files all taking up space .
Add dedupe and you can cut the storage requirements by a whole lot.There are other situations where you end up with lots of identical files/blocks on a storage pool .
Dedupe may not be useful on a single OS system , but that does n't make it useless .</tokentext>
<sentencetext>Create a backup server that does remote backups of hundreds of Linux and Windows servers and what do you get?
Multiple copies of identical OS system files all taking up space.
Add dedupe and you can cut the storage requirements by a whole lot.Create a VM host server running multiple VMs using the same guest OS and what do you get?
Multiple copies of identical OS systems files all taking up space.
Add dedupe and you can cut the storage requirements by a whole lot.There are other situations where you end up with lots of identical files/blocks on a storage pool.
Dedupe may not be useful on a single OS system, but that doesn't make it useless.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957268</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960960</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>this great guy</author>
	<datestamp>1257281340000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>Imagine he amount of stuff you could (unreliably) store on a hard disk if massive de-duplication was built into the drive electronics.</p></div><p>Bad idea. Doing dedup in the drive electronics would:
</p><ul>
<li>Not allow dedup across multiple drives.</li><li>Not allow dedup of data blocks cached in memory (the OS would be unaware of duplicated blocks)</li><li>Waste disk I/O writes on duplicated blocks. When doing dedup in software, the OS doesn't even bother sending data blocks to the drive.</li></ul></div>
	</htmltext>
<tokenext>Imagine he amount of stuff you could ( unreliably ) store on a hard disk if massive de-duplication was built into the drive electronics.Bad idea .
Doing dedup in the drive electronics would : Not allow dedup across multiple drives.Not allow dedup of data blocks cached in memory ( the OS would be unaware of duplicated blocks ) Waste disk I/O writes on duplicated blocks .
When doing dedup in software , the OS does n't even bother sending data blocks to the drive .</tokentext>
<sentencetext>Imagine he amount of stuff you could (unreliably) store on a hard disk if massive de-duplication was built into the drive electronics.Bad idea.
Doing dedup in the drive electronics would:

Not allow dedup across multiple drives.Not allow dedup of data blocks cached in memory (the OS would be unaware of duplicated blocks)Waste disk I/O writes on duplicated blocks.
When doing dedup in software, the OS doesn't even bother sending data blocks to the drive.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961630</id>
	<title>Re:Hash Collisions</title>
	<author>TheRaven64</author>
	<datestamp>1257248340000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>What kind of RAM are you using for doing this comparison?  The probability of a non-correctable error in ECC RAM is about 50 orders of magnitude higher than the probability of a SHA256 collision so the extra memory churn is likely to decrease your reliability unless you're using magic RAM, not increase it.</htmltext>
<tokenext>What kind of RAM are you using for doing this comparison ?
The probability of a non-correctable error in ECC RAM is about 50 orders of magnitude higher than the probability of a SHA256 collision so the extra memory churn is likely to decrease your reliability unless you 're using magic RAM , not increase it .</tokentext>
<sentencetext>What kind of RAM are you using for doing this comparison?
The probability of a non-correctable error in ECC RAM is about 50 orders of magnitude higher than the probability of a SHA256 collision so the extra memory churn is likely to decrease your reliability unless you're using magic RAM, not increase it.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959800</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961544</id>
	<title>Re:BTRFS is better</title>
	<author>hab136</author>
	<datestamp>1257247560000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>And unusable on anything but Linux due to licensing (BTRFS is GPL).  No, FUSE doesn't count for production use.</p></htmltext>
<tokenext>And unusable on anything but Linux due to licensing ( BTRFS is GPL ) .
No , FUSE does n't count for production use .</tokentext>
<sentencetext>And unusable on anything but Linux due to licensing (BTRFS is GPL).
No, FUSE doesn't count for production use.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957856</id>
	<title>Building it in makes no sense</title>
	<author>saleenS281</author>
	<datestamp>1257170400000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>First, why would you want it built into a hard drive?  Your deduplication ratio would then be limited to what you can store on one drive.  The drive would have no way to reference blocks on other drives in the same system.  Doing it in software allows you reference (in this case) all data within the entire zpool.  That could be petabytes of storage (theoretically it could be far more, but that's probably the realistic limits today due to hardware/performance constraints).
<br> <br>
As for your "hell to pay later" that's not true for two reason.  First, there is no "modify in place".  All data is allocated from new blocks, that's how a copy-on-write filesystem works.  If it's "updated" you'd be allocating new blocks.  If you're concerned with filling a pool up completely, you can put quota's in place to prevent it.
<br> <br>
Second, if you "run out of space", you just add new drives to the raid group and continue on your merry way.  You can grow a zpool on the fly.</htmltext>
<tokenext>First , why would you want it built into a hard drive ?
Your deduplication ratio would then be limited to what you can store on one drive .
The drive would have no way to reference blocks on other drives in the same system .
Doing it in software allows you reference ( in this case ) all data within the entire zpool .
That could be petabytes of storage ( theoretically it could be far more , but that 's probably the realistic limits today due to hardware/performance constraints ) .
As for your " hell to pay later " that 's not true for two reason .
First , there is no " modify in place " .
All data is allocated from new blocks , that 's how a copy-on-write filesystem works .
If it 's " updated " you 'd be allocating new blocks .
If you 're concerned with filling a pool up completely , you can put quota 's in place to prevent it .
Second , if you " run out of space " , you just add new drives to the raid group and continue on your merry way .
You can grow a zpool on the fly .</tokentext>
<sentencetext>First, why would you want it built into a hard drive?
Your deduplication ratio would then be limited to what you can store on one drive.
The drive would have no way to reference blocks on other drives in the same system.
Doing it in software allows you reference (in this case) all data within the entire zpool.
That could be petabytes of storage (theoretically it could be far more, but that's probably the realistic limits today due to hardware/performance constraints).
As for your "hell to pay later" that's not true for two reason.
First, there is no "modify in place".
All data is allocated from new blocks, that's how a copy-on-write filesystem works.
If it's "updated" you'd be allocating new blocks.
If you're concerned with filling a pool up completely, you can put quota's in place to prevent it.
Second, if you "run out of space", you just add new drives to the raid group and continue on your merry way.
You can grow a zpool on the fly.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959410</id>
	<title>Slashdot on ZFS</title>
	<author>Anonymous</author>
	<datestamp>1257179820000</datestamp>
	<modclass>Funny</modclass>
	<modscore>1</modscore>
	<htmltext><p>So<nobr> <wbr></nobr>... any plans on using ZFS on slashdot to help de-duplicate stories?</p></htmltext>
<tokenext>So ... any plans on using ZFS on slashdot to help de-duplicate stories ?</tokentext>
<sentencetext>So ... any plans on using ZFS on slashdot to help de-duplicate stories?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438</id>
	<title>Re:This is good news...</title>
	<author>Anonymous</author>
	<datestamp>1257180000000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>2</modscore>
	<htmltext><p><div class="quote"><p>Use open source, get cutting edge things.</p></div><p>I run Linux, where's my ZFS?  No, FUSE doesn't count.</p></div>
	</htmltext>
<tokenext>Use open source , get cutting edge things.I run Linux , where 's my ZFS ?
No , FUSE does n't count .</tokentext>
<sentencetext>Use open source, get cutting edge things.I run Linux, where's my ZFS?
No, FUSE doesn't count.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958518</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>bertok</author>
	<datestamp>1257173700000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>3</modscore>
	<htmltext><p><div class="quote"><p>Windows Storage Server 2003  (yes, yes I know its from Microsoft) shipped with this feature (that is called Single Instance Storage)<br><a href="http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a" title="technet.com">http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a</a> [technet.com] </p></div><p>It's not even close to the same thing.</p><p>We investigated this a while back, and it is basically a dirty, filthy hack on top of vanilla NTFS.</p><p>First of all, it doesn't compare blocks or byte-ranges, but entire files only. If two files are 99\% identical, then they are different, and SIS won't merge them.</p><p>Second, it uses a reparse point to merge the files, which has significant overhead, at least 4KB for each file, if I remember correctly. That is, SIS won't save you any disk space for small files, which is actually quite common on file servers. The overhead erases much of the benefit even for larger files, to the level that SIS will skip files smaller than 32KB by default.</p><p>Third, it operates in the background, <i>after</i> files have been written. This means that files have to be written out in their entirety, read back in, compared byte-for-byte to another file, and then erased later. This is incredibly inefficient. On large file servers, the disk was thrashed like crazy.</p><p>Lastly, we found that the Copy-on-Write mechanism immediately copied out the entire file if it was changed even slightly. For small files, this is not noticable, but for large files this can be a massive performance hog. A 4kb write can be potentially translated into a multi-GB copy!</p><p>Proper single-instancing systems use in-memory hash tables that are often partitioned using "file similarity" heuristics to prevent cache thrashing. Even more advanced systems can maintain single-instancing during replication and backups, reducing bandwidth requirements <i>enormously</i>. Take a look at the features of the <a href="http://www.datadomain.com/products/" title="datadomain.com">Data Domain</a> [datadomain.com] filers for an idea of what the current state of the art is.</p></div>
	</htmltext>
<tokenext>Windows Storage Server 2003 ( yes , yes I know its from Microsoft ) shipped with this feature ( that is called Single Instance Storage ) http : //blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a [ technet.com ] It 's not even close to the same thing.We investigated this a while back , and it is basically a dirty , filthy hack on top of vanilla NTFS.First of all , it does n't compare blocks or byte-ranges , but entire files only .
If two files are 99 \ % identical , then they are different , and SIS wo n't merge them.Second , it uses a reparse point to merge the files , which has significant overhead , at least 4KB for each file , if I remember correctly .
That is , SIS wo n't save you any disk space for small files , which is actually quite common on file servers .
The overhead erases much of the benefit even for larger files , to the level that SIS will skip files smaller than 32KB by default.Third , it operates in the background , after files have been written .
This means that files have to be written out in their entirety , read back in , compared byte-for-byte to another file , and then erased later .
This is incredibly inefficient .
On large file servers , the disk was thrashed like crazy.Lastly , we found that the Copy-on-Write mechanism immediately copied out the entire file if it was changed even slightly .
For small files , this is not noticable , but for large files this can be a massive performance hog .
A 4kb write can be potentially translated into a multi-GB copy ! Proper single-instancing systems use in-memory hash tables that are often partitioned using " file similarity " heuristics to prevent cache thrashing .
Even more advanced systems can maintain single-instancing during replication and backups , reducing bandwidth requirements enormously .
Take a look at the features of the Data Domain [ datadomain.com ] filers for an idea of what the current state of the art is .</tokentext>
<sentencetext>Windows Storage Server 2003  (yes, yes I know its from Microsoft) shipped with this feature (that is called Single Instance Storage)http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a [technet.com] It's not even close to the same thing.We investigated this a while back, and it is basically a dirty, filthy hack on top of vanilla NTFS.First of all, it doesn't compare blocks or byte-ranges, but entire files only.
If two files are 99\% identical, then they are different, and SIS won't merge them.Second, it uses a reparse point to merge the files, which has significant overhead, at least 4KB for each file, if I remember correctly.
That is, SIS won't save you any disk space for small files, which is actually quite common on file servers.
The overhead erases much of the benefit even for larger files, to the level that SIS will skip files smaller than 32KB by default.Third, it operates in the background, after files have been written.
This means that files have to be written out in their entirety, read back in, compared byte-for-byte to another file, and then erased later.
This is incredibly inefficient.
On large file servers, the disk was thrashed like crazy.Lastly, we found that the Copy-on-Write mechanism immediately copied out the entire file if it was changed even slightly.
For small files, this is not noticable, but for large files this can be a massive performance hog.
A 4kb write can be potentially translated into a multi-GB copy!Proper single-instancing systems use in-memory hash tables that are often partitioned using "file similarity" heuristics to prevent cache thrashing.
Even more advanced systems can maintain single-instancing during replication and backups, reducing bandwidth requirements enormously.
Take a look at the features of the Data Domain [datadomain.com] filers for an idea of what the current state of the art is.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963072</id>
	<title>Re:This is good news...</title>
	<author>Ant P.</author>
	<datestamp>1257261720000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><a href="http://www.lessfs.com/" title="lessfs.com">Here you go.</a> [lessfs.com]</p><p>I really don't see why ZFS deserves a front page article every time it assimilates another piece of old news. I guess it's a lesson to any software project looking for publicity - build everything into a monolithic ball of mud and win free slashvertisements.</p></htmltext>
<tokenext>Here you go .
[ lessfs.com ] I really do n't see why ZFS deserves a front page article every time it assimilates another piece of old news .
I guess it 's a lesson to any software project looking for publicity - build everything into a monolithic ball of mud and win free slashvertisements .</tokentext>
<sentencetext>Here you go.
[lessfs.com]I really don't see why ZFS deserves a front page article every time it assimilates another piece of old news.
I guess it's a lesson to any software project looking for publicity - build everything into a monolithic ball of mud and win free slashvertisements.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961756</id>
	<title>Re:BTRFS is better</title>
	<author>TheRaven64</author>
	<datestamp>1257249960000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Sure.  Apart from the fact that it's not production ready yet, doesn't yet implement most of the features of ZFS, and doesn't yet implement most of the better-than-ZFS features, btrfs is loads better.</htmltext>
<tokenext>Sure .
Apart from the fact that it 's not production ready yet , does n't yet implement most of the features of ZFS , and does n't yet implement most of the better-than-ZFS features , btrfs is loads better .</tokentext>
<sentencetext>Sure.
Apart from the fact that it's not production ready yet, doesn't yet implement most of the features of ZFS, and doesn't yet implement most of the better-than-ZFS features, btrfs is loads better.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961578</id>
	<title>Re:This is good news...</title>
	<author>Anonymous</author>
	<datestamp>1257247920000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p><div class="quote"><p>... That might be the case if e.g. you have 30 VMs on the same machine each with a separate install of the same OS...</p></div><p>You are doing it wrong!
<br> The correct approach is ot install the OS in one VM and then clone said VM 30 times. This way the original virtual disk becomes read-only and each of the clones stores only the diff. Done! Saves almost the same amount of space (except in the off-chance that the clones all contain similar differences to the original), works on every filesystem and no performance penalty.</p></div>
	</htmltext>
<tokenext>... That might be the case if e.g .
you have 30 VMs on the same machine each with a separate install of the same OS...You are doing it wrong !
The correct approach is ot install the OS in one VM and then clone said VM 30 times .
This way the original virtual disk becomes read-only and each of the clones stores only the diff .
Done ! Saves almost the same amount of space ( except in the off-chance that the clones all contain similar differences to the original ) , works on every filesystem and no performance penalty .</tokentext>
<sentencetext>... That might be the case if e.g.
you have 30 VMs on the same machine each with a separate install of the same OS...You are doing it wrong!
The correct approach is ot install the OS in one VM and then clone said VM 30 times.
This way the original virtual disk becomes read-only and each of the clones stores only the diff.
Done! Saves almost the same amount of space (except in the off-chance that the clones all contain similar differences to the original), works on every filesystem and no performance penalty.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957720</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962966</id>
	<title>Re:Any other file systems with that feature?</title>
	<author>Anonymous</author>
	<datestamp>1257261180000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p><a href="http://tech.slashdot.org/comments.pl?sid=1428208&amp;cid=29962112" title="slashdot.org" rel="nofollow">http://tech.slashdot.org/comments.pl?sid=1428208&amp;cid=29962112</a> [slashdot.org]</p></htmltext>
<tokenext>http : //tech.slashdot.org/comments.pl ? sid = 1428208&amp;cid = 29962112 [ slashdot.org ]</tokentext>
<sentencetext>http://tech.slashdot.org/comments.pl?sid=1428208&amp;cid=29962112 [slashdot.org]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960814</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959644</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>symbolset</author>
	<datestamp>1257181860000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>2</modscore>
	<htmltext>I'm curious about these storage needs that shrink.  Is this a hypothetical case, or can you provide a real world citation of an example?  In a broad world many strange things are found but I always considered this one mythical.</htmltext>
<tokenext>I 'm curious about these storage needs that shrink .
Is this a hypothetical case , or can you provide a real world citation of an example ?
In a broad world many strange things are found but I always considered this one mythical .</tokentext>
<sentencetext>I'm curious about these storage needs that shrink.
Is this a hypothetical case, or can you provide a real world citation of an example?
In a broad world many strange things are found but I always considered this one mythical.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961084</id>
	<title>Re:Does that mean...</title>
	<author>noidentity</author>
	<datestamp>1257240000000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>2</modscore>
	<htmltext><blockquote><div><p>Duplicate slashdot articles will be links back to the original one?</p></div>

</blockquote><p>No, see, this de-duplication is transparent at the interface level. So while dupes won't take extra disk space on Slashdot servers, we'll still see them as normal. Isn't it nice to know that this optimization will be taking place?</p></div>
	</htmltext>
<tokenext>Duplicate slashdot articles will be links back to the original one ?
No , see , this de-duplication is transparent at the interface level .
So while dupes wo n't take extra disk space on Slashdot servers , we 'll still see them as normal .
Is n't it nice to know that this optimization will be taking place ?</tokentext>
<sentencetext>Duplicate slashdot articles will be links back to the original one?
No, see, this de-duplication is transparent at the interface level.
So while dupes won't take extra disk space on Slashdot servers, we'll still see them as normal.
Isn't it nice to know that this optimization will be taking place?
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956668</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957448</id>
	<title>Re:Wake me when they build it into the hard disk</title>
	<author>Anonymous</author>
	<datestamp>1257168720000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>This doesn't apply to ZFS due to the way it uses drives. All drives are added to a storage pool, and drives are used as needed based on speed and reliability requirements. So to upgrade, you'd just add a new drive to the pool, mark the old drive for removal, wait as it moves the blocks to any other drive(s) in the pool, then remove the old drive.</p></htmltext>
<tokenext>This does n't apply to ZFS due to the way it uses drives .
All drives are added to a storage pool , and drives are used as needed based on speed and reliability requirements .
So to upgrade , you 'd just add a new drive to the pool , mark the old drive for removal , wait as it moves the blocks to any other drive ( s ) in the pool , then remove the old drive .</tokentext>
<sentencetext>This doesn't apply to ZFS due to the way it uses drives.
All drives are added to a storage pool, and drives are used as needed based on speed and reliability requirements.
So to upgrade, you'd just add a new drive to the pool, mark the old drive for removal, wait as it moves the blocks to any other drive(s) in the pool, then remove the old drive.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962658</id>
	<title>Re:Hash Collisions</title>
	<author>Anonymous</author>
	<datestamp>1257259200000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>I worked for an internet backup company who did the exact same analysis and came to the same conclusion.  And that was including the fact that we had crazy serious fault tolerance (RAID, mirroring, multiple locations,<nobr> <wbr></nobr>...)</p></htmltext>
<tokenext>I worked for an internet backup company who did the exact same analysis and came to the same conclusion .
And that was including the fact that we had crazy serious fault tolerance ( RAID , mirroring , multiple locations , ... )</tokentext>
<sentencetext>I worked for an internet backup company who did the exact same analysis and came to the same conclusion.
And that was including the fact that we had crazy serious fault tolerance (RAID, mirroring, multiple locations, ...)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957458</id>
	<title>Re:More reason to be a ZFS fanboy</title>
	<author>Just Some Guy</author>
	<datestamp>1257168780000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>What do you know - you and I actually agree on something.  Yeah, FreeBSD + ZFS is a complete win for pretty much everything involving file transfer.  I honestly can't think of a single thing I don't like about it.  The instant FreeBSD imports this, I'm swapping in a quad-core CPU to give it as much crunching power as it wants to do its thing.</p></htmltext>
<tokenext>What do you know - you and I actually agree on something .
Yeah , FreeBSD + ZFS is a complete win for pretty much everything involving file transfer .
I honestly ca n't think of a single thing I do n't like about it .
The instant FreeBSD imports this , I 'm swapping in a quad-core CPU to give it as much crunching power as it wants to do its thing .</tokentext>
<sentencetext>What do you know - you and I actually agree on something.
Yeah, FreeBSD + ZFS is a complete win for pretty much everything involving file transfer.
I honestly can't think of a single thing I don't like about it.
The instant FreeBSD imports this, I'm swapping in a quad-core CPU to give it as much crunching power as it wants to do its thing.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720</id>
	<title>Hash Collisions</title>
	<author>UltimApe</author>
	<datestamp>1257164940000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>Surely with high amounts of data (that zfs is supposed to be able to handle), a hash collision may occur?  I'm sure a block is &gt; 256bits.  Do they just expect this never to happen?</p><p>Although I suppose they could just be using it as a way to narrow down candidates for deduplication...  doing a final bit for bit check before deciding the data is the same.</p></htmltext>
<tokenext>Surely with high amounts of data ( that zfs is supposed to be able to handle ) , a hash collision may occur ?
I 'm sure a block is &gt; 256bits .
Do they just expect this never to happen ? Although I suppose they could just be using it as a way to narrow down candidates for deduplication... doing a final bit for bit check before deciding the data is the same .</tokentext>
<sentencetext>Surely with high amounts of data (that zfs is supposed to be able to handle), a hash collision may occur?
I'm sure a block is &gt; 256bits.
Do they just expect this never to happen?Although I suppose they could just be using it as a way to narrow down candidates for deduplication...  doing a final bit for bit check before deciding the data is the same.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963490</id>
	<title>Re:BTRFS is better</title>
	<author>Anonymous</author>
	<datestamp>1257264060000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Except for the minor detail that it doesn't actually work in production yet and ZFS has for years? Yeah, I guess once it actually exists it's going to rule! any day now, any day now!</p></htmltext>
<tokenext>Except for the minor detail that it does n't actually work in production yet and ZFS has for years ?
Yeah , I guess once it actually exists it 's going to rule !
any day now , any day now !</tokentext>
<sentencetext>Except for the minor detail that it doesn't actually work in production yet and ZFS has for years?
Yeah, I guess once it actually exists it's going to rule!
any day now, any day now!</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961162</id>
	<title>Re:Hash Collisions</title>
	<author>Anonymous</author>
	<datestamp>1257240960000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p><div class="quote"><p>The probability of a hash collision for a 256 bit hash (or even a 128 bit one) is negligible.</p><p>How negligible? Well, the probability of a collision is never more then N^2 / 2^h, where N is the number of blocks stored and h is the number of bits in the hash. So, if we have 2^64 blocks stored (a mere billion terabytes or so for 128 byte blocks) , the probability of a collision is less than 2^(-128), or 10^(-38). Hardly worth worrying about.</p><p>And that's an upper limit, not the actual value.</p></div><p>Your math isn't quite right.  The last step is to apply Murphy's Law.  if P is 2^(-128), M(P) (where M is the Murphy function) = 1.  Thus a collision is guaranteed.</p></div>
	</htmltext>
<tokenext>The probability of a hash collision for a 256 bit hash ( or even a 128 bit one ) is negligible.How negligible ?
Well , the probability of a collision is never more then N ^ 2 / 2 ^ h , where N is the number of blocks stored and h is the number of bits in the hash .
So , if we have 2 ^ 64 blocks stored ( a mere billion terabytes or so for 128 byte blocks ) , the probability of a collision is less than 2 ^ ( -128 ) , or 10 ^ ( -38 ) .
Hardly worth worrying about.And that 's an upper limit , not the actual value.Your math is n't quite right .
The last step is to apply Murphy 's Law .
if P is 2 ^ ( -128 ) , M ( P ) ( where M is the Murphy function ) = 1 .
Thus a collision is guaranteed .</tokentext>
<sentencetext>The probability of a hash collision for a 256 bit hash (or even a 128 bit one) is negligible.How negligible?
Well, the probability of a collision is never more then N^2 / 2^h, where N is the number of blocks stored and h is the number of bits in the hash.
So, if we have 2^64 blocks stored (a mere billion terabytes or so for 128 byte blocks) , the probability of a collision is less than 2^(-128), or 10^(-38).
Hardly worth worrying about.And that's an upper limit, not the actual value.Your math isn't quite right.
The last step is to apply Murphy's Law.
if P is 2^(-128), M(P) (where M is the Murphy function) = 1.
Thus a collision is guaranteed.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956856</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958946</id>
	<title>full byte comparisons</title>
	<author>Anonymous</author>
	<datestamp>1257175920000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>I like this from the article:
<p>
&gt; You can tell ZFS to do full byte comparisons rather than relying on the hash if you want full security against hash duplicates:
</p><p>
I once did similar a project with web content caching that replaced some data with a hash of said data with a way to get to the actual data.
All sorts of people were worried about hash conflicts, etc. People are always worried about collisions.
</p><p>
It took a lot of convincing that that risk is lower than a nuclear strike on the data center(s).
</p><p>
What finally did convince my team mates was that 2^256 (~10^77) is by some estimates is close to the number of elementary particles in the visible universe (without a few orders of magnitudes at least).
<br>
So assuming the hash function is good (there's no evidence to prove otherwise), we'd have to try almost as many inputs as there are particles in the universe.
The chances of hitting duplicates are so astronomically small that doing byte comparisons is most certainly useless, and just check mark feature for those types who worry about these things. AFAIK there are no known SHA256 duplicates.</p></htmltext>
<tokenext>I like this from the article : &gt; You can tell ZFS to do full byte comparisons rather than relying on the hash if you want full security against hash duplicates : I once did similar a project with web content caching that replaced some data with a hash of said data with a way to get to the actual data .
All sorts of people were worried about hash conflicts , etc .
People are always worried about collisions .
It took a lot of convincing that that risk is lower than a nuclear strike on the data center ( s ) .
What finally did convince my team mates was that 2 ^ 256 ( ~ 10 ^ 77 ) is by some estimates is close to the number of elementary particles in the visible universe ( without a few orders of magnitudes at least ) .
So assuming the hash function is good ( there 's no evidence to prove otherwise ) , we 'd have to try almost as many inputs as there are particles in the universe .
The chances of hitting duplicates are so astronomically small that doing byte comparisons is most certainly useless , and just check mark feature for those types who worry about these things .
AFAIK there are no known SHA256 duplicates .</tokentext>
<sentencetext>I like this from the article:

&gt; You can tell ZFS to do full byte comparisons rather than relying on the hash if you want full security against hash duplicates:

I once did similar a project with web content caching that replaced some data with a hash of said data with a way to get to the actual data.
All sorts of people were worried about hash conflicts, etc.
People are always worried about collisions.
It took a lot of convincing that that risk is lower than a nuclear strike on the data center(s).
What finally did convince my team mates was that 2^256 (~10^77) is by some estimates is close to the number of elementary particles in the visible universe (without a few orders of magnitudes at least).
So assuming the hash function is good (there's no evidence to prove otherwise), we'd have to try almost as many inputs as there are particles in the universe.
The chances of hitting duplicates are so astronomically small that doing byte comparisons is most certainly useless, and just check mark feature for those types who worry about these things.
AFAIK there are no known SHA256 duplicates.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957132</id>
	<title>This is the year of Solaris on the desktop</title>
	<author>jotaeleemeese</author>
	<datestamp>1257167400000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Where did I hear that one?</p></htmltext>
<tokenext>Where did I hear that one ?</tokentext>
<sentencetext>Where did I hear that one?</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676</parent>
</comment>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_10</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956810
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_68</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_41</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957400
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_59</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957452
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_62</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957336
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957752
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_87</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957334
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_1</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963144
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_58</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961166
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29966948
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_89</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961166
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29966590
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_65</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957690
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959562
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_26</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29967668
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_42</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29972298
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_33</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956668
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961084
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_16</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961750
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_32</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957690
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29964416
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_23</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29965556
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_7</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960628
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_84</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960688
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_57</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957658
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959156
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_60</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962098
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_51</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961068
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_74</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956984
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_48</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956866
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959800
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961630
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_50</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957074
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_81</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956866
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959800
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960360
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_24</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961332
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_8</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961778
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_52</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959188
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_15</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957078
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_38</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957442
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959884
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_31</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961770
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_14</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956856
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961162
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_45</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956668
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961640
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_21</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959644
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29988350
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_5</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957448
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_79</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961748
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_82</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29970150
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_73</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957518
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_69</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963946
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_72</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962128
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_63</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959094
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_46</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958518
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963514
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_37</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958578
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_13</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963490
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_36</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957856
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_27</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962658
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_2</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956906
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_43</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957336
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958582
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_92</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959316
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_66</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961728
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_71</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956786
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960672
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_85</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956866
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958536
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_28</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957336
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957970
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_56</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957772
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960288
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_19</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957240
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_61</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29967664
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_35</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957456
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957986
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29965190
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_18</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962330
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_49</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960186
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29967492
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_40</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958192
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_25</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957132
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_9</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957780
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_91</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960960
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_86</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957810
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_0</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957458
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_30</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962376
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_88</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957660
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_90</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961756
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_64</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957694
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_55</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957456
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958186
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_78</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958598
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_83</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959348
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_54</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957442
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958516
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_17</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956856
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961470
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_22</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956982
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_6</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957268
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959808
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_47</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956760
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_12</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961342
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_75</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960662
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_3</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959214
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_77</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958462
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_80</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957456
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957986
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960814
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962966
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_53</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961578
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_76</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29973000
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_39</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963072
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_67</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957876
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_70</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957268
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957866
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_44</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959644
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29964594
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_29</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961544
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_20</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960086
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_4</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29965698
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_11</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962340
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_11_02_2117206_34</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961888
</commentlist>
</thread>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.7</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957268
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959808
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957866
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.4</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958946
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.5</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956676
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956906
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957720
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961578
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961748
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959316
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956770
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957240
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959438
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961728
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963072
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961332
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963946
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29967668
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957442
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958516
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959884
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957132
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.2</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956752
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957458
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960186
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29967492
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960662
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957690
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29964416
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959562
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957072
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957400
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959644
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29988350
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29964594
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958598
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29965698
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958578
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962376
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957780
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.9</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956720
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956866
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959800
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960360
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961630
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958536
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957876
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956960
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962340
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961778
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960628
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962658
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961342
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963144
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962098
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957452
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956760
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956786
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960672
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956856
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961162
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961470
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957334
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.13</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956796
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960086
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960960
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962330
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957336
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958582
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957970
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957752
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957074
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959188
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957856
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29973000
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958192
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960688
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957518
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957012
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957810
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957448
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.3</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957772
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960288
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.11</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956668
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961640
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961084
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.1</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958384
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961888
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961770
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29965556
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29970150
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29972298
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.12</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956686
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.10</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957026
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.8</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959318
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963490
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962128
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961756
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29967664
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961544
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.0</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956740
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961750
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959094
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961068
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956984
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956810
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956802
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957694
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957456
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957986
----http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29960814
-----http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29962966
----http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29965190
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958186
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958518
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29963514
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29956982
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957078
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957388
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957658
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959156
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959348
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29958462
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29959214
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29957660
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_11_02_2117206.6</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29961166
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29966590
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_11_02_2117206.29966948
</commentlist>
</conversation>
