<article>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#article10_03_28_0052234</id>
	<title>Open Source Deduplication For Linux With Opendedup</title>
	<author>timothy</author>
	<datestamp>1269790260000</datestamp>
	<htmltext>tazzbit writes <i>"The storage vendors have been crowing about data deduplication technology for some time now, but a new open source project, <a href="http://www.opendedup.org/">Opendedup</a>, brings it to Linux and its hypervisors &mdash; KVM, Xen and VMware. The new deduplication-based file system called SDFS (GPL v2) is scalable to <a href="http://www.cio.com.au/article/340870/open\_source\_deduplication\_software\_released\_linux">eight petabytes of capacity with 256 storage engines</a>, which can each store up to 32TB of deduplicated data. Each volume can be up to 8 exabytes and the number of files is limited by the underlying file system. Opendedup runs in user space, making it platform independent, easier to scale and cluster, and it can integrate with other user space services like Amazon S3."</i></htmltext>
<tokenext>tazzbit writes " The storage vendors have been crowing about data deduplication technology for some time now , but a new open source project , Opendedup , brings it to Linux and its hypervisors    KVM , Xen and VMware .
The new deduplication-based file system called SDFS ( GPL v2 ) is scalable to eight petabytes of capacity with 256 storage engines , which can each store up to 32TB of deduplicated data .
Each volume can be up to 8 exabytes and the number of files is limited by the underlying file system .
Opendedup runs in user space , making it platform independent , easier to scale and cluster , and it can integrate with other user space services like Amazon S3 .
"</tokentext>
<sentencetext>tazzbit writes "The storage vendors have been crowing about data deduplication technology for some time now, but a new open source project, Opendedup, brings it to Linux and its hypervisors — KVM, Xen and VMware.
The new deduplication-based file system called SDFS (GPL v2) is scalable to eight petabytes of capacity with 256 storage engines, which can each store up to 32TB of deduplicated data.
Each volume can be up to 8 exabytes and the number of files is limited by the underlying file system.
Opendedup runs in user space, making it platform independent, easier to scale and cluster, and it can integrate with other user space services like Amazon S3.
"</sentencetext>
</article>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645694</id>
	<title>Re:A hypothetical question.</title>
	<author>drsmithy</author>
	<datestamp>1269806580000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p> <i>Note that you can't *change* the file (because that would just split the files up again), but being able to read the file (when you couldn't before) or knowing that another copy exists elsewhere can be very useful knowledge.</i>
</p><p>If you can "generate a file" that can be deduplicated, then by definition you already know about the date in that file.</p></htmltext>
<tokenext>Note that you ca n't * change * the file ( because that would just split the files up again ) , but being able to read the file ( when you could n't before ) or knowing that another copy exists elsewhere can be very useful knowledge .
If you can " generate a file " that can be deduplicated , then by definition you already know about the date in that file .</tokentext>
<sentencetext> Note that you can't *change* the file (because that would just split the files up again), but being able to read the file (when you couldn't before) or knowing that another copy exists elsewhere can be very useful knowledge.
If you can "generate a file" that can be deduplicated, then by definition you already know about the date in that file.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645068</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644940</id>
	<title>redundant if saving large amounts of data to SAN</title>
	<author>Anonymous</author>
	<datestamp>1269709440000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>If you are storing that amount of data wouldn't you use a SAN and don't most already have data de-duplication technology?  I suppose this project will be pillaged by all of the backup appliance MFG's and those who build consumer grade NAS devices</p></htmltext>
<tokenext>If you are storing that amount of data would n't you use a SAN and do n't most already have data de-duplication technology ?
I suppose this project will be pillaged by all of the backup appliance MFG 's and those who build consumer grade NAS devices</tokentext>
<sentencetext>If you are storing that amount of data wouldn't you use a SAN and don't most already have data de-duplication technology?
I suppose this project will be pillaged by all of the backup appliance MFG's and those who build consumer grade NAS devices</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645010</id>
	<title>Re:Or get inline deduplication</title>
	<author>mrsteveman1</author>
	<datestamp>1269710160000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Plus you get the "real" ZFS, zones, and tightly integrated, bootable system rollbacks using zfs clones<nobr> <wbr></nobr>:)</p></htmltext>
<tokenext>Plus you get the " real " ZFS , zones , and tightly integrated , bootable system rollbacks using zfs clones : )</tokentext>
<sentencetext>Plus you get the "real" ZFS, zones, and tightly integrated, bootable system rollbacks using zfs clones :)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644972</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646482</id>
	<title>Clueless OP</title>
	<author>Anonymous</author>
	<datestamp>1269780540000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p><b>but a new open source project, Opendedup, brings it to Linux and its hypervisors &mdash; KVM, Xen and VMware. The new deduplication-based file system called SDFS (GPL v2) </b> </p><p>Firstly, vmWare's hypervisor isn't based on Linux.  It is a proprietary kernel.  The Service Console in vmWare is a custom Linux based on Redhat, but the hypervisor itself is not Linux.</p><p>Secondly, vmWare uses their own proprietary VMFS filesystem that allows multiple physical servers access to the same SAN-attached LUN.  It can also use NFS for VM storage.  It does NOT support the use of SDFS.</p></htmltext>
<tokenext>but a new open source project , Opendedup , brings it to Linux and its hypervisors    KVM , Xen and VMware .
The new deduplication-based file system called SDFS ( GPL v2 ) Firstly , vmWare 's hypervisor is n't based on Linux .
It is a proprietary kernel .
The Service Console in vmWare is a custom Linux based on Redhat , but the hypervisor itself is not Linux.Secondly , vmWare uses their own proprietary VMFS filesystem that allows multiple physical servers access to the same SAN-attached LUN .
It can also use NFS for VM storage .
It does NOT support the use of SDFS .</tokentext>
<sentencetext>but a new open source project, Opendedup, brings it to Linux and its hypervisors — KVM, Xen and VMware.
The new deduplication-based file system called SDFS (GPL v2)  Firstly, vmWare's hypervisor isn't based on Linux.
It is a proprietary kernel.
The Service Console in vmWare is a custom Linux based on Redhat, but the hypervisor itself is not Linux.Secondly, vmWare uses their own proprietary VMFS filesystem that allows multiple physical servers access to the same SAN-attached LUN.
It can also use NFS for VM storage.
It does NOT support the use of SDFS.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646422</id>
	<title>Re:This is for hard disks</title>
	<author>DarkOx</author>
	<datestamp>1269779280000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>Does software like ESX and others (Xen etc) perform this in memory already for running VMs? I.e. if you have 2 Windows VMs it will only store one copy of the libs etc in the hosts memory ?</p> </div><p>I don't know about Xen but VMWare will do that.</p><p><div class="quote"><p>is there easy way to get multiple machines running 'as one' to pool resources for running a vm setup? Does openmosix do that?</p></div><p>I am not entirely certain what you mean by 'as one' to pool resources.  Openmosix more or less is a load distributor that dispatches jobs across hosts.  I am not sure what advantage you would gain by virtualizing the hosts other than granularity.</p></div>
	</htmltext>
<tokenext>Does software like ESX and others ( Xen etc ) perform this in memory already for running VMs ?
I.e. if you have 2 Windows VMs it will only store one copy of the libs etc in the hosts memory ?
I do n't know about Xen but VMWare will do that.is there easy way to get multiple machines running 'as one ' to pool resources for running a vm setup ?
Does openmosix do that ? I am not entirely certain what you mean by 'as one ' to pool resources .
Openmosix more or less is a load distributor that dispatches jobs across hosts .
I am not sure what advantage you would gain by virtualizing the hosts other than granularity .</tokentext>
<sentencetext>Does software like ESX and others (Xen etc) perform this in memory already for running VMs?
I.e. if you have 2 Windows VMs it will only store one copy of the libs etc in the hosts memory ?
I don't know about Xen but VMWare will do that.is there easy way to get multiple machines running 'as one' to pool resources for running a vm setup?
Does openmosix do that?I am not entirely certain what you mean by 'as one' to pool resources.
Openmosix more or less is a load distributor that dispatches jobs across hosts.
I am not sure what advantage you would gain by virtualizing the hosts other than granularity.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645502</id>
	<title>Re:This just gave me a good idea!</title>
	<author>Anonymous</author>
	<datestamp>1269716880000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>look into fdupes for finding/dealing w/ dupes.</p></htmltext>
<tokenext>look into fdupes for finding/dealing w/ dupes .</tokentext>
<sentencetext>look into fdupes for finding/dealing w/ dupes.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31647372</id>
	<title>Two open-source de-duplication systems</title>
	<author>Anonymous</author>
	<datestamp>1269792300000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>I've never used opendedup but I have been using lessfs http://www.lessfs.com/wordpress/ to store backups of virtual servers.</p><p>So now we have two choices for open source de-duplication!</p></htmltext>
<tokenext>I 've never used opendedup but I have been using lessfs http : //www.lessfs.com/wordpress/ to store backups of virtual servers.So now we have two choices for open source de-duplication !</tokentext>
<sentencetext>I've never used opendedup but I have been using lessfs http://www.lessfs.com/wordpress/ to store backups of virtual servers.So now we have two choices for open source de-duplication!</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646212</id>
	<title>Re:deduplication</title>
	<author>nacturation</author>
	<datestamp>1269774600000</datestamp>
	<modclass>Funny</modclass>
	<modscore>4</modscore>
	<htmltext><p><div class="quote"><p>What kind of lame recursive acronym is "deduplication"? I'm flummoxed in any attempt to decipher it.</p></div><p>Deduplication Eases Disk Utilization Purposefully Linking Information Common Among Trusted Independent Operating Nodes</p></div>
	</htmltext>
<tokenext>What kind of lame recursive acronym is " deduplication " ?
I 'm flummoxed in any attempt to decipher it.Deduplication Eases Disk Utilization Purposefully Linking Information Common Among Trusted Independent Operating Nodes</tokentext>
<sentencetext>What kind of lame recursive acronym is "deduplication"?
I'm flummoxed in any attempt to decipher it.Deduplication Eases Disk Utilization Purposefully Linking Information Common Among Trusted Independent Operating Nodes
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644898</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645160</id>
	<title>Patent 5,813,008</title>
	<author>Anonymous</author>
	<datestamp>1269712140000</datestamp>
	<modclass>Offtopic</modclass>
	<modscore>0</modscore>
	<htmltext><p>September 22, 1998.<br><a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect2=PTO1&amp;Sect2=HITOFF&amp;p=1&amp;u=/netahtml/PTO/search-bool.html&amp;r=1&amp;f=G&amp;l=50&amp;d=PALL&amp;RefSrch=yes&amp;Query=PN/5813008" title="uspto.gov" rel="nofollow">Single instance storage of information</a> [uspto.gov]</p></htmltext>
<tokenext>September 22 , 1998.Single instance storage of information [ uspto.gov ]</tokentext>
<sentencetext>September 22, 1998.Single instance storage of information [uspto.gov]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645554</id>
	<title>New use for an old algorithm?</title>
	<author>Squatting\_Dog</author>
	<datestamp>1269718020000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Isn't this just an application of 'tokenizing' as it is used in compression of data streams? Build an index of unique(read non-repetitive) data segments and store the (smaller)index and resulting data?</p><p>This has been around for some time....hard to believe that this use has just come to light.</p></htmltext>
<tokenext>Is n't this just an application of 'tokenizing ' as it is used in compression of data streams ?
Build an index of unique ( read non-repetitive ) data segments and store the ( smaller ) index and resulting data ? This has been around for some time....hard to believe that this use has just come to light .</tokentext>
<sentencetext>Isn't this just an application of 'tokenizing' as it is used in compression of data streams?
Build an index of unique(read non-repetitive) data segments and store the (smaller)index and resulting data?This has been around for some time....hard to believe that this use has just come to light.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645068</id>
	<title>Re:A hypothetical question.</title>
	<author>tlhIngan</author>
	<datestamp>1269710700000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><blockquote><div><p>I appreciate any deduplication solution for linux for sure, but isnt any deplucation creating a lot of shared ressources which could be possibly exploited for attacks (e.g. on the privacy of other users)?</p></div></blockquote><p>Most likely in the implementation itself, not the de-duplication process.</p><p>Let's say user A and B have some file in common. Without de-duplication, the file exists on both home directories. With de-duplication, one copy of the file exists for both users. Now, if there is an exploit such that you could find out if this has happened, then user A or B will know that the other has a copy of the same file. That knowledge could be useful.</p><p>Ditto on critical system files - if you could generate a file and have it match a protected system file, this might be useful to exploit the system. E.g.,<nobr> <wbr></nobr>/etc/shadow (which isn't normally world-readable). If you can find a way to tell the deduplication happens, you can get access to these critical files for other purposes.</p><p>Note that you can't *change* the file (because that would just split the files up again), but being able to read the file (when you couldn't before) or knowing that another copy exists elsewhere can be very useful knowledge. But the de-duplication mechanism must inadvertently reveal when this happens.</p></div>
	</htmltext>
<tokenext>I appreciate any deduplication solution for linux for sure , but isnt any deplucation creating a lot of shared ressources which could be possibly exploited for attacks ( e.g .
on the privacy of other users ) ? Most likely in the implementation itself , not the de-duplication process.Let 's say user A and B have some file in common .
Without de-duplication , the file exists on both home directories .
With de-duplication , one copy of the file exists for both users .
Now , if there is an exploit such that you could find out if this has happened , then user A or B will know that the other has a copy of the same file .
That knowledge could be useful.Ditto on critical system files - if you could generate a file and have it match a protected system file , this might be useful to exploit the system .
E.g. , /etc/shadow ( which is n't normally world-readable ) .
If you can find a way to tell the deduplication happens , you can get access to these critical files for other purposes.Note that you ca n't * change * the file ( because that would just split the files up again ) , but being able to read the file ( when you could n't before ) or knowing that another copy exists elsewhere can be very useful knowledge .
But the de-duplication mechanism must inadvertently reveal when this happens .</tokentext>
<sentencetext>I appreciate any deduplication solution for linux for sure, but isnt any deplucation creating a lot of shared ressources which could be possibly exploited for attacks (e.g.
on the privacy of other users)?Most likely in the implementation itself, not the de-duplication process.Let's say user A and B have some file in common.
Without de-duplication, the file exists on both home directories.
With de-duplication, one copy of the file exists for both users.
Now, if there is an exploit such that you could find out if this has happened, then user A or B will know that the other has a copy of the same file.
That knowledge could be useful.Ditto on critical system files - if you could generate a file and have it match a protected system file, this might be useful to exploit the system.
E.g., /etc/shadow (which isn't normally world-readable).
If you can find a way to tell the deduplication happens, you can get access to these critical files for other purposes.Note that you can't *change* the file (because that would just split the files up again), but being able to read the file (when you couldn't before) or knowing that another copy exists elsewhere can be very useful knowledge.
But the de-duplication mechanism must inadvertently reveal when this happens.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644794</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645128</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>Lorens</author>
	<datestamp>1269711480000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>A major use case is NAS for users. Think of all those multi-megabyte files, stored individually by thousands of users.</p><p>However, normally deduplication is block level, under the filesystem, invisible to the user. This is implemented by NetApp SANs, for instance. After having RTFA, OpenDedup seems to be file-level, running between the user and an underlying file system. I'm not sure it's a good idea.</p></htmltext>
<tokenext>A major use case is NAS for users .
Think of all those multi-megabyte files , stored individually by thousands of users.However , normally deduplication is block level , under the filesystem , invisible to the user .
This is implemented by NetApp SANs , for instance .
After having RTFA , OpenDedup seems to be file-level , running between the user and an underlying file system .
I 'm not sure it 's a good idea .</tokentext>
<sentencetext>A major use case is NAS for users.
Think of all those multi-megabyte files, stored individually by thousands of users.However, normally deduplication is block level, under the filesystem, invisible to the user.
This is implemented by NetApp SANs, for instance.
After having RTFA, OpenDedup seems to be file-level, running between the user and an underlying file system.
I'm not sure it's a good idea.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646562</id>
	<title>Re:Yea, I RTFA, but...</title>
	<author>Anonymous</author>
	<datestamp>1269782460000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Find and sha are good for filtering out non-dupes, and the odds are very good that they're not collisions, but why take the chance?  You still have to read every file in once to hash it, what's the big deal about doing a cmp on the fraction of the files that hash-match?  It can't make your run take more than twice as long, and if you really did have *every* file matching, you'd want that anyway.</p><p>Also, such a script as you describe exists in the linux world, FSlint, It's definitely something to run on your staging directory before creating a dvd image (if you're using a format that understands hard links, that is.  I forget if ISO9660 does.)</p></htmltext>
<tokenext>Find and sha are good for filtering out non-dupes , and the odds are very good that they 're not collisions , but why take the chance ?
You still have to read every file in once to hash it , what 's the big deal about doing a cmp on the fraction of the files that hash-match ?
It ca n't make your run take more than twice as long , and if you really did have * every * file matching , you 'd want that anyway.Also , such a script as you describe exists in the linux world , FSlint , It 's definitely something to run on your staging directory before creating a dvd image ( if you 're using a format that understands hard links , that is .
I forget if ISO9660 does .
)</tokentext>
<sentencetext>Find and sha are good for filtering out non-dupes, and the odds are very good that they're not collisions, but why take the chance?
You still have to read every file in once to hash it, what's the big deal about doing a cmp on the fraction of the files that hash-match?
It can't make your run take more than twice as long, and if you really did have *every* file matching, you'd want that anyway.Also, such a script as you describe exists in the linux world, FSlint, It's definitely something to run on your staging directory before creating a dvd image (if you're using a format that understands hard links, that is.
I forget if ISO9660 does.
)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645058</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31683992</id>
	<title>Re:deduplication</title>
	<author>Dr. Zed</author>
	<datestamp>1270067160000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><a href="http://dictionary.reference.com/browse/de-" title="reference.com" rel="nofollow">de-</a> [reference.com] <a href="http://dictionary.reference.com/browse/duplication" title="reference.com" rel="nofollow">duplication</a> [reference.com]</p></htmltext>
<tokenext>de- [ reference.com ] duplication [ reference.com ]</tokentext>
<sentencetext>de- [reference.com] duplication [reference.com]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644898</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645058</id>
	<title>Re:Yea, I RTFA, but...</title>
	<author>dlgeek</author>
	<datestamp>1269710580000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>3</modscore>
	<htmltext>You could easily write a script to do that using find, sha1sum or md5sum, sort and link. It would probably only take about 5-10 minutes to write but you most likely don't want to do that. When you modify one item in a hard linked pair, the other one is edited as well, whereas a copy doesn't do this. Unless you are sure your data is immutable, this will lead to problems down the road.<br> <br>
Deduplication systems pay attention to this and maintain independent indexes to do copy-on-write and the like to preserve the independence of each reference.</htmltext>
<tokenext>You could easily write a script to do that using find , sha1sum or md5sum , sort and link .
It would probably only take about 5-10 minutes to write but you most likely do n't want to do that .
When you modify one item in a hard linked pair , the other one is edited as well , whereas a copy does n't do this .
Unless you are sure your data is immutable , this will lead to problems down the road .
Deduplication systems pay attention to this and maintain independent indexes to do copy-on-write and the like to preserve the independence of each reference .</tokentext>
<sentencetext>You could easily write a script to do that using find, sha1sum or md5sum, sort and link.
It would probably only take about 5-10 minutes to write but you most likely don't want to do that.
When you modify one item in a hard linked pair, the other one is edited as well, whereas a copy doesn't do this.
Unless you are sure your data is immutable, this will lead to problems down the road.
Deduplication systems pay attention to this and maintain independent indexes to do copy-on-write and the like to preserve the independence of each reference.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31654670</id>
	<title>Re:This is for hard disks</title>
	<author>dchaffey</author>
	<datestamp>1269858600000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>VMware's technology you are referring to is called Transparent Page Sharing, if you want to look it up.<br>To my knowledge they're the only major Hypervisor to have this for Windows VMs, and it is a huge contributor to their VM density leadership; I'm not sure if other Linux-based Hypervisors implement something for Linux VMs</htmltext>
<tokenext>VMware 's technology you are referring to is called Transparent Page Sharing , if you want to look it up.To my knowledge they 're the only major Hypervisor to have this for Windows VMs , and it is a huge contributor to their VM density leadership ; I 'm not sure if other Linux-based Hypervisors implement something for Linux VMs</tokentext>
<sentencetext>VMware's technology you are referring to is called Transparent Page Sharing, if you want to look it up.To my knowledge they're the only major Hypervisor to have this for Windows VMs, and it is a huge contributor to their VM density leadership; I'm not sure if other Linux-based Hypervisors implement something for Linux VMs</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31676942</id>
	<title>Re:Yea, I RTFA, but...</title>
	<author>rawler</author>
	<datestamp>1269941520000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>There is one danger with hardlinks that should not be forgotten. Hardlinks are not copy-on-write (and AFAIK, can't be made COW?), which means that if files get linked in the de-duplication-process, updates to either file will contaminate the other.</p><p>A practical example where this WOULD be a definite problem could be a double-buffered application, that for consistency always keeps a "backup" of it's config. During idle, this file could be identical to the "live" file, and hard-linking them could completely destroy the consistency feature of the app.</p><p>Another scenario would be having a file on your desktop of some family photo you want to mess around with, also in archive. Hardlink them, and editing the one on the desktop will overwrite the one in the archive. (Under some conditions, I.E. no move-operations done by the editing app)</p></htmltext>
<tokenext>There is one danger with hardlinks that should not be forgotten .
Hardlinks are not copy-on-write ( and AFAIK , ca n't be made COW ?
) , which means that if files get linked in the de-duplication-process , updates to either file will contaminate the other.A practical example where this WOULD be a definite problem could be a double-buffered application , that for consistency always keeps a " backup " of it 's config .
During idle , this file could be identical to the " live " file , and hard-linking them could completely destroy the consistency feature of the app.Another scenario would be having a file on your desktop of some family photo you want to mess around with , also in archive .
Hardlink them , and editing the one on the desktop will overwrite the one in the archive .
( Under some conditions , I.E .
no move-operations done by the editing app )</tokentext>
<sentencetext>There is one danger with hardlinks that should not be forgotten.
Hardlinks are not copy-on-write (and AFAIK, can't be made COW?
), which means that if files get linked in the de-duplication-process, updates to either file will contaminate the other.A practical example where this WOULD be a definite problem could be a double-buffered application, that for consistency always keeps a "backup" of it's config.
During idle, this file could be identical to the "live" file, and hard-linking them could completely destroy the consistency feature of the app.Another scenario would be having a file on your desktop of some family photo you want to mess around with, also in archive.
Hardlink them, and editing the one on the desktop will overwrite the one in the archive.
(Under some conditions, I.E.
no move-operations done by the editing app)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644812</id>
	<title>Hasn't this been posted before?</title>
	<author>Anonymous</author>
	<datestamp>1269708000000</datestamp>
	<modclass>Funny</modclass>
	<modscore>5</modscore>
	<htmltext>Just wondering...</htmltext>
<tokenext>Just wondering.. .</tokentext>
<sentencetext>Just wondering...</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966</id>
	<title>Yea, I RTFA, but...</title>
	<author>mrsteveman1</author>
	<datestamp>1269709800000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>......from what i can tell, this is NOT a way to deduplicate existing filesystems or even layer it on top of existing data, but a new filesystem operating perhaps like eCryptfs, storing backend data on an existing filesystem in some FS-specific format.</p><p>So, having said that, does anyone know if there is a good way to resolve EXISTING duplicate files on Linux using hard links? For every identical pair found, a+b, b is deleted and instead hardlinked to a? I know there are plenty of duplicate file finders (fdupes, some windows programs, etc), but they're all focused on deleting things rather than simply recovering space using hardlinks.</p></htmltext>
<tokenext>......from what i can tell , this is NOT a way to deduplicate existing filesystems or even layer it on top of existing data , but a new filesystem operating perhaps like eCryptfs , storing backend data on an existing filesystem in some FS-specific format.So , having said that , does anyone know if there is a good way to resolve EXISTING duplicate files on Linux using hard links ?
For every identical pair found , a + b , b is deleted and instead hardlinked to a ?
I know there are plenty of duplicate file finders ( fdupes , some windows programs , etc ) , but they 're all focused on deleting things rather than simply recovering space using hardlinks .</tokentext>
<sentencetext>......from what i can tell, this is NOT a way to deduplicate existing filesystems or even layer it on top of existing data, but a new filesystem operating perhaps like eCryptfs, storing backend data on an existing filesystem in some FS-specific format.So, having said that, does anyone know if there is a good way to resolve EXISTING duplicate files on Linux using hard links?
For every identical pair found, a+b, b is deleted and instead hardlinked to a?
I know there are plenty of duplicate file finders (fdupes, some windows programs, etc), but they're all focused on deleting things rather than simply recovering space using hardlinks.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645392</id>
	<title>Re:This just gave me a good idea!</title>
	<author>CAIMLAS</author>
	<datestamp>1269715020000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Two things to look into:</p><p><a href="http://www.mikerubel.org/computers/rsync\_snapshots/" title="mikerubel.org">rsync snapshots</a> [mikerubel.org]<br><a href="http://rsnapshot.org/" title="rsnapshot.org">rsnapshot, for a better rsync snapshot</a> [rsnapshot.org]</p></htmltext>
<tokenext>Two things to look into : rsync snapshots [ mikerubel.org ] rsnapshot , for a better rsync snapshot [ rsnapshot.org ]</tokentext>
<sentencetext>Two things to look into:rsync snapshots [mikerubel.org]rsnapshot, for a better rsync snapshot [rsnapshot.org]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646894</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>marvin2k</author>
	<datestamp>1269787620000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Who does 2GB OS installs especially in a 200+ VM environment? That's insane.
I agree that deduplication is a nice addition to the virtual tool-set but it only seems to really ad a benefit to very specific environments. If I have optimized OS installs and the VMs run completely different data-sets from different organizations then the cost (both money and system resources) of deduplication seems to outweigh the benefit of saving a few G especially in a world where HDs come in 2TB sizes.</htmltext>
<tokenext>Who does 2GB OS installs especially in a 200 + VM environment ?
That 's insane .
I agree that deduplication is a nice addition to the virtual tool-set but it only seems to really ad a benefit to very specific environments .
If I have optimized OS installs and the VMs run completely different data-sets from different organizations then the cost ( both money and system resources ) of deduplication seems to outweigh the benefit of saving a few G especially in a world where HDs come in 2TB sizes .</tokentext>
<sentencetext>Who does 2GB OS installs especially in a 200+ VM environment?
That's insane.
I agree that deduplication is a nice addition to the virtual tool-set but it only seems to really ad a benefit to very specific environments.
If I have optimized OS installs and the VMs run completely different data-sets from different organizations then the cost (both money and system resources) of deduplication seems to outweigh the benefit of saving a few G especially in a world where HDs come in 2TB sizes.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645736</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644904</id>
	<title>Excellent!</title>
	<author>Anonymous</author>
	<datestamp>1269708900000</datestamp>
	<modclass>Flamebait</modclass>
	<modscore>-1</modscore>
	<htmltext><p>Once again the laziness of software types requires the ingenuity of massive hardware to compensate. Keep it up programmers! Soon you can have a retarded clam programming stuff that runs on a 6500THz processor and you'll STILL blame the <i>computer</i> for being slow!</p></htmltext>
<tokenext>Once again the laziness of software types requires the ingenuity of massive hardware to compensate .
Keep it up programmers !
Soon you can have a retarded clam programming stuff that runs on a 6500THz processor and you 'll STILL blame the computer for being slow !</tokentext>
<sentencetext>Once again the laziness of software types requires the ingenuity of massive hardware to compensate.
Keep it up programmers!
Soon you can have a retarded clam programming stuff that runs on a 6500THz processor and you'll STILL blame the computer for being slow!</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645552</id>
	<title>apologies to LL Cool J</title>
	<author>Anonymous</author>
	<datestamp>1269718020000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>stoolpigeon asks: Are you taking my deduplication investigation seriously or are you disrespecting my deduplication investigation?</p><p>-- minor misquote of LL Cool J speaking to Robin Wright in the movie Toys</p></htmltext>
<tokenext>stoolpigeon asks : Are you taking my deduplication investigation seriously or are you disrespecting my deduplication investigation ? -- minor misquote of LL Cool J speaking to Robin Wright in the movie Toys</tokentext>
<sentencetext>stoolpigeon asks: Are you taking my deduplication investigation seriously or are you disrespecting my deduplication investigation?-- minor misquote of LL Cool J speaking to Robin Wright in the movie Toys</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31665938</id>
	<title>Re:Yea, I RTFA, but...</title>
	<author>jc42</author>
	<datestamp>1269878340000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><i>So, having said that, does anyone know if there is a good way to resolve EXISTING duplicate files on Linux using hard links?</i></p><p>Yeah; I was a bit disappointed to find that the "dedupe" software talked about here doesn't seem to do that.  The intent here seems to be to handle editing one of the "dupes" by splitting it apart into a new file, so that the others don't change.  This is pretty much the opposite of what I find that I usually want.</p><p>Actually, I've written a couple of programs (in different langauges) to do linking of identical files for some time.  One is about 25 years old, and arose in a project where we were having a lot of problems with software that "broke" hard links when changes were made to a file.  This was shooting down our use of multiply-linked files to classify files in multiple ways by linking them into several appropriate directories.  So we worked on software to hunt down the problems and fix them.  Since we conceptualized the problem as "broken links", we called our operation "relinking".  We coded up several algorithms to do the job, and pitted them against each other.  We were a bit bemused to find that there wasn't really that much difference between them.<nobr> <wbr></nobr>;-)  I kept a couple of them.</p><p>Anyway, I see that a few others have written similar tools.  The problem finding them seems to be the different terminology that different developers have used.  The "merge" term makes sense if you think about some other reasons you might want to do it.</p><p>Does anyone have any knowledge of other terms that might be used to google for such software?  It might be interesting to find out how many times people have reinvented this particular wheel under different names.</p></htmltext>
<tokenext>So , having said that , does anyone know if there is a good way to resolve EXISTING duplicate files on Linux using hard links ? Yeah ; I was a bit disappointed to find that the " dedupe " software talked about here does n't seem to do that .
The intent here seems to be to handle editing one of the " dupes " by splitting it apart into a new file , so that the others do n't change .
This is pretty much the opposite of what I find that I usually want.Actually , I 've written a couple of programs ( in different langauges ) to do linking of identical files for some time .
One is about 25 years old , and arose in a project where we were having a lot of problems with software that " broke " hard links when changes were made to a file .
This was shooting down our use of multiply-linked files to classify files in multiple ways by linking them into several appropriate directories .
So we worked on software to hunt down the problems and fix them .
Since we conceptualized the problem as " broken links " , we called our operation " relinking " .
We coded up several algorithms to do the job , and pitted them against each other .
We were a bit bemused to find that there was n't really that much difference between them .
; - ) I kept a couple of them.Anyway , I see that a few others have written similar tools .
The problem finding them seems to be the different terminology that different developers have used .
The " merge " term makes sense if you think about some other reasons you might want to do it.Does anyone have any knowledge of other terms that might be used to google for such software ?
It might be interesting to find out how many times people have reinvented this particular wheel under different names .</tokentext>
<sentencetext>So, having said that, does anyone know if there is a good way to resolve EXISTING duplicate files on Linux using hard links?Yeah; I was a bit disappointed to find that the "dedupe" software talked about here doesn't seem to do that.
The intent here seems to be to handle editing one of the "dupes" by splitting it apart into a new file, so that the others don't change.
This is pretty much the opposite of what I find that I usually want.Actually, I've written a couple of programs (in different langauges) to do linking of identical files for some time.
One is about 25 years old, and arose in a project where we were having a lot of problems with software that "broke" hard links when changes were made to a file.
This was shooting down our use of multiply-linked files to classify files in multiple ways by linking them into several appropriate directories.
So we worked on software to hunt down the problems and fix them.
Since we conceptualized the problem as "broken links", we called our operation "relinking".
We coded up several algorithms to do the job, and pitted them against each other.
We were a bit bemused to find that there wasn't really that much difference between them.
;-)  I kept a couple of them.Anyway, I see that a few others have written similar tools.
The problem finding them seems to be the different terminology that different developers have used.
The "merge" term makes sense if you think about some other reasons you might want to do it.Does anyone have any knowledge of other terms that might be used to google for such software?
It might be interesting to find out how many times people have reinvented this particular wheel under different names.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644918</id>
	<title>Let's get down to brass tacks.</title>
	<author>jtownatpunk.net</author>
	<datestamp>1269709080000</datestamp>
	<modclass>Redundant</modclass>
	<modscore>0</modscore>
	<htmltext><p>Does this mean I'll finally be able to store my entire porn collection on a single volume?</p></htmltext>
<tokenext>Does this mean I 'll finally be able to store my entire porn collection on a single volume ?</tokentext>
<sentencetext>Does this mean I'll finally be able to store my entire porn collection on a single volume?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645664</id>
	<title>Re:redundant if saving large amounts of data to SA</title>
	<author>afidel</author>
	<datestamp>1269719700000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Not every SAN has dedupe, for instance my HP EVA doesn't. Also many of the lowend Netapp boxes have too anemic processors to be able to do dedupe. Most of the lowend iSCSI boxes also lack dedupe.</htmltext>
<tokenext>Not every SAN has dedupe , for instance my HP EVA does n't .
Also many of the lowend Netapp boxes have too anemic processors to be able to do dedupe .
Most of the lowend iSCSI boxes also lack dedupe .</tokentext>
<sentencetext>Not every SAN has dedupe, for instance my HP EVA doesn't.
Also many of the lowend Netapp boxes have too anemic processors to be able to do dedupe.
Most of the lowend iSCSI boxes also lack dedupe.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644940</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31647044</id>
	<title>Re:Yea, I RTFA, but...</title>
	<author>Ant P.</author>
	<datestamp>1269789480000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><a href="http://code.google.com/p/hardlinkpy/" title="google.com">http://code.google.com/p/hardlinkpy/</a> [google.com]</p></htmltext>
<tokenext>http : //code.google.com/p/hardlinkpy/ [ google.com ]</tokentext>
<sentencetext>http://code.google.com/p/hardlinkpy/ [google.com]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646340</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>jabuzz</author>
	<datestamp>1269777600000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Yeah, but the problem there is the cost. We run on 17GB boot disks, so your 200VM's would require under 4TB of disk to store. I am sorry but 4TB of storage is peanuts and I can do that easily with a low end DS3400.</p><p>Now the million dollar question to ask is how much does your dedupe solution cost? The reason being any dedupe that is supported against a virtualization solution we have looked at costs more than just buying the frigging disk. One then has to question the point of bothering with the extra layer of complexity.</p><p>The level of dedupe in bulk storage is likely to be low as well, besides which the cost of dedupe on a couple hundred TB of disks is rediculas.  Even for backup one has to wonder as well, tape is again really cheap, and dedupe for hundreds of TB is bloody expensive.</p></htmltext>
<tokenext>Yeah , but the problem there is the cost .
We run on 17GB boot disks , so your 200VM 's would require under 4TB of disk to store .
I am sorry but 4TB of storage is peanuts and I can do that easily with a low end DS3400.Now the million dollar question to ask is how much does your dedupe solution cost ?
The reason being any dedupe that is supported against a virtualization solution we have looked at costs more than just buying the frigging disk .
One then has to question the point of bothering with the extra layer of complexity.The level of dedupe in bulk storage is likely to be low as well , besides which the cost of dedupe on a couple hundred TB of disks is rediculas .
Even for backup one has to wonder as well , tape is again really cheap , and dedupe for hundreds of TB is bloody expensive .</tokentext>
<sentencetext>Yeah, but the problem there is the cost.
We run on 17GB boot disks, so your 200VM's would require under 4TB of disk to store.
I am sorry but 4TB of storage is peanuts and I can do that easily with a low end DS3400.Now the million dollar question to ask is how much does your dedupe solution cost?
The reason being any dedupe that is supported against a virtualization solution we have looked at costs more than just buying the frigging disk.
One then has to question the point of bothering with the extra layer of complexity.The level of dedupe in bulk storage is likely to be low as well, besides which the cost of dedupe on a couple hundred TB of disks is rediculas.
Even for backup one has to wonder as well, tape is again really cheap, and dedupe for hundreds of TB is bloody expensive.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645736</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31647708</id>
	<title>ZFS deduplication</title>
	<author>Anonymous</author>
	<datestamp>1269795060000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>'nuff said.</p></htmltext>
<tokenext>'nuff said .</tokentext>
<sentencetext>'nuff said.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646354</id>
	<title>Re:This just gave me a good idea!</title>
	<author>bokmann</author>
	<datestamp>1269777960000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>For more good ideas like this, watch this screencast from pragmatic TV.</p><p><a href="http://bit.ly/Pk3z3" title="bit.ly">http://bit.ly/Pk3z3</a> [bit.ly]</p><p>Jim Weirich expains how git (the version control tool) works from the ground up, and in doing so, builds a hypothetical system that sounds like what you are trying to do.</p></htmltext>
<tokenext>For more good ideas like this , watch this screencast from pragmatic TV.http : //bit.ly/Pk3z3 [ bit.ly ] Jim Weirich expains how git ( the version control tool ) works from the ground up , and in doing so , builds a hypothetical system that sounds like what you are trying to do .</tokentext>
<sentencetext>For more good ideas like this, watch this screencast from pragmatic TV.http://bit.ly/Pk3z3 [bit.ly]Jim Weirich expains how git (the version control tool) works from the ground up, and in doing so, builds a hypothetical system that sounds like what you are trying to do.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646280</id>
	<title>See Also LESSFS</title>
	<author>sharper56</author>
	<datestamp>1269776220000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>4</modscore>
	<htmltext><p>Another nice OpenSource FS De-Dup project to look into is LESSFS.</p><p>Block-level de-dup and good speed. Also offers per block encryption and compression.</p><p>I'm using it backup VMs. 2TB of raw VMs plus 60 days of changes store down to 300GB.  Write to de-dup FS is &gt; 50MB/s.</p></htmltext>
<tokenext>Another nice OpenSource FS De-Dup project to look into is LESSFS.Block-level de-dup and good speed .
Also offers per block encryption and compression.I 'm using it backup VMs .
2TB of raw VMs plus 60 days of changes store down to 300GB .
Write to de-dup FS is &gt; 50MB/s .</tokentext>
<sentencetext>Another nice OpenSource FS De-Dup project to look into is LESSFS.Block-level de-dup and good speed.
Also offers per block encryption and compression.I'm using it backup VMs.
2TB of raw VMs plus 60 days of changes store down to 300GB.
Write to de-dup FS is &gt; 50MB/s.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646234</id>
	<title>Re:This just gave me a good idea!</title>
	<author>nacturation</author>
	<datestamp>1269775200000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Deduplicated backups: <a href="http://backuppc.sourceforge.net/info.html" title="sourceforge.net">http://backuppc.sourceforge.net/info.html</a> [sourceforge.net]</p></htmltext>
<tokenext>Deduplicated backups : http : //backuppc.sourceforge.net/info.html [ sourceforge.net ]</tokentext>
<sentencetext>Deduplicated backups: http://backuppc.sourceforge.net/info.html [sourceforge.net]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31647252</id>
	<title>Re: Let's get down to brass tacks.</title>
	<author>Anonymous</author>
	<datestamp>1269791280000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>I can give you hard figures here. I've been developing a deduplication algorithm and using my porn collection as test data. Out of 3600 images, I can confirm that I have two copies of one image. I can also confirm that there is very little variation in JPEG quantization tables.</p></htmltext>
<tokenext>I can give you hard figures here .
I 've been developing a deduplication algorithm and using my porn collection as test data .
Out of 3600 images , I can confirm that I have two copies of one image .
I can also confirm that there is very little variation in JPEG quantization tables .</tokentext>
<sentencetext>I can give you hard figures here.
I've been developing a deduplication algorithm and using my porn collection as test data.
Out of 3600 images, I can confirm that I have two copies of one image.
I can also confirm that there is very little variation in JPEG quantization tables.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645002</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645408</id>
	<title>See also: LessFS</title>
	<author>kb1</author>
	<datestamp>1269715140000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>The LessFS project also deserves mention: <a href="http://www.lessfs.com/" title="lessfs.com" rel="nofollow">http://www.lessfs.com/</a> [lessfs.com] . Just think of the effect of combining a deduplication system with an iSCSI shared virtual tape library like <a href="http://sites.google.com/site/linuxvtl2/" title="google.com" rel="nofollow">http://sites.google.com/site/linuxvtl2/</a> [google.com]</htmltext>
<tokenext>The LessFS project also deserves mention : http : //www.lessfs.com/ [ lessfs.com ] .
Just think of the effect of combining a deduplication system with an iSCSI shared virtual tape library like http : //sites.google.com/site/linuxvtl2/ [ google.com ]</tokentext>
<sentencetext>The LessFS project also deserves mention: http://www.lessfs.com/ [lessfs.com] .
Just think of the effect of combining a deduplication system with an iSCSI shared virtual tape library like http://sites.google.com/site/linuxvtl2/ [google.com]</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31647878</id>
	<title>This has saved me loads of time</title>
	<author>MarkH</author>
	<datestamp>1269796380000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I had 3 backups of home data of about 300gbytes each.</p><p>Each one was almost but not quite the same due to some rather poor backup policies on mypart.</p><p>I was able to dedup per backup to get them small enough to combine and dedup the combo.</p><p>Left with one pure 150gbytes combo. Rsync is amazing</p></htmltext>
<tokenext>I had 3 backups of home data of about 300gbytes each.Each one was almost but not quite the same due to some rather poor backup policies on mypart.I was able to dedup per backup to get them small enough to combine and dedup the combo.Left with one pure 150gbytes combo .
Rsync is amazing</tokentext>
<sentencetext>I had 3 backups of home data of about 300gbytes each.Each one was almost but not quite the same due to some rather poor backup policies on mypart.I was able to dedup per backup to get them small enough to combine and dedup the combo.Left with one pure 150gbytes combo.
Rsync is amazing</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786</id>
	<title>This is for hard disks</title>
	<author>ZERO1ZERO</author>
	<datestamp>1269707760000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext>Does software like ESX and others (Xen etc) perform this in memory already for running VMs? I.e. if you have 2 Windows VMs it will only store one copy of the libs etc in the hosts memory ?

<p>
Also, is there easy way to get multiple machines running 'as one' to pool resources for running a vm setup? Does openmosix do that?</p></htmltext>
<tokenext>Does software like ESX and others ( Xen etc ) perform this in memory already for running VMs ?
I.e. if you have 2 Windows VMs it will only store one copy of the libs etc in the hosts memory ?
Also , is there easy way to get multiple machines running 'as one ' to pool resources for running a vm setup ?
Does openmosix do that ?</tokentext>
<sentencetext>Does software like ESX and others (Xen etc) perform this in memory already for running VMs?
I.e. if you have 2 Windows VMs it will only store one copy of the libs etc in the hosts memory ?
Also, is there easy way to get multiple machines running 'as one' to pool resources for running a vm setup?
Does openmosix do that?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31655446</id>
	<title>It's called "rsnapshot"</title>
	<author>Walles</author>
	<datestamp>1269868440000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>There's a program that automates what you describe, and it's called "RSnapshot":<br><a href="http://rsnapshot.org/" title="rsnapshot.org">http://rsnapshot.org/</a> [rsnapshot.org]</p><p>If you have a system that isn't always up you want something like this to launch it:<br><a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=27;filename=run-rsnapshot;att=1;bug=523923" title="debian.org">http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=27;filename=run-rsnapshot;att=1;bug=523923</a> [debian.org]</p></htmltext>
<tokenext>There 's a program that automates what you describe , and it 's called " RSnapshot " : http : //rsnapshot.org/ [ rsnapshot.org ] If you have a system that is n't always up you want something like this to launch it : http : //bugs.debian.org/cgi-bin/bugreport.cgi ? msg = 27 ; filename = run-rsnapshot ; att = 1 ; bug = 523923 [ debian.org ]</tokentext>
<sentencetext>There's a program that automates what you describe, and it's called "RSnapshot":http://rsnapshot.org/ [rsnapshot.org]If you have a system that isn't always up you want something like this to launch it:http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=27;filename=run-rsnapshot;att=1;bug=523923 [debian.org]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31648310</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>Eil</author>
	<datestamp>1269799560000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>My personal experience with it will begin in a few months, when we get our new Celerra installed<nobr> <wbr></nobr>:-)</p></div></blockquote><p>My condolences to you, sir.</p></div>
	</htmltext>
<tokenext>My personal experience with it will begin in a few months , when we get our new Celerra installed : - ) My condolences to you , sir .</tokentext>
<sentencetext>My personal experience with it will begin in a few months, when we get our new Celerra installed :-)My condolences to you, sir.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645350</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645098</id>
	<title>Re:Yea, I RTFA, but...</title>
	<author>symbolset</author>
	<datestamp>1269711060000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>There are security problems with this.  The duplicate files might have different metadata - for example, access privileges.
</p><p>For real (block level) deduplication, try lessfs or zfs.</p></htmltext>
<tokenext>There are security problems with this .
The duplicate files might have different metadata - for example , access privileges .
For real ( block level ) deduplication , try lessfs or zfs .</tokentext>
<sentencetext>There are security problems with this.
The duplicate files might have different metadata - for example, access privileges.
For real (block level) deduplication, try lessfs or zfs.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644898</id>
	<title>deduplication</title>
	<author>hduff</author>
	<datestamp>1269708840000</datestamp>
	<modclass>Offtopic</modclass>
	<modscore>0</modscore>
	<htmltext><p>What kind of lame recursive acronym is "deduplication"?</p><p>I'm flummoxed in any attempt to decipher it.</p></htmltext>
<tokenext>What kind of lame recursive acronym is " deduplication " ? I 'm flummoxed in any attempt to decipher it .</tokentext>
<sentencetext>What kind of lame recursive acronym is "deduplication"?I'm flummoxed in any attempt to decipher it.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646538</id>
	<title>Re:This just gave me a good idea!</title>
	<author>TheRaven64</author>
	<datestamp>1269782040000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>You might like to take a look at <a href="http://www.peereboom.us/epitome/man/epitome.1.html" title="peereboom.us">Epitome</a> [peereboom.us], which supports CAS, DEDUP, SIS and remote backup.</htmltext>
<tokenext>You might like to take a look at Epitome [ peereboom.us ] , which supports CAS , DEDUP , SIS and remote backup .</tokentext>
<sentencetext>You might like to take a look at Epitome [peereboom.us], which supports CAS, DEDUP, SIS and remote backup.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644902</id>
	<title>Re:This is for hard disks</title>
	<author>TooMuchToDo</author>
	<datestamp>1269708900000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Both VMware and KVM can do this. Not sure about Xen. Google "memory deduplication $VM\_TECH"</htmltext>
<tokenext>Both VMware and KVM can do this .
Not sure about Xen .
Google " memory deduplication $ VM \ _TECH "</tokentext>
<sentencetext>Both VMware and KVM can do this.
Not sure about Xen.
Google "memory deduplication $VM\_TECH"</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31649118</id>
	<title>Re:Confusing summary</title>
	<author>s1acker</author>
	<datestamp>1269804960000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>when you create an sdfs volume, you specify an arbitrary volume size - 8EB must be the maximum size you can specify.  I'm guessing the current implementation is only able to deal with 32TB worth of deduplicated chunks - so if you have 8EB of data which you can get a 250000:1 deduplication ratio on, then you could fill up the 8EB volume.</htmltext>
<tokenext>when you create an sdfs volume , you specify an arbitrary volume size - 8EB must be the maximum size you can specify .
I 'm guessing the current implementation is only able to deal with 32TB worth of deduplicated chunks - so if you have 8EB of data which you can get a 250000 : 1 deduplication ratio on , then you could fill up the 8EB volume .</tokentext>
<sentencetext>when you create an sdfs volume, you specify an arbitrary volume size - 8EB must be the maximum size you can specify.
I'm guessing the current implementation is only able to deal with 32TB worth of deduplicated chunks - so if you have 8EB of data which you can get a 250000:1 deduplication ratio on, then you could fill up the 8EB volume.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645108</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645064</id>
	<title>Re:Hasn't this been posted before?</title>
	<author>Hurricane78</author>
	<datestamp>1269710700000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Well, at least this comment has been posted before.</p><p>Dude, you&rsquo;re only piling it up. Like with trolling: If you react to it, you only make it worse.</p><p>And because I&rsquo;m not better, I&rsquo;m now gonna end it, by stating that: yes, yes, I&rsquo;m also not making it better. ^^<br>Oh wait... now I am!<nobr> <wbr></nobr>:)</p></htmltext>
<tokenext>Well , at least this comment has been posted before.Dude , you    re only piling it up .
Like with trolling : If you react to it , you only make it worse.And because I    m not better , I    m now gon na end it , by stating that : yes , yes , I    m also not making it better .
^ ^ Oh wait... now I am !
: )</tokentext>
<sentencetext>Well, at least this comment has been posted before.Dude, you’re only piling it up.
Like with trolling: If you react to it, you only make it worse.And because I’m not better, I’m now gonna end it, by stating that: yes, yes, I’m also not making it better.
^^Oh wait... now I am!
:)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644812</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31654998</id>
	<title>ROTFL</title>
	<author>Anonymous</author>
	<datestamp>1269863040000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>ROTFL! Filesystem in userland with Java. Buhhaha! The only valuable as a PoC. Trading some MB of storage for hedge performance impact doesn't sound like a good trade-off<nobr> <wbr></nobr>:)</p></htmltext>
<tokenext>ROTFL !
Filesystem in userland with Java .
Buhhaha ! The only valuable as a PoC .
Trading some MB of storage for hedge performance impact does n't sound like a good trade-off : )</tokentext>
<sentencetext>ROTFL!
Filesystem in userland with Java.
Buhhaha! The only valuable as a PoC.
Trading some MB of storage for hedge performance impact doesn't sound like a good trade-off :)</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645282</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>Anonymous</author>
	<datestamp>1269713940000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>3</modscore>
	<htmltext><p>If you cut up a large file into lots of chunks of whatever size, lets say 64KB each.  Then, you look at the chunks.  If you have two chunks that are the same, you remove the second one, and just place a pointer to the first one.  Data Deduplication is much more complicated than that in real life, but basically, the more data you have, or the smaller the chunks you look at, the more likely you are to have duplication, or collisions.  (how many word documents have a few words in a row?  remove every repeat of the phrase "and then the" and replace it with a pointer, if you will).</p><p>This is also similar to WAN acceleration, which at a high enough level, is just deduplicating traffic that the network would have to transmit.</p><p>It is amazing how much space you can free up, when your not just looking at the file level.  This has become very big in recent years, cause storage has exploded, and processors are finally fast enough to do this in real-time.</p></htmltext>
<tokenext>If you cut up a large file into lots of chunks of whatever size , lets say 64KB each .
Then , you look at the chunks .
If you have two chunks that are the same , you remove the second one , and just place a pointer to the first one .
Data Deduplication is much more complicated than that in real life , but basically , the more data you have , or the smaller the chunks you look at , the more likely you are to have duplication , or collisions .
( how many word documents have a few words in a row ?
remove every repeat of the phrase " and then the " and replace it with a pointer , if you will ) .This is also similar to WAN acceleration , which at a high enough level , is just deduplicating traffic that the network would have to transmit.It is amazing how much space you can free up , when your not just looking at the file level .
This has become very big in recent years , cause storage has exploded , and processors are finally fast enough to do this in real-time .</tokentext>
<sentencetext>If you cut up a large file into lots of chunks of whatever size, lets say 64KB each.
Then, you look at the chunks.
If you have two chunks that are the same, you remove the second one, and just place a pointer to the first one.
Data Deduplication is much more complicated than that in real life, but basically, the more data you have, or the smaller the chunks you look at, the more likely you are to have duplication, or collisions.
(how many word documents have a few words in a row?
remove every repeat of the phrase "and then the" and replace it with a pointer, if you will).This is also similar to WAN acceleration, which at a high enough level, is just deduplicating traffic that the network would have to transmit.It is amazing how much space you can free up, when your not just looking at the file level.
This has become very big in recent years, cause storage has exploded, and processors are finally fast enough to do this in real-time.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645026</id>
	<title>Both VMware and KVM can do thi</title>
	<author>Anonymous</author>
	<datestamp>1269710280000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Both VMware and KVM can do this. Not sure about Xen. Google "memory deduplication $VM\_TECH" <a href="http://www.chinamobilephones.org/" title="chinamobilephones.org" rel="nofollow">China Mobile Phones</a> [chinamobilephones.org]<br><a href="http://www.chinese-girls.org/" title="chinese-girls.org" rel="nofollow">Chinese Girls</a> [chinese-girls.org]</p></htmltext>
<tokenext>Both VMware and KVM can do this .
Not sure about Xen .
Google " memory deduplication $ VM \ _TECH " China Mobile Phones [ chinamobilephones.org ] Chinese Girls [ chinese-girls.org ]</tokentext>
<sentencetext>Both VMware and KVM can do this.
Not sure about Xen.
Google "memory deduplication $VM\_TECH" China Mobile Phones [chinamobilephones.org]Chinese Girls [chinese-girls.org]</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645002</id>
	<title>Re:Let's get down to brass tacks.</title>
	<author>SanityInAnarchy</author>
	<datestamp>1269710160000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Well, just how repetitive is your porn collection?</p></htmltext>
<tokenext>Well , just how repetitive is your porn collection ?</tokentext>
<sentencetext>Well, just how repetitive is your porn collection?</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644918</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31725792</id>
	<title>for memory</title>
	<author>h00manist</author>
	<datestamp>1270409220000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>vmware does share memory pages.
<a href="http://fedoraproject.org/wiki/Features/KSM#KSM" title="fedoraproject.org">KSM</a> [fedoraproject.org] appears to have that now too, haven't read much about it -
unix-linux uses this very well in multiuser, especially in <a href="http://www.stgraber.org/2010/02/21/ltsp-52-out" title="stgraber.org">LTSP</a> [stgraber.org], where users running the same program share the memory.  I don't know if windows terminal server does it nowadays - it didn't when I used it, several versions ago.</htmltext>
<tokenext>vmware does share memory pages .
KSM [ fedoraproject.org ] appears to have that now too , have n't read much about it - unix-linux uses this very well in multiuser , especially in LTSP [ stgraber.org ] , where users running the same program share the memory .
I do n't know if windows terminal server does it nowadays - it did n't when I used it , several versions ago .</tokentext>
<sentencetext>vmware does share memory pages.
KSM [fedoraproject.org] appears to have that now too, haven't read much about it -
unix-linux uses this very well in multiuser, especially in LTSP [stgraber.org], where users running the same program share the memory.
I don't know if windows terminal server does it nowadays - it didn't when I used it, several versions ago.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31651190</id>
	<title>Re:Or get inline deduplication</title>
	<author>tbuskey</author>
	<datestamp>1269776760000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>ZFS dedupe in OpenSolaris is also Open Source.</p><p>I've gotten 11\% dedup savings on 1.04 TB of a 1.82 TB volume.<br>Add compresson savings and ECC (so bad bits don't happen silently).</p><p>I'm hoping it will be in btrfs so Linux will have it.</p></htmltext>
<tokenext>ZFS dedupe in OpenSolaris is also Open Source.I 've gotten 11 \ % dedup savings on 1.04 TB of a 1.82 TB volume.Add compresson savings and ECC ( so bad bits do n't happen silently ) .I 'm hoping it will be in btrfs so Linux will have it .</tokentext>
<sentencetext>ZFS dedupe in OpenSolaris is also Open Source.I've gotten 11\% dedup savings on 1.04 TB of a 1.82 TB volume.Add compresson savings and ECC (so bad bits don't happen silently).I'm hoping it will be in btrfs so Linux will have it.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644972</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646134</id>
	<title>Re:This just gave me a good idea!</title>
	<author>Fruit</author>
	<datestamp>1269772260000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><a href="http://git.fruit.je/src?a=blob\_plain;f=findlinks/findlinks" title="fruit.je">here you go</a> [fruit.je]<nobr> <wbr></nobr>:)</htmltext>
<tokenext>here you go [ fruit.je ] : )</tokentext>
<sentencetext>here you go [fruit.je] :)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645272</id>
	<title>It finally happened</title>
	<author>thesymbolicfrog</author>
	<datestamp>1269713820000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I stopped being able to read English.  WTF does any of that mean?  Is it written in moonspeak?</p></htmltext>
<tokenext>I stopped being able to read English .
WTF does any of that mean ?
Is it written in moonspeak ?</tokentext>
<sentencetext>I stopped being able to read English.
WTF does any of that mean?
Is it written in moonspeak?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645852</id>
	<title>Collision</title>
	<author>Anonymous</author>
	<datestamp>1269809700000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>just make sure it checks for collisions<br><a href="http://en.wikipedia.org/wiki/Collision\_(computer\_science)" title="wikipedia.org" rel="nofollow">http://en.wikipedia.org/wiki/Collision\_(computer\_science)</a> [wikipedia.org]</p></htmltext>
<tokenext>just make sure it checks for collisionshttp : //en.wikipedia.org/wiki/Collision \ _ ( computer \ _science ) [ wikipedia.org ]</tokentext>
<sentencetext>just make sure it checks for collisionshttp://en.wikipedia.org/wiki/Collision\_(computer\_science) [wikipedia.org]</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644972</id>
	<title>Or get inline deduplication</title>
	<author>anilg</author>
	<datestamp>1269709860000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>with <a href="http://www.nexentastor.org/projects/site/wiki/CommunityEdition" title="nexentastor.org">NexentaStor CE</a> [nexentastor.org], which is based on OpenSolaris b134. It's free.. and has an excellent Storage WebUI.<nobr> <wbr></nobr>/plug</p><p>For a detailed explanation of OpenSolaris dedup see this <a href="http://blogs.sun.com/bonwick/entry/zfs\_dedup" title="sun.com">blog entry</a> [sun.com].</p><p>~Anil</p></htmltext>
<tokenext>with NexentaStor CE [ nexentastor.org ] , which is based on OpenSolaris b134 .
It 's free.. and has an excellent Storage WebUI .
/plugFor a detailed explanation of OpenSolaris dedup see this blog entry [ sun.com ] . ~ Anil</tokentext>
<sentencetext>with NexentaStor CE [nexentastor.org], which is based on OpenSolaris b134.
It's free.. and has an excellent Storage WebUI.
/plugFor a detailed explanation of OpenSolaris dedup see this blog entry [sun.com].~Anil</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645082</id>
	<title>Re:Yea, I RTFA, but...</title>
	<author>Lorens</author>
	<datestamp>1269710880000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>I wrote fileuniq (http://sourceforge.net/projects/fileuniq/) exactly for this reason. You can symlink or hardlink, decide how identical a file must be (timestamp, uid...), or delete.</p><p>It's far from optimized, but I accept patches<nobr> <wbr></nobr>:-)</p></htmltext>
<tokenext>I wrote fileuniq ( http : //sourceforge.net/projects/fileuniq/ ) exactly for this reason .
You can symlink or hardlink , decide how identical a file must be ( timestamp , uid... ) , or delete.It 's far from optimized , but I accept patches : - )</tokentext>
<sentencetext>I wrote fileuniq (http://sourceforge.net/projects/fileuniq/) exactly for this reason.
You can symlink or hardlink, decide how identical a file must be (timestamp, uid...), or delete.It's far from optimized, but I accept patches :-)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645596</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>Spad</author>
	<datestamp>1269718620000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>All good dedupe systems are block-level, not file-level so you don't just save where whole files are identical but on *any* identical data that's on the disks.</p><p>If you're running VMs with the same OS you'll probably find that close to 70\% of the data can be de-duplicated - and that's before you consider things like farms of clustered servers where you have literally identical config or fileservers with lots of idiots saving 40 "backup" copies of the same 2Gb access database just in case they need it.</p><p>Our deduped backup array is currently storing ~70Tb of backups on 10Tb of raw space and it's only about 40\% full - to me, that's useful.</p></htmltext>
<tokenext>All good dedupe systems are block-level , not file-level so you do n't just save where whole files are identical but on * any * identical data that 's on the disks.If you 're running VMs with the same OS you 'll probably find that close to 70 \ % of the data can be de-duplicated - and that 's before you consider things like farms of clustered servers where you have literally identical config or fileservers with lots of idiots saving 40 " backup " copies of the same 2Gb access database just in case they need it.Our deduped backup array is currently storing ~ 70Tb of backups on 10Tb of raw space and it 's only about 40 \ % full - to me , that 's useful .</tokentext>
<sentencetext>All good dedupe systems are block-level, not file-level so you don't just save where whole files are identical but on *any* identical data that's on the disks.If you're running VMs with the same OS you'll probably find that close to 70\% of the data can be de-duplicated - and that's before you consider things like farms of clustered servers where you have literally identical config or fileservers with lots of idiots saving 40 "backup" copies of the same 2Gb access database just in case they need it.Our deduped backup array is currently storing ~70Tb of backups on 10Tb of raw space and it's only about 40\% full - to me, that's useful.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645736</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>Anonymous</author>
	<datestamp>1269807180000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>3</modscore>
	<htmltext><p> <i>I wonder how much this approach really buys you in "normal" scenarios especially given the CPU and disk I/O cost involved in finding and maintaining the de-duplicated blocks. There may be a few very specific examples where this could really make a difference but can someone enlighten me how this is useful on say a physical system with 10 Centos VMs running different apps or similar apps with different data? You might save a few blocks because of the shared OS files but if you did a proper minimal OS install then the gain hardly seems to be worth the effort.</i>
</p><p>Assume 200 VMs at, say, 2GB per OS install.  Allowing for some uniqueness, you'll probably end up using something in the ballpark of 20-30GB of "real" space to store 400GB of "virtual" data.  That's a *massive* saving, not only disk space, but also in IOPS, since any well-engineered system will carry that deduplication through to the cache layer as well.
</p><p>Deduplication is *huge* in virtual environments.  The other big place it provides benefits, of course, is D2D backups.</p></htmltext>
<tokenext>I wonder how much this approach really buys you in " normal " scenarios especially given the CPU and disk I/O cost involved in finding and maintaining the de-duplicated blocks .
There may be a few very specific examples where this could really make a difference but can someone enlighten me how this is useful on say a physical system with 10 Centos VMs running different apps or similar apps with different data ?
You might save a few blocks because of the shared OS files but if you did a proper minimal OS install then the gain hardly seems to be worth the effort .
Assume 200 VMs at , say , 2GB per OS install .
Allowing for some uniqueness , you 'll probably end up using something in the ballpark of 20-30GB of " real " space to store 400GB of " virtual " data .
That 's a * massive * saving , not only disk space , but also in IOPS , since any well-engineered system will carry that deduplication through to the cache layer as well .
Deduplication is * huge * in virtual environments .
The other big place it provides benefits , of course , is D2D backups .</tokentext>
<sentencetext> I wonder how much this approach really buys you in "normal" scenarios especially given the CPU and disk I/O cost involved in finding and maintaining the de-duplicated blocks.
There may be a few very specific examples where this could really make a difference but can someone enlighten me how this is useful on say a physical system with 10 Centos VMs running different apps or similar apps with different data?
You might save a few blocks because of the shared OS files but if you did a proper minimal OS install then the gain hardly seems to be worth the effort.
Assume 200 VMs at, say, 2GB per OS install.
Allowing for some uniqueness, you'll probably end up using something in the ballpark of 20-30GB of "real" space to store 400GB of "virtual" data.
That's a *massive* saving, not only disk space, but also in IOPS, since any well-engineered system will carry that deduplication through to the cache layer as well.
Deduplication is *huge* in virtual environments.
The other big place it provides benefits, of course, is D2D backups.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646890</id>
	<title>Re:A hypothetical question.</title>
	<author>Anonymous</author>
	<datestamp>1269787560000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p><div class="quote"><p>Most likely in the implementation itself, not the de-duplication process.</p></div><p>I disagree, I think there is a flaw in the process not just an implementation. In fact the attack seams trivial if you have access to the partition at a time when there is no/little other disk access to it:<br>1)get disk usage for partition<br>2)write file to disk<br>3)see if disk usage has changed</p><p>If the OS lies about disk usage then you end up in a mess once your disk gets full (either actually full or prematurely full due to the lying), the only 1/2 sensible way i can think of working around this is to only present users with df info for their quotas.</p><p>Fortunately it only seams plausible useful in two scenarios:<br>1) verifying that there is a file on the disk, you suspect of being there<br>2) getting the contents of a small file by bruteforce, for<nobr> <wbr></nobr>/etc/shadow even with the speed up you get by circumventing logins throttling, you are now effectively bruteforcing all users passwords simultaneously, but for some suitably limited config file it is actually plausible (cookies, magic cookies, etc all seam possible but perhaps too slow)</p></div>
	</htmltext>
<tokenext>Most likely in the implementation itself , not the de-duplication process.I disagree , I think there is a flaw in the process not just an implementation .
In fact the attack seams trivial if you have access to the partition at a time when there is no/little other disk access to it : 1 ) get disk usage for partition2 ) write file to disk3 ) see if disk usage has changedIf the OS lies about disk usage then you end up in a mess once your disk gets full ( either actually full or prematurely full due to the lying ) , the only 1/2 sensible way i can think of working around this is to only present users with df info for their quotas.Fortunately it only seams plausible useful in two scenarios : 1 ) verifying that there is a file on the disk , you suspect of being there2 ) getting the contents of a small file by bruteforce , for /etc/shadow even with the speed up you get by circumventing logins throttling , you are now effectively bruteforcing all users passwords simultaneously , but for some suitably limited config file it is actually plausible ( cookies , magic cookies , etc all seam possible but perhaps too slow )</tokentext>
<sentencetext>Most likely in the implementation itself, not the de-duplication process.I disagree, I think there is a flaw in the process not just an implementation.
In fact the attack seams trivial if you have access to the partition at a time when there is no/little other disk access to it:1)get disk usage for partition2)write file to disk3)see if disk usage has changedIf the OS lies about disk usage then you end up in a mess once your disk gets full (either actually full or prematurely full due to the lying), the only 1/2 sensible way i can think of working around this is to only present users with df info for their quotas.Fortunately it only seams plausible useful in two scenarios:1) verifying that there is a file on the disk, you suspect of being there2) getting the contents of a small file by bruteforce, for /etc/shadow even with the speed up you get by circumventing logins throttling, you are now effectively bruteforcing all users passwords simultaneously, but for some suitably limited config file it is actually plausible (cookies, magic cookies, etc all seam possible but perhaps too slow)
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645068</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644900</id>
	<title>I was worried for a second there...</title>
	<author>Anonymous</author>
	<datestamp>1269708900000</datestamp>
	<modclass>Troll</modclass>
	<modscore>-1</modscore>
	<htmltext><p>I thought it said "Open Source Decapitation".</p></htmltext>
<tokenext>I thought it said " Open Source Decapitation " .</tokentext>
<sentencetext>I thought it said "Open Source Decapitation".</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645712</id>
	<title>Look at StoreBackup</title>
	<author>bradley13</author>
	<datestamp>1269806880000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>We're a bit off topic here, seeing as this has nothing to do with <i>file systems</i>, but being off-topic is on-topic for<nobr> <wbr></nobr>/.

</p><p>Anyhow: StoreBackup is a great backup system that automatically detects duplicates.</p></htmltext>
<tokenext>We 're a bit off topic here , seeing as this has nothing to do with file systems , but being off-topic is on-topic for / .
Anyhow : StoreBackup is a great backup system that automatically detects duplicates .</tokentext>
<sentencetext>We're a bit off topic here, seeing as this has nothing to do with file systems, but being off-topic is on-topic for /.
Anyhow: StoreBackup is a great backup system that automatically detects duplicates.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646498</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>RAMMS+EIN</author>
	<datestamp>1269780960000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>``You might save a few blocks because of the shared OS files but if you did a proper minimal OS install then the gain hardly seems to be worth the effort.''</p><p>Right, but note the if. Most of the places where I've seen virtualization used have most of the VMs running instances of a proprietary operating system which shall remain unnamed. Together with other components that tend to be common, the amount of data that is common among instances can easily be over 10 GB per instance.</p><p>There is certainly a more efficient way to deal with &gt; 10GB of common data per instance than storing the same data multiple times, and deduplication is one way to do things more efficiently.</p></htmltext>
<tokenext>` ` You might save a few blocks because of the shared OS files but if you did a proper minimal OS install then the gain hardly seems to be worth the effort .
''Right , but note the if .
Most of the places where I 've seen virtualization used have most of the VMs running instances of a proprietary operating system which shall remain unnamed .
Together with other components that tend to be common , the amount of data that is common among instances can easily be over 10 GB per instance.There is certainly a more efficient way to deal with &gt; 10GB of common data per instance than storing the same data multiple times , and deduplication is one way to do things more efficiently .</tokentext>
<sentencetext>``You might save a few blocks because of the shared OS files but if you did a proper minimal OS install then the gain hardly seems to be worth the effort.
''Right, but note the if.
Most of the places where I've seen virtualization used have most of the VMs running instances of a proprietary operating system which shall remain unnamed.
Together with other components that tend to be common, the amount of data that is common among instances can easily be over 10 GB per instance.There is certainly a more efficient way to deal with &gt; 10GB of common data per instance than storing the same data multiple times, and deduplication is one way to do things more efficiently.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645146</id>
	<title>"support for deduplication at 4K block sizes"</title>
	<author>Anonymous</author>
	<datestamp>1269711780000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Why would anyone keep their blocks so cold?</p></htmltext>
<tokenext>Why would anyone keep their blocks so cold ?</tokentext>
<sentencetext>Why would anyone keep their blocks so cold?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645244</id>
	<title>Re:This just gave me a good idea!</title>
	<author>Anonymous</author>
	<datestamp>1269713460000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>1</modscore>
	<htmltext><p><a href="http://en.wikipedia.org/wiki/Venti" title="wikipedia.org" rel="nofollow">http://en.wikipedia.org/wiki/Venti</a> [wikipedia.org]</p></htmltext>
<tokenext>http : //en.wikipedia.org/wiki/Venti [ wikipedia.org ]</tokentext>
<sentencetext>http://en.wikipedia.org/wiki/Venti [wikipedia.org]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646594</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>MMC Monster</author>
	<datestamp>1269782940000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>This doesn't even save a single hard drive at current storage densities.<nobr> <wbr></nobr>:-(</p></htmltext>
<tokenext>This does n't even save a single hard drive at current storage densities .
: - (</tokentext>
<sentencetext>This doesn't even save a single hard drive at current storage densities.
:-(</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645736</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645024</id>
	<title>Offtopic?</title>
	<author>SanityInAnarchy</author>
	<datestamp>1269710280000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>3</modscore>
	<htmltext><p>If you'd mentioned the fact that this appears to be written in Java, you <i>might</i> have a point. But despite this, and the fact that it's in userland, they seem to be getting pretty decent performance out of it.</p><p>And keep in mind, all of this is to support <i>reducing</i> the amount of storage required on a hard disk, and it's a fairly large programming effort to do so. Seems like this entire project is just the opposite of what you claim -- it's software types doing <i>extra work</i> so they can spend less on storage.</p></htmltext>
<tokenext>If you 'd mentioned the fact that this appears to be written in Java , you might have a point .
But despite this , and the fact that it 's in userland , they seem to be getting pretty decent performance out of it.And keep in mind , all of this is to support reducing the amount of storage required on a hard disk , and it 's a fairly large programming effort to do so .
Seems like this entire project is just the opposite of what you claim -- it 's software types doing extra work so they can spend less on storage .</tokentext>
<sentencetext>If you'd mentioned the fact that this appears to be written in Java, you might have a point.
But despite this, and the fact that it's in userland, they seem to be getting pretty decent performance out of it.And keep in mind, all of this is to support reducing the amount of storage required on a hard disk, and it's a fairly large programming effort to do so.
Seems like this entire project is just the opposite of what you claim -- it's software types doing extra work so they can spend less on storage.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644904</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645316</id>
	<title>Re:This just gave me a good idea!</title>
	<author>Anonymous</author>
	<datestamp>1269714240000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>This script identifies duplicate files in one swift line, using only find, xargs, md5sum and awk:</p><p>http://www.gedankenverbrechen.org/~tk/finddupes.sh</p></htmltext>
<tokenext>This script identifies duplicate files in one swift line , using only find , xargs , md5sum and awk : http : //www.gedankenverbrechen.org/ ~ tk/finddupes.sh</tokentext>
<sentencetext>This script identifies duplicate files in one swift line, using only find, xargs, md5sum and awk:http://www.gedankenverbrechen.org/~tk/finddupes.sh</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646472</id>
	<title>Re:This just gave me a good idea!</title>
	<author>david.given</author>
	<datestamp>1269780360000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>You may want to look at <a href="http://rsnapshot.org/" title="rsnapshot.org">rsnapshot</a> [rsnapshot.org]. It's a very small shell script that pretty much duplicates the functionality of Apple's Time Machine. Each backup becomes a timestamped directory containing all the data in the backup; files that haven't changed from backup to backup are hardlinked together, so they only get stored once (per-file deduplication). This makes incremental backups very cheap, while also avoiding the need for specialised backup restoration software. It all works through the magic of rsync.

<p>On my system, each incremental backup of a 24GB dataset occupies about 600MB (depending how many files have changed). And each incremental backup is a complete, uncompressed copy of the dataset, making extracting files trivial!

</p><p>It'll also backup across the network with ssh, so you can back up remote servers; it'll even back up Windows machines. It does proper backup rotation (I store two weeks' worth of daily backups, then a a couple of weekly backups, then monthly). It's totally awesome.</p></htmltext>
<tokenext>You may want to look at rsnapshot [ rsnapshot.org ] .
It 's a very small shell script that pretty much duplicates the functionality of Apple 's Time Machine .
Each backup becomes a timestamped directory containing all the data in the backup ; files that have n't changed from backup to backup are hardlinked together , so they only get stored once ( per-file deduplication ) .
This makes incremental backups very cheap , while also avoiding the need for specialised backup restoration software .
It all works through the magic of rsync .
On my system , each incremental backup of a 24GB dataset occupies about 600MB ( depending how many files have changed ) .
And each incremental backup is a complete , uncompressed copy of the dataset , making extracting files trivial !
It 'll also backup across the network with ssh , so you can back up remote servers ; it 'll even back up Windows machines .
It does proper backup rotation ( I store two weeks ' worth of daily backups , then a a couple of weekly backups , then monthly ) .
It 's totally awesome .</tokentext>
<sentencetext>You may want to look at rsnapshot [rsnapshot.org].
It's a very small shell script that pretty much duplicates the functionality of Apple's Time Machine.
Each backup becomes a timestamped directory containing all the data in the backup; files that haven't changed from backup to backup are hardlinked together, so they only get stored once (per-file deduplication).
This makes incremental backups very cheap, while also avoiding the need for specialised backup restoration software.
It all works through the magic of rsync.
On my system, each incremental backup of a 24GB dataset occupies about 600MB (depending how many files have changed).
And each incremental backup is a complete, uncompressed copy of the dataset, making extracting files trivial!
It'll also backup across the network with ssh, so you can back up remote servers; it'll even back up Windows machines.
It does proper backup rotation (I store two weeks' worth of daily backups, then a a couple of weekly backups, then monthly).
It's totally awesome.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</id>
	<title>This just gave me a good idea!</title>
	<author>thePowerOfGrayskull</author>
	<datestamp>1269710220000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>3</modscore>
	<htmltext>Actually, just the title did it.  I've historically had a bad habit of backing things up by taking tar/gzs of directory structures, giving them an obscure name, and putting them onto network storage.   Or sometimes just copying directory structures without zipping first.  Needless to say, this makes for a huge mess.
<p>
Just occurred to me that it would not be difficult to write a quick script to extract everything into its own tree; run sha1sum on all files; and identify duplicate files automatically; probably in just one or two lines.
</p><p>
So in other words -- thanks Slashdot! The otherwise unintelligible summary did me a world of good -- mostly because there was no context as to what the hell it was talking about, so I had to supply my own definition...</p></htmltext>
<tokenext>Actually , just the title did it .
I 've historically had a bad habit of backing things up by taking tar/gzs of directory structures , giving them an obscure name , and putting them onto network storage .
Or sometimes just copying directory structures without zipping first .
Needless to say , this makes for a huge mess .
Just occurred to me that it would not be difficult to write a quick script to extract everything into its own tree ; run sha1sum on all files ; and identify duplicate files automatically ; probably in just one or two lines .
So in other words -- thanks Slashdot !
The otherwise unintelligible summary did me a world of good -- mostly because there was no context as to what the hell it was talking about , so I had to supply my own definition.. .</tokentext>
<sentencetext>Actually, just the title did it.
I've historically had a bad habit of backing things up by taking tar/gzs of directory structures, giving them an obscure name, and putting them onto network storage.
Or sometimes just copying directory structures without zipping first.
Needless to say, this makes for a huge mess.
Just occurred to me that it would not be difficult to write a quick script to extract everything into its own tree; run sha1sum on all files; and identify duplicate files automatically; probably in just one or two lines.
So in other words -- thanks Slashdot!
The otherwise unintelligible summary did me a world of good -- mostly because there was no context as to what the hell it was talking about, so I had to supply my own definition...</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31650172</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>jabuzz</author>
	<datestamp>1269770040000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>A NetApp is not a SAN, it is a fancy dressed up NAS that you pay through the nose for.</p></htmltext>
<tokenext>A NetApp is not a SAN , it is a fancy dressed up NAS that you pay through the nose for .</tokentext>
<sentencetext>A NetApp is not a SAN, it is a fancy dressed up NAS that you pay through the nose for.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645128</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645350</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>jdoverholt</author>
	<datestamp>1269714660000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>If you look at the sales materials from any of the big vendors (EMC, I'm looking at you), even a single system image shows reduction in size through block-level deduplication--even more through variable-sized blocks.  I can't recall the exact numbers, I'm at the end of a terribly long week, but I think it was somewhere around 10-30\% reduction in the day-0 backup size.  Subsequent days typically see a &gt;95\% reduction.<br> <br>

All sales literature, mind you.  My personal experience with it will begin in a few months, when we get our new Celerra installed<nobr> <wbr></nobr>:-)<br> <br>

P.S. Remember that a project such as this is good because it offers high-dollar features to low-dollar players who enjoy tinkering in their basements.  Such was the goal of Linux in the first place.  It's how, on a three-figure budget, a dedicated nerd can set up a several-terabyte file server with software RAID-6 protection and (soon) data deduplication--stuff you'd pay EMC 100-1000 times as much for.</htmltext>
<tokenext>If you look at the sales materials from any of the big vendors ( EMC , I 'm looking at you ) , even a single system image shows reduction in size through block-level deduplication--even more through variable-sized blocks .
I ca n't recall the exact numbers , I 'm at the end of a terribly long week , but I think it was somewhere around 10-30 \ % reduction in the day-0 backup size .
Subsequent days typically see a &gt; 95 \ % reduction .
All sales literature , mind you .
My personal experience with it will begin in a few months , when we get our new Celerra installed : - ) P.S .
Remember that a project such as this is good because it offers high-dollar features to low-dollar players who enjoy tinkering in their basements .
Such was the goal of Linux in the first place .
It 's how , on a three-figure budget , a dedicated nerd can set up a several-terabyte file server with software RAID-6 protection and ( soon ) data deduplication--stuff you 'd pay EMC 100-1000 times as much for .</tokentext>
<sentencetext>If you look at the sales materials from any of the big vendors (EMC, I'm looking at you), even a single system image shows reduction in size through block-level deduplication--even more through variable-sized blocks.
I can't recall the exact numbers, I'm at the end of a terribly long week, but I think it was somewhere around 10-30\% reduction in the day-0 backup size.
Subsequent days typically see a &gt;95\% reduction.
All sales literature, mind you.
My personal experience with it will begin in a few months, when we get our new Celerra installed :-) 

P.S.
Remember that a project such as this is good because it offers high-dollar features to low-dollar players who enjoy tinkering in their basements.
Such was the goal of Linux in the first place.
It's how, on a three-figure budget, a dedicated nerd can set up a several-terabyte file server with software RAID-6 protection and (soon) data deduplication--stuff you'd pay EMC 100-1000 times as much for.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645734</id>
	<title>Re:I was worried for a second there...</title>
	<author>GNUALMAFUERTE</author>
	<datestamp>1269807120000</datestamp>
	<modclass>Offtopic</modclass>
	<modscore>-1</modscore>
	<htmltext><p><a href="http://xkcd.com/225/" title="xkcd.com" rel="nofollow">http://xkcd.com/225/</a> [xkcd.com]</p></htmltext>
<tokenext>http : //xkcd.com/225/ [ xkcd.com ]</tokentext>
<sentencetext>http://xkcd.com/225/ [xkcd.com]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644900</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646374</id>
	<title>Re:This just gave me a good idea!</title>
	<author>slashflood</author>
	<datestamp>1269778260000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>One word: <a href="http://backuppc.sourceforge.net/info.html" title="sourceforge.net">BackupPC</a> [sourceforge.net].</htmltext>
<tokenext>One word : BackupPC [ sourceforge.net ] .</tokentext>
<sentencetext>One word: BackupPC [sourceforge.net].</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772</id>
	<title>In case you don't know much about it</title>
	<author>stoolpigeon</author>
	<datestamp>1269707520000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>5</modscore>
	<htmltext><p><a href="http://en.wikipedia.org/wiki/Data\_deduplication" title="wikipedia.org">Data deduplication</a> [wikipedia.org]<br>( I don't )</p></htmltext>
<tokenext>Data deduplication [ wikipedia.org ] ( I do n't )</tokentext>
<sentencetext>Data deduplication [wikipedia.org]( I don't )</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646496</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>DarkOx</author>
	<datestamp>1269780900000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Ok I'll bite.</p><p>Its real rarity in any of the enterprise environments that I have ever seen for minimal OS install installs to be the mode of operation on application servers (Unix and like); and I have never seen in on Windows based application servers.  I am not even certain I agree that its such a good idea.  Sure all the daemons not in use should not be started and ideally have had their execute bits turned off to avoid mistakes but when things go wrong its often helpful to have full platform availability.</p><p>So in lots of SAN based storage scenarios I suspect there is a great deal more than a few blocks to be saved on OS files alone.</p><p>Now for an application think about your typical corporate mail server, where users usually send 100 people a copy of the same speadsheet; times a thousand speadsheets, times a few hundred users.  Yea it would be nice if you could get them to use the collaboration application or at least the file server but that will never happen.  Exchange prior to 2k10 did that type of dedupe in the information store, but not any more.  Lets assume you have a 5 or 7TB of online mail storage.  An often quoted figure is 30\% will be duplicate in the environment I described.  SAN storage is still expensive enough that if you can cut that mail store down by a TB that is meaningful savings.  If that is not reason enough for you imaging you are doing some kind of SAN level replication to a hot site.  The less data you need to move the less connectivity you need to have to do it; at least in the States here D3s are not inexpensive.  Even if you are just scratching tape backups every night cutting down the size of the snap shot in anyway possible is a big win, anyone who has even been stressed to figure out backup windows will tell you that.</p></htmltext>
<tokenext>Ok I 'll bite.Its real rarity in any of the enterprise environments that I have ever seen for minimal OS install installs to be the mode of operation on application servers ( Unix and like ) ; and I have never seen in on Windows based application servers .
I am not even certain I agree that its such a good idea .
Sure all the daemons not in use should not be started and ideally have had their execute bits turned off to avoid mistakes but when things go wrong its often helpful to have full platform availability.So in lots of SAN based storage scenarios I suspect there is a great deal more than a few blocks to be saved on OS files alone.Now for an application think about your typical corporate mail server , where users usually send 100 people a copy of the same speadsheet ; times a thousand speadsheets , times a few hundred users .
Yea it would be nice if you could get them to use the collaboration application or at least the file server but that will never happen .
Exchange prior to 2k10 did that type of dedupe in the information store , but not any more .
Lets assume you have a 5 or 7TB of online mail storage .
An often quoted figure is 30 \ % will be duplicate in the environment I described .
SAN storage is still expensive enough that if you can cut that mail store down by a TB that is meaningful savings .
If that is not reason enough for you imaging you are doing some kind of SAN level replication to a hot site .
The less data you need to move the less connectivity you need to have to do it ; at least in the States here D3s are not inexpensive .
Even if you are just scratching tape backups every night cutting down the size of the snap shot in anyway possible is a big win , anyone who has even been stressed to figure out backup windows will tell you that .</tokentext>
<sentencetext>Ok I'll bite.Its real rarity in any of the enterprise environments that I have ever seen for minimal OS install installs to be the mode of operation on application servers (Unix and like); and I have never seen in on Windows based application servers.
I am not even certain I agree that its such a good idea.
Sure all the daemons not in use should not be started and ideally have had their execute bits turned off to avoid mistakes but when things go wrong its often helpful to have full platform availability.So in lots of SAN based storage scenarios I suspect there is a great deal more than a few blocks to be saved on OS files alone.Now for an application think about your typical corporate mail server, where users usually send 100 people a copy of the same speadsheet; times a thousand speadsheets, times a few hundred users.
Yea it would be nice if you could get them to use the collaboration application or at least the file server but that will never happen.
Exchange prior to 2k10 did that type of dedupe in the information store, but not any more.
Lets assume you have a 5 or 7TB of online mail storage.
An often quoted figure is 30\% will be duplicate in the environment I described.
SAN storage is still expensive enough that if you can cut that mail store down by a TB that is meaningful savings.
If that is not reason enough for you imaging you are doing some kind of SAN level replication to a hot site.
The less data you need to move the less connectivity you need to have to do it; at least in the States here D3s are not inexpensive.
Even if you are just scratching tape backups every night cutting down the size of the snap shot in anyway possible is a big win, anyone who has even been stressed to figure out backup windows will tell you that.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645108</id>
	<title>Confusing summary</title>
	<author>Brian Gordon</author>
	<datestamp>1269711180000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>The new deduplication-based file system called SDFS (GPL v2) is scalable to eight petabytes of capacity with 256 storage engines, which can each store up to 32TB of deduplicated data. Each volume can be up to 8 exabytes</p></div></blockquote><p>Can anyone offer wisdom on what the volume size is supposed to signify, being different from the maximum size that SDFS is scalable to?</p></div>
	</htmltext>
<tokenext>The new deduplication-based file system called SDFS ( GPL v2 ) is scalable to eight petabytes of capacity with 256 storage engines , which can each store up to 32TB of deduplicated data .
Each volume can be up to 8 exabytesCan anyone offer wisdom on what the volume size is supposed to signify , being different from the maximum size that SDFS is scalable to ?</tokentext>
<sentencetext>The new deduplication-based file system called SDFS (GPL v2) is scalable to eight petabytes of capacity with 256 storage engines, which can each store up to 32TB of deduplicated data.
Each volume can be up to 8 exabytesCan anyone offer wisdom on what the volume size is supposed to signify, being different from the maximum size that SDFS is scalable to?
	</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646128</id>
	<title>Great idea</title>
	<author>fearlezz</author>
	<datestamp>1269772080000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Too bad it's just another new filesystem. I would have preferred integration into (some future version of) EXTn or BTRFS.<br>Not only would that mean it gets more widely available, it also means you don't have to miss al the nice functions of these filesystems. You may even be able to use it out of the box.</p></htmltext>
<tokenext>Too bad it 's just another new filesystem .
I would have preferred integration into ( some future version of ) EXTn or BTRFS.Not only would that mean it gets more widely available , it also means you do n't have to miss al the nice functions of these filesystems .
You may even be able to use it out of the box .</tokentext>
<sentencetext>Too bad it's just another new filesystem.
I would have preferred integration into (some future version of) EXTn or BTRFS.Not only would that mean it gets more widely available, it also means you don't have to miss al the nice functions of these filesystems.
You may even be able to use it out of the box.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646088</id>
	<title>Re:Yea, I RTFA, but...</title>
	<author>TarpaKungs</author>
	<datestamp>1269771300000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>FSLint is very good.

<a href="http://www.pixelbeat.org/fslint/" title="pixelbeat.org">http://www.pixelbeat.org/fslint/</a> [pixelbeat.org]</htmltext>
<tokenext>FSLint is very good .
http : //www.pixelbeat.org/fslint/ [ pixelbeat.org ]</tokentext>
<sentencetext>FSLint is very good.
http://www.pixelbeat.org/fslint/ [pixelbeat.org]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31651432</id>
	<title>Re:Hasn't this been posted before?</title>
	<author>Anonymous</author>
	<datestamp>1269778680000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Yes - Slashdot swapped to this software and they've saved 30\% storage space with their duplicate posts<nobr> <wbr></nobr>/run and duck</p></htmltext>
<tokenext>Yes - Slashdot swapped to this software and they 've saved 30 \ % storage space with their duplicate posts /run and duck</tokentext>
<sentencetext>Yes - Slashdot swapped to this software and they've saved 30\% storage space with their duplicate posts /run and duck</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644812</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645338</id>
	<title>Off-site replication</title>
	<author>Dishwasha</author>
	<datestamp>1269714540000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>One of the biggest targets for data de-duplication is for efficient off-site replication which you see in the EMC Avamar product line.  This is advantageous when your WAN links aren't fast enough so that you can't do synchronous replication and a scheduled asynchronous replication would take too long.  I'd like to see the SDSF storage engine be intelligent enough to snapshot the data, then when the next "backup/replication" occurs, it gathers up all the hashes of the blocks that have changed since the snapshot was created, communicates those hashes to the off-site system, and then transfer just the blocks that currently don't have a comparable hash on the target system, the target system receives a complete hash table update of the snapshot block difference from the source, and then both systems merge their snapshots and then take a new snapshot to get ready for the next replication cycle.</p></htmltext>
<tokenext>One of the biggest targets for data de-duplication is for efficient off-site replication which you see in the EMC Avamar product line .
This is advantageous when your WAN links are n't fast enough so that you ca n't do synchronous replication and a scheduled asynchronous replication would take too long .
I 'd like to see the SDSF storage engine be intelligent enough to snapshot the data , then when the next " backup/replication " occurs , it gathers up all the hashes of the blocks that have changed since the snapshot was created , communicates those hashes to the off-site system , and then transfer just the blocks that currently do n't have a comparable hash on the target system , the target system receives a complete hash table update of the snapshot block difference from the source , and then both systems merge their snapshots and then take a new snapshot to get ready for the next replication cycle .</tokentext>
<sentencetext>One of the biggest targets for data de-duplication is for efficient off-site replication which you see in the EMC Avamar product line.
This is advantageous when your WAN links aren't fast enough so that you can't do synchronous replication and a scheduled asynchronous replication would take too long.
I'd like to see the SDSF storage engine be intelligent enough to snapshot the data, then when the next "backup/replication" occurs, it gathers up all the hashes of the blocks that have changed since the snapshot was created, communicates those hashes to the off-site system, and then transfer just the blocks that currently don't have a comparable hash on the target system, the target system receives a complete hash table update of the snapshot block difference from the source, and then both systems merge their snapshots and then take a new snapshot to get ready for the next replication cycle.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645696</id>
	<title>Re:A hypothetical question.</title>
	<author>GNUALMAFUERTE</author>
	<datestamp>1269806580000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Leaving aside vulnerabilities on any particular implementation, the only possible attack vector I see would be a bruteforce approach. Basically, a user in one VM creates random n bytes size files with all possible combinations of files of that size (off course, this would only be feasible for very small files, but<nobr> <wbr></nobr>/etc/shadow is usually small enough, and so is everything on $HOME/.ssh/). Eventually, the user would create a file that would match a copy on another VM. Off course, this would be useless without a way to check if another file was matched and deduplication took place. If the deduplication solution has any virtual guest software (like vmware tools), and that tool shares this kind of information with other systems, it might be possible, but that's a big might.</p><p>Any reasonably implemented deduplication solution should be 100\% transparent to the guest, and very secure.</p><p>And, to all the people talking about "shared resources", deduplication doesn't create "shared resources". Deduplication is not similar to symbolic links (ln -s). If you want  to compare it to links, you have to compare it to hard links, and that would be hard links that automatically dereferenced and created a new copy of the file with all the blocks as soon as the user wanted to write to that file. Remember, as soon as the file changes on any given guest, the information is not the same anymore, and so that file is not de-duplicated anymore. A user can change his copy of the file, not other people's files.</p></htmltext>
<tokenext>Leaving aside vulnerabilities on any particular implementation , the only possible attack vector I see would be a bruteforce approach .
Basically , a user in one VM creates random n bytes size files with all possible combinations of files of that size ( off course , this would only be feasible for very small files , but /etc/shadow is usually small enough , and so is everything on $ HOME/.ssh/ ) .
Eventually , the user would create a file that would match a copy on another VM .
Off course , this would be useless without a way to check if another file was matched and deduplication took place .
If the deduplication solution has any virtual guest software ( like vmware tools ) , and that tool shares this kind of information with other systems , it might be possible , but that 's a big might.Any reasonably implemented deduplication solution should be 100 \ % transparent to the guest , and very secure.And , to all the people talking about " shared resources " , deduplication does n't create " shared resources " .
Deduplication is not similar to symbolic links ( ln -s ) .
If you want to compare it to links , you have to compare it to hard links , and that would be hard links that automatically dereferenced and created a new copy of the file with all the blocks as soon as the user wanted to write to that file .
Remember , as soon as the file changes on any given guest , the information is not the same anymore , and so that file is not de-duplicated anymore .
A user can change his copy of the file , not other people 's files .</tokentext>
<sentencetext>Leaving aside vulnerabilities on any particular implementation, the only possible attack vector I see would be a bruteforce approach.
Basically, a user in one VM creates random n bytes size files with all possible combinations of files of that size (off course, this would only be feasible for very small files, but /etc/shadow is usually small enough, and so is everything on $HOME/.ssh/).
Eventually, the user would create a file that would match a copy on another VM.
Off course, this would be useless without a way to check if another file was matched and deduplication took place.
If the deduplication solution has any virtual guest software (like vmware tools), and that tool shares this kind of information with other systems, it might be possible, but that's a big might.Any reasonably implemented deduplication solution should be 100\% transparent to the guest, and very secure.And, to all the people talking about "shared resources", deduplication doesn't create "shared resources".
Deduplication is not similar to symbolic links (ln -s).
If you want  to compare it to links, you have to compare it to hard links, and that would be hard links that automatically dereferenced and created a new copy of the file with all the blocks as soon as the user wanted to write to that file.
Remember, as soon as the file changes on any given guest, the information is not the same anymore, and so that file is not de-duplicated anymore.
A user can change his copy of the file, not other people's files.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644794</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644992</id>
	<title>It wants Java :-(</title>
	<author>Anonymous</author>
	<datestamp>1269710040000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext>I wonder how well it performs, or if this is just functionality for demonstration purposes ?</htmltext>
<tokenext>I wonder how well it performs , or if this is just functionality for demonstration purposes ?</tokentext>
<sentencetext>I wonder how well it performs, or if this is just functionality for demonstration purposes ?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012</id>
	<title>How useful is this in realistic scenarios?</title>
	<author>marvin2k</author>
	<datestamp>1269710220000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Given that usually most of the disk space is swallowed by the data of an application and that data rarely is identical to the data on another system (why would you have two systems then?) I wonder how much this approach really buys you in "normal" scenarios especially given the CPU and disk I/O cost involved in finding and maintaining the de-duplicated blocks.
There may be a few very specific examples where this could really make a difference but can someone enlighten me how this is useful on say a physical system with 10 Centos VMs running different apps or similar apps with different data? You might save a few blocks because of the shared OS files but if you did a proper minimal OS install then the gain hardly seems to be worth the effort.</htmltext>
<tokenext>Given that usually most of the disk space is swallowed by the data of an application and that data rarely is identical to the data on another system ( why would you have two systems then ?
) I wonder how much this approach really buys you in " normal " scenarios especially given the CPU and disk I/O cost involved in finding and maintaining the de-duplicated blocks .
There may be a few very specific examples where this could really make a difference but can someone enlighten me how this is useful on say a physical system with 10 Centos VMs running different apps or similar apps with different data ?
You might save a few blocks because of the shared OS files but if you did a proper minimal OS install then the gain hardly seems to be worth the effort .</tokentext>
<sentencetext>Given that usually most of the disk space is swallowed by the data of an application and that data rarely is identical to the data on another system (why would you have two systems then?
) I wonder how much this approach really buys you in "normal" scenarios especially given the CPU and disk I/O cost involved in finding and maintaining the de-duplicated blocks.
There may be a few very specific examples where this could really make a difference but can someone enlighten me how this is useful on say a physical system with 10 Centos VMs running different apps or similar apps with different data?
You might save a few blocks because of the shared OS files but if you did a proper minimal OS install then the gain hardly seems to be worth the effort.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645386</id>
	<title>Re:Patent 5,813,008</title>
	<author>pem</author>
	<datestamp>1269714960000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext>A good lawyer could probably argue that this doesn't apply.
<p>
Claim 1(a) requires "dividing an information item into a common portion and a unique portion".
</p><p>
It may be that the patent covers the case where the unique portion is empty, but then again maybe not, <b>especially</b> if the computer never takes the step to find out!  In other words, if you treat every item as a common item (even if there is only one copy), there is a good chance the patent might not apply.
</p><p>
(There is also a good chance that the patent is written the way it is specifically because it doesn't apply to that case -- it may be that there is prior art in one of the referenced patents.)</p></htmltext>
<tokenext>A good lawyer could probably argue that this does n't apply .
Claim 1 ( a ) requires " dividing an information item into a common portion and a unique portion " .
It may be that the patent covers the case where the unique portion is empty , but then again maybe not , especially if the computer never takes the step to find out !
In other words , if you treat every item as a common item ( even if there is only one copy ) , there is a good chance the patent might not apply .
( There is also a good chance that the patent is written the way it is specifically because it does n't apply to that case -- it may be that there is prior art in one of the referenced patents .
)</tokentext>
<sentencetext>A good lawyer could probably argue that this doesn't apply.
Claim 1(a) requires "dividing an information item into a common portion and a unique portion".
It may be that the patent covers the case where the unique portion is empty, but then again maybe not, especially if the computer never takes the step to find out!
In other words, if you treat every item as a common item (even if there is only one copy), there is a good chance the patent might not apply.
(There is also a good chance that the patent is written the way it is specifically because it doesn't apply to that case -- it may be that there is prior art in one of the referenced patents.
)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645160</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645092</id>
	<title>User Land?  Come on!</title>
	<author>Gazzonyx</author>
	<datestamp>1269711000000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>[...] Opendedup runs in user space, making it platform independent, easier to scale and cluster, [...]</p></div><p>... and slow, prone to locking issues, etc.  There's a reason no one runs ZFS over FUSE, why would we do it with this?</p></div>
	</htmltext>
<tokenext>[ ... ] Opendedup runs in user space , making it platform independent , easier to scale and cluster , [ ... ] ... and slow , prone to locking issues , etc .
There 's a reason no one runs ZFS over FUSE , why would we do it with this ?</tokentext>
<sentencetext>[...] Opendedup runs in user space, making it platform independent, easier to scale and cluster, [...]... and slow, prone to locking issues, etc.
There's a reason no one runs ZFS over FUSE, why would we do it with this?
	</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645152</id>
	<title>Re:How useful is this in realistic scenarios?</title>
	<author>snikulin</author>
	<datestamp>1269711900000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Well, a really good and useful "home" scenario is a system backup of multiple computers with the same OS.<br>OS itself plus common software takes at least 20-30 GB per installation these days.</p><p>My WHS (which does support de-dup in form of <a href="http://en.wikipedia.org/wiki/Single\_Instance\_Store" title="wikipedia.org" rel="nofollow">Single-instance storage</a> [wikipedia.org]) server keeps full backup (3-months worth) of my seven Windows home computers on about 60 GB.</p><p>Unfortunately SIS does not work for WHS shared folders, so my two Linux machines' (my version control &amp; gallery servers) rsync backups over SMB are not de-duplicated by WHS.</p><p>I could probably save only<nobr> <wbr></nobr>/etc,<nobr> <wbr></nobr>/var and<nobr> <wbr></nobr>/srv of each server, but so far I backup everything.</p></htmltext>
<tokenext>Well , a really good and useful " home " scenario is a system backup of multiple computers with the same OS.OS itself plus common software takes at least 20-30 GB per installation these days.My WHS ( which does support de-dup in form of Single-instance storage [ wikipedia.org ] ) server keeps full backup ( 3-months worth ) of my seven Windows home computers on about 60 GB.Unfortunately SIS does not work for WHS shared folders , so my two Linux machines ' ( my version control &amp; gallery servers ) rsync backups over SMB are not de-duplicated by WHS.I could probably save only /etc , /var and /srv of each server , but so far I backup everything .</tokentext>
<sentencetext>Well, a really good and useful "home" scenario is a system backup of multiple computers with the same OS.OS itself plus common software takes at least 20-30 GB per installation these days.My WHS (which does support de-dup in form of Single-instance storage [wikipedia.org]) server keeps full backup (3-months worth) of my seven Windows home computers on about 60 GB.Unfortunately SIS does not work for WHS shared folders, so my two Linux machines' (my version control &amp; gallery servers) rsync backups over SMB are not de-duplicated by WHS.I could probably save only /etc, /var and /srv of each server, but so far I backup everything.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644794</id>
	<title>A hypothetical question.</title>
	<author>drolli</author>
	<datestamp>1269707820000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>I appreciate any deduplication solution for linux for sure, but isnt any deplucation creating a lot of shared ressources which could be possibly exploited for attacks (e.g. on the privacy of other users)?</htmltext>
<tokenext>I appreciate any deduplication solution for linux for sure , but isnt any deplucation creating a lot of shared ressources which could be possibly exploited for attacks ( e.g .
on the privacy of other users ) ?</tokentext>
<sentencetext>I appreciate any deduplication solution for linux for sure, but isnt any deplucation creating a lot of shared ressources which could be possibly exploited for attacks (e.g.
on the privacy of other users)?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645046</id>
	<title>This workgeat,nfc...</title>
	<author>Anonymous</author>
	<datestamp>1269710520000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>I usedthichnlgywrp!</p></htmltext>
<tokenext>I usedthichnlgywrp !</tokentext>
<sentencetext>I usedthichnlgywrp!</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646294</id>
	<title>Re:A hypothetical question.</title>
	<author>kitgerrits</author>
	<datestamp>1269776580000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Deduplication often relies on copy-on-write to maintain seperate versions after deduplication.<br>Once a block is deduplicated between users A, B and C into file Z and user B changes his file, the filesystem will record the change and point user B to block Z instead.<br>Other security issues (permissions) should be handled by the filesystem table, not the physical file.</p></htmltext>
<tokenext>Deduplication often relies on copy-on-write to maintain seperate versions after deduplication.Once a block is deduplicated between users A , B and C into file Z and user B changes his file , the filesystem will record the change and point user B to block Z instead.Other security issues ( permissions ) should be handled by the filesystem table , not the physical file .</tokentext>
<sentencetext>Deduplication often relies on copy-on-write to maintain seperate versions after deduplication.Once a block is deduplicated between users A, B and C into file Z and user B changes his file, the filesystem will record the change and point user B to block Z instead.Other security issues (permissions) should be handled by the filesystem table, not the physical file.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644794</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646494</id>
	<title>Re:In case you don't know much about it</title>
	<author>vrmlguy</author>
	<datestamp>1269780780000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Here's another explaination: <a href="http://storagezilla.typepad.com/storagezilla/2009/02/unified-storage-file-system-deduplication.html" title="typepad.com">http://storagezilla.typepad.com/storagezilla/2009/02/unified-storage-file-system-deduplication.html</a> [typepad.com]</p><p>There's a table about half-way down showing the differences between file-level dedup (elimination of duplicate files), fixed block dedup (elimilation of duplicate blocks as stored on the disk, which is what Opendedup is doing), and variable block dedup (which handles non-block aligned data, such as when you insert or delete someting at the start of a large file).  File level dedup is (almost) drop dead easy, you just take a checksum of every file and link those that match to a single copy.  (Handling file updates can be problematic, though.  You want your deduped files to be read-only.)  Fixed block is almost as easy, since a file is just a list of blocks.  You use FUSE to turn those blocks into fixed length files, which are then themselves deduped.  This fixes the file-update problem, since each update creates a new block.</p><p>Variable block dedup looks for special groups of bytes to divided a file into chunks (like using newlines to divide a text file into lines).  These chunks are then dedups as above.  If you aren't careful, you can waste space (since the blocks aren't exactly multiples of the disk's block size).  Random seeks can be harder, since you can't multiply the block number by the block size to find a location.</p></htmltext>
<tokenext>Here 's another explaination : http : //storagezilla.typepad.com/storagezilla/2009/02/unified-storage-file-system-deduplication.html [ typepad.com ] There 's a table about half-way down showing the differences between file-level dedup ( elimination of duplicate files ) , fixed block dedup ( elimilation of duplicate blocks as stored on the disk , which is what Opendedup is doing ) , and variable block dedup ( which handles non-block aligned data , such as when you insert or delete someting at the start of a large file ) .
File level dedup is ( almost ) drop dead easy , you just take a checksum of every file and link those that match to a single copy .
( Handling file updates can be problematic , though .
You want your deduped files to be read-only .
) Fixed block is almost as easy , since a file is just a list of blocks .
You use FUSE to turn those blocks into fixed length files , which are then themselves deduped .
This fixes the file-update problem , since each update creates a new block.Variable block dedup looks for special groups of bytes to divided a file into chunks ( like using newlines to divide a text file into lines ) .
These chunks are then dedups as above .
If you are n't careful , you can waste space ( since the blocks are n't exactly multiples of the disk 's block size ) .
Random seeks can be harder , since you ca n't multiply the block number by the block size to find a location .</tokentext>
<sentencetext>Here's another explaination: http://storagezilla.typepad.com/storagezilla/2009/02/unified-storage-file-system-deduplication.html [typepad.com]There's a table about half-way down showing the differences between file-level dedup (elimination of duplicate files), fixed block dedup (elimilation of duplicate blocks as stored on the disk, which is what Opendedup is doing), and variable block dedup (which handles non-block aligned data, such as when you insert or delete someting at the start of a large file).
File level dedup is (almost) drop dead easy, you just take a checksum of every file and link those that match to a single copy.
(Handling file updates can be problematic, though.
You want your deduped files to be read-only.
)  Fixed block is almost as easy, since a file is just a list of blocks.
You use FUSE to turn those blocks into fixed length files, which are then themselves deduped.
This fixes the file-update problem, since each update creates a new block.Variable block dedup looks for special groups of bytes to divided a file into chunks (like using newlines to divide a text file into lines).
These chunks are then dedups as above.
If you aren't careful, you can waste space (since the blocks aren't exactly multiples of the disk's block size).
Random seeks can be harder, since you can't multiply the block number by the block size to find a location.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645264</id>
	<title>Re:Excellent!</title>
	<author>az1324</author>
	<datestamp>1269713700000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>They are very poor programmers.  Almost nothing works in retarded clam shell (rcsh).</p></htmltext>
<tokenext>They are very poor programmers .
Almost nothing works in retarded clam shell ( rcsh ) .</tokentext>
<sentencetext>They are very poor programmers.
Almost nothing works in retarded clam shell (rcsh).</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644904</parent>
</comment>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_42</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645502
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_27</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645696
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644794
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_29</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31650172
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645128
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_32</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645694
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645068
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644794
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_2</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645392
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_24</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645734
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644900
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_49</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31647252
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645002
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644918
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_40</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646472
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_31</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31665938
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_14</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645316
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_30</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646498
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_21</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646340
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645736
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_8</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646422
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_46</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645664
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644940
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_19</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645712
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_50</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31654670
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_22</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646354
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_5</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645064
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644812
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_13</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645596
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_36</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645152
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_12</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646088
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_43</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31676942
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_6</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645082
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_44</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645264
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644904
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_35</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645386
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645160
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_1</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31725792
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_3</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31648310
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645350
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_11</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646562
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645058
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_34</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646594
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645736
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_25</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646134
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_41</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31649118
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645108
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_26</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645024
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644904
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_9</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646280
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_0</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31655446
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_28</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646890
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645068
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644794
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_33</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31647044
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_47</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645010
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644972
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_18</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645098
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_23</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645244
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_48</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31651190
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644972
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_39</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31651432
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644812
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_52</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645552
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_7</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646212
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644898
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_15</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644902
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_38</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646374
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_17</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645282
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_20</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646234
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_45</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31683992
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644898
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_16</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646538
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_10</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646894
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645736
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_4</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646294
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644794
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_51</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646494
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_03_28_0052234_37</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646496
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
</commentlist>
</thread>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.6</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644918
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645002
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31647252
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.4</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644904
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645024
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645264
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.8</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645852
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.17</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644772
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646280
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645160
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645386
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645552
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646494
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.2</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644966
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645058
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646562
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646088
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31665938
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645098
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31676942
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645082
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31647044
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.15</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645014
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646234
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645502
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645244
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646354
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646472
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31655446
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646538
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645712
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645316
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646374
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645392
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646134
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.5</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644898
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646212
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31683992
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.3</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645092
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.13</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644794
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646294
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645696
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645068
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645694
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646890
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.11</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645554
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.9</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644812
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645064
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31651432
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.7</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644786
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31654670
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646422
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644902
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31725792
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.16</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644940
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645664
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.0</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645408
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.14</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644972
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645010
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31651190
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.1</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645012
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645350
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31648310
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645282
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645596
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646498
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646496
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645736
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646894
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646594
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31646340
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645128
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31650172
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645152
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.12</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31644900
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645734
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_03_28_0052234.10</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31645108
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_03_28_0052234.31649118
</commentlist>
</conversation>
