<article>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#article10_02_23_1826226</id>
	<title>How Twitter Is Moving To the Cassandra Database</title>
	<author>kdawson</author>
	<datestamp>1266951300000</datestamp>
	<htmltext>MyNoSQL has up an interview with Ryan King on how <a href="http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king">Twitter is transitioning to the Cassandra database</a>. Here's some <a href="http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/">detailed background</a> on <a href="http://incubator.apache.org/cassandra/">Cassandra</a>, which aims to "bring together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model." Before settling on Cassandra, the Twitter team looked into: <i>"...HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable, and probably some others I'm forgetting. ... We're currently moving our largest (and most painful to maintain) table &mdash; the statuses table, which contains all tweets and retweets. ... Some side notes here about importing. We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast &mdash; it would saturate the backplane of our network. We've switched back to using the Thrift interface for bulk loading (and we still have to throttle it). The whole process takes about a week now. With infinite network bandwidth we could do it in about 7 hours on our current cluster."</i> Relatedly, an anonymous reader notes that the upcoming <a href="http://nosqlboston.eventbrite.com/">NoSQL Live</a> conference, which will take place in Boston March 11th, has announced their lineup of speakers and panelists including Ryan King and folks from LinkedIn, StumbleUpon, and Rackspace.</htmltext>
<tokenext>MyNoSQL has up an interview with Ryan King on how Twitter is transitioning to the Cassandra database .
Here 's some detailed background on Cassandra , which aims to " bring together Dynamo 's fully distributed design and Bigtable 's ColumnFamily-based data model .
" Before settling on Cassandra , the Twitter team looked into : " ...HBase , Voldemort , MongoDB , MemcacheDB , Redis , Cassandra , HyperTable , and probably some others I 'm forgetting .
... We 're currently moving our largest ( and most painful to maintain ) table    the statuses table , which contains all tweets and retweets .
... Some side notes here about importing .
We were originally trying to use the BinaryMemtable interface , but we actually found it to be too fast    it would saturate the backplane of our network .
We 've switched back to using the Thrift interface for bulk loading ( and we still have to throttle it ) .
The whole process takes about a week now .
With infinite network bandwidth we could do it in about 7 hours on our current cluster .
" Relatedly , an anonymous reader notes that the upcoming NoSQL Live conference , which will take place in Boston March 11th , has announced their lineup of speakers and panelists including Ryan King and folks from LinkedIn , StumbleUpon , and Rackspace .</tokentext>
<sentencetext>MyNoSQL has up an interview with Ryan King on how Twitter is transitioning to the Cassandra database.
Here's some detailed background on Cassandra, which aims to "bring together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.
" Before settling on Cassandra, the Twitter team looked into: "...HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable, and probably some others I'm forgetting.
... We're currently moving our largest (and most painful to maintain) table — the statuses table, which contains all tweets and retweets.
... Some side notes here about importing.
We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast — it would saturate the backplane of our network.
We've switched back to using the Thrift interface for bulk loading (and we still have to throttle it).
The whole process takes about a week now.
With infinite network bandwidth we could do it in about 7 hours on our current cluster.
" Relatedly, an anonymous reader notes that the upcoming NoSQL Live conference, which will take place in Boston March 11th, has announced their lineup of speakers and panelists including Ryan King and folks from LinkedIn, StumbleUpon, and Rackspace.</sentencetext>
</article>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249250</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>AndrewNeo</author>
	<datestamp>1266958560000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>4</modscore>
	<htmltext><p>I think their point is not <i>everything</i> needs an RDBMS, whereas before it was the 'go to' method of storing data.</p></htmltext>
<tokenext>I think their point is not everything needs an RDBMS , whereas before it was the 'go to ' method of storing data .</tokentext>
<sentencetext>I think their point is not everything needs an RDBMS, whereas before it was the 'go to' method of storing data.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248780</id>
	<title>Re:hmmm</title>
	<author>Anonymous</author>
	<datestamp>1266957180000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p><div class="quote"><p>facebook uses casandra</p></div><p>Bye bye Twitter, it was nice knowing you.</p><p>I'd rather not get tweets from last week showing up as "latest".</p></div>
	</htmltext>
<tokenext>facebook uses casandraBye bye Twitter , it was nice knowing you.I 'd rather not get tweets from last week showing up as " latest " .</tokentext>
<sentencetext>facebook uses casandraBye bye Twitter, it was nice knowing you.I'd rather not get tweets from last week showing up as "latest".
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248442</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252952</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>Anonymous</author>
	<datestamp>1266930120000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>From detailed background on cassandra link:</p><blockquote><div><p> <b>Multi-datacenter awareness</b>: you can adjust your node layout to ensure that if one datacenter burns in a fire, an alternative datacenter will have at least one full copy of every record.</p></div></blockquote><p>I didn't know that IP tunnels were able to transport EMP's. Awesome!</p></div>
	</htmltext>
<tokenext>From detailed background on cassandra link : Multi-datacenter awareness : you can adjust your node layout to ensure that if one datacenter burns in a fire , an alternative datacenter will have at least one full copy of every record.I did n't know that IP tunnels were able to transport EMP 's .
Awesome !</tokentext>
<sentencetext>From detailed background on cassandra link: Multi-datacenter awareness: you can adjust your node layout to ensure that if one datacenter burns in a fire, an alternative datacenter will have at least one full copy of every record.I didn't know that IP tunnels were able to transport EMP's.
Awesome!
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250930</id>
	<title>Re:And this is front page news, why?</title>
	<author>u38cg</author>
	<datestamp>1266921720000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext>Does Twitter really have loads which are more difficult to manage than, say, the BBC, CNN, Google, or Wikipedia?  I would have thought serving up a fairly straightforward page, a stylesheet, a background image and the tweets or twits or whatever they're called can't be that difficult compared to, say, Facebook.</htmltext>
<tokenext>Does Twitter really have loads which are more difficult to manage than , say , the BBC , CNN , Google , or Wikipedia ?
I would have thought serving up a fairly straightforward page , a stylesheet , a background image and the tweets or twits or whatever they 're called ca n't be that difficult compared to , say , Facebook .</tokentext>
<sentencetext>Does Twitter really have loads which are more difficult to manage than, say, the BBC, CNN, Google, or Wikipedia?
I would have thought serving up a fairly straightforward page, a stylesheet, a background image and the tweets or twits or whatever they're called can't be that difficult compared to, say, Facebook.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248994</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253390</id>
	<title>Re:Don't want to install Cassandra</title>
	<author>turbidostato</author>
	<datestamp>1266932160000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I hear Cassandra is really a trojan. Can anyone verify? I don't want a trojan on my "computer....."</p><p>But, but... what if I gift it you?  I swear I'm not Trojan but Greek.</p></htmltext>
<tokenext>I hear Cassandra is really a trojan .
Can anyone verify ?
I do n't want a trojan on my " computer..... " But , but... what if I gift it you ?
I swear I 'm not Trojan but Greek .</tokentext>
<sentencetext>I hear Cassandra is really a trojan.
Can anyone verify?
I don't want a trojan on my "computer....."But, but... what if I gift it you?
I swear I'm not Trojan but Greek.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249820</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250706</id>
	<title>NiG6a</title>
	<author>Anonymous</author>
	<datestamp>1266920940000</datestamp>
	<modclass>Troll</modclass>
	<modscore>-1</modscore>
	<htmltext>Preferrably with an Inw eternity...Romeo IS DYING LIKE THE</htmltext>
<tokenext>Preferrably with an Inw eternity...Romeo IS DYING LIKE THE</tokentext>
<sentencetext>Preferrably with an Inw eternity...Romeo IS DYING LIKE THE</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251274</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>roman\_mir</author>
	<datestamp>1266922740000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Let me put it this way, it will make it perfectly clear: if twoofter is regenerating every page on every hit and they are running into issues with speed, then their problem is not their data storage, it's their design.  Now that it is clear they don't care about data consistency, I have the solution for them.</p><p>They just need to regenerate the pages once in a few minutes on some large node and then push the static content to their webservers.  Done.  And that's why they sometimes pay me the big bucks<nobr> <wbr></nobr>:) to think of the obvious.</p></htmltext>
<tokenext>Let me put it this way , it will make it perfectly clear : if twoofter is regenerating every page on every hit and they are running into issues with speed , then their problem is not their data storage , it 's their design .
Now that it is clear they do n't care about data consistency , I have the solution for them.They just need to regenerate the pages once in a few minutes on some large node and then push the static content to their webservers .
Done. And that 's why they sometimes pay me the big bucks : ) to think of the obvious .</tokentext>
<sentencetext>Let me put it this way, it will make it perfectly clear: if twoofter is regenerating every page on every hit and they are running into issues with speed, then their problem is not their data storage, it's their design.
Now that it is clear they don't care about data consistency, I have the solution for them.They just need to regenerate the pages once in a few minutes on some large node and then push the static content to their webservers.
Done.  And that's why they sometimes pay me the big bucks :) to think of the obvious.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249250</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253408</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>Doomdark</author>
	<datestamp>1266932280000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><i>
Is there really a huge issue with rdbms speeds? Well if there is something there, that's what needs to be looked at. If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.
</i>
<p>
What makes you think this has not been done? Sometimes combination of arrogance and ignorance here is amazing. Very bright minds are working on all kinds of approaches; and of course Oracle (et al) are working on their set of tools to improve them as well.
</p><p>
In reality it is ALL about different compromises. RDBMS pay hefty price for ACID, and that is ok if that is what you absolutely need. But there is no way to horizontally scale them efficiently (or, after some point, at all). This can be solved by rethinking what your actual requirements are -- if you can loosen some of the requirements by adopting "eventual consistency", you can get much better scalability and availability. You can not just add more boxes to your Oracle cluster: you need bigger box(es). Period. But you can easily add new hosts on your no-sql clusters (depends on system, for some its easier than others; but this is big focus for all of them).
It is not even so much about speed (of individual requests) but throughtput, and ability to incrementally increase it as needed.
</p><p>
There are certainly cases where you'd rather want full ACID set for authoritative data. And then there are many cases -- not just read-only/caching -- where it is acceptable to have intermediate inconsistent states. For Amazon Dynamo was used for shopping carts, for example. Oracle database was not cost-effective, and by cost I don't mean license costs, but maintenance (and license, h/w etc). It was designed to solve a specific problem. Other companies are building similar solutions.</p></htmltext>
<tokenext>Is there really a huge issue with rdbms speeds ?
Well if there is something there , that 's what needs to be looked at .
If RDBMSs are not fast enough , that 's just an opportunity to work more on them to speed them up .
What makes you think this has not been done ?
Sometimes combination of arrogance and ignorance here is amazing .
Very bright minds are working on all kinds of approaches ; and of course Oracle ( et al ) are working on their set of tools to improve them as well .
In reality it is ALL about different compromises .
RDBMS pay hefty price for ACID , and that is ok if that is what you absolutely need .
But there is no way to horizontally scale them efficiently ( or , after some point , at all ) .
This can be solved by rethinking what your actual requirements are -- if you can loosen some of the requirements by adopting " eventual consistency " , you can get much better scalability and availability .
You can not just add more boxes to your Oracle cluster : you need bigger box ( es ) .
Period. But you can easily add new hosts on your no-sql clusters ( depends on system , for some its easier than others ; but this is big focus for all of them ) .
It is not even so much about speed ( of individual requests ) but throughtput , and ability to incrementally increase it as needed .
There are certainly cases where you 'd rather want full ACID set for authoritative data .
And then there are many cases -- not just read-only/caching -- where it is acceptable to have intermediate inconsistent states .
For Amazon Dynamo was used for shopping carts , for example .
Oracle database was not cost-effective , and by cost I do n't mean license costs , but maintenance ( and license , h/w etc ) .
It was designed to solve a specific problem .
Other companies are building similar solutions .</tokentext>
<sentencetext>
Is there really a huge issue with rdbms speeds?
Well if there is something there, that's what needs to be looked at.
If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.
What makes you think this has not been done?
Sometimes combination of arrogance and ignorance here is amazing.
Very bright minds are working on all kinds of approaches; and of course Oracle (et al) are working on their set of tools to improve them as well.
In reality it is ALL about different compromises.
RDBMS pay hefty price for ACID, and that is ok if that is what you absolutely need.
But there is no way to horizontally scale them efficiently (or, after some point, at all).
This can be solved by rethinking what your actual requirements are -- if you can loosen some of the requirements by adopting "eventual consistency", you can get much better scalability and availability.
You can not just add more boxes to your Oracle cluster: you need bigger box(es).
Period. But you can easily add new hosts on your no-sql clusters (depends on system, for some its easier than others; but this is big focus for all of them).
It is not even so much about speed (of individual requests) but throughtput, and ability to incrementally increase it as needed.
There are certainly cases where you'd rather want full ACID set for authoritative data.
And then there are many cases -- not just read-only/caching -- where it is acceptable to have intermediate inconsistent states.
For Amazon Dynamo was used for shopping carts, for example.
Oracle database was not cost-effective, and by cost I don't mean license costs, but maintenance (and license, h/w etc).
It was designed to solve a specific problem.
Other companies are building similar solutions.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249816</id>
	<title>Re:And this is front page news, why?</title>
	<author>Monkeedude1212</author>
	<datestamp>1266917580000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>2</modscore>
	<htmltext><p>I suppose then why would we care if any site made any random change to any part of its infrastructure?</p><p>Twitter is a -very- busy site.</p><p>They are changing their infrastructure to accomodate. Here's what they looked at, here is what they chose. If you are looking for something with equal performance, you don't have to shop around.</p></htmltext>
<tokenext>I suppose then why would we care if any site made any random change to any part of its infrastructure ? Twitter is a -very- busy site.They are changing their infrastructure to accomodate .
Here 's what they looked at , here is what they chose .
If you are looking for something with equal performance , you do n't have to shop around .</tokentext>
<sentencetext>I suppose then why would we care if any site made any random change to any part of its infrastructure?Twitter is a -very- busy site.They are changing their infrastructure to accomodate.
Here's what they looked at, here is what they chose.
If you are looking for something with equal performance, you don't have to shop around.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251208</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>Anonymous</author>
	<datestamp>1266922560000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>In principle, a database itself has no cost associated with integrity compared to Cassandra or the others. If you do away with foreign keys, the only "slowdown" would be due to primary/unique key constraints, which *any* map type storage with incur, because checking unicity is O(1) if you're indexing at the same time. Now, there *is* a cost associated with transactional integrity, but that is a latency and not throughput problem. To simplify, if you require transactional integrity, you need to flush and thus wait for the seek and the platter to rotate to wherever the data needs to be written. If matters little if you're committing 1 or 1000 transactions, once you're there the disk bandwidth can take it, it's getting there that is the issue, and that means latency (ignoring SSDs).</p><p>Why does it matter then?</p><p>Because every single DB interface in existence is synchronous. So while the DB can handle 1M TXs a second, that would require 10000 threads on the web side, each working for a negligible amount of time and waiting for 10ms. And that doesn't work.</p><p>Since they can't fix that, the only option is to have the request complete in<nobr> <wbr></nobr>.1 ms. Then you only need 100 threads. But then you need to do away with transactional integrity. They decided they could do away with that. Fair enough. But the problem there is not RDBMS themselves, it's the sorry APIs and drivers we have to work with.</p></htmltext>
<tokenext>In principle , a database itself has no cost associated with integrity compared to Cassandra or the others .
If you do away with foreign keys , the only " slowdown " would be due to primary/unique key constraints , which * any * map type storage with incur , because checking unicity is O ( 1 ) if you 're indexing at the same time .
Now , there * is * a cost associated with transactional integrity , but that is a latency and not throughput problem .
To simplify , if you require transactional integrity , you need to flush and thus wait for the seek and the platter to rotate to wherever the data needs to be written .
If matters little if you 're committing 1 or 1000 transactions , once you 're there the disk bandwidth can take it , it 's getting there that is the issue , and that means latency ( ignoring SSDs ) .Why does it matter then ? Because every single DB interface in existence is synchronous .
So while the DB can handle 1M TXs a second , that would require 10000 threads on the web side , each working for a negligible amount of time and waiting for 10ms .
And that does n't work.Since they ca n't fix that , the only option is to have the request complete in .1 ms. Then you only need 100 threads .
But then you need to do away with transactional integrity .
They decided they could do away with that .
Fair enough .
But the problem there is not RDBMS themselves , it 's the sorry APIs and drivers we have to work with .</tokentext>
<sentencetext>In principle, a database itself has no cost associated with integrity compared to Cassandra or the others.
If you do away with foreign keys, the only "slowdown" would be due to primary/unique key constraints, which *any* map type storage with incur, because checking unicity is O(1) if you're indexing at the same time.
Now, there *is* a cost associated with transactional integrity, but that is a latency and not throughput problem.
To simplify, if you require transactional integrity, you need to flush and thus wait for the seek and the platter to rotate to wherever the data needs to be written.
If matters little if you're committing 1 or 1000 transactions, once you're there the disk bandwidth can take it, it's getting there that is the issue, and that means latency (ignoring SSDs).Why does it matter then?Because every single DB interface in existence is synchronous.
So while the DB can handle 1M TXs a second, that would require 10000 threads on the web side, each working for a negligible amount of time and waiting for 10ms.
And that doesn't work.Since they can't fix that, the only option is to have the request complete in .1 ms. Then you only need 100 threads.
But then you need to do away with transactional integrity.
They decided they could do away with that.
Fair enough.
But the problem there is not RDBMS themselves, it's the sorry APIs and drivers we have to work with.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249450</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248370</id>
	<title>Cassandra, eh?</title>
	<author>maugle</author>
	<datestamp>1266956100000</datestamp>
	<modclass>Funny</modclass>
	<modscore>4</modscore>
	<htmltext>I hear Cassandra can even predict when disastrous system failures are going to occur!  Unfortunately, for some reason nobody ever believes the warnings.</htmltext>
<tokenext>I hear Cassandra can even predict when disastrous system failures are going to occur !
Unfortunately , for some reason nobody ever believes the warnings .</tokentext>
<sentencetext>I hear Cassandra can even predict when disastrous system failures are going to occur!
Unfortunately, for some reason nobody ever believes the warnings.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249292</id>
	<title>Re:hmmm</title>
	<author>clarkkent09</author>
	<datestamp>1266958680000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Yeah, but in those cases if something horrible happens and all data gets deleted, nothing of value will be lost, whereas with slashdot........ok never mind</htmltext>
<tokenext>Yeah , but in those cases if something horrible happens and all data gets deleted , nothing of value will be lost , whereas with slashdot........ok never mind</tokentext>
<sentencetext>Yeah, but in those cases if something horrible happens and all data gets deleted, nothing of value will be lost, whereas with slashdot........ok never mind</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248442</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248482</id>
	<title>Re:Don't believe them!</title>
	<author>Push Latency</author>
	<datestamp>1266956460000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>I took an axe to my last Cassandra cluster and feel quite better now.</htmltext>
<tokenext>I took an axe to my last Cassandra cluster and feel quite better now .</tokentext>
<sentencetext>I took an axe to my last Cassandra cluster and feel quite better now.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248230</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251430</id>
	<title>Java / JVM Wins Again ...</title>
	<author>zuperduperman</author>
	<datestamp>1266923340000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>It's fascinating how after initially being a posterboy for the post-Java revolution Twitter is gradually moving their architecture to the JVM, piece by piece.   I think it's actually a credit to them that they seem to have level heads and are evaluating technology on it's merits (where as if you talk to most of the ruby / python crowd they would rather stick toothpicks in their eyes than endorse a solution that involves java).</p></htmltext>
<tokenext>It 's fascinating how after initially being a posterboy for the post-Java revolution Twitter is gradually moving their architecture to the JVM , piece by piece .
I think it 's actually a credit to them that they seem to have level heads and are evaluating technology on it 's merits ( where as if you talk to most of the ruby / python crowd they would rather stick toothpicks in their eyes than endorse a solution that involves java ) .</tokentext>
<sentencetext>It's fascinating how after initially being a posterboy for the post-Java revolution Twitter is gradually moving their architecture to the JVM, piece by piece.
I think it's actually a credit to them that they seem to have level heads and are evaluating technology on it's merits (where as if you talk to most of the ruby / python crowd they would rather stick toothpicks in their eyes than endorse a solution that involves java).</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31257162</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>Anonymous</author>
	<datestamp>1265107500000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Cause SQL doesnt scale proper.<br>Check out Ars Technica's feature about NoSQL :</p><p><a href="http://arstechnica.com/business/data-centers/2010/02/-since-the-rise-of.ars" title="arstechnica.com" rel="nofollow">http://arstechnica.com/business/data-centers/2010/02/-since-the-rise-of.ars</a> [arstechnica.com]</p></htmltext>
<tokenext>Cause SQL doesnt scale proper.Check out Ars Technica 's feature about NoSQL : http : //arstechnica.com/business/data-centers/2010/02/-since-the-rise-of.ars [ arstechnica.com ]</tokentext>
<sentencetext>Cause SQL doesnt scale proper.Check out Ars Technica's feature about NoSQL :http://arstechnica.com/business/data-centers/2010/02/-since-the-rise-of.ars [arstechnica.com]</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31254618</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>foxylad</author>
	<datestamp>1266939720000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>&gt; Is there really a huge issue with rdbms speeds? Well if there is something there, that's what needs to be looked at. If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.</p><p>To my mind, it's scaling rather than speed that is the issue. Having seen an RDBMS web app grow in popularity until we needed two DB machines, I have some inkling of how painful that transition is. So now I use Appengine when I can, which scales completely painlessly. There is a trade-off because you have to un-learn long-held habits (like normalisation), but if my app hits Oprah I'll be listening to champagne corks popping, not processors.</p></htmltext>
<tokenext>&gt; Is there really a huge issue with rdbms speeds ?
Well if there is something there , that 's what needs to be looked at .
If RDBMSs are not fast enough , that 's just an opportunity to work more on them to speed them up.To my mind , it 's scaling rather than speed that is the issue .
Having seen an RDBMS web app grow in popularity until we needed two DB machines , I have some inkling of how painful that transition is .
So now I use Appengine when I can , which scales completely painlessly .
There is a trade-off because you have to un-learn long-held habits ( like normalisation ) , but if my app hits Oprah I 'll be listening to champagne corks popping , not processors .</tokentext>
<sentencetext>&gt; Is there really a huge issue with rdbms speeds?
Well if there is something there, that's what needs to be looked at.
If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.To my mind, it's scaling rather than speed that is the issue.
Having seen an RDBMS web app grow in popularity until we needed two DB machines, I have some inkling of how painful that transition is.
So now I use Appengine when I can, which scales completely painlessly.
There is a trade-off because you have to un-learn long-held habits (like normalisation), but if my app hits Oprah I'll be listening to champagne corks popping, not processors.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252622</id>
	<title>Re:And this is front page news, why?</title>
	<author>DragonWriter</author>
	<datestamp>1266928620000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it?</p></div></blockquote><p>Because in some areas Twitter is at an extreme of scale, so what they are doing to deal with that extreme of scale (even if it isn't necessarily always the <i>ideal</i> choice) is usually interesting since, if you are looking for things that have been done in production to deal with the kind of scaling they experience, there aren't a lot of other data points to find.</p></div>
	</htmltext>
<tokenext>Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it ? Because in some areas Twitter is at an extreme of scale , so what they are doing to deal with that extreme of scale ( even if it is n't necessarily always the ideal choice ) is usually interesting since , if you are looking for things that have been done in production to deal with the kind of scaling they experience , there are n't a lot of other data points to find .</tokentext>
<sentencetext>Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it?Because in some areas Twitter is at an extreme of scale, so what they are doing to deal with that extreme of scale (even if it isn't necessarily always the ideal choice) is usually interesting since, if you are looking for things that have been done in production to deal with the kind of scaling they experience, there aren't a lot of other data points to find.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255320</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>maraist</author>
	<datestamp>1266944760000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>RDBMS's are optimized for READS, not writes.  You can produce a 1,000 machine mysql-INNODB cluster that will be faster than memcached and be fully ACID complaint.  But you'll only ever have 1 write node.  You CAN do sharded masters with interleaved auto-incremented values, but then your foreign keys are totally out the window - as is your ACIDity.  Oracle has clustered lock managers, but very quickly is going to max out it's scalability - especially if it's limited to a single SAN.<br> <br>Relatively expensive 15,000 RPM disks are going to max out near 15,000 random seeks per second.  RAID-10 or even RAID-50 (if you're sadistic) is only going to give you a small constant multiplier to this performance.  And if you're maxing out said items, then the SCSI queueing and multi-gig RAID-controller memory cards will buy you mere seconds of peek-performance.. Utterly useless in sustained writes.<br> <br>SSD's alleviate some of the disk-based limitations, EXCEPT that you are constrained by three factors.. 1) Disk size 2) SSD's like large block-sizes  3) SSD's can't write to one location too often. Thus the modern high performance SSDs do address remapping, which eventually degrades the overall performance.  And ironically, while random-IO is faster on SSD's than disks, linear writes seem to be faster on disks than SSDs.  Obviously SSD's are still in their infancy.  The game may change any year now.<br> <br>The two core write-scaling problems above are the inter-table dependencies (the foreign keys) and the random-IO necessary for diskhashtable or B+Tree backingstore layouts (factorial-layouts alleviate this somewhat).  This also applies to read when you do large M-way joins of multi-giga-record tables.  You're essentially requiring over a billion disk seeks to satisfy a single query - completely unmanageable.  Yes there were ways to re-architect the product to mitigate this type of query (denormalization, externalized batch journaling, etc) - but your argument was that RDBMS's solve problems - this is a problem that requires hackery to avoid the intrinsic flaw in RDBMS's foreign key/join-key architecture.<br> <br>Google's BigTable paradigm does away with the need for foreign keys by simply providing a 1-to-many relationship as a 3'rd dimension to the simple flat table.  Yes this doesn't solve 4D problems, but MOST RDBMS's could be done away with by simply living in this 3D space.<br> <br>You could achieve this pattern with existing RDBMS's simply by storing a hash-map in a blob column.  But this would not be efficient at all.  You'd have to lock the entire row and rewrite the entire blob to change a single value.<br> <br>BigTable gives you MVCC on each 3D key-value pair with locking to the primary-key of the row.  It's a column-oriented database (of which there are many in RDBMSs), but almost all the real meat is in this 3'rd dimension which is stored in a versioned, replicated, append-only manner.<br> <br>The append-only immuteable data naturally fits a read scaling model.. Once saved to disk, you replicate the recordset to dozens, if not hundreds or thousands of machines (typically on a copy-on-cache-miss model when hitting one of a thousand servers).  You then leverage the map-reduce model, to make sure you catch any writing nodes for the given column of interest, then on the reduce, you choose the newest version.  Thus you have consistency (unlike some scalable approaches that do an 'eventually consistent' model).<br> <br>Because of the map-reduced MVCC, you can then shard out writes to random machines.. It literally doesn't matter where the inserts/updates/deletes get written to, because on the next read, only the newest version will be passed to the client.  There is some contention in centrally managing which nodes are doing writes, but at least you can spread writes to at least a dozen machines per column.. And with say a dozen columns, that means spreading writes across a hundred nodes.  And with a dozen tables, you're over a thousand write nodes (across multiple data-centers or at least isolated networks) (though obviously you</htmltext>
<tokenext>RDBMS 's are optimized for READS , not writes .
You can produce a 1,000 machine mysql-INNODB cluster that will be faster than memcached and be fully ACID complaint .
But you 'll only ever have 1 write node .
You CAN do sharded masters with interleaved auto-incremented values , but then your foreign keys are totally out the window - as is your ACIDity .
Oracle has clustered lock managers , but very quickly is going to max out it 's scalability - especially if it 's limited to a single SAN .
Relatively expensive 15,000 RPM disks are going to max out near 15,000 random seeks per second .
RAID-10 or even RAID-50 ( if you 're sadistic ) is only going to give you a small constant multiplier to this performance .
And if you 're maxing out said items , then the SCSI queueing and multi-gig RAID-controller memory cards will buy you mere seconds of peek-performance.. Utterly useless in sustained writes .
SSD 's alleviate some of the disk-based limitations , EXCEPT that you are constrained by three factors.. 1 ) Disk size 2 ) SSD 's like large block-sizes 3 ) SSD 's ca n't write to one location too often .
Thus the modern high performance SSDs do address remapping , which eventually degrades the overall performance .
And ironically , while random-IO is faster on SSD 's than disks , linear writes seem to be faster on disks than SSDs .
Obviously SSD 's are still in their infancy .
The game may change any year now .
The two core write-scaling problems above are the inter-table dependencies ( the foreign keys ) and the random-IO necessary for diskhashtable or B + Tree backingstore layouts ( factorial-layouts alleviate this somewhat ) .
This also applies to read when you do large M-way joins of multi-giga-record tables .
You 're essentially requiring over a billion disk seeks to satisfy a single query - completely unmanageable .
Yes there were ways to re-architect the product to mitigate this type of query ( denormalization , externalized batch journaling , etc ) - but your argument was that RDBMS 's solve problems - this is a problem that requires hackery to avoid the intrinsic flaw in RDBMS 's foreign key/join-key architecture .
Google 's BigTable paradigm does away with the need for foreign keys by simply providing a 1-to-many relationship as a 3'rd dimension to the simple flat table .
Yes this does n't solve 4D problems , but MOST RDBMS 's could be done away with by simply living in this 3D space .
You could achieve this pattern with existing RDBMS 's simply by storing a hash-map in a blob column .
But this would not be efficient at all .
You 'd have to lock the entire row and rewrite the entire blob to change a single value .
BigTable gives you MVCC on each 3D key-value pair with locking to the primary-key of the row .
It 's a column-oriented database ( of which there are many in RDBMSs ) , but almost all the real meat is in this 3'rd dimension which is stored in a versioned , replicated , append-only manner .
The append-only immuteable data naturally fits a read scaling model.. Once saved to disk , you replicate the recordset to dozens , if not hundreds or thousands of machines ( typically on a copy-on-cache-miss model when hitting one of a thousand servers ) .
You then leverage the map-reduce model , to make sure you catch any writing nodes for the given column of interest , then on the reduce , you choose the newest version .
Thus you have consistency ( unlike some scalable approaches that do an 'eventually consistent ' model ) .
Because of the map-reduced MVCC , you can then shard out writes to random machines.. It literally does n't matter where the inserts/updates/deletes get written to , because on the next read , only the newest version will be passed to the client .
There is some contention in centrally managing which nodes are doing writes , but at least you can spread writes to at least a dozen machines per column.. And with say a dozen columns , that means spreading writes across a hundred nodes .
And with a dozen tables , you 're over a thousand write nodes ( across multiple data-centers or at least isolated networks ) ( though obviously you</tokentext>
<sentencetext>RDBMS's are optimized for READS, not writes.
You can produce a 1,000 machine mysql-INNODB cluster that will be faster than memcached and be fully ACID complaint.
But you'll only ever have 1 write node.
You CAN do sharded masters with interleaved auto-incremented values, but then your foreign keys are totally out the window - as is your ACIDity.
Oracle has clustered lock managers, but very quickly is going to max out it's scalability - especially if it's limited to a single SAN.
Relatively expensive 15,000 RPM disks are going to max out near 15,000 random seeks per second.
RAID-10 or even RAID-50 (if you're sadistic) is only going to give you a small constant multiplier to this performance.
And if you're maxing out said items, then the SCSI queueing and multi-gig RAID-controller memory cards will buy you mere seconds of peek-performance.. Utterly useless in sustained writes.
SSD's alleviate some of the disk-based limitations, EXCEPT that you are constrained by three factors.. 1) Disk size 2) SSD's like large block-sizes  3) SSD's can't write to one location too often.
Thus the modern high performance SSDs do address remapping, which eventually degrades the overall performance.
And ironically, while random-IO is faster on SSD's than disks, linear writes seem to be faster on disks than SSDs.
Obviously SSD's are still in their infancy.
The game may change any year now.
The two core write-scaling problems above are the inter-table dependencies (the foreign keys) and the random-IO necessary for diskhashtable or B+Tree backingstore layouts (factorial-layouts alleviate this somewhat).
This also applies to read when you do large M-way joins of multi-giga-record tables.
You're essentially requiring over a billion disk seeks to satisfy a single query - completely unmanageable.
Yes there were ways to re-architect the product to mitigate this type of query (denormalization, externalized batch journaling, etc) - but your argument was that RDBMS's solve problems - this is a problem that requires hackery to avoid the intrinsic flaw in RDBMS's foreign key/join-key architecture.
Google's BigTable paradigm does away with the need for foreign keys by simply providing a 1-to-many relationship as a 3'rd dimension to the simple flat table.
Yes this doesn't solve 4D problems, but MOST RDBMS's could be done away with by simply living in this 3D space.
You could achieve this pattern with existing RDBMS's simply by storing a hash-map in a blob column.
But this would not be efficient at all.
You'd have to lock the entire row and rewrite the entire blob to change a single value.
BigTable gives you MVCC on each 3D key-value pair with locking to the primary-key of the row.
It's a column-oriented database (of which there are many in RDBMSs), but almost all the real meat is in this 3'rd dimension which is stored in a versioned, replicated, append-only manner.
The append-only immuteable data naturally fits a read scaling model.. Once saved to disk, you replicate the recordset to dozens, if not hundreds or thousands of machines (typically on a copy-on-cache-miss model when hitting one of a thousand servers).
You then leverage the map-reduce model, to make sure you catch any writing nodes for the given column of interest, then on the reduce, you choose the newest version.
Thus you have consistency (unlike some scalable approaches that do an 'eventually consistent' model).
Because of the map-reduced MVCC, you can then shard out writes to random machines.. It literally doesn't matter where the inserts/updates/deletes get written to, because on the next read, only the newest version will be passed to the client.
There is some contention in centrally managing which nodes are doing writes, but at least you can spread writes to at least a dozen machines per column.. And with say a dozen columns, that means spreading writes across a hundred nodes.
And with a dozen tables, you're over a thousand write nodes (across multiple data-centers or at least isolated networks) (though obviously you</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31254070</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>Eil</author>
	<datestamp>1266935820000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>I laugh at how none of the requirements included disaster recovery. No single point of failure does not preclude failing at every point simultaneously. EMP bomb at your primary datacenter anyone?</p></div></blockquote><p>1) They never said they didn't plan for disaster recovery. It's silly to deride them for not discussing the entirety of their backups and disaster recovery efforts when the whole topic of the article was their move to Cassandra as a primary data store.</p><p>2) Disaster recovery looks at realistic threat scenarios. Fire, sabotage, natural disaster, and so on. "EMP bomb at your primary datacenter" is wholly unlikely. Nobody can plan for failure at all points simultaneously because "all points" includes everything in their entire operation including backups and redundant systems. What do you want them to do, make hourly offline backups and bury the tapes under a mountain in China? The point of DR is to make your systems diverse, redundant, and operable against a broad category of general failures. Not fully invulnerable to every random specific <a href="http://en.wikipedia.org/wiki/Movie\_plot\_threat" title="wikipedia.org">movie plot threat</a> [wikipedia.org] someone happens to come up with.</p></div>
	</htmltext>
<tokenext>I laugh at how none of the requirements included disaster recovery .
No single point of failure does not preclude failing at every point simultaneously .
EMP bomb at your primary datacenter anyone ? 1 ) They never said they did n't plan for disaster recovery .
It 's silly to deride them for not discussing the entirety of their backups and disaster recovery efforts when the whole topic of the article was their move to Cassandra as a primary data store.2 ) Disaster recovery looks at realistic threat scenarios .
Fire , sabotage , natural disaster , and so on .
" EMP bomb at your primary datacenter " is wholly unlikely .
Nobody can plan for failure at all points simultaneously because " all points " includes everything in their entire operation including backups and redundant systems .
What do you want them to do , make hourly offline backups and bury the tapes under a mountain in China ?
The point of DR is to make your systems diverse , redundant , and operable against a broad category of general failures .
Not fully invulnerable to every random specific movie plot threat [ wikipedia.org ] someone happens to come up with .</tokentext>
<sentencetext>I laugh at how none of the requirements included disaster recovery.
No single point of failure does not preclude failing at every point simultaneously.
EMP bomb at your primary datacenter anyone?1) They never said they didn't plan for disaster recovery.
It's silly to deride them for not discussing the entirety of their backups and disaster recovery efforts when the whole topic of the article was their move to Cassandra as a primary data store.2) Disaster recovery looks at realistic threat scenarios.
Fire, sabotage, natural disaster, and so on.
"EMP bomb at your primary datacenter" is wholly unlikely.
Nobody can plan for failure at all points simultaneously because "all points" includes everything in their entire operation including backups and redundant systems.
What do you want them to do, make hourly offline backups and bury the tapes under a mountain in China?
The point of DR is to make your systems diverse, redundant, and operable against a broad category of general failures.
Not fully invulnerable to every random specific movie plot threat [wikipedia.org] someone happens to come up with.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251402</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>ryansking</author>
	<datestamp>1266923220000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>2</modscore>
	<htmltext>You're right, I failed to mention disaster recovery&ndash; it was something we looked at, its just been awhile since we went through the evaluation process, so I've forgotten  a few things.

We actually liked Cassandra for DR scenarios &ndash; the snapshot functionality makes backups relatively straight forward, plus multi-DC support will make operational continuity in the case of losing a whole DC a possibility.</htmltext>
<tokenext>You 're right , I failed to mention disaster recovery    it was something we looked at , its just been awhile since we went through the evaluation process , so I 've forgotten a few things .
We actually liked Cassandra for DR scenarios    the snapshot functionality makes backups relatively straight forward , plus multi-DC support will make operational continuity in the case of losing a whole DC a possibility .</tokentext>
<sentencetext>You're right, I failed to mention disaster recovery– it was something we looked at, its just been awhile since we went through the evaluation process, so I've forgotten  a few things.
We actually liked Cassandra for DR scenarios – the snapshot functionality makes backups relatively straight forward, plus multi-DC support will make operational continuity in the case of losing a whole DC a possibility.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250500</id>
	<title>Re:Cassandra, eh?</title>
	<author>Anonymous</author>
	<datestamp>1266920220000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>You mean the Fail Whale Watching component?</p></htmltext>
<tokenext>You mean the Fail Whale Watching component ?</tokentext>
<sentencetext>You mean the Fail Whale Watching component?</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248370</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248654</id>
	<title>Huzzzah!</title>
	<author>Anonymous</author>
	<datestamp>1266956880000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I look forward to a brand new twitter that randomly doesn't display expected data and sometimes doesn't take my status updates!</p></htmltext>
<tokenext>I look forward to a brand new twitter that randomly does n't display expected data and sometimes does n't take my status updates !</tokentext>
<sentencetext>I look forward to a brand new twitter that randomly doesn't display expected data and sometimes doesn't take my status updates!</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</id>
	<title>Twitter needs scalability experts</title>
	<author>Heretic2</author>
	<datestamp>1266918660000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>5</modscore>
	<htmltext><p>I love how ass backwards twitter has always been with learning how to scale their 90s infrastructure up. I remember when they called out the Ruby community because they didn't understand MySQL replication and memcached.</p><p>I guess without a profit model they couldn't use a real RDBMS like Oracle. EFD (Enterprise Flash Drive) support anyone?  11g supports EFD on native SSD block-levels.  Write scale?  How about 1+ million transactions/sec on a single node Oracle DB using &lt;$100K worth of equipment and licenses? Anyway, I've built HUGE databases for a long time, odds are most of you have interfaced with them. Just because it's free and open-source doesn't make it cheap.</p><p>I love FOSS don't get me wrong, but best-in-class is best-in-class.  I only use FOSS when it happens to be best-in-class.  I laugh at how none of the requirements included disaster recovery.  No single point of failure does not preclude failing at every point simultaneously.  EMP bomb at your primary datacenter anyone?</p></htmltext>
<tokenext>I love how ass backwards twitter has always been with learning how to scale their 90s infrastructure up .
I remember when they called out the Ruby community because they did n't understand MySQL replication and memcached.I guess without a profit model they could n't use a real RDBMS like Oracle .
EFD ( Enterprise Flash Drive ) support anyone ?
11g supports EFD on native SSD block-levels .
Write scale ?
How about 1 + million transactions/sec on a single node Oracle DB using I love FOSS do n't get me wrong , but best-in-class is best-in-class .
I only use FOSS when it happens to be best-in-class .
I laugh at how none of the requirements included disaster recovery .
No single point of failure does not preclude failing at every point simultaneously .
EMP bomb at your primary datacenter anyone ?</tokentext>
<sentencetext>I love how ass backwards twitter has always been with learning how to scale their 90s infrastructure up.
I remember when they called out the Ruby community because they didn't understand MySQL replication and memcached.I guess without a profit model they couldn't use a real RDBMS like Oracle.
EFD (Enterprise Flash Drive) support anyone?
11g supports EFD on native SSD block-levels.
Write scale?
How about 1+ million transactions/sec on a single node Oracle DB using I love FOSS don't get me wrong, but best-in-class is best-in-class.
I only use FOSS when it happens to be best-in-class.
I laugh at how none of the requirements included disaster recovery.
No single point of failure does not preclude failing at every point simultaneously.
EMP bomb at your primary datacenter anyone?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249014</id>
	<title>Re:And this is front page news, why?</title>
	<author>TheTyrannyOfForcedRe</author>
	<datestamp>1266957900000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>It's interesting because Twitter is one of the Big Guys and it's cool to know what the Big Guys are up to.  Also, a lot of us maintain Twitter based websites and/or apps.</htmltext>
<tokenext>It 's interesting because Twitter is one of the Big Guys and it 's cool to know what the Big Guys are up to .
Also , a lot of us maintain Twitter based websites and/or apps .</tokentext>
<sentencetext>It's interesting because Twitter is one of the Big Guys and it's cool to know what the Big Guys are up to.
Also, a lot of us maintain Twitter based websites and/or apps.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248230</id>
	<title>Don't believe them!</title>
	<author>Anonymous</author>
	<datestamp>1266955620000</datestamp>
	<modclass>Funny</modclass>
	<modscore>4</modscore>
	<htmltext>They keep saying that the Cassandra database is better, but somehow I don't believe them.  I can't imagine they know what they're talking about.  Maybe in the long-term they'll be proven right but I really don't think they are.  I don't know why, though...<p>
heh heh heh.</p></htmltext>
<tokenext>They keep saying that the Cassandra database is better , but somehow I do n't believe them .
I ca n't imagine they know what they 're talking about .
Maybe in the long-term they 'll be proven right but I really do n't think they are .
I do n't know why , though.. . heh heh heh .</tokentext>
<sentencetext>They keep saying that the Cassandra database is better, but somehow I don't believe them.
I can't imagine they know what they're talking about.
Maybe in the long-term they'll be proven right but I really don't think they are.
I don't know why, though...
heh heh heh.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31261902</id>
	<title>he he he, hit the soft spot, did I, twater?</title>
	<author>roman\_mir</author>
	<datestamp>1265137020000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>http://slashdot.org/~roman\_mir/comments - I imagine twater storm of moderation points was spent well this time, every single post I had on this issue was above 3 point and now within 1 hour, all comments were moderated down.  To me that's just funny - someone does not like the truth.</p><p>I just wonder is it the twater birds or does it have something to do with the nosql ideologists?</p></htmltext>
<tokenext>http : //slashdot.org/ ~ roman \ _mir/comments - I imagine twater storm of moderation points was spent well this time , every single post I had on this issue was above 3 point and now within 1 hour , all comments were moderated down .
To me that 's just funny - someone does not like the truth.I just wonder is it the twater birds or does it have something to do with the nosql ideologists ?</tokentext>
<sentencetext>http://slashdot.org/~roman\_mir/comments - I imagine twater storm of moderation points was spent well this time, every single post I had on this issue was above 3 point and now within 1 hour, all comments were moderated down.
To me that's just funny - someone does not like the truth.I just wonder is it the twater birds or does it have something to do with the nosql ideologists?</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252814</id>
	<title>Re:And this is front page news, why?</title>
	<author>DragonWriter</author>
	<datestamp>1266929460000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>Does Twitter really have loads which are more difficult to manage than, say, the BBC, CNN, Google, or Wikipedia?</p></div></blockquote><p>(1) In some measures , probably;<br>(2) When Google or Wikipedia makes announcements about technology (whether its a "change" or not) they use in their backend, that's usually often a front-page story on Slashdot, too. The BBC and CNN don't, AFAIK, tend to make big public announcements about back-end technology.</p><blockquote><div><p>I would have thought serving up a fairly straightforward page, a stylesheet, a background image and the tweets or twits or whatever they're called can't be that difficult compared to, say, Facebook.</p></div></blockquote><p>Processing the tweets is the big scale issue at Twitter, AFAIK, and while Facebook does something similar with its status updates, ISTR that the scale at Twitter is bigger. But its not really a big issue either way, as when Facebook talks about their technology backend, that also gets attention from Slashdot.</p></div>
	</htmltext>
<tokenext>Does Twitter really have loads which are more difficult to manage than , say , the BBC , CNN , Google , or Wikipedia ?
( 1 ) In some measures , probably ; ( 2 ) When Google or Wikipedia makes announcements about technology ( whether its a " change " or not ) they use in their backend , that 's usually often a front-page story on Slashdot , too .
The BBC and CNN do n't , AFAIK , tend to make big public announcements about back-end technology.I would have thought serving up a fairly straightforward page , a stylesheet , a background image and the tweets or twits or whatever they 're called ca n't be that difficult compared to , say , Facebook.Processing the tweets is the big scale issue at Twitter , AFAIK , and while Facebook does something similar with its status updates , ISTR that the scale at Twitter is bigger .
But its not really a big issue either way , as when Facebook talks about their technology backend , that also gets attention from Slashdot .</tokentext>
<sentencetext>Does Twitter really have loads which are more difficult to manage than, say, the BBC, CNN, Google, or Wikipedia?
(1) In some measures , probably;(2) When Google or Wikipedia makes announcements about technology (whether its a "change" or not) they use in their backend, that's usually often a front-page story on Slashdot, too.
The BBC and CNN don't, AFAIK, tend to make big public announcements about back-end technology.I would have thought serving up a fairly straightforward page, a stylesheet, a background image and the tweets or twits or whatever they're called can't be that difficult compared to, say, Facebook.Processing the tweets is the big scale issue at Twitter, AFAIK, and while Facebook does something similar with its status updates, ISTR that the scale at Twitter is bigger.
But its not really a big issue either way, as when Facebook talks about their technology backend, that also gets attention from Slashdot.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250930</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252550</id>
	<title>Re:Java / JVM Wins Again ...</title>
	<author>DragonWriter</author>
	<datestamp>1266928260000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>It's fascinating how after initially being a posterboy for the post-Java revolution Twitter is gradually moving their architecture to the JVM piece by piece.</p></div></blockquote><p>I think its fascinating, too -- but probably in a very different way than you do. You seem to think that it is a repudiation of some mythical "post-Java revolution", when in many ways I think it is a validation of exactly the approach that was common to pushing Ruby, Python, and similar languages as more agile alternatives to Java. The appeal of tools noted for their suitability for rapid development of software that works and is maintainable, even if it isn't going to set any kind of performance records, is that it supports getting new functionality (and, thus, often new <i>businesses</i>) of the ground, and supports the kind of rapid change that is often necessary when a product is first exposed to a mass market, gets used in new and unexpected (by the developers) ways, etc., and that the right time to optimize performance is often once the concept is validated, and trying to do too much of that too early means you lose agility in introduction and early development of the product.</p><p>Shifting, component by component, to more "enterprisey" solutions as a service/product matures is entirely consistent with that understanding.</p><blockquote><div><p>(where as if you talk to most of the ruby / python crowd they would rather stick toothpicks in their eyes than endorse a solution that involves java).</p></div></blockquote><p>I don't think that's particularly true. Sure, some of the people in the any language community are going to be partisans for that language exclusively, but the Ruby community (which I'm more familiar with than the Python community) seems particularly friendly to Java as a platform, and to Ruby being used in the role of a "glue" language instead of an exclusive language.</p><p>In the case of the Ruby community, I think that the appearance of anti-Java sentiment there stems largely from the the early days of <i>Rails</i>, where lots of people were pushing Rails by extolling (often in a rather hyperbolic manner) its virtues as compared to enterprise-oriented, XML-configuration-heavy, Java frameworks.</p></div>
	</htmltext>
<tokenext>It 's fascinating how after initially being a posterboy for the post-Java revolution Twitter is gradually moving their architecture to the JVM piece by piece.I think its fascinating , too -- but probably in a very different way than you do .
You seem to think that it is a repudiation of some mythical " post-Java revolution " , when in many ways I think it is a validation of exactly the approach that was common to pushing Ruby , Python , and similar languages as more agile alternatives to Java .
The appeal of tools noted for their suitability for rapid development of software that works and is maintainable , even if it is n't going to set any kind of performance records , is that it supports getting new functionality ( and , thus , often new businesses ) of the ground , and supports the kind of rapid change that is often necessary when a product is first exposed to a mass market , gets used in new and unexpected ( by the developers ) ways , etc. , and that the right time to optimize performance is often once the concept is validated , and trying to do too much of that too early means you lose agility in introduction and early development of the product.Shifting , component by component , to more " enterprisey " solutions as a service/product matures is entirely consistent with that understanding .
( where as if you talk to most of the ruby / python crowd they would rather stick toothpicks in their eyes than endorse a solution that involves java ) .I do n't think that 's particularly true .
Sure , some of the people in the any language community are going to be partisans for that language exclusively , but the Ruby community ( which I 'm more familiar with than the Python community ) seems particularly friendly to Java as a platform , and to Ruby being used in the role of a " glue " language instead of an exclusive language.In the case of the Ruby community , I think that the appearance of anti-Java sentiment there stems largely from the the early days of Rails , where lots of people were pushing Rails by extolling ( often in a rather hyperbolic manner ) its virtues as compared to enterprise-oriented , XML-configuration-heavy , Java frameworks .</tokentext>
<sentencetext>It's fascinating how after initially being a posterboy for the post-Java revolution Twitter is gradually moving their architecture to the JVM piece by piece.I think its fascinating, too -- but probably in a very different way than you do.
You seem to think that it is a repudiation of some mythical "post-Java revolution", when in many ways I think it is a validation of exactly the approach that was common to pushing Ruby, Python, and similar languages as more agile alternatives to Java.
The appeal of tools noted for their suitability for rapid development of software that works and is maintainable, even if it isn't going to set any kind of performance records, is that it supports getting new functionality (and, thus, often new businesses) of the ground, and supports the kind of rapid change that is often necessary when a product is first exposed to a mass market, gets used in new and unexpected (by the developers) ways, etc., and that the right time to optimize performance is often once the concept is validated, and trying to do too much of that too early means you lose agility in introduction and early development of the product.Shifting, component by component, to more "enterprisey" solutions as a service/product matures is entirely consistent with that understanding.
(where as if you talk to most of the ruby / python crowd they would rather stick toothpicks in their eyes than endorse a solution that involves java).I don't think that's particularly true.
Sure, some of the people in the any language community are going to be partisans for that language exclusively, but the Ruby community (which I'm more familiar with than the Python community) seems particularly friendly to Java as a platform, and to Ruby being used in the role of a "glue" language instead of an exclusive language.In the case of the Ruby community, I think that the appearance of anti-Java sentiment there stems largely from the the early days of Rails, where lots of people were pushing Rails by extolling (often in a rather hyperbolic manner) its virtues as compared to enterprise-oriented, XML-configuration-heavy, Java frameworks.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251430</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253880</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>Bazouel</author>
	<datestamp>1266934800000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I am curious what someone with your experience thinks of PostgreSQL ? Would you say that it can scale properly as Oracle does ?</p><p>This is a genuine question as I am pondering between both for my startup. Even thought I already done my investigations, one more opinion cannot hurt<nobr> <wbr></nobr>:) Assuming my current DB design holds, it will have about 50 tables, most having less than 10,000 records and some having few millions records (they will be partitioned). The volume of reads will be much higher than writes. Write queries will involve mostly 1-2 tables and short transactions. Typical read queries will require many joins (thought most can be cached or materialized as the data is quite stale).</p></htmltext>
<tokenext>I am curious what someone with your experience thinks of PostgreSQL ?
Would you say that it can scale properly as Oracle does ? This is a genuine question as I am pondering between both for my startup .
Even thought I already done my investigations , one more opinion can not hurt : ) Assuming my current DB design holds , it will have about 50 tables , most having less than 10,000 records and some having few millions records ( they will be partitioned ) .
The volume of reads will be much higher than writes .
Write queries will involve mostly 1-2 tables and short transactions .
Typical read queries will require many joins ( thought most can be cached or materialized as the data is quite stale ) .</tokentext>
<sentencetext>I am curious what someone with your experience thinks of PostgreSQL ?
Would you say that it can scale properly as Oracle does ?This is a genuine question as I am pondering between both for my startup.
Even thought I already done my investigations, one more opinion cannot hurt :) Assuming my current DB design holds, it will have about 50 tables, most having less than 10,000 records and some having few millions records (they will be partitioned).
The volume of reads will be much higher than writes.
Write queries will involve mostly 1-2 tables and short transactions.
Typical read queries will require many joins (thought most can be cached or materialized as the data is quite stale).</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252518</id>
	<title>Re:Too bad they dont about TPF/ZTPF and TPFDB/ACPD</title>
	<author>einhverfr</author>
	<datestamp>1266928140000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Teradata seems to win typical OLTP and OLAP benchmarks.  I would think for airline reservations and such that would be my choice of platform.</p></htmltext>
<tokenext>Teradata seems to win typical OLTP and OLAP benchmarks .
I would think for airline reservations and such that would be my choice of platform .</tokentext>
<sentencetext>Teradata seems to win typical OLTP and OLAP benchmarks.
I would think for airline reservations and such that would be my choice of platform.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250600</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250600</id>
	<title>Too bad they dont about TPF/ZTPF and TPFDB/ACPDB</title>
	<author>Anonymous</author>
	<datestamp>1266920640000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>It's always funny to read things written by people who obviously are inexperienced with high volume transaction processing in the mainframe environment. The systems behind airline, rail, and hotel reservations as well as emergency response messaging often are built on IBM mainframes using TPF/ZTPF as the operating system and<br>TPFDB(formerly known as ACPDB) as the underlying database. If someone would take the time to study TPFDB, they would notice its nonrelational character, as well as some interesting similarities to what the Cassandra developers unknowingly chose to do. By the way, these systems are happily handling 10K-12K transactions per second without bunny farm racks of servers.</p><p>Sometimes progress is not always about what will be done, but understanding the benefits of older things that have been done.</p></htmltext>
<tokenext>It 's always funny to read things written by people who obviously are inexperienced with high volume transaction processing in the mainframe environment .
The systems behind airline , rail , and hotel reservations as well as emergency response messaging often are built on IBM mainframes using TPF/ZTPF as the operating system andTPFDB ( formerly known as ACPDB ) as the underlying database .
If someone would take the time to study TPFDB , they would notice its nonrelational character , as well as some interesting similarities to what the Cassandra developers unknowingly chose to do .
By the way , these systems are happily handling 10K-12K transactions per second without bunny farm racks of servers.Sometimes progress is not always about what will be done , but understanding the benefits of older things that have been done .</tokentext>
<sentencetext>It's always funny to read things written by people who obviously are inexperienced with high volume transaction processing in the mainframe environment.
The systems behind airline, rail, and hotel reservations as well as emergency response messaging often are built on IBM mainframes using TPF/ZTPF as the operating system andTPFDB(formerly known as ACPDB) as the underlying database.
If someone would take the time to study TPFDB, they would notice its nonrelational character, as well as some interesting similarities to what the Cassandra developers unknowingly chose to do.
By the way, these systems are happily handling 10K-12K transactions per second without bunny farm racks of servers.Sometimes progress is not always about what will be done, but understanding the benefits of older things that have been done.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251350</id>
	<title>Re:Cassandra, eh?</title>
	<author>Hurricane78</author>
	<datestamp>1266923040000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Disastrous failure? Twitter? There&rsquo;s at least one joke in there somewhere. ^^</p></htmltext>
<tokenext>Disastrous failure ?
Twitter ? There    s at least one joke in there somewhere .
^ ^</tokentext>
<sentencetext>Disastrous failure?
Twitter? There’s at least one joke in there somewhere.
^^</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248370</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252652</id>
	<title>Open Source Parallel Databases</title>
	<author>cervo</author>
	<datestamp>1266928740000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>A lot of the complaints from NoSQL seem to be regarding DBMSses being too slow and SQL being too hard.  And yet a lot of them invent query languages/query languages similar to SQL.  Supposedly Oracle scales up really well.  There is a paper that compares mapreduce to parallel databases and Hadoop takes a huge beating via the RDBMSes in performance.  Now the funny thing is that Oracle was not included, yet most content that if you pay enough Oracle scales really well.  DB2 also scales, because in 1999 I worked at a place with terabytes of database space and they had a few nodes running DB2 on AIX boxes and seemed to be getting adequate performance.
<br> <br>
But most open sources databases seem to not be able to compete with the likes of the commercial parallel databases.  But it seems like an open source parallel database would do a lot to silence many nosql critics.  There is still the complaint about needing to define a schema, however if you are not exploring the data and are processing the same data over and over again, it seems like a good idea to define a schema anyway, that way you can better detect files that don't conform.</htmltext>
<tokenext>A lot of the complaints from NoSQL seem to be regarding DBMSses being too slow and SQL being too hard .
And yet a lot of them invent query languages/query languages similar to SQL .
Supposedly Oracle scales up really well .
There is a paper that compares mapreduce to parallel databases and Hadoop takes a huge beating via the RDBMSes in performance .
Now the funny thing is that Oracle was not included , yet most content that if you pay enough Oracle scales really well .
DB2 also scales , because in 1999 I worked at a place with terabytes of database space and they had a few nodes running DB2 on AIX boxes and seemed to be getting adequate performance .
But most open sources databases seem to not be able to compete with the likes of the commercial parallel databases .
But it seems like an open source parallel database would do a lot to silence many nosql critics .
There is still the complaint about needing to define a schema , however if you are not exploring the data and are processing the same data over and over again , it seems like a good idea to define a schema anyway , that way you can better detect files that do n't conform .</tokentext>
<sentencetext>A lot of the complaints from NoSQL seem to be regarding DBMSses being too slow and SQL being too hard.
And yet a lot of them invent query languages/query languages similar to SQL.
Supposedly Oracle scales up really well.
There is a paper that compares mapreduce to parallel databases and Hadoop takes a huge beating via the RDBMSes in performance.
Now the funny thing is that Oracle was not included, yet most content that if you pay enough Oracle scales really well.
DB2 also scales, because in 1999 I worked at a place with terabytes of database space and they had a few nodes running DB2 on AIX boxes and seemed to be getting adequate performance.
But most open sources databases seem to not be able to compete with the likes of the commercial parallel databases.
But it seems like an open source parallel database would do a lot to silence many nosql critics.
There is still the complaint about needing to define a schema, however if you are not exploring the data and are processing the same data over and over again, it seems like a good idea to define a schema anyway, that way you can better detect files that don't conform.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249390</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>Anonymous</author>
	<datestamp>1266915840000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>Is there really a huge issue with rdbms speeds?  Well if there is something there, that's what needs to be looked at.  If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.</p></div><p>Surely that's the point. It isn't possible to practically scale RDBMSs up to the sort of scale you need for a huge website such as Amazon. The requirement to continue to meet all of the constraints of the relational model makes it very hard to split databases over a large cluster without a lock-bound hell. There are two solutions to this - either you spend a vast amount of effort trying to get the relational model to scale a bit, or you bite the bullet and relax the relational model's constraints.</p><p>Don't get me wrong - there are good reasons why the relational model has constraints in the data model to ensure ACID qualities. However beyond a certain point it is easier to deal with the problems that come from using a different model than it is to stretch a conventional RDBMs and deal with the problems of keeping multiple distributed copies of data consistent.</p><p>Take the collection of user reviews and product pictures on a large site like Amazon. Does this need the analytical power of a RDBMS? No. Does it need something a lot more advanced then "flat files or an in memory hash-map" in order to scale to heavy loads across multiple continents? Yes. That's the sort of thing NoSQL databases are working on.</p><p>In general your attitude reminds me of the people who thought personal computers would always be toys. "Proper work" would be done on mainframes/supercomputers and trivial office tasks may as well be done on paper. Well, mainframes / supercomputers are still faster than personal computers, but few people would claim the PC had no impact on the office.</p></div>
	</htmltext>
<tokenext>Is there really a huge issue with rdbms speeds ?
Well if there is something there , that 's what needs to be looked at .
If RDBMSs are not fast enough , that 's just an opportunity to work more on them to speed them up.Surely that 's the point .
It is n't possible to practically scale RDBMSs up to the sort of scale you need for a huge website such as Amazon .
The requirement to continue to meet all of the constraints of the relational model makes it very hard to split databases over a large cluster without a lock-bound hell .
There are two solutions to this - either you spend a vast amount of effort trying to get the relational model to scale a bit , or you bite the bullet and relax the relational model 's constraints.Do n't get me wrong - there are good reasons why the relational model has constraints in the data model to ensure ACID qualities .
However beyond a certain point it is easier to deal with the problems that come from using a different model than it is to stretch a conventional RDBMs and deal with the problems of keeping multiple distributed copies of data consistent.Take the collection of user reviews and product pictures on a large site like Amazon .
Does this need the analytical power of a RDBMS ?
No. Does it need something a lot more advanced then " flat files or an in memory hash-map " in order to scale to heavy loads across multiple continents ?
Yes. That 's the sort of thing NoSQL databases are working on.In general your attitude reminds me of the people who thought personal computers would always be toys .
" Proper work " would be done on mainframes/supercomputers and trivial office tasks may as well be done on paper .
Well , mainframes / supercomputers are still faster than personal computers , but few people would claim the PC had no impact on the office .</tokentext>
<sentencetext>Is there really a huge issue with rdbms speeds?
Well if there is something there, that's what needs to be looked at.
If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.Surely that's the point.
It isn't possible to practically scale RDBMSs up to the sort of scale you need for a huge website such as Amazon.
The requirement to continue to meet all of the constraints of the relational model makes it very hard to split databases over a large cluster without a lock-bound hell.
There are two solutions to this - either you spend a vast amount of effort trying to get the relational model to scale a bit, or you bite the bullet and relax the relational model's constraints.Don't get me wrong - there are good reasons why the relational model has constraints in the data model to ensure ACID qualities.
However beyond a certain point it is easier to deal with the problems that come from using a different model than it is to stretch a conventional RDBMs and deal with the problems of keeping multiple distributed copies of data consistent.Take the collection of user reviews and product pictures on a large site like Amazon.
Does this need the analytical power of a RDBMS?
No. Does it need something a lot more advanced then "flat files or an in memory hash-map" in order to scale to heavy loads across multiple continents?
Yes. That's the sort of thing NoSQL databases are working on.In general your attitude reminds me of the people who thought personal computers would always be toys.
"Proper work" would be done on mainframes/supercomputers and trivial office tasks may as well be done on paper.
Well, mainframes / supercomputers are still faster than personal computers, but few people would claim the PC had no impact on the office.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252274</id>
	<title>Re:Intersystems Cach&#233;</title>
	<author>Anonymous</author>
	<datestamp>1266926940000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>How does the write performance scale across several servers ("scale horizontally")? That is what NoSQL is all about.</p></htmltext>
<tokenext>How does the write performance scale across several servers ( " scale horizontally " ) ?
That is what NoSQL is all about .</tokentext>
<sentencetext>How does the write performance scale across several servers ("scale horizontally")?
That is what NoSQL is all about.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249600</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253252</id>
	<title>Re:network issues?</title>
	<author>geniusj</author>
	<datestamp>1266931560000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I haven't checked, but I'd bet that BinaryMemtable uses UDP, when combined with the fast speed, could easily cause significant network saturation..</p></htmltext>
<tokenext>I have n't checked , but I 'd bet that BinaryMemtable uses UDP , when combined with the fast speed , could easily cause significant network saturation. .</tokentext>
<sentencetext>I haven't checked, but I'd bet that BinaryMemtable uses UDP, when combined with the fast speed, could easily cause significant network saturation..</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249938</id>
	<title>Re:Cassandra, eh?</title>
	<author>idontgno</author>
	<datestamp>1266918060000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>And, of course, when the system failure strikes, Cassandra will be blamed, not the underlying issues Cassandra warned of.</htmltext>
<tokenext>And , of course , when the system failure strikes , Cassandra will be blamed , not the underlying issues Cassandra warned of .</tokentext>
<sentencetext>And, of course, when the system failure strikes, Cassandra will be blamed, not the underlying issues Cassandra warned of.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248370</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251816</id>
	<title>Re:Too bad they dont about TPF/ZTPF and TPFDB/ACPD</title>
	<author>Anonymous</author>
	<datestamp>1266924780000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Nice point.  Thanks for this.  Data processing/transaction is not really my area of expertise, but I've always worked with the thought that nothing I'm doing is new on a technical level.  This goes to show it.  What the F/OSS community should focus on, be it through research groups is the human computer interaction.  This is a relatively new field of study - maybe 20 years old, and there's a lot less catch-up.  My conspiracy theory hat of yester-year would probably take a stab that this is why oracle cut funding to the accessibility projects of sun/gnome.  Just to extend the gap between free and commercial HCI offerings.</p></htmltext>
<tokenext>Nice point .
Thanks for this .
Data processing/transaction is not really my area of expertise , but I 've always worked with the thought that nothing I 'm doing is new on a technical level .
This goes to show it .
What the F/OSS community should focus on , be it through research groups is the human computer interaction .
This is a relatively new field of study - maybe 20 years old , and there 's a lot less catch-up .
My conspiracy theory hat of yester-year would probably take a stab that this is why oracle cut funding to the accessibility projects of sun/gnome .
Just to extend the gap between free and commercial HCI offerings .</tokentext>
<sentencetext>Nice point.
Thanks for this.
Data processing/transaction is not really my area of expertise, but I've always worked with the thought that nothing I'm doing is new on a technical level.
This goes to show it.
What the F/OSS community should focus on, be it through research groups is the human computer interaction.
This is a relatively new field of study - maybe 20 years old, and there's a lot less catch-up.
My conspiracy theory hat of yester-year would probably take a stab that this is why oracle cut funding to the accessibility projects of sun/gnome.
Just to extend the gap between free and commercial HCI offerings.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250600</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250684</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>mini me</author>
	<datestamp>1266920880000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>They are seeing about 1/2 million transactions per second with this setup based on the information given, but no word of what their cluster consists of. If it is just a handful of generic PCs, $100,000 for your setup looks pretty expensive.</p></htmltext>
<tokenext>They are seeing about 1/2 million transactions per second with this setup based on the information given , but no word of what their cluster consists of .
If it is just a handful of generic PCs , $ 100,000 for your setup looks pretty expensive .</tokentext>
<sentencetext>They are seeing about 1/2 million transactions per second with this setup based on the information given, but no word of what their cluster consists of.
If it is just a handful of generic PCs, $100,000 for your setup looks pretty expensive.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252746</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>DragonWriter</author>
	<datestamp>1266929220000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>I think their point is not everything needs an RDBMS, whereas before it was the 'go to' method of storing data.</p><p>Except, of course, that it never was the "go to" method of storing data. There was no point in history where RDBMS's were anywhere close to the exclusive method of persisting data. Non-relational document-oriented storage has pretty much always dominated in the era in which relational databases existed, whether it was proprietary binary document formats, fairly direct text-based document formats, or highly structures (XML, etc.) text-based document formats.</p></div></blockquote></div>
	</htmltext>
<tokenext>I think their point is not everything needs an RDBMS , whereas before it was the 'go to ' method of storing data.Except , of course , that it never was the " go to " method of storing data .
There was no point in history where RDBMS 's were anywhere close to the exclusive method of persisting data .
Non-relational document-oriented storage has pretty much always dominated in the era in which relational databases existed , whether it was proprietary binary document formats , fairly direct text-based document formats , or highly structures ( XML , etc .
) text-based document formats .</tokentext>
<sentencetext>I think their point is not everything needs an RDBMS, whereas before it was the 'go to' method of storing data.Except, of course, that it never was the "go to" method of storing data.
There was no point in history where RDBMS's were anywhere close to the exclusive method of persisting data.
Non-relational document-oriented storage has pretty much always dominated in the era in which relational databases existed, whether it was proprietary binary document formats, fairly direct text-based document formats, or highly structures (XML, etc.
) text-based document formats.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249250</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255304</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>artsrc</author>
	<datestamp>1266944580000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><tt>&gt; The more interesting aspect of all of this 'NoSQL' movement is how they believe that if they achieve some speed improvement against some relational databases, how that makes them so much better.<br><br>Or the most interesting aspect of the NoSQL movement is that many of the most successful web companies have rejected the SQL orthodoxy and achieved great success.&nbsp; As someone in a conservative, SQL only, environment this is interesting.<br><br>&gt; Is there really a huge issue with rdbms speeds?<br><br>There has always been issues with database speed, we have plenty.&nbsp; Some are best solved by adding an index, caching some results or re-writing a query.&nbsp; Some might be best solved by switching to Cassandra or using the file system.<br></tt></htmltext>
<tokenext>&gt; The more interesting aspect of all of this 'NoSQL ' movement is how they believe that if they achieve some speed improvement against some relational databases , how that makes them so much better.Or the most interesting aspect of the NoSQL movement is that many of the most successful web companies have rejected the SQL orthodoxy and achieved great success.   As someone in a conservative , SQL only , environment this is interesting. &gt; Is there really a huge issue with rdbms speeds ? There has always been issues with database speed , we have plenty.   Some are best solved by adding an index , caching some results or re-writing a query.   Some might be best solved by switching to Cassandra or using the file system .</tokentext>
<sentencetext>&gt; The more interesting aspect of all of this 'NoSQL' movement is how they believe that if they achieve some speed improvement against some relational databases, how that makes them so much better.Or the most interesting aspect of the NoSQL movement is that many of the most successful web companies have rejected the SQL orthodoxy and achieved great success.  As someone in a conservative, SQL only, environment this is interesting.&gt; Is there really a huge issue with rdbms speeds?There has always been issues with database speed, we have plenty.  Some are best solved by adding an index, caching some results or re-writing a query.  Some might be best solved by switching to Cassandra or using the file system.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250970</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>roman\_mir</author>
	<datestamp>1266921780000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>1</modscore>
	<htmltext><p>You know, the truth is, most data is still stored in individual files, not in databases.  So RDBMSs were always a very niche thing used for projects because they are understood and it's easier to develop for them if you really have massive data requirements.</p><p>Files - that's what many projects even today use, not databases.  This is basically what they are going back to - files with whatever window dressing on top - a facade of hashes, it's all key/value pairs.  It is, my friends, the old old idea of property files.</p><p>I mean, really, I wrote a system in August that uses property files for storage as a database.  Property file as a database - because it works.  But that's a storage method.  So in the NoSQL space they also do clustering by replication across nodes, but it does not really matter much if the data is all the same on all nodes.</p><p>But you can do the same with an RDBMS, really, you can skip the principles of ACID and replicate across nodes and hope that it's good enough.  Maybe the implementation for things like 'Cassandra' allows faster replication than what is normally done in an RDBMS, but just you wait and see how the RDBMSs of tomorrow provide a few flags to do the same thing in some 'partial ACID mode' with quick replication.</p><p>This is intended for applications that do not really care about consistency of data - Google does not care.  Twewter does not care.  Amazon has to jump through more hoops I am sure than Tweeter, because real money is involved.</p></htmltext>
<tokenext>You know , the truth is , most data is still stored in individual files , not in databases .
So RDBMSs were always a very niche thing used for projects because they are understood and it 's easier to develop for them if you really have massive data requirements.Files - that 's what many projects even today use , not databases .
This is basically what they are going back to - files with whatever window dressing on top - a facade of hashes , it 's all key/value pairs .
It is , my friends , the old old idea of property files.I mean , really , I wrote a system in August that uses property files for storage as a database .
Property file as a database - because it works .
But that 's a storage method .
So in the NoSQL space they also do clustering by replication across nodes , but it does not really matter much if the data is all the same on all nodes.But you can do the same with an RDBMS , really , you can skip the principles of ACID and replicate across nodes and hope that it 's good enough .
Maybe the implementation for things like 'Cassandra ' allows faster replication than what is normally done in an RDBMS , but just you wait and see how the RDBMSs of tomorrow provide a few flags to do the same thing in some 'partial ACID mode ' with quick replication.This is intended for applications that do not really care about consistency of data - Google does not care .
Twewter does not care .
Amazon has to jump through more hoops I am sure than Tweeter , because real money is involved .</tokentext>
<sentencetext>You know, the truth is, most data is still stored in individual files, not in databases.
So RDBMSs were always a very niche thing used for projects because they are understood and it's easier to develop for them if you really have massive data requirements.Files - that's what many projects even today use, not databases.
This is basically what they are going back to - files with whatever window dressing on top - a facade of hashes, it's all key/value pairs.
It is, my friends, the old old idea of property files.I mean, really, I wrote a system in August that uses property files for storage as a database.
Property file as a database - because it works.
But that's a storage method.
So in the NoSQL space they also do clustering by replication across nodes, but it does not really matter much if the data is all the same on all nodes.But you can do the same with an RDBMS, really, you can skip the principles of ACID and replicate across nodes and hope that it's good enough.
Maybe the implementation for things like 'Cassandra' allows faster replication than what is normally done in an RDBMS, but just you wait and see how the RDBMSs of tomorrow provide a few flags to do the same thing in some 'partial ACID mode' with quick replication.This is intended for applications that do not really care about consistency of data - Google does not care.
Twewter does not care.
Amazon has to jump through more hoops I am sure than Tweeter, because real money is involved.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249250</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31254744</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>Anonymous</author>
	<datestamp>1266940500000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext>A good commenting system is hierarchical.  Guess what? Hierarchical databases existed years before SQL.</htmltext>
<tokenext>A good commenting system is hierarchical .
Guess what ?
Hierarchical databases existed years before SQL .</tokentext>
<sentencetext>A good commenting system is hierarchical.
Guess what?
Hierarchical databases existed years before SQL.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251480</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248994</id>
	<title>Re:And this is front page news, why?</title>
	<author>Anonymous</author>
	<datestamp>1266957840000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>5</modscore>
	<htmltext><p>Scaling. If something turns out to be robust and fast enough for Twitter, it is definitely of interest to anyone working on significantly large and busy websites.</p></htmltext>
<tokenext>Scaling .
If something turns out to be robust and fast enough for Twitter , it is definitely of interest to anyone working on significantly large and busy websites .</tokentext>
<sentencetext>Scaling.
If something turns out to be robust and fast enough for Twitter, it is definitely of interest to anyone working on significantly large and busy websites.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249142</id>
	<title>Re:Don't believe them!</title>
	<author>mariushm</author>
	<datestamp>1266958260000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>For some reason my mind went to Cassandra Crossing (http://en.wikipedia.org/wiki/The\_Cassandra\_Crossing)</p></htmltext>
<tokenext>For some reason my mind went to Cassandra Crossing ( http : //en.wikipedia.org/wiki/The \ _Cassandra \ _Crossing )</tokentext>
<sentencetext>For some reason my mind went to Cassandra Crossing (http://en.wikipedia.org/wiki/The\_Cassandra\_Crossing)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248230</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31263688</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>magus\_melchior</author>
	<datestamp>1265101380000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>EMP bomb at your primary datacenter anyone?</p></div></blockquote><p>I'm pretty sure that's what Faraday cages are for. I know that EMP bombs (AKA nuke detonation in the upper atmosphere) is a favorite doomsday scenario, but with the right electrical hardening (re: Switzerland), they're pretty easy to defend against.</p><p>Now, fires (and by "fire" I mean something like "thermite"), floods, dirty bombs, and earthquakes, on the other hands...</p></div>
	</htmltext>
<tokenext>EMP bomb at your primary datacenter anyone ? I 'm pretty sure that 's what Faraday cages are for .
I know that EMP bombs ( AKA nuke detonation in the upper atmosphere ) is a favorite doomsday scenario , but with the right electrical hardening ( re : Switzerland ) , they 're pretty easy to defend against.Now , fires ( and by " fire " I mean something like " thermite " ) , floods , dirty bombs , and earthquakes , on the other hands.. .</tokentext>
<sentencetext>EMP bomb at your primary datacenter anyone?I'm pretty sure that's what Faraday cages are for.
I know that EMP bombs (AKA nuke detonation in the upper atmosphere) is a favorite doomsday scenario, but with the right electrical hardening (re: Switzerland), they're pretty easy to defend against.Now, fires (and by "fire" I mean something like "thermite"), floods, dirty bombs, and earthquakes, on the other hands...
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</id>
	<title>pfffft twatter tweeter</title>
	<author>roman\_mir</author>
	<datestamp>1266957900000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>2</modscore>
	<htmltext><p>who cares what twuufter is running off.</p><p>The more interesting aspect of all of this 'NoSQL' movement is how they believe that if they achieve some speed improvement against some relational databases, how that makes them so much better.</p><p>If you don't really need a database to run your 'website', then who cares if you use flat files or an in memory hashmap for all your data needs?  Databases are not being replaced by NoSQL in projects that need databases. The projects that may not have ever needed databases may benefit by this NoSQL idea, but if you actually need a database... well, you better be really good at working around all kinds of problems that this will create for you.</p><p>I think that relational databases are good at what they do and that many projects may not need them, but if you do need them on the back end, you will end up with them on the back end.  Of-course there maybe some caching/hashmaps/files on the front end but at the back stuff will be sorted out within a real datastore.</p><p>Is there really a huge issue with rdbms speeds?  Well if there is something there, that's what needs to be looked at.  If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.</p></htmltext>
<tokenext>who cares what twuufter is running off.The more interesting aspect of all of this 'NoSQL ' movement is how they believe that if they achieve some speed improvement against some relational databases , how that makes them so much better.If you do n't really need a database to run your 'website ' , then who cares if you use flat files or an in memory hashmap for all your data needs ?
Databases are not being replaced by NoSQL in projects that need databases .
The projects that may not have ever needed databases may benefit by this NoSQL idea , but if you actually need a database... well , you better be really good at working around all kinds of problems that this will create for you.I think that relational databases are good at what they do and that many projects may not need them , but if you do need them on the back end , you will end up with them on the back end .
Of-course there maybe some caching/hashmaps/files on the front end but at the back stuff will be sorted out within a real datastore.Is there really a huge issue with rdbms speeds ?
Well if there is something there , that 's what needs to be looked at .
If RDBMSs are not fast enough , that 's just an opportunity to work more on them to speed them up .</tokentext>
<sentencetext>who cares what twuufter is running off.The more interesting aspect of all of this 'NoSQL' movement is how they believe that if they achieve some speed improvement against some relational databases, how that makes them so much better.If you don't really need a database to run your 'website', then who cares if you use flat files or an in memory hashmap for all your data needs?
Databases are not being replaced by NoSQL in projects that need databases.
The projects that may not have ever needed databases may benefit by this NoSQL idea, but if you actually need a database... well, you better be really good at working around all kinds of problems that this will create for you.I think that relational databases are good at what they do and that many projects may not need them, but if you do need them on the back end, you will end up with them on the back end.
Of-course there maybe some caching/hashmaps/files on the front end but at the back stuff will be sorted out within a real datastore.Is there really a huge issue with rdbms speeds?
Well if there is something there, that's what needs to be looked at.
If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250618</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>Anonymous</author>
	<datestamp>1266920700000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>...real RDBMS like Oracle...</p></div></blockquote><p>

Holy fuck, the right tool for the right job, please? Oracle does somethings for some markets <i>really well</i> but for the rest of us who don't need such a high degree of transactional safety that $90k + two-node RAC price tag might just end up taking your great web 3.0 business through development, maybe early beta before you begin liquidating assets. That's per-processor licensing too on a database that scales <i>vertically</i> well (very well really) but not horizontally well (sharding anyone?) so the better your project does the more processor licenses you'll be looking at, and using higher-priced hardware to do it too (because cheap boxes scale best <i>horizontally</i>).<br> <br>
So sure, if you can afford it get the big iron and with any luck your industry works with the kind of margins you'll never even need to know the cost of going that way. Personally, after having done the Sun/Oracle thing I hope to never find myself sitting at a business meeting trying to figure out how we can meet capacity demands after we've run out of money paying for high priced hardware and license fees.
<br> <br>
I'm glad products like Oracles exist, even somewhat impressed by them, but not every project will need them and there are very real cost considerations that should also be taken into account. Know your business and for the love of God, do a thorough survey of all the available tools before you commit to one.</p></div>
	</htmltext>
<tokenext>...real RDBMS like Oracle.. . Holy fuck , the right tool for the right job , please ?
Oracle does somethings for some markets really well but for the rest of us who do n't need such a high degree of transactional safety that $ 90k + two-node RAC price tag might just end up taking your great web 3.0 business through development , maybe early beta before you begin liquidating assets .
That 's per-processor licensing too on a database that scales vertically well ( very well really ) but not horizontally well ( sharding anyone ?
) so the better your project does the more processor licenses you 'll be looking at , and using higher-priced hardware to do it too ( because cheap boxes scale best horizontally ) .
So sure , if you can afford it get the big iron and with any luck your industry works with the kind of margins you 'll never even need to know the cost of going that way .
Personally , after having done the Sun/Oracle thing I hope to never find myself sitting at a business meeting trying to figure out how we can meet capacity demands after we 've run out of money paying for high priced hardware and license fees .
I 'm glad products like Oracles exist , even somewhat impressed by them , but not every project will need them and there are very real cost considerations that should also be taken into account .
Know your business and for the love of God , do a thorough survey of all the available tools before you commit to one .</tokentext>
<sentencetext>...real RDBMS like Oracle...

Holy fuck, the right tool for the right job, please?
Oracle does somethings for some markets really well but for the rest of us who don't need such a high degree of transactional safety that $90k + two-node RAC price tag might just end up taking your great web 3.0 business through development, maybe early beta before you begin liquidating assets.
That's per-processor licensing too on a database that scales vertically well (very well really) but not horizontally well (sharding anyone?
) so the better your project does the more processor licenses you'll be looking at, and using higher-priced hardware to do it too (because cheap boxes scale best horizontally).
So sure, if you can afford it get the big iron and with any luck your industry works with the kind of margins you'll never even need to know the cost of going that way.
Personally, after having done the Sun/Oracle thing I hope to never find myself sitting at a business meeting trying to figure out how we can meet capacity demands after we've run out of money paying for high priced hardware and license fees.
I'm glad products like Oracles exist, even somewhat impressed by them, but not every project will need them and there are very real cost considerations that should also be taken into account.
Know your business and for the love of God, do a thorough survey of all the available tools before you commit to one.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249160</id>
	<title>I'm Reluctant</title>
	<author>Anonymous</author>
	<datestamp>1266958260000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>1</modscore>
	<htmltext><p>I'm reluctant to believe that Twitter is a good technology bellwether. Twitter seems to have so many technology issues, fail whales, outages, security breeches...</p><p>SO, I'm left wondering; what does this move say? Does it say that Cassandra is so bad that Twitter is using it? Or does it say that a fail whale population boom is imminent?</p></htmltext>
<tokenext>I 'm reluctant to believe that Twitter is a good technology bellwether .
Twitter seems to have so many technology issues , fail whales , outages , security breeches...SO , I 'm left wondering ; what does this move say ?
Does it say that Cassandra is so bad that Twitter is using it ?
Or does it say that a fail whale population boom is imminent ?</tokentext>
<sentencetext>I'm reluctant to believe that Twitter is a good technology bellwether.
Twitter seems to have so many technology issues, fail whales, outages, security breeches...SO, I'm left wondering; what does this move say?
Does it say that Cassandra is so bad that Twitter is using it?
Or does it say that a fail whale population boom is imminent?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253742</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>Anonymous</author>
	<datestamp>1266933960000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>I lost interest in Oracle when I had to develop for it and learned that it can't support index names of more than 32 characters in length. Oh, and needing a degree to figure out licensing costs doesn't help either; best hope for a non-crooked VAR to set you straight.</htmltext>
<tokenext>I lost interest in Oracle when I had to develop for it and learned that it ca n't support index names of more than 32 characters in length .
Oh , and needing a degree to figure out licensing costs does n't help either ; best hope for a non-crooked VAR to set you straight .</tokentext>
<sentencetext>I lost interest in Oracle when I had to develop for it and learned that it can't support index names of more than 32 characters in length.
Oh, and needing a degree to figure out licensing costs doesn't help either; best hope for a non-crooked VAR to set you straight.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255622</id>
	<title>Re:Open Source Parallel Databases</title>
	<author>maraist</author>
	<datestamp>1266947700000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>[complaining that] "SQL being too hard"? Well, one can assume you can ignore this class of amateurs - there's no lack of free learning tools for SQL - and it's dirt simple.<br> <br>"And yet a lot of them invent query languages/query languages similar to SQL. "  - See, I think you're magically associating two classes of programmers.  There are people, like myself that love the expressiveness of SQL over virtually any other language for data-set manipulation. Thus we would like as an optional to utilize SQL on even a simple key-value store.  And there are tons of noSQL solutions that provide SQL front ends (hell, there are SQL front-ends to CSV stores).  You typically lose efficiency at this point, but for rarely run reports, the lack of bugs makes it worth it.<br> <br>" Hadoop takes a huge beating via the RDBMSes in performance"  - By Hadoop, I assume you mean HBase which runs on top of several layers of technologies - the lowest of which is Hadoop.  Naturally this layering produces inefficiencies.  Consequently, things like HyperTable came about as functional equivalents of HBase without all the layers (and written in raw C I might add).  When people say scales well, they typically mean runs slowly on a given node..  And thus something like HBase requires several dozen machines before it can overtake an optimized single-node (mysql/Oracle/what-have-you). Then when you jack up the performance of the single node-cluster (Oracle RAC), you need a lot more machines before you can overtake.  This may not make sense for 90\% of companies out there - having 1,000 machines just isn't practical and the maintenance costs will be killer in year-2.  For Google it made absolute sense..  They simply can't make a single DB configuration go fast enough.  And therein is the driving model that noSQL is trying to replicate.<br> <br>"Now the funny thing is that Oracle was not included" - yeah funny how Oracle has a clause in their license that says you may NEVER publish performance results.. Guess why.. Makes it easier to suck suckers in to paying $100k, only to find that a mysql setup is faster on the same hardware.  Yes, you can spend $200k on Oracle and have it faster than mysql will ever be, but you didn't budge for that when you were suckered in.<br> <br>"1999 I worked at a place with terabytes of database space"  - A peta-byte of archive data is not the same as a 100 gigabyte of actively manipulated data when you can only get 32Gig of RAM on the box (such that any random indexed lookup is almost guaranteed to hit the disk).  It's somewhat easy to add disks to a virtual server cloud.. Use iSCSI via 100 mounted partitions into a tablespace that spans all 100 partitions (in linearlyly appended mode) - you can do this with mysql today. Not sure what the limits of LVM are, but you could do it that way too.  Not too expensive either if you use cheap 2TB disks in sufficiently raided configurations.  This use to be mainframe class (tons of IO with fully redundant hardware was their mantra).  But my point is there are problem-spaces that make this not scale with RDBMS - unless you treat the offending table as a simple key-value store, such that you can shard it - thereby not properly utilizing the RDBMS.<br> <br>"But it seems like an open source parallel database would do a lot to silence many nosql critics" - you're not going to silence people that think of data as simple key-value pairs, or highly specialized full-text-searching (which is related to but independent of RDBMS activity).  Or even as log-file-processing (such as apache page-view reporting).  These are things that RDBMS isn't the best solution for.  It CAN do these things, which is why it's become the multi-use hammer.  But, when batch processing a 10 million records a day, I have the choice of having a 30 minute load time in the RDBMS and a pretty heafty sustained load over several hours (due to the random-seeks that can't fit in memory).  Or I can just store to a flat text file CSV, and maintain a cursor (I mean file-handle) to the last read item, and both load and process the entire</htmltext>
<tokenext>[ complaining that ] " SQL being too hard " ?
Well , one can assume you can ignore this class of amateurs - there 's no lack of free learning tools for SQL - and it 's dirt simple .
" And yet a lot of them invent query languages/query languages similar to SQL .
" - See , I think you 're magically associating two classes of programmers .
There are people , like myself that love the expressiveness of SQL over virtually any other language for data-set manipulation .
Thus we would like as an optional to utilize SQL on even a simple key-value store .
And there are tons of noSQL solutions that provide SQL front ends ( hell , there are SQL front-ends to CSV stores ) .
You typically lose efficiency at this point , but for rarely run reports , the lack of bugs makes it worth it .
" Hadoop takes a huge beating via the RDBMSes in performance " - By Hadoop , I assume you mean HBase which runs on top of several layers of technologies - the lowest of which is Hadoop .
Naturally this layering produces inefficiencies .
Consequently , things like HyperTable came about as functional equivalents of HBase without all the layers ( and written in raw C I might add ) .
When people say scales well , they typically mean runs slowly on a given node.. And thus something like HBase requires several dozen machines before it can overtake an optimized single-node ( mysql/Oracle/what-have-you ) .
Then when you jack up the performance of the single node-cluster ( Oracle RAC ) , you need a lot more machines before you can overtake .
This may not make sense for 90 \ % of companies out there - having 1,000 machines just is n't practical and the maintenance costs will be killer in year-2 .
For Google it made absolute sense.. They simply ca n't make a single DB configuration go fast enough .
And therein is the driving model that noSQL is trying to replicate .
" Now the funny thing is that Oracle was not included " - yeah funny how Oracle has a clause in their license that says you may NEVER publish performance results.. Guess why.. Makes it easier to suck suckers in to paying $ 100k , only to find that a mysql setup is faster on the same hardware .
Yes , you can spend $ 200k on Oracle and have it faster than mysql will ever be , but you did n't budge for that when you were suckered in .
" 1999 I worked at a place with terabytes of database space " - A peta-byte of archive data is not the same as a 100 gigabyte of actively manipulated data when you can only get 32Gig of RAM on the box ( such that any random indexed lookup is almost guaranteed to hit the disk ) .
It 's somewhat easy to add disks to a virtual server cloud.. Use iSCSI via 100 mounted partitions into a tablespace that spans all 100 partitions ( in linearlyly appended mode ) - you can do this with mysql today .
Not sure what the limits of LVM are , but you could do it that way too .
Not too expensive either if you use cheap 2TB disks in sufficiently raided configurations .
This use to be mainframe class ( tons of IO with fully redundant hardware was their mantra ) .
But my point is there are problem-spaces that make this not scale with RDBMS - unless you treat the offending table as a simple key-value store , such that you can shard it - thereby not properly utilizing the RDBMS .
" But it seems like an open source parallel database would do a lot to silence many nosql critics " - you 're not going to silence people that think of data as simple key-value pairs , or highly specialized full-text-searching ( which is related to but independent of RDBMS activity ) .
Or even as log-file-processing ( such as apache page-view reporting ) .
These are things that RDBMS is n't the best solution for .
It CAN do these things , which is why it 's become the multi-use hammer .
But , when batch processing a 10 million records a day , I have the choice of having a 30 minute load time in the RDBMS and a pretty heafty sustained load over several hours ( due to the random-seeks that ca n't fit in memory ) .
Or I can just store to a flat text file CSV , and maintain a cursor ( I mean file-handle ) to the last read item , and both load and process the entire</tokentext>
<sentencetext>[complaining that] "SQL being too hard"?
Well, one can assume you can ignore this class of amateurs - there's no lack of free learning tools for SQL - and it's dirt simple.
"And yet a lot of them invent query languages/query languages similar to SQL.
"  - See, I think you're magically associating two classes of programmers.
There are people, like myself that love the expressiveness of SQL over virtually any other language for data-set manipulation.
Thus we would like as an optional to utilize SQL on even a simple key-value store.
And there are tons of noSQL solutions that provide SQL front ends (hell, there are SQL front-ends to CSV stores).
You typically lose efficiency at this point, but for rarely run reports, the lack of bugs makes it worth it.
" Hadoop takes a huge beating via the RDBMSes in performance"  - By Hadoop, I assume you mean HBase which runs on top of several layers of technologies - the lowest of which is Hadoop.
Naturally this layering produces inefficiencies.
Consequently, things like HyperTable came about as functional equivalents of HBase without all the layers (and written in raw C I might add).
When people say scales well, they typically mean runs slowly on a given node..  And thus something like HBase requires several dozen machines before it can overtake an optimized single-node (mysql/Oracle/what-have-you).
Then when you jack up the performance of the single node-cluster (Oracle RAC), you need a lot more machines before you can overtake.
This may not make sense for 90\% of companies out there - having 1,000 machines just isn't practical and the maintenance costs will be killer in year-2.
For Google it made absolute sense..  They simply can't make a single DB configuration go fast enough.
And therein is the driving model that noSQL is trying to replicate.
"Now the funny thing is that Oracle was not included" - yeah funny how Oracle has a clause in their license that says you may NEVER publish performance results.. Guess why.. Makes it easier to suck suckers in to paying $100k, only to find that a mysql setup is faster on the same hardware.
Yes, you can spend $200k on Oracle and have it faster than mysql will ever be, but you didn't budge for that when you were suckered in.
"1999 I worked at a place with terabytes of database space"  - A peta-byte of archive data is not the same as a 100 gigabyte of actively manipulated data when you can only get 32Gig of RAM on the box (such that any random indexed lookup is almost guaranteed to hit the disk).
It's somewhat easy to add disks to a virtual server cloud.. Use iSCSI via 100 mounted partitions into a tablespace that spans all 100 partitions (in linearlyly appended mode) - you can do this with mysql today.
Not sure what the limits of LVM are, but you could do it that way too.
Not too expensive either if you use cheap 2TB disks in sufficiently raided configurations.
This use to be mainframe class (tons of IO with fully redundant hardware was their mantra).
But my point is there are problem-spaces that make this not scale with RDBMS - unless you treat the offending table as a simple key-value store, such that you can shard it - thereby not properly utilizing the RDBMS.
"But it seems like an open source parallel database would do a lot to silence many nosql critics" - you're not going to silence people that think of data as simple key-value pairs, or highly specialized full-text-searching (which is related to but independent of RDBMS activity).
Or even as log-file-processing (such as apache page-view reporting).
These are things that RDBMS isn't the best solution for.
It CAN do these things, which is why it's become the multi-use hammer.
But, when batch processing a 10 million records a day, I have the choice of having a 30 minute load time in the RDBMS and a pretty heafty sustained load over several hours (due to the random-seeks that can't fit in memory).
Or I can just store to a flat text file CSV, and maintain a cursor (I mean file-handle) to the last read item, and both load and process the entire</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252652</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251860</id>
	<title>They considered Voldemort</title>
	<author>Anonymous</author>
	<datestamp>1266924900000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>But found that its backup policy required horcruxes.</p></htmltext>
<tokenext>But found that its backup policy required horcruxes .</tokentext>
<sentencetext>But found that its backup policy required horcruxes.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249450</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>azmodean+1</author>
	<datestamp>1266916080000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>4</modscore>
	<htmltext><p>I think you're missing the point here, the problem with RDBMSs isn't that they are "slow" per-se, which implies that they just need some good ol' fashioned optimization.  The problem is that there is a cost associated with the data integrity guarantees they make (usually appears in scalability bottlenecks rather than in pure computational inefficiencies), regardless of how good the implementation is, and if you don't need some of those guarantees, you can dispense with them and end up with better performance (again, this typically means better scalability).  Additionally, this is the kind of bottleneck that you just can't throw more resources at.  Sure you can find the bottleneck and beef up that particular component to do more transactions/second, but at a certain point you've isolated the bottleneck on a world-class server that is doing nothing but that, and it's still a bottleneck.  At that point (preferably long before you reach that point) you have to look at transitioning to an infrastructure that makes some kind of tradeoff that allows the removal of the bottleneck, which is what NoSQL does.

</p><p>I doubt Twitter wants very many RDBMS-type data coherency guarantees at all.  160-character text strings with a similarly-sized amount of metadata, and no real-time delivery guarantees?  Sounds like their database can get pretty inconsistent without messing things up badly.  It seems to me they would be well served by using a database that offers just what they want/need in that area and better performance.

</p><p>Oh and this:</p><p><div class="quote"><p>Is there really a huge issue with rdbms speeds?</p></div><p>yes, and what are you smoking that you would even ask this question?</p></div>
	</htmltext>
<tokenext>I think you 're missing the point here , the problem with RDBMSs is n't that they are " slow " per-se , which implies that they just need some good ol ' fashioned optimization .
The problem is that there is a cost associated with the data integrity guarantees they make ( usually appears in scalability bottlenecks rather than in pure computational inefficiencies ) , regardless of how good the implementation is , and if you do n't need some of those guarantees , you can dispense with them and end up with better performance ( again , this typically means better scalability ) .
Additionally , this is the kind of bottleneck that you just ca n't throw more resources at .
Sure you can find the bottleneck and beef up that particular component to do more transactions/second , but at a certain point you 've isolated the bottleneck on a world-class server that is doing nothing but that , and it 's still a bottleneck .
At that point ( preferably long before you reach that point ) you have to look at transitioning to an infrastructure that makes some kind of tradeoff that allows the removal of the bottleneck , which is what NoSQL does .
I doubt Twitter wants very many RDBMS-type data coherency guarantees at all .
160-character text strings with a similarly-sized amount of metadata , and no real-time delivery guarantees ?
Sounds like their database can get pretty inconsistent without messing things up badly .
It seems to me they would be well served by using a database that offers just what they want/need in that area and better performance .
Oh and this : Is there really a huge issue with rdbms speeds ? yes , and what are you smoking that you would even ask this question ?</tokentext>
<sentencetext>I think you're missing the point here, the problem with RDBMSs isn't that they are "slow" per-se, which implies that they just need some good ol' fashioned optimization.
The problem is that there is a cost associated with the data integrity guarantees they make (usually appears in scalability bottlenecks rather than in pure computational inefficiencies), regardless of how good the implementation is, and if you don't need some of those guarantees, you can dispense with them and end up with better performance (again, this typically means better scalability).
Additionally, this is the kind of bottleneck that you just can't throw more resources at.
Sure you can find the bottleneck and beef up that particular component to do more transactions/second, but at a certain point you've isolated the bottleneck on a world-class server that is doing nothing but that, and it's still a bottleneck.
At that point (preferably long before you reach that point) you have to look at transitioning to an infrastructure that makes some kind of tradeoff that allows the removal of the bottleneck, which is what NoSQL does.
I doubt Twitter wants very many RDBMS-type data coherency guarantees at all.
160-character text strings with a similarly-sized amount of metadata, and no real-time delivery guarantees?
Sounds like their database can get pretty inconsistent without messing things up badly.
It seems to me they would be well served by using a database that offers just what they want/need in that area and better performance.
Oh and this:Is there really a huge issue with rdbms speeds?yes, and what are you smoking that you would even ask this question?
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252514</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>Anonymous</author>
	<datestamp>1266928140000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Cool! Let's design twitter in a slashdot forum! That sounds fun!</p><p>EFD's are 10x cheaper now than when Twitter started. But, we'll ignore that. You need two nodes for fail-over, probably. So  $200k.You also need some hardware and licenses for test and development environments. So, probably  $250k minimum?</p><p>Twitter is doing ~50 million tweets per day. That's about 600 tweets per second, probably with peek load around 6k/s. I'm just guessing about the peek load. So, your point load capability of 1 million simple row writes per second with no indexing is extreme overkill. But, you aren't measuring the right things. The critical problem is read throughput and latency when under write load. Twitter currently supports an average page read load of 1600/s. So, there are probably bursty point loads approaching 10k/s.</p><p>You need to write those raw tweets into a FollowTweet table like |ID|UserID|Date|AuthorID|AuthorName|TweetID|Tweet|, or you'll end up doing essentially random searches across the core tweet table for every page view. The dominant query pattern for this table is</p><p>SELECT TOP 50 * FROM FollowTweet WHERE UserID = @pUserID ORDER BY Date DESC</p><p>You probably need a clustered covered index to make that query fast or you'll do a lot of random seeks on read. I've already denormalized that a bit for you to get rid of the join against the core Tweet table. You can argue about that if you like. Test both if you disagree.</p><p>So, what's your write rate to a table that looks like that while maintaining a clustered covered index, after writing 50 million tweets per day for 6 months? It's far lower than 1 million transactions/sec. But, I'm curious about the exact number.</p><p>Twitter also provides free text search. So, add in a free text search index on tweets, ordered by time of post descending.</p><p>Now that you're maintaining a free text index on the core Tweet table, how many writes per second can you sustain, after writing 50 million tweets per day for 6 months?</p><p>Much more importantly, how many user queries per second can you sustain against the FollowTweet table, and what is the 99.9\% latency of a read? And, how many free text searches per second can you sustain, and what is the 99.9\% latency of a free text search? And, how long does it take to completely write a tweet and make it available to all followers and in the free text index?</p><p>I'm sure you know that latencies are much more important than throughput to the user's subjective experience. I bet you can't sustain the necessary write load on a single node while maintaining good read latency, So, you're going to need some read only mirror nodes to scale reads. So, add that hardware and license cost to your system.</p><p>I could, however, take 12 commodity 1u boxes with cheap disks and 8GB ram, install mongo on them and turn them into mirrored pairs, and handle all of the load I just described with consistently low latency. That'd be about 133/s direct key lookups per mongo box for a page view. And, 100/s writes per mongodb box. Each mongodb box should scale to 10x that load without worry, so we should be fairly safe for point load. I'd have to see the real twitter numbers to be sure. They do seem to have high burst peeks. We can snapshot the mongodbs for disaster recovery, and I'm mirrored for fast failover if a single node goes down.</p><p>You're going to need another 6-12 front end boxes to render pages, with either storage system.</p><p>The total cost for the release environment machines would be ~$24k from a reputable server hardware builder. So, we're looking at $24k for the NoSql FOSS design, vs. $200k for the Oracle, big iron design. We're both ignoring network equipment, bandwidth, and hosting costs.</p><p>Also, with an oracle design, you get to spend your $500k a year on operational specialists with pagers. With FOSS, you can probably spend $240k a year on that staff.</p></htmltext>
<tokenext>Cool !
Let 's design twitter in a slashdot forum !
That sounds fun ! EFD 's are 10x cheaper now than when Twitter started .
But , we 'll ignore that .
You need two nodes for fail-over , probably .
So $ 200k.You also need some hardware and licenses for test and development environments .
So , probably $ 250k minimum ? Twitter is doing ~ 50 million tweets per day .
That 's about 600 tweets per second , probably with peek load around 6k/s .
I 'm just guessing about the peek load .
So , your point load capability of 1 million simple row writes per second with no indexing is extreme overkill .
But , you are n't measuring the right things .
The critical problem is read throughput and latency when under write load .
Twitter currently supports an average page read load of 1600/s .
So , there are probably bursty point loads approaching 10k/s.You need to write those raw tweets into a FollowTweet table like | ID | UserID | Date | AuthorID | AuthorName | TweetID | Tweet | , or you 'll end up doing essentially random searches across the core tweet table for every page view .
The dominant query pattern for this table isSELECT TOP 50 * FROM FollowTweet WHERE UserID = @ pUserID ORDER BY Date DESCYou probably need a clustered covered index to make that query fast or you 'll do a lot of random seeks on read .
I 've already denormalized that a bit for you to get rid of the join against the core Tweet table .
You can argue about that if you like .
Test both if you disagree.So , what 's your write rate to a table that looks like that while maintaining a clustered covered index , after writing 50 million tweets per day for 6 months ?
It 's far lower than 1 million transactions/sec .
But , I 'm curious about the exact number.Twitter also provides free text search .
So , add in a free text search index on tweets , ordered by time of post descending.Now that you 're maintaining a free text index on the core Tweet table , how many writes per second can you sustain , after writing 50 million tweets per day for 6 months ? Much more importantly , how many user queries per second can you sustain against the FollowTweet table , and what is the 99.9 \ % latency of a read ?
And , how many free text searches per second can you sustain , and what is the 99.9 \ % latency of a free text search ?
And , how long does it take to completely write a tweet and make it available to all followers and in the free text index ? I 'm sure you know that latencies are much more important than throughput to the user 's subjective experience .
I bet you ca n't sustain the necessary write load on a single node while maintaining good read latency , So , you 're going to need some read only mirror nodes to scale reads .
So , add that hardware and license cost to your system.I could , however , take 12 commodity 1u boxes with cheap disks and 8GB ram , install mongo on them and turn them into mirrored pairs , and handle all of the load I just described with consistently low latency .
That 'd be about 133/s direct key lookups per mongo box for a page view .
And , 100/s writes per mongodb box .
Each mongodb box should scale to 10x that load without worry , so we should be fairly safe for point load .
I 'd have to see the real twitter numbers to be sure .
They do seem to have high burst peeks .
We can snapshot the mongodbs for disaster recovery , and I 'm mirrored for fast failover if a single node goes down.You 're going to need another 6-12 front end boxes to render pages , with either storage system.The total cost for the release environment machines would be ~ $ 24k from a reputable server hardware builder .
So , we 're looking at $ 24k for the NoSql FOSS design , vs. $ 200k for the Oracle , big iron design .
We 're both ignoring network equipment , bandwidth , and hosting costs.Also , with an oracle design , you get to spend your $ 500k a year on operational specialists with pagers .
With FOSS , you can probably spend $ 240k a year on that staff .</tokentext>
<sentencetext>Cool!
Let's design twitter in a slashdot forum!
That sounds fun!EFD's are 10x cheaper now than when Twitter started.
But, we'll ignore that.
You need two nodes for fail-over, probably.
So  $200k.You also need some hardware and licenses for test and development environments.
So, probably  $250k minimum?Twitter is doing ~50 million tweets per day.
That's about 600 tweets per second, probably with peek load around 6k/s.
I'm just guessing about the peek load.
So, your point load capability of 1 million simple row writes per second with no indexing is extreme overkill.
But, you aren't measuring the right things.
The critical problem is read throughput and latency when under write load.
Twitter currently supports an average page read load of 1600/s.
So, there are probably bursty point loads approaching 10k/s.You need to write those raw tweets into a FollowTweet table like |ID|UserID|Date|AuthorID|AuthorName|TweetID|Tweet|, or you'll end up doing essentially random searches across the core tweet table for every page view.
The dominant query pattern for this table isSELECT TOP 50 * FROM FollowTweet WHERE UserID = @pUserID ORDER BY Date DESCYou probably need a clustered covered index to make that query fast or you'll do a lot of random seeks on read.
I've already denormalized that a bit for you to get rid of the join against the core Tweet table.
You can argue about that if you like.
Test both if you disagree.So, what's your write rate to a table that looks like that while maintaining a clustered covered index, after writing 50 million tweets per day for 6 months?
It's far lower than 1 million transactions/sec.
But, I'm curious about the exact number.Twitter also provides free text search.
So, add in a free text search index on tweets, ordered by time of post descending.Now that you're maintaining a free text index on the core Tweet table, how many writes per second can you sustain, after writing 50 million tweets per day for 6 months?Much more importantly, how many user queries per second can you sustain against the FollowTweet table, and what is the 99.9\% latency of a read?
And, how many free text searches per second can you sustain, and what is the 99.9\% latency of a free text search?
And, how long does it take to completely write a tweet and make it available to all followers and in the free text index?I'm sure you know that latencies are much more important than throughput to the user's subjective experience.
I bet you can't sustain the necessary write load on a single node while maintaining good read latency, So, you're going to need some read only mirror nodes to scale reads.
So, add that hardware and license cost to your system.I could, however, take 12 commodity 1u boxes with cheap disks and 8GB ram, install mongo on them and turn them into mirrored pairs, and handle all of the load I just described with consistently low latency.
That'd be about 133/s direct key lookups per mongo box for a page view.
And, 100/s writes per mongodb box.
Each mongodb box should scale to 10x that load without worry, so we should be fairly safe for point load.
I'd have to see the real twitter numbers to be sure.
They do seem to have high burst peeks.
We can snapshot the mongodbs for disaster recovery, and I'm mirrored for fast failover if a single node goes down.You're going to need another 6-12 front end boxes to render pages, with either storage system.The total cost for the release environment machines would be ~$24k from a reputable server hardware builder.
So, we're looking at $24k for the NoSql FOSS design, vs. $200k for the Oracle, big iron design.
We're both ignoring network equipment, bandwidth, and hosting costs.Also, with an oracle design, you get to spend your $500k a year on operational specialists with pagers.
With FOSS, you can probably spend $240k a year on that staff.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248810</id>
	<title>Re:Don't believe them!</title>
	<author>sconeu</author>
	<datestamp>1266957300000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Damn... you beat me to it.  I was going to say, "Cassandra?  I don't believe it!"</p></htmltext>
<tokenext>Damn... you beat me to it .
I was going to say , " Cassandra ?
I do n't believe it !
"</tokentext>
<sentencetext>Damn... you beat me to it.
I was going to say, "Cassandra?
I don't believe it!
"</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248230</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249600</id>
	<title>Intersystems Cach&#233;</title>
	<author>paugq</author>
	<datestamp>1266916620000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>They should move to <a href="http://intersystems.com/" title="intersystems.com">Intersystems</a> [intersystems.com] <a href="http://intersystems.com/cache/" title="intersystems.com">Cach&#233;</a> [intersystems.com]. SQL, objects, XML and even MUMPS. It will make equally happy SQL and NoSQL fans. And it's damn fast. Much leaner than Oracle, DB2 or Informix, too. Excellent support. Extremely good. Not cheap, thought.</htmltext>
<tokenext>They should move to Intersystems [ intersystems.com ] Cach   [ intersystems.com ] .
SQL , objects , XML and even MUMPS .
It will make equally happy SQL and NoSQL fans .
And it 's damn fast .
Much leaner than Oracle , DB2 or Informix , too .
Excellent support .
Extremely good .
Not cheap , thought .</tokentext>
<sentencetext>They should move to Intersystems [intersystems.com] Caché [intersystems.com].
SQL, objects, XML and even MUMPS.
It will make equally happy SQL and NoSQL fans.
And it's damn fast.
Much leaner than Oracle, DB2 or Informix, too.
Excellent support.
Extremely good.
Not cheap, thought.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250556</id>
	<title>Speed *isn't* scalability</title>
	<author>Colin Smith</author>
	<datestamp>1266920520000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Speed is latency. (how long it takes)<br>Scalability is throughput. (how many concurrent). Or put another way; Speed is the quality, throughput is the width.</p><p><div class="quote"><p>who cares what twuufter is running off.</p></div><p>Well, developers, and their managers do. They're nothing if not fashion victims.</p><p>RDBMS aren't the be all and end all of scalability (or speed, they perform a shit load of management functions you may or may not need). While attempting to scale conventional rdbms you get into write consistency problem, lookup performance problems unless you specifically design your data structures properly. You end up fighting with the relational data model.</p><p>Most developers never even think about it, they just develop against their local mysql install and are overjoyed that their app actually runs. Not all apps even need an rdbms. I've seen apps with a single table, two columns, one of which is a key and it's running on an rdbms, because that's what you do... The words WTF sprang to mind.</p></div>
	</htmltext>
<tokenext>Speed is latency .
( how long it takes ) Scalability is throughput .
( how many concurrent ) .
Or put another way ; Speed is the quality , throughput is the width.who cares what twuufter is running off.Well , developers , and their managers do .
They 're nothing if not fashion victims.RDBMS are n't the be all and end all of scalability ( or speed , they perform a shit load of management functions you may or may not need ) .
While attempting to scale conventional rdbms you get into write consistency problem , lookup performance problems unless you specifically design your data structures properly .
You end up fighting with the relational data model.Most developers never even think about it , they just develop against their local mysql install and are overjoyed that their app actually runs .
Not all apps even need an rdbms .
I 've seen apps with a single table , two columns , one of which is a key and it 's running on an rdbms , because that 's what you do... The words WTF sprang to mind .</tokentext>
<sentencetext>Speed is latency.
(how long it takes)Scalability is throughput.
(how many concurrent).
Or put another way; Speed is the quality, throughput is the width.who cares what twuufter is running off.Well, developers, and their managers do.
They're nothing if not fashion victims.RDBMS aren't the be all and end all of scalability (or speed, they perform a shit load of management functions you may or may not need).
While attempting to scale conventional rdbms you get into write consistency problem, lookup performance problems unless you specifically design your data structures properly.
You end up fighting with the relational data model.Most developers never even think about it, they just develop against their local mysql install and are overjoyed that their app actually runs.
Not all apps even need an rdbms.
I've seen apps with a single table, two columns, one of which is a key and it's running on an rdbms, because that's what you do... The words WTF sprang to mind.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249966</id>
	<title>Re:network issues?</title>
	<author>KermodeBear</author>
	<datestamp>1266918180000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Could someone explain to me why this kind of speed would be a problem? It seems to me that if BinaryMemtable is so incredibly fast that other things become a bottleneck, then you're in a great position. You have something very fast for storing and retrieving data - you just need to get bigger, faster pipes.</p></htmltext>
<tokenext>Could someone explain to me why this kind of speed would be a problem ?
It seems to me that if BinaryMemtable is so incredibly fast that other things become a bottleneck , then you 're in a great position .
You have something very fast for storing and retrieving data - you just need to get bigger , faster pipes .</tokentext>
<sentencetext>Could someone explain to me why this kind of speed would be a problem?
It seems to me that if BinaryMemtable is so incredibly fast that other things become a bottleneck, then you're in a great position.
You have something very fast for storing and retrieving data - you just need to get bigger, faster pipes.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248592</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31263716</id>
	<title>Re:network issues?</title>
	<author>magus\_melchior</author>
	<datestamp>1265101560000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I was thinking that myself; if their backplane is being saturated, surely there's a way to throttle the import process using the datacenter network hardware, QoS, or something similar? For that matter, why don't they have a redundant network so that the production net isn't impacted by datacenter ops (I know, I know... cost)?</p></htmltext>
<tokenext>I was thinking that myself ; if their backplane is being saturated , surely there 's a way to throttle the import process using the datacenter network hardware , QoS , or something similar ?
For that matter , why do n't they have a redundant network so that the production net is n't impacted by datacenter ops ( I know , I know... cost ) ?</tokentext>
<sentencetext>I was thinking that myself; if their backplane is being saturated, surely there's a way to throttle the import process using the datacenter network hardware, QoS, or something similar?
For that matter, why don't they have a redundant network so that the production net isn't impacted by datacenter ops (I know, I know... cost)?</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248592</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249820</id>
	<title>Don't want to install Cassandra</title>
	<author>einhverfr</author>
	<datestamp>1266917580000</datestamp>
	<modclass>Funny</modclass>
	<modscore>2</modscore>
	<htmltext><p>I hear Cassandra is really a trojan.  Can anyone verify?  I don't want a trojan on my computer.....</p></htmltext>
<tokenext>I hear Cassandra is really a trojan .
Can anyone verify ?
I do n't want a trojan on my computer.... .</tokentext>
<sentencetext>I hear Cassandra is really a trojan.
Can anyone verify?
I don't want a trojan on my computer.....</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250796</id>
	<title>Re:network issues?</title>
	<author>Bill, Shooter of Bul</author>
	<datestamp>1266921180000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>2</modscore>
	<htmltext>Yes and no. They are specifically talking about importing their data into cassandra. Which will be a one time event, not worth upgrading the network bandwidth. They need to throttle it to allow for more time sensitive traffic to use the bandwidth. The bandwidth to the database in normal use will be much, much less then the import bandwidth.</htmltext>
<tokenext>Yes and no .
They are specifically talking about importing their data into cassandra .
Which will be a one time event , not worth upgrading the network bandwidth .
They need to throttle it to allow for more time sensitive traffic to use the bandwidth .
The bandwidth to the database in normal use will be much , much less then the import bandwidth .</tokentext>
<sentencetext>Yes and no.
They are specifically talking about importing their data into cassandra.
Which will be a one time event, not worth upgrading the network bandwidth.
They need to throttle it to allow for more time sensitive traffic to use the bandwidth.
The bandwidth to the database in normal use will be much, much less then the import bandwidth.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250644</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>tokul</author>
	<datestamp>1266920760000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>Is there really a huge issue with rdbms speeds? Well if there is something there, that's what needs to be looked at. If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.</p></div>
</blockquote><p>WW2 and Korea called.
</p><p>Is there really huge issue with those propeller plane speeds. We can always speed them up, right. Fastest prop planes reach 850-870km/h. me-262 reached 900 km/h. Mig-15 went to 1075 km/h.
</p><p>If other tools are faster and better than rdbms, then why people should waste their time with slower option.</p></div>
	</htmltext>
<tokenext>Is there really a huge issue with rdbms speeds ?
Well if there is something there , that 's what needs to be looked at .
If RDBMSs are not fast enough , that 's just an opportunity to work more on them to speed them up .
WW2 and Korea called .
Is there really huge issue with those propeller plane speeds .
We can always speed them up , right .
Fastest prop planes reach 850-870km/h .
me-262 reached 900 km/h .
Mig-15 went to 1075 km/h .
If other tools are faster and better than rdbms , then why people should waste their time with slower option .</tokentext>
<sentencetext>Is there really a huge issue with rdbms speeds?
Well if there is something there, that's what needs to be looked at.
If RDBMSs are not fast enough, that's just an opportunity to work more on them to speed them up.
WW2 and Korea called.
Is there really huge issue with those propeller plane speeds.
We can always speed them up, right.
Fastest prop planes reach 850-870km/h.
me-262 reached 900 km/h.
Mig-15 went to 1075 km/h.
If other tools are faster and better than rdbms, then why people should waste their time with slower option.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248442</id>
	<title>hmmm</title>
	<author>Anonymous</author>
	<datestamp>1266956340000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext>facebook uses casandra, digg uses cassandra, twitter is mocing to cassandra.  Maybe in 5 years slashdot will get with it.</htmltext>
<tokenext>facebook uses casandra , digg uses cassandra , twitter is mocing to cassandra .
Maybe in 5 years slashdot will get with it .</tokentext>
<sentencetext>facebook uses casandra, digg uses cassandra, twitter is mocing to cassandra.
Maybe in 5 years slashdot will get with it.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248592</id>
	<title>network issues?</title>
	<author>Anonymous</author>
	<datestamp>1266956760000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>4</modscore>
	<htmltext><i>We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast  it would saturate the backplane of our network.</i> <p>.</p><p>

First time I have ever heard anyone say that a database was too fast.  Maybe there are network problems that also need to be addressed.</p></htmltext>
<tokenext>We were originally trying to use the BinaryMemtable interface , but we actually found it to be too fast it would saturate the backplane of our network .
. First time I have ever heard anyone say that a database was too fast .
Maybe there are network problems that also need to be addressed .</tokentext>
<sentencetext>We were originally trying to use the BinaryMemtable interface, but we actually found it to be too fast  it would saturate the backplane of our network.
.

First time I have ever heard anyone say that a database was too fast.
Maybe there are network problems that also need to be addressed.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842</id>
	<title>And this is front page news, why?</title>
	<author>Lunix Nutcase</author>
	<datestamp>1266957360000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it?</p></htmltext>
<tokenext>Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it ?</tokentext>
<sentencetext>Why is it that whenever twitter makes any random change to some part of its infrastructure that we need a front page story about it?</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31256316</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>DragonWriter</author>
	<datestamp>1266955020000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>If you don't really need a database to run your 'website', then who cares if you use flat files or an in memory hashmap for all your data needs?</p></div></blockquote><p>There is a difference between needing a structured storage mechanism (database) and needing a database that implements the relational model and provides ACID guarantees. Further, many non-relational databases provide specific, weaker forms of ACID guarantees that are better than (say) naive flat file storage would, while providing better scalability in certain applications than existing RDBMS products.</p><p>There's certainly a lot of work going on on providing better scalability for relational databases providing ACID guarantees, too, and as that progresses (because strong ACID guarantees do have value), RDBMS's may be better in some of the roles that "NoSQL" products are good for now. There are challenges to scalability with ACID guarantees, and maybe even some hard barriers, so at best its going to be easier to build scalable products with weaker guarantees in the near future. And real apps need real solutions now, not solutions that might materialize years down the line.</p><blockquote><div><p>Is there really a huge issue with rdbms speeds?</p></div></blockquote><p>Yes, in certain applications with certain workloads there is. Otherwise people would just use existing products.</p></div>
	</htmltext>
<tokenext>If you do n't really need a database to run your 'website ' , then who cares if you use flat files or an in memory hashmap for all your data needs ? There is a difference between needing a structured storage mechanism ( database ) and needing a database that implements the relational model and provides ACID guarantees .
Further , many non-relational databases provide specific , weaker forms of ACID guarantees that are better than ( say ) naive flat file storage would , while providing better scalability in certain applications than existing RDBMS products.There 's certainly a lot of work going on on providing better scalability for relational databases providing ACID guarantees , too , and as that progresses ( because strong ACID guarantees do have value ) , RDBMS 's may be better in some of the roles that " NoSQL " products are good for now .
There are challenges to scalability with ACID guarantees , and maybe even some hard barriers , so at best its going to be easier to build scalable products with weaker guarantees in the near future .
And real apps need real solutions now , not solutions that might materialize years down the line.Is there really a huge issue with rdbms speeds ? Yes , in certain applications with certain workloads there is .
Otherwise people would just use existing products .</tokentext>
<sentencetext>If you don't really need a database to run your 'website', then who cares if you use flat files or an in memory hashmap for all your data needs?There is a difference between needing a structured storage mechanism (database) and needing a database that implements the relational model and provides ACID guarantees.
Further, many non-relational databases provide specific, weaker forms of ACID guarantees that are better than (say) naive flat file storage would, while providing better scalability in certain applications than existing RDBMS products.There's certainly a lot of work going on on providing better scalability for relational databases providing ACID guarantees, too, and as that progresses (because strong ACID guarantees do have value), RDBMS's may be better in some of the roles that "NoSQL" products are good for now.
There are challenges to scalability with ACID guarantees, and maybe even some hard barriers, so at best its going to be easier to build scalable products with weaker guarantees in the near future.
And real apps need real solutions now, not solutions that might materialize years down the line.Is there really a huge issue with rdbms speeds?Yes, in certain applications with certain workloads there is.
Otherwise people would just use existing products.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251406</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>codepunk</author>
	<datestamp>1266923280000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I love oracle it is a fine database, would I personally buy it? Nope, but as long as<br>it is OPM (Other Peoples Money) I am perfectly fine with it. Now say I was designing something<br>like a medical records system oracle would be a no brainer. Missing a couple of tweets here<br>and there who is really going to care.</p></htmltext>
<tokenext>I love oracle it is a fine database , would I personally buy it ?
Nope , but as long asit is OPM ( Other Peoples Money ) I am perfectly fine with it .
Now say I was designing somethinglike a medical records system oracle would be a no brainer .
Missing a couple of tweets hereand there who is really going to care .</tokentext>
<sentencetext>I love oracle it is a fine database, would I personally buy it?
Nope, but as long asit is OPM (Other Peoples Money) I am perfectly fine with it.
Now say I was designing somethinglike a medical records system oracle would be a no brainer.
Missing a couple of tweets hereand there who is really going to care.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255048</id>
	<title>Amazing</title>
	<author>caller9</author>
	<datestamp>1266942720000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Cassandra has the goods for high available and optimized for non-financial data.</p><p>That said, I am amazed at how much time, money, and effort has gone into Twitter.</p><p>Now a distributed scalable super duper database will keep track of who is pooping. <a href="http://poop.obtoose.com/" title="obtoose.com">http://poop.obtoose.com/</a> [obtoose.com]</p></htmltext>
<tokenext>Cassandra has the goods for high available and optimized for non-financial data.That said , I am amazed at how much time , money , and effort has gone into Twitter.Now a distributed scalable super duper database will keep track of who is pooping .
http : //poop.obtoose.com/ [ obtoose.com ]</tokentext>
<sentencetext>Cassandra has the goods for high available and optimized for non-financial data.That said, I am amazed at how much time, money, and effort has gone into Twitter.Now a distributed scalable super duper database will keep track of who is pooping.
http://poop.obtoose.com/ [obtoose.com]</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253876</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>Eil</author>
	<datestamp>1266934740000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Just like there is no universal programming language for every type of software, there is no universal database engine for every type of data storage.</p></htmltext>
<tokenext>Just like there is no universal programming language for every type of software , there is no universal database engine for every type of data storage .</tokentext>
<sentencetext>Just like there is no universal programming language for every type of software, there is no universal database engine for every type of data storage.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252822</id>
	<title>How hard can it be?</title>
	<author>FloydTheDroid</author>
	<datestamp>1266929520000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>I was going to try to write something funny about twitter only needing three tables to run and how hard is it to change but then I thought about how much money they're going to make off those three tables and I started to cry.</htmltext>
<tokenext>I was going to try to write something funny about twitter only needing three tables to run and how hard is it to change but then I thought about how much money they 're going to make off those three tables and I started to cry .</tokentext>
<sentencetext>I was going to try to write something funny about twitter only needing three tables to run and how hard is it to change but then I thought about how much money they're going to make off those three tables and I started to cry.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252634</id>
	<title>Re:Twitter needs scalability experts</title>
	<author>lawpoop</author>
	<datestamp>1266928680000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>I only use FOSS when it happens to be best-in-class</p></div><p>Just curious, what FOSS have/do you use?</p></div>
	</htmltext>
<tokenext>I only use FOSS when it happens to be best-in-classJust curious , what FOSS have/do you use ?</tokentext>
<sentencetext>I only use FOSS when it happens to be best-in-classJust curious, what FOSS have/do you use?
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31257468</id>
	<title>Re:Too bad they dont about TPF/ZTPF and TPFDB/ACPD</title>
	<author>TheSunborn</author>
	<datestamp>1265111040000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>The problem is that 10K-12K transactions  is 1/100 of what twitter need.</p></htmltext>
<tokenext>The problem is that 10K-12K transactions is 1/100 of what twitter need .</tokentext>
<sentencetext>The problem is that 10K-12K transactions  is 1/100 of what twitter need.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250600</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250150</id>
	<title>Re:network issues?</title>
	<author>Anonymous</author>
	<datestamp>1266918840000</datestamp>
	<modclass>Funny</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>you just need to get bigger, faster pipes</p></div></blockquote><p>That's what <em>she</em> said!</p></div>
	</htmltext>
<tokenext>you just need to get bigger , faster pipesThat 's what she said !</tokentext>
<sentencetext>you just need to get bigger, faster pipesThat's what she said!
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249966</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251480</id>
	<title>Re:pfffft twatter tweeter</title>
	<author>Knowbuddy</author>
	<datestamp>1266923520000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>I don't think you understand the niche that NoSQL databases are trying to fill.</p><blockquote><div><p>The more interesting aspect of all of this 'NoSQL' movement is how they believe that if they achieve some speed improvement against some relational databases, how that makes them so much better.</p></div></blockquote><p>It's not a black and white, panacea-type situation.  Relational databases are good at some things, non-relational databases are good at others.  Where non-relational databases are better is at solving very specific problems, many of which happen to map directly to the needs of web developers.</p><p>A Viper is a fun car to take you to and from work, but it's probably not the best to shuttle around a little league baseball team--that's what minivans are for.  (Whether the Viper is the relational or non-relational database in the analogy is up to you.)</p><p>I teach a course titled <em>Advanced Database Concepts</em>, so I'll give you the same example I give my students: blogs.  It's the sort of canonical example--I didn't make it up.</p><p>To show a blog's home page, you need a list of recent posts.  Each post is probably associated with a category, maybe some tags, and and author.  Just to get that data, you're looking at joining 3 tables: Posts, Categories, and Users.  What if you want a comment count?  That's another join, and the query just got hairier--do you do a simple aggregation (join then group), or do you see that might be inefficient and so transform it into a harder-to-read-but-more-efficient subquery?  That might even involve a fifth join, if you have registered user accounts and avatars for your commenters.</p><p>All of which is fine and good until you're running LiveJournal or WordPress.com and you have millions of bloggers generating hundreds of millions of posts and who knows how many comments.  With beefy machines and proper indexes you're probably okay<nobr> <wbr></nobr>... but I wouldn't want to be the DBA who had to tell management that a new column needed to be added to any of those tables.</p><p>Enter NoSQL/non-relational databases: why not fetch everything with just one query?  (I'd show you some JSON, as that's what many of the NoSQL databases speak, but the<nobr> <wbr></nobr>/. filter considers it too much junk.)  You put your comments in the same document as your posts, and the replies to those comments in child arrays, and the user info right inside the comments.  If your users can't change their username, this isn't a bad solution.  There are other tricks, but the point is that you reduce everything down to a single denormalized query.</p><p>This design makes it trivially easy to build data-driven web pages, as effectively every web language has a JSON deserializer.  No ORM impedence mismatch, and you get horizontal scalability pretty much for free.</p><blockquote><div><p>If you don't really need a database to run your 'website', then who cares if you use flat files or an in memory hashmap for all your data needs?</p></div></blockquote><p>Because it's still a database, even if it's non-relational.  You're still doing inserts and updates and deletes, you just get a nice hunk of denormalized clay to play with instead of the normalized rigidity of Tinker Toys.</p><blockquote><div><p>I think that relational databases are good at what they do and that many projects may not need them, but if you do need them on the back end, you will end up with them on the back end.</p></div></blockquote><p>But that's the point I think you're missing: until relatively recently, relational databases were the only game in town.  Relational databases are ubiquitous because they solved the problems of the 60s-90s.  They aren't going anywhere, as those types of problems (financial, transactional, etc) aren't going anywhere.  But now we have a relatively new class of problems (graphs, etc) that need to be nailed down just as thoroughly.  Many web applications are straining to fit within the relational model, and this explosion of NoSQL software is because people are realizing that all that straining can't be good for them.</p><blockquote><div><p>Is there really a huge issue with rdbms speeds?</p></div></blockquote><p>Others may disagree with me, but I make the point that it's not the speed that is the problem--it's the scalability.  Sure, you can keep throwing memory and horsepower at the problem and hope that covers it.  Partitioning data across database servers is a tough problem, with no perfect solutions.  But why keep fighting on the bleeding edge of technology like that?  NoSQL databases are designed to be easily horizontally scalable from the start (in many cases you wouldn't believe how easy it is), even though they may lack the huge feature set of an Oracle or Teradata solution.</p><p>But just to reiterate: non-relational databases aren't a panacea, either.  Each type of database has it good points and bad points.  Each solves a set of problems, some of which overlap and some don't.  The reason you should care is because now you have more options--you're not stuck trying to wedge your system into a relational model if you don't want to.  And isn't<nobr> <wbr></nobr>/. all about freedom of choice?</p></div>
	</htmltext>
<tokenext>I do n't think you understand the niche that NoSQL databases are trying to fill.The more interesting aspect of all of this 'NoSQL ' movement is how they believe that if they achieve some speed improvement against some relational databases , how that makes them so much better.It 's not a black and white , panacea-type situation .
Relational databases are good at some things , non-relational databases are good at others .
Where non-relational databases are better is at solving very specific problems , many of which happen to map directly to the needs of web developers.A Viper is a fun car to take you to and from work , but it 's probably not the best to shuttle around a little league baseball team--that 's what minivans are for .
( Whether the Viper is the relational or non-relational database in the analogy is up to you .
) I teach a course titled Advanced Database Concepts , so I 'll give you the same example I give my students : blogs .
It 's the sort of canonical example--I did n't make it up.To show a blog 's home page , you need a list of recent posts .
Each post is probably associated with a category , maybe some tags , and and author .
Just to get that data , you 're looking at joining 3 tables : Posts , Categories , and Users .
What if you want a comment count ?
That 's another join , and the query just got hairier--do you do a simple aggregation ( join then group ) , or do you see that might be inefficient and so transform it into a harder-to-read-but-more-efficient subquery ?
That might even involve a fifth join , if you have registered user accounts and avatars for your commenters.All of which is fine and good until you 're running LiveJournal or WordPress.com and you have millions of bloggers generating hundreds of millions of posts and who knows how many comments .
With beefy machines and proper indexes you 're probably okay ... but I would n't want to be the DBA who had to tell management that a new column needed to be added to any of those tables.Enter NoSQL/non-relational databases : why not fetch everything with just one query ?
( I 'd show you some JSON , as that 's what many of the NoSQL databases speak , but the / .
filter considers it too much junk .
) You put your comments in the same document as your posts , and the replies to those comments in child arrays , and the user info right inside the comments .
If your users ca n't change their username , this is n't a bad solution .
There are other tricks , but the point is that you reduce everything down to a single denormalized query.This design makes it trivially easy to build data-driven web pages , as effectively every web language has a JSON deserializer .
No ORM impedence mismatch , and you get horizontal scalability pretty much for free.If you do n't really need a database to run your 'website ' , then who cares if you use flat files or an in memory hashmap for all your data needs ? Because it 's still a database , even if it 's non-relational .
You 're still doing inserts and updates and deletes , you just get a nice hunk of denormalized clay to play with instead of the normalized rigidity of Tinker Toys.I think that relational databases are good at what they do and that many projects may not need them , but if you do need them on the back end , you will end up with them on the back end.But that 's the point I think you 're missing : until relatively recently , relational databases were the only game in town .
Relational databases are ubiquitous because they solved the problems of the 60s-90s .
They are n't going anywhere , as those types of problems ( financial , transactional , etc ) are n't going anywhere .
But now we have a relatively new class of problems ( graphs , etc ) that need to be nailed down just as thoroughly .
Many web applications are straining to fit within the relational model , and this explosion of NoSQL software is because people are realizing that all that straining ca n't be good for them.Is there really a huge issue with rdbms speeds ? Others may disagree with me , but I make the point that it 's not the speed that is the problem--it 's the scalability .
Sure , you can keep throwing memory and horsepower at the problem and hope that covers it .
Partitioning data across database servers is a tough problem , with no perfect solutions .
But why keep fighting on the bleeding edge of technology like that ?
NoSQL databases are designed to be easily horizontally scalable from the start ( in many cases you would n't believe how easy it is ) , even though they may lack the huge feature set of an Oracle or Teradata solution.But just to reiterate : non-relational databases are n't a panacea , either .
Each type of database has it good points and bad points .
Each solves a set of problems , some of which overlap and some do n't .
The reason you should care is because now you have more options--you 're not stuck trying to wedge your system into a relational model if you do n't want to .
And is n't / .
all about freedom of choice ?</tokentext>
<sentencetext>I don't think you understand the niche that NoSQL databases are trying to fill.The more interesting aspect of all of this 'NoSQL' movement is how they believe that if they achieve some speed improvement against some relational databases, how that makes them so much better.It's not a black and white, panacea-type situation.
Relational databases are good at some things, non-relational databases are good at others.
Where non-relational databases are better is at solving very specific problems, many of which happen to map directly to the needs of web developers.A Viper is a fun car to take you to and from work, but it's probably not the best to shuttle around a little league baseball team--that's what minivans are for.
(Whether the Viper is the relational or non-relational database in the analogy is up to you.
)I teach a course titled Advanced Database Concepts, so I'll give you the same example I give my students: blogs.
It's the sort of canonical example--I didn't make it up.To show a blog's home page, you need a list of recent posts.
Each post is probably associated with a category, maybe some tags, and and author.
Just to get that data, you're looking at joining 3 tables: Posts, Categories, and Users.
What if you want a comment count?
That's another join, and the query just got hairier--do you do a simple aggregation (join then group), or do you see that might be inefficient and so transform it into a harder-to-read-but-more-efficient subquery?
That might even involve a fifth join, if you have registered user accounts and avatars for your commenters.All of which is fine and good until you're running LiveJournal or WordPress.com and you have millions of bloggers generating hundreds of millions of posts and who knows how many comments.
With beefy machines and proper indexes you're probably okay ... but I wouldn't want to be the DBA who had to tell management that a new column needed to be added to any of those tables.Enter NoSQL/non-relational databases: why not fetch everything with just one query?
(I'd show you some JSON, as that's what many of the NoSQL databases speak, but the /.
filter considers it too much junk.
)  You put your comments in the same document as your posts, and the replies to those comments in child arrays, and the user info right inside the comments.
If your users can't change their username, this isn't a bad solution.
There are other tricks, but the point is that you reduce everything down to a single denormalized query.This design makes it trivially easy to build data-driven web pages, as effectively every web language has a JSON deserializer.
No ORM impedence mismatch, and you get horizontal scalability pretty much for free.If you don't really need a database to run your 'website', then who cares if you use flat files or an in memory hashmap for all your data needs?Because it's still a database, even if it's non-relational.
You're still doing inserts and updates and deletes, you just get a nice hunk of denormalized clay to play with instead of the normalized rigidity of Tinker Toys.I think that relational databases are good at what they do and that many projects may not need them, but if you do need them on the back end, you will end up with them on the back end.But that's the point I think you're missing: until relatively recently, relational databases were the only game in town.
Relational databases are ubiquitous because they solved the problems of the 60s-90s.
They aren't going anywhere, as those types of problems (financial, transactional, etc) aren't going anywhere.
But now we have a relatively new class of problems (graphs, etc) that need to be nailed down just as thoroughly.
Many web applications are straining to fit within the relational model, and this explosion of NoSQL software is because people are realizing that all that straining can't be good for them.Is there really a huge issue with rdbms speeds?Others may disagree with me, but I make the point that it's not the speed that is the problem--it's the scalability.
Sure, you can keep throwing memory and horsepower at the problem and hope that covers it.
Partitioning data across database servers is a tough problem, with no perfect solutions.
But why keep fighting on the bleeding edge of technology like that?
NoSQL databases are designed to be easily horizontally scalable from the start (in many cases you wouldn't believe how easy it is), even though they may lack the huge feature set of an Oracle or Teradata solution.But just to reiterate: non-relational databases aren't a panacea, either.
Each type of database has it good points and bad points.
Each solves a set of problems, some of which overlap and some don't.
The reason you should care is because now you have more options--you're not stuck trying to wedge your system into a relational model if you don't want to.
And isn't /.
all about freedom of choice?
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030</parent>
</comment>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_33</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249014
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_10</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252952
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_12</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252634
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_26</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250970
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249250
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_8</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250500
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248370
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_1</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249816
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_41</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253252
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249966
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248592
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_28</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249938
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248370
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_31</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253742
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_19</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251816
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250600
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_24</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250556
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_22</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255304
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_7</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248482
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248230
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_47</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252518
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250600
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_6</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255320
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_37</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31261902
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_18</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253408
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_21</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249390
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_17</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250618
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_20</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31254070
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_5</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251406
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_11</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251350
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248370
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_45</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250684
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_36</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250644
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_0</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31263716
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248592
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_35</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31257162
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_40</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250150
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249966
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248592
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_16</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251402
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_29</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253390
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249820
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_32</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31254744
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251480
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_34</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252274
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249600
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_48</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248780
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248442
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_25</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252622
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_9</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31254618
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_49</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249292
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248442
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_27</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248810
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248230
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_4</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251208
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249450
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_30</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255622
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252652
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_44</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251274
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249250
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_46</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249142
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248230
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_23</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31256316
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_14</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252550
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251430
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_13</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253880
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_15</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31257468
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250600
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_3</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253876
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_38</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252814
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250930
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248994
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_2</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252514
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_43</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252746
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249250
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_39</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250796
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249966
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248592
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_10_02_23_1826226_42</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31263688
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
</commentlist>
</thread>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.3</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252652
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255622
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.1</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251430
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252550
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.4</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250600
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31257468
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252518
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251816
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.12</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248842
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249014
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252622
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248994
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250930
---http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252814
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249816
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.2</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249160
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.10</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249820
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253390
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.11</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248230
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249142
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248810
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248482
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.7</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249600
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252274
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.13</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250104
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31263688
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250618
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252514
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31254070
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252952
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251406
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251402
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253742
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253880
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252634
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250684
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.5</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248654
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.8</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248370
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249938
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250500
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251350
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.6</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248592
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249966
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250796
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250150
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253252
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31263716
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.9</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248442
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249292
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31248780
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation10_02_23_1826226.0</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249030
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31261902
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249390
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31257162
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250644
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249250
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31252746
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251274
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250970
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255304
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31256316
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253408
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251480
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31254744
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31253876
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31249450
--http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31251208
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31250556
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31254618
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment10_02_23_1826226.31255320
</commentlist>
</conversation>
