<article>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#article09_10_13_0114230</id>
	<title>Getting Students To Think At Internet Scale</title>
	<author>kdawson</author>
	<datestamp>1255424880000</datestamp>
	<htmltext><a href="http://hughpickens.com/" rel="nofollow">Hugh Pickens</a> writes <i>"The NY Times reports that researchers and workers in fields as diverse as biotechnology, astronomy, and computer science will soon find themselves overwhelmed with information &mdash; so the next generation of computer scientists will have to <a href="http://www.nytimes.com/2009/10/12/technology/12data.html">learn think in terms of Internet scale</a> of petabytes of data. For the most part, university students have used rather modest computing systems to support their studies, but these machines fail to churn through enough data to really challenge and train young minds to ponder the mega-scale problems of tomorrow. 'If they imprint on these small systems, that becomes their frame of reference and what they're always thinking about,' said Jim Spohrer, a director at IBM's Almaden Research Center. This year, the National Science Foundation funded 14 universities that want to teach their students how to grapple with big data questions. Students are beginning to work with data sets like the <a href="http://www.lsst.org/lsst/science/datamgmt\_products">Large Synoptic Survey Telescope</a>, the largest public data set in the world. The telescope takes detailed images of large chunks of the sky and produces about 30 terabytes of data each night. 'Science these days has basically turned into a data-management problem,' says Jimmy Lin, an associate professor at the University of Maryland."</i></htmltext>
<tokenext>Hugh Pickens writes " The NY Times reports that researchers and workers in fields as diverse as biotechnology , astronomy , and computer science will soon find themselves overwhelmed with information    so the next generation of computer scientists will have to learn think in terms of Internet scale of petabytes of data .
For the most part , university students have used rather modest computing systems to support their studies , but these machines fail to churn through enough data to really challenge and train young minds to ponder the mega-scale problems of tomorrow .
'If they imprint on these small systems , that becomes their frame of reference and what they 're always thinking about, ' said Jim Spohrer , a director at IBM 's Almaden Research Center .
This year , the National Science Foundation funded 14 universities that want to teach their students how to grapple with big data questions .
Students are beginning to work with data sets like the Large Synoptic Survey Telescope , the largest public data set in the world .
The telescope takes detailed images of large chunks of the sky and produces about 30 terabytes of data each night .
'Science these days has basically turned into a data-management problem, ' says Jimmy Lin , an associate professor at the University of Maryland .
"</tokentext>
<sentencetext>Hugh Pickens writes "The NY Times reports that researchers and workers in fields as diverse as biotechnology, astronomy, and computer science will soon find themselves overwhelmed with information — so the next generation of computer scientists will have to learn think in terms of Internet scale of petabytes of data.
For the most part, university students have used rather modest computing systems to support their studies, but these machines fail to churn through enough data to really challenge and train young minds to ponder the mega-scale problems of tomorrow.
'If they imprint on these small systems, that becomes their frame of reference and what they're always thinking about,' said Jim Spohrer, a director at IBM's Almaden Research Center.
This year, the National Science Foundation funded 14 universities that want to teach their students how to grapple with big data questions.
Students are beginning to work with data sets like the Large Synoptic Survey Telescope, the largest public data set in the world.
The telescope takes detailed images of large chunks of the sky and produces about 30 terabytes of data each night.
'Science these days has basically turned into a data-management problem,' says Jimmy Lin, an associate professor at the University of Maryland.
"</sentencetext>
</article>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729765</id>
	<title>A fantastic idea</title>
	<author>Anonymous</author>
	<datestamp>1255430820000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>This is a great idea<br>
&nbsp; &nbsp; Even in business we often hit problems with systems that are designed by people that just dont think about real world data volumes. I work in the ERP vendor SPACE  (SAP, ORACLE, PEOPLESOFT and so on)  and their inhouse systems arent designed to simulate real world  data and so  their performance is shocking when you load  real throughput into them. AND so many times have I seen graduates think Microsoft systems can take enterprise volumes of data - and are shocked when the build something that collapses under a few terabytes  or so ! Im used to having to post millions of transactions a day and there isnt an MS system in the world that deals with that. No offence to MS -  we use excel for reporting and drilldowns and  access a lot but understanding the limitations of the tools what it can really handle and scale to is essential. As well as understanding what large data volumes actually are these days !</p><p>I know of a large bank that put in an ERP system using INTEL and MS SQL SERVER (with LOTS of press).  We were a bit shocked actually because that bank was larger than we were and we had mainframes struggling to cope with our transaction load.<br>In fact I was hauled over the coals for the cost of our hardware - so i investigate. The INTEL / MS solution failed so miserably they quietly shut it down and moved back to their mainframe - no press !. It wasnt able to cope with the merest fraction of the load and couldnt have. However the people involved had no conception of what large meant ( and they thought that a faster processor was all you needed - it never occurred to them you get something for all the extra money you pay for in a mainframe !)</p><p>I think this is a terrific idea - but not only a the whole internet but they should teach this so the students understand these concepts  for any large corporation they may work for !</p></htmltext>
<tokenext>This is a great idea     Even in business we often hit problems with systems that are designed by people that just dont think about real world data volumes .
I work in the ERP vendor SPACE ( SAP , ORACLE , PEOPLESOFT and so on ) and their inhouse systems arent designed to simulate real world data and so their performance is shocking when you load real throughput into them .
AND so many times have I seen graduates think Microsoft systems can take enterprise volumes of data - and are shocked when the build something that collapses under a few terabytes or so !
Im used to having to post millions of transactions a day and there isnt an MS system in the world that deals with that .
No offence to MS - we use excel for reporting and drilldowns and access a lot but understanding the limitations of the tools what it can really handle and scale to is essential .
As well as understanding what large data volumes actually are these days ! I know of a large bank that put in an ERP system using INTEL and MS SQL SERVER ( with LOTS of press ) .
We were a bit shocked actually because that bank was larger than we were and we had mainframes struggling to cope with our transaction load.In fact I was hauled over the coals for the cost of our hardware - so i investigate .
The INTEL / MS solution failed so miserably they quietly shut it down and moved back to their mainframe - no press ! .
It wasnt able to cope with the merest fraction of the load and couldnt have .
However the people involved had no conception of what large meant ( and they thought that a faster processor was all you needed - it never occurred to them you get something for all the extra money you pay for in a mainframe !
) I think this is a terrific idea - but not only a the whole internet but they should teach this so the students understand these concepts for any large corporation they may work for !</tokentext>
<sentencetext>This is a great idea
    Even in business we often hit problems with systems that are designed by people that just dont think about real world data volumes.
I work in the ERP vendor SPACE  (SAP, ORACLE, PEOPLESOFT and so on)  and their inhouse systems arent designed to simulate real world  data and so  their performance is shocking when you load  real throughput into them.
AND so many times have I seen graduates think Microsoft systems can take enterprise volumes of data - and are shocked when the build something that collapses under a few terabytes  or so !
Im used to having to post millions of transactions a day and there isnt an MS system in the world that deals with that.
No offence to MS -  we use excel for reporting and drilldowns and  access a lot but understanding the limitations of the tools what it can really handle and scale to is essential.
As well as understanding what large data volumes actually are these days !I know of a large bank that put in an ERP system using INTEL and MS SQL SERVER (with LOTS of press).
We were a bit shocked actually because that bank was larger than we were and we had mainframes struggling to cope with our transaction load.In fact I was hauled over the coals for the cost of our hardware - so i investigate.
The INTEL / MS solution failed so miserably they quietly shut it down and moved back to their mainframe - no press !.
It wasnt able to cope with the merest fraction of the load and couldnt have.
However the people involved had no conception of what large meant ( and they thought that a faster processor was all you needed - it never occurred to them you get something for all the extra money you pay for in a mainframe !
)I think this is a terrific idea - but not only a the whole internet but they should teach this so the students understand these concepts  for any large corporation they may work for !</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730087</id>
	<title>data reduction is it's own discipline</title>
	<author>petes\_PoV</author>
	<datestamp>1255435440000</datestamp>
	<modclass>Troll</modclass>
	<modscore>0</modscore>
	<htmltext>A degree course is the first step, not the final result in a worthwhile scientific education. You don't expect to teach every student every technique they might use in every job they could get. Most of them won't even go into research - so there is a lot of waste teaching people skills that only a few will need. Far better to focus on the foundations (which could well include the basics of data analysis), rather than spending time on the ins and outs of products that are in use today - and will therefore be obsolete when they graduate and need to use that skill.
<p>
You could very well argue that it's not even a scientists job to turn petabytes of data into kilobytes of information - that's a technicians role. Scientists are there to create the knowledge, not do the lab assistant's job.</p></htmltext>
<tokenext>A degree course is the first step , not the final result in a worthwhile scientific education .
You do n't expect to teach every student every technique they might use in every job they could get .
Most of them wo n't even go into research - so there is a lot of waste teaching people skills that only a few will need .
Far better to focus on the foundations ( which could well include the basics of data analysis ) , rather than spending time on the ins and outs of products that are in use today - and will therefore be obsolete when they graduate and need to use that skill .
You could very well argue that it 's not even a scientists job to turn petabytes of data into kilobytes of information - that 's a technicians role .
Scientists are there to create the knowledge , not do the lab assistant 's job .</tokentext>
<sentencetext>A degree course is the first step, not the final result in a worthwhile scientific education.
You don't expect to teach every student every technique they might use in every job they could get.
Most of them won't even go into research - so there is a lot of waste teaching people skills that only a few will need.
Far better to focus on the foundations (which could well include the basics of data analysis), rather than spending time on the ins and outs of products that are in use today - and will therefore be obsolete when they graduate and need to use that skill.
You could very well argue that it's not even a scientists job to turn petabytes of data into kilobytes of information - that's a technicians role.
Scientists are there to create the knowledge, not do the lab assistant's job.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29738995</id>
	<title>As simple as possible, but no simpler</title>
	<author>Anonymous</author>
	<datestamp>1255436820000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Hopefully the instructors are being a bit more sensible than the summary implies and are teaching students that problems at different scales require different approaches to finding solutions. For a small embedded system, simplicity and efficiency are key. Too many levels of abstraction and caching and you will have a lousy system that barely runs on the target processor. At the opposite end of the scale, appropriate abstractions and caching are absolutely essential in order to effectively manage complex systems with large numbers of transactions or large volumes of data (or both). Keep things too simple and the system will fail to scale adequately.</p><p>For any given system you want to try to hit that sweet spot of engineering design: keeping things as simple as possible, but no simpler.</p></htmltext>
<tokenext>Hopefully the instructors are being a bit more sensible than the summary implies and are teaching students that problems at different scales require different approaches to finding solutions .
For a small embedded system , simplicity and efficiency are key .
Too many levels of abstraction and caching and you will have a lousy system that barely runs on the target processor .
At the opposite end of the scale , appropriate abstractions and caching are absolutely essential in order to effectively manage complex systems with large numbers of transactions or large volumes of data ( or both ) .
Keep things too simple and the system will fail to scale adequately.For any given system you want to try to hit that sweet spot of engineering design : keeping things as simple as possible , but no simpler .</tokentext>
<sentencetext>Hopefully the instructors are being a bit more sensible than the summary implies and are teaching students that problems at different scales require different approaches to finding solutions.
For a small embedded system, simplicity and efficiency are key.
Too many levels of abstraction and caching and you will have a lousy system that barely runs on the target processor.
At the opposite end of the scale, appropriate abstractions and caching are absolutely essential in order to effectively manage complex systems with large numbers of transactions or large volumes of data (or both).
Keep things too simple and the system will fail to scale adequately.For any given system you want to try to hit that sweet spot of engineering design: keeping things as simple as possible, but no simpler.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29733247</id>
	<title>Re:Indeed</title>
	<author>BJ\_Covert\_Action</author>
	<datestamp>1255455120000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>Unsurprisingly, all of it can't be stored. There's a dedicated group of people whose only job is to make sure that only relevant information is extracted, and another small group whose only job is to make sure that all this information can be stored, accessed, and processed at large scales.</p></div><p>
I didn't know they needed perl coders at CERN. No wonder everyone is afraid of the LHC destroying the world...
<br> <br>
=P</p></div>
	</htmltext>
<tokenext>Unsurprisingly , all of it ca n't be stored .
There 's a dedicated group of people whose only job is to make sure that only relevant information is extracted , and another small group whose only job is to make sure that all this information can be stored , accessed , and processed at large scales .
I did n't know they needed perl coders at CERN .
No wonder everyone is afraid of the LHC destroying the world.. . = P</tokentext>
<sentencetext>Unsurprisingly, all of it can't be stored.
There's a dedicated group of people whose only job is to make sure that only relevant information is extracted, and another small group whose only job is to make sure that all this information can be stored, accessed, and processed at large scales.
I didn't know they needed perl coders at CERN.
No wonder everyone is afraid of the LHC destroying the world...
 
=P
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729861</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730943</id>
	<title>Past Time to Stop Using int</title>
	<author>Anonymous</author>
	<datestamp>1255443840000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Is there a single intro to programming book that uses long in favor of int?  Just like double has replaced float for almost all numerical calculations, we need long to replace int.</htmltext>
<tokenext>Is there a single intro to programming book that uses long in favor of int ?
Just like double has replaced float for almost all numerical calculations , we need long to replace int .</tokentext>
<sentencetext>Is there a single intro to programming book that uses long in favor of int?
Just like double has replaced float for almost all numerical calculations, we need long to replace int.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29741651</id>
	<title>Re:Data management problem</title>
	<author>jawahar</author>
	<datestamp>1255463040000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>Getting Students To Think At Internet Scale</p></div></blockquote><p>

After reading the headline, I thought this is an extension to <a href="http://www.kegel.com/c10k.html" title="kegel.com" rel="nofollow">http://www.kegel.com/c10k.html</a> [kegel.com]</p></div>
	</htmltext>
<tokenext>Getting Students To Think At Internet Scale After reading the headline , I thought this is an extension to http : //www.kegel.com/c10k.html [ kegel.com ]</tokentext>
<sentencetext>Getting Students To Think At Internet Scale

After reading the headline, I thought this is an extension to http://www.kegel.com/c10k.html [kegel.com]
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729909</id>
	<title>Well...</title>
	<author>DavidR1991</author>
	<datestamp>1255432740000</datestamp>
	<modclass>None</modclass>
	<modscore>2</modscore>
	<htmltext><p>If you swap the focus from smaller size problems to the mega-scale problems, then you get a bunch of students who can only do mega-scale problems (reverse of the trend the article talks about)</p><p>Here's the rub: It's easier to scale up than it is to scale down. Most big problems are made up of lots of little problems. Little problems are rarely made up of mega-scale problems...</p><p>I think what they need to do is to keep the focus on the small/'regular' stuff, but <i>also</i> show how their knowledge applies to the "big stuff" (so they can 'see' problems from both ends) - not just focus on one or the other</p></htmltext>
<tokenext>If you swap the focus from smaller size problems to the mega-scale problems , then you get a bunch of students who can only do mega-scale problems ( reverse of the trend the article talks about ) Here 's the rub : It 's easier to scale up than it is to scale down .
Most big problems are made up of lots of little problems .
Little problems are rarely made up of mega-scale problems...I think what they need to do is to keep the focus on the small/'regular ' stuff , but also show how their knowledge applies to the " big stuff " ( so they can 'see ' problems from both ends ) - not just focus on one or the other</tokentext>
<sentencetext>If you swap the focus from smaller size problems to the mega-scale problems, then you get a bunch of students who can only do mega-scale problems (reverse of the trend the article talks about)Here's the rub: It's easier to scale up than it is to scale down.
Most big problems are made up of lots of little problems.
Little problems are rarely made up of mega-scale problems...I think what they need to do is to keep the focus on the small/'regular' stuff, but also show how their knowledge applies to the "big stuff" (so they can 'see' problems from both ends) - not just focus on one or the other</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730005</id>
	<title>Work at enterprise...</title>
	<author>SharpFang</author>
	<datestamp>1255434240000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>It was a very surprising experience, moving from small services where you get 10 hits per minute maybe, to a corporation that receives several thousands hits per second.</p><p>There was a layer of cache between each of 4 application layers (database, back-end, front-end and adserver), and whenever a generic cache wouldn't cut it, a custom one was applied. On my last project there, the dedicated caching system could reduce some 5000 hits per second to 1 database query per 5 seconds - way overengineered even for our needs but it was a pleasure watching the backend compressing several thousands requests into one, and the frontend split into pieces of "very strong cache, keep in browser cache for weeks", "strong caching, refresh once/15 min site-wide", "weak caching, refresh site-wide every 30s" and "no caching, per visitor data" with the first being some 15K of Javascript, the second about 5K of generic content data, the third about 100 bytes of immediate reports and the last some 10 bytes of user prefs and choices.</p></htmltext>
<tokenext>It was a very surprising experience , moving from small services where you get 10 hits per minute maybe , to a corporation that receives several thousands hits per second.There was a layer of cache between each of 4 application layers ( database , back-end , front-end and adserver ) , and whenever a generic cache would n't cut it , a custom one was applied .
On my last project there , the dedicated caching system could reduce some 5000 hits per second to 1 database query per 5 seconds - way overengineered even for our needs but it was a pleasure watching the backend compressing several thousands requests into one , and the frontend split into pieces of " very strong cache , keep in browser cache for weeks " , " strong caching , refresh once/15 min site-wide " , " weak caching , refresh site-wide every 30s " and " no caching , per visitor data " with the first being some 15K of Javascript , the second about 5K of generic content data , the third about 100 bytes of immediate reports and the last some 10 bytes of user prefs and choices .</tokentext>
<sentencetext>It was a very surprising experience, moving from small services where you get 10 hits per minute maybe, to a corporation that receives several thousands hits per second.There was a layer of cache between each of 4 application layers (database, back-end, front-end and adserver), and whenever a generic cache wouldn't cut it, a custom one was applied.
On my last project there, the dedicated caching system could reduce some 5000 hits per second to 1 database query per 5 seconds - way overengineered even for our needs but it was a pleasure watching the backend compressing several thousands requests into one, and the frontend split into pieces of "very strong cache, keep in browser cache for weeks", "strong caching, refresh once/15 min site-wide", "weak caching, refresh site-wide every 30s" and "no caching, per visitor data" with the first being some 15K of Javascript, the second about 5K of generic content data, the third about 100 bytes of immediate reports and the last some 10 bytes of user prefs and choices.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730181</id>
	<title>Re:Students don't need to think at internet scale</title>
	<author>Strange Ranger</author>
	<datestamp>1255436640000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>2</modscore>
	<htmltext><i>I don't see the problem.</i> <br> <br>^Maybe this illustrates the point?<br> <br>Really really big numbers can be hard for the human brain to get a grip on.  But more to the point, operating at large scales presents problems unique to the scale.  Think of baking cookies.  Doing this in your kitchen is a familiar thing to most people.  But the kitchen method doesn't translate well to an industrial scale.  Keebler doesn't use a million gallon bowl and cranes with giant beaters on the end.  They don't have ovens the size of a cruise ships.  Just because you can make awesome cookies in your kitchen doesn't qualify you one bit to work for Keebler.  <br>
Whether it's cookies or scientific inquiry it's a good idea to prepare students to process things on the appropriate scale.</htmltext>
<tokenext>I do n't see the problem .
^ Maybe this illustrates the point ?
Really really big numbers can be hard for the human brain to get a grip on .
But more to the point , operating at large scales presents problems unique to the scale .
Think of baking cookies .
Doing this in your kitchen is a familiar thing to most people .
But the kitchen method does n't translate well to an industrial scale .
Keebler does n't use a million gallon bowl and cranes with giant beaters on the end .
They do n't have ovens the size of a cruise ships .
Just because you can make awesome cookies in your kitchen does n't qualify you one bit to work for Keebler .
Whether it 's cookies or scientific inquiry it 's a good idea to prepare students to process things on the appropriate scale .</tokentext>
<sentencetext>I don't see the problem.
^Maybe this illustrates the point?
Really really big numbers can be hard for the human brain to get a grip on.
But more to the point, operating at large scales presents problems unique to the scale.
Think of baking cookies.
Doing this in your kitchen is a familiar thing to most people.
But the kitchen method doesn't translate well to an industrial scale.
Keebler doesn't use a million gallon bowl and cranes with giant beaters on the end.
They don't have ovens the size of a cruise ships.
Just because you can make awesome cookies in your kitchen doesn't qualify you one bit to work for Keebler.
Whether it's cookies or scientific inquiry it's a good idea to prepare students to process things on the appropriate scale.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729777</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730389</id>
	<title>Since I was very young</title>
	<author>C0quette</author>
	<datestamp>1255439220000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Some have the attitude for juggling with exabytes. Since I was very young I've realized I never wanted to be human size. So I avoid the crowds and traffic jams. They just remind me of how small I am. Because of this longing in my heart I'm going to start the growing art. I'm going to grow now and never stop. Think like a mountain, grow to the top. Tall, I want to be tall. As big as a wall. And if I'm not tall, then I will crawl. With concentration, my size increased. And now I'm fourteen stories high, at least. Empire State Human! Just a born kid, I'll go to Egypt to be the pyramids. Brick by brick. Stone by stone. Growing till I'm fully grown. Fetch more water. Fetch more sand. Biggest person in the land. The Human League.</htmltext>
<tokenext>Some have the attitude for juggling with exabytes .
Since I was very young I 've realized I never wanted to be human size .
So I avoid the crowds and traffic jams .
They just remind me of how small I am .
Because of this longing in my heart I 'm going to start the growing art .
I 'm going to grow now and never stop .
Think like a mountain , grow to the top .
Tall , I want to be tall .
As big as a wall .
And if I 'm not tall , then I will crawl .
With concentration , my size increased .
And now I 'm fourteen stories high , at least .
Empire State Human !
Just a born kid , I 'll go to Egypt to be the pyramids .
Brick by brick .
Stone by stone .
Growing till I 'm fully grown .
Fetch more water .
Fetch more sand .
Biggest person in the land .
The Human League .</tokentext>
<sentencetext>Some have the attitude for juggling with exabytes.
Since I was very young I've realized I never wanted to be human size.
So I avoid the crowds and traffic jams.
They just remind me of how small I am.
Because of this longing in my heart I'm going to start the growing art.
I'm going to grow now and never stop.
Think like a mountain, grow to the top.
Tall, I want to be tall.
As big as a wall.
And if I'm not tall, then I will crawl.
With concentration, my size increased.
And now I'm fourteen stories high, at least.
Empire State Human!
Just a born kid, I'll go to Egypt to be the pyramids.
Brick by brick.
Stone by stone.
Growing till I'm fully grown.
Fetch more water.
Fetch more sand.
Biggest person in the land.
The Human League.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29733741</id>
	<title>Re:everybody can</title>
	<author>CompMD</author>
	<datestamp>1255457400000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Managing large amounts of data was a problem for the chief engineer for a project I worked on.  This guy had a PhD in Aerospace Engineering and lots of professional and academic honors.  I was running a wind tunnel test that was capturing 8 24-bit signals at 10kHz and writing the data to a csv.  Now, he bought good hardware, but refused to pay for decent analysis software, mainly because he didn't know any.  So I had to write a program to break up the data into files small enough that Excel could open them, and then he could work with them.  I volunteered to write something with a database backend and use gnuplot to graph data, but noooooo, that would take precious engineering time.</p><p>Long story short, he ended up spending more time figuring out how to screw with Excel than he spent actually figuring out what the data meant.  Of course, the customer had to pay for his lack of competence.  I'm so glad I don't have to deal with that guy any more.</p></htmltext>
<tokenext>Managing large amounts of data was a problem for the chief engineer for a project I worked on .
This guy had a PhD in Aerospace Engineering and lots of professional and academic honors .
I was running a wind tunnel test that was capturing 8 24-bit signals at 10kHz and writing the data to a csv .
Now , he bought good hardware , but refused to pay for decent analysis software , mainly because he did n't know any .
So I had to write a program to break up the data into files small enough that Excel could open them , and then he could work with them .
I volunteered to write something with a database backend and use gnuplot to graph data , but noooooo , that would take precious engineering time.Long story short , he ended up spending more time figuring out how to screw with Excel than he spent actually figuring out what the data meant .
Of course , the customer had to pay for his lack of competence .
I 'm so glad I do n't have to deal with that guy any more .</tokentext>
<sentencetext>Managing large amounts of data was a problem for the chief engineer for a project I worked on.
This guy had a PhD in Aerospace Engineering and lots of professional and academic honors.
I was running a wind tunnel test that was capturing 8 24-bit signals at 10kHz and writing the data to a csv.
Now, he bought good hardware, but refused to pay for decent analysis software, mainly because he didn't know any.
So I had to write a program to break up the data into files small enough that Excel could open them, and then he could work with them.
I volunteered to write something with a database backend and use gnuplot to graph data, but noooooo, that would take precious engineering time.Long story short, he ended up spending more time figuring out how to screw with Excel than he spent actually figuring out what the data meant.
Of course, the customer had to pay for his lack of competence.
I'm so glad I don't have to deal with that guy any more.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729745</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730289</id>
	<title>Re:Data management problem</title>
	<author>FlyingBishop</author>
	<datestamp>1255438140000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>3</modscore>
	<htmltext><p>It's also useless to say 'hey I'm analyzing this graph' if you're analyzing it wrong. I think you're missing the big picture. It's incredibly naive to think that the fundamental laws are simple enough to be grasped without massive datasets. It is possible, but all the data gathered thus far suggests that the fundamental laws of nature will not be found by someone staring at an equation on a whiteboard until it clicks. That is why Cern's data capacity is measured in terabytes, and they want to grow it as much as possible. That's why we have so much genetic data.</p><p>Scientific method and principles count, but they are not enough.</p></htmltext>
<tokenext>It 's also useless to say 'hey I 'm analyzing this graph ' if you 're analyzing it wrong .
I think you 're missing the big picture .
It 's incredibly naive to think that the fundamental laws are simple enough to be grasped without massive datasets .
It is possible , but all the data gathered thus far suggests that the fundamental laws of nature will not be found by someone staring at an equation on a whiteboard until it clicks .
That is why Cern 's data capacity is measured in terabytes , and they want to grow it as much as possible .
That 's why we have so much genetic data.Scientific method and principles count , but they are not enough .</tokentext>
<sentencetext>It's also useless to say 'hey I'm analyzing this graph' if you're analyzing it wrong.
I think you're missing the big picture.
It's incredibly naive to think that the fundamental laws are simple enough to be grasped without massive datasets.
It is possible, but all the data gathered thus far suggests that the fundamental laws of nature will not be found by someone staring at an equation on a whiteboard until it clicks.
That is why Cern's data capacity is measured in terabytes, and they want to grow it as much as possible.
That's why we have so much genetic data.Scientific method and principles count, but they are not enough.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29732991</id>
	<title>School Should Focus on Basics</title>
	<author>Anonymous</author>
	<datestamp>1255453920000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Unfortunately businesses want to turn the U.S.A. education system into a head start training program. The problem is if you focus on specific technologies or techniques what is a student going to do when the skills are obsolete and they get "right-sized" out of the market. A solid understanding of basic principles and techniques for problem solving would go a long way to getting our level of education up where it should be. Then turn around and offer some cool tools and resources for projects, extra-curricular, or extra-credit. If a college or high school wants to design a special class to learn about how to use newer tools and newer tech, that is great but if the people in the class haven't mastered the basics of written or verbal communication, it is going to be a very very short class.</p></htmltext>
<tokenext>Unfortunately businesses want to turn the U.S.A. education system into a head start training program .
The problem is if you focus on specific technologies or techniques what is a student going to do when the skills are obsolete and they get " right-sized " out of the market .
A solid understanding of basic principles and techniques for problem solving would go a long way to getting our level of education up where it should be .
Then turn around and offer some cool tools and resources for projects , extra-curricular , or extra-credit .
If a college or high school wants to design a special class to learn about how to use newer tools and newer tech , that is great but if the people in the class have n't mastered the basics of written or verbal communication , it is going to be a very very short class .</tokentext>
<sentencetext>Unfortunately businesses want to turn the U.S.A. education system into a head start training program.
The problem is if you focus on specific technologies or techniques what is a student going to do when the skills are obsolete and they get "right-sized" out of the market.
A solid understanding of basic principles and techniques for problem solving would go a long way to getting our level of education up where it should be.
Then turn around and offer some cool tools and resources for projects, extra-curricular, or extra-credit.
If a college or high school wants to design a special class to learn about how to use newer tools and newer tech, that is great but if the people in the class haven't mastered the basics of written or verbal communication, it is going to be a very very short class.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730247</id>
	<title>Re:Data management problem</title>
	<author>Interoperable</author>
	<datestamp>1255437420000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>4</modscore>
	<htmltext><p>Yeah no kidding. I don't know if maybe that quote ('Science these days has basically turned into a data-management problem') was taken out of context, but I'm surprised a professor would say something that ignorant. I recently did a Master's in physics and it certainly didn't involve huge quantities of data; I ended up transferring much of my data off a spectrum analyzer with a floppy drive. (When we lost the GPIB transfer script I thought it would take too long to learn the HP libraries to rewrite it. That was a mistake, after 4 hours of shoving floppies in the drive I sat down and wrote a script in 2 hours, ah well.)</p><p>But the point is, a 400 data point trace may be exactly what you need to get the information your looking for. Just because we can collect and process huge quantities of data doesn't mean that all science requires you to do so, nor is simply handling the data the critical part of analyzing it.</p></div>
	</htmltext>
<tokenext>Yeah no kidding .
I do n't know if maybe that quote ( 'Science these days has basically turned into a data-management problem ' ) was taken out of context , but I 'm surprised a professor would say something that ignorant .
I recently did a Master 's in physics and it certainly did n't involve huge quantities of data ; I ended up transferring much of my data off a spectrum analyzer with a floppy drive .
( When we lost the GPIB transfer script I thought it would take too long to learn the HP libraries to rewrite it .
That was a mistake , after 4 hours of shoving floppies in the drive I sat down and wrote a script in 2 hours , ah well .
) But the point is , a 400 data point trace may be exactly what you need to get the information your looking for .
Just because we can collect and process huge quantities of data does n't mean that all science requires you to do so , nor is simply handling the data the critical part of analyzing it .</tokentext>
<sentencetext>Yeah no kidding.
I don't know if maybe that quote ('Science these days has basically turned into a data-management problem') was taken out of context, but I'm surprised a professor would say something that ignorant.
I recently did a Master's in physics and it certainly didn't involve huge quantities of data; I ended up transferring much of my data off a spectrum analyzer with a floppy drive.
(When we lost the GPIB transfer script I thought it would take too long to learn the HP libraries to rewrite it.
That was a mistake, after 4 hours of shoving floppies in the drive I sat down and wrote a script in 2 hours, ah well.
)But the point is, a 400 data point trace may be exactly what you need to get the information your looking for.
Just because we can collect and process huge quantities of data doesn't mean that all science requires you to do so, nor is simply handling the data the critical part of analyzing it.
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29740369</id>
	<title>Data management issue</title>
	<author>Anonymous</author>
	<datestamp>1255447800000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>That's why I run "Einstein@home" to help with the search for neutron stars using LIGO (gravitational wave detector) data. If every geek gave up some hard drive space and processor time on all their boxes...</p><p>Ram</p></htmltext>
<tokenext>That 's why I run " Einstein @ home " to help with the search for neutron stars using LIGO ( gravitational wave detector ) data .
If every geek gave up some hard drive space and processor time on all their boxes...Ram</tokentext>
<sentencetext>That's why I run "Einstein@home" to help with the search for neutron stars using LIGO (gravitational wave detector) data.
If every geek gave up some hard drive space and processor time on all their boxes...Ram</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29734029</id>
	<title>Re:Indeed</title>
	<author>jtownatpunk.net</author>
	<datestamp>1255458480000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>This is nothing new.  I worked at a university back in the early 90s and the center for remote sensing and optics was pulling in more data every single day than most department servers could hold.  Their setup was both amazing and frightening.  Just a massive pile of machines with saturated SCSI controllers.  One of their big projects was to build a 4tb array.  But 9.6 gig drives were just trickling into the market at that time.  You'd need over 400 of those just to provide 4tb of raw storage.  Nevermind parity and redundancy.  And even if they did manage to design the system, the cost...</p><p>But my point is that scientists and their support groups have been managing large sets of data for as long as there's been scientists generating data to manage.  We've ramped up the capacity and efficiency of our storage technology and they've ramped up the amount of data they collect and the amount of processing they do to it.</p></htmltext>
<tokenext>This is nothing new .
I worked at a university back in the early 90s and the center for remote sensing and optics was pulling in more data every single day than most department servers could hold .
Their setup was both amazing and frightening .
Just a massive pile of machines with saturated SCSI controllers .
One of their big projects was to build a 4tb array .
But 9.6 gig drives were just trickling into the market at that time .
You 'd need over 400 of those just to provide 4tb of raw storage .
Nevermind parity and redundancy .
And even if they did manage to design the system , the cost...But my point is that scientists and their support groups have been managing large sets of data for as long as there 's been scientists generating data to manage .
We 've ramped up the capacity and efficiency of our storage technology and they 've ramped up the amount of data they collect and the amount of processing they do to it .</tokentext>
<sentencetext>This is nothing new.
I worked at a university back in the early 90s and the center for remote sensing and optics was pulling in more data every single day than most department servers could hold.
Their setup was both amazing and frightening.
Just a massive pile of machines with saturated SCSI controllers.
One of their big projects was to build a 4tb array.
But 9.6 gig drives were just trickling into the market at that time.
You'd need over 400 of those just to provide 4tb of raw storage.
Nevermind parity and redundancy.
And even if they did manage to design the system, the cost...But my point is that scientists and their support groups have been managing large sets of data for as long as there's been scientists generating data to manage.
We've ramped up the capacity and efficiency of our storage technology and they've ramped up the amount of data they collect and the amount of processing they do to it.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729861</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29733987</id>
	<title>Re:The Petabyte Problem</title>
	<author>zrq</author>
	<datestamp>1255458360000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>Computer, show me all ship-like objects, in any profile.  Ah, there it is."</p></div>
</blockquote><p>

We are working on it <a href="http://www.ivoa.net/pub/info/" title="ivoa.net">IVOA</a> [ivoa.net].</p></div>
	</htmltext>
<tokenext>Computer , show me all ship-like objects , in any profile .
Ah , there it is .
" We are working on it IVOA [ ivoa.net ] .</tokentext>
<sentencetext>Computer, show me all ship-like objects, in any profile.
Ah, there it is.
"


We are working on it IVOA [ivoa.net].
	</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729881</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730465</id>
	<title>2first</title>
	<author>Anonymous</author>
	<datestamp>1255440060000</datestamp>
	<modclass>Offtopic</modclass>
	<modscore>-1</modscore>
	<htmltext>the reaper BSD's not so bad.  To the A 7ull-time GNNA and Michael Smith faster chip and some of the Rivalry. While available to the project to</htmltext>
<tokenext>the reaper BSD 's not so bad .
To the A 7ull-time GNNA and Michael Smith faster chip and some of the Rivalry .
While available to the project to</tokentext>
<sentencetext>the reaper BSD's not so bad.
To the A 7ull-time GNNA and Michael Smith faster chip and some of the Rivalry.
While available to the project to</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729833</id>
	<title>Re:Data management problem</title>
	<author>Hognoxious</author>
	<datestamp>1255431480000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Quite.  Dr Snow didn't need squillobytes of data to discover the cause of cholera, just a few hundred cases, some keen observation and a bit of intuition.</p></htmltext>
<tokenext>Quite .
Dr Snow did n't need squillobytes of data to discover the cause of cholera , just a few hundred cases , some keen observation and a bit of intuition .</tokentext>
<sentencetext>Quite.
Dr Snow didn't need squillobytes of data to discover the cause of cholera, just a few hundred cases, some keen observation and a bit of intuition.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730011</id>
	<title>Re:The LSST?</title>
	<author>Shag</author>
	<datestamp>1255434300000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>2</modscore>
	<htmltext><p>What aallan said - although, 2015?  I thought the big projects (LSST, EELT, TMT) were all setting a 2018 target now.</p><p>I went to a talk a month and a half ago by LSST's lead camera scientist (Steve Kahn) and LSST is at this point very much vaporware (as in, they've got some of the money, and some of the parts, but are nowhere near having all the money or having it all built.) Even Pan-STARRS, which is only supposed to crank out 10TB a night, only has 1 of 4 planned scopes built (they're building a second), and has been having optical quality problems with that one. By the time kids born at the turn of the century are leaving high school, though, yes, we <i>do</i> expect things like these to be up and running.</p><p>But at the risk of sounding like that one college that publishes a list every year of what the freshman class of that year does and doesn't know, kids born around the turn of the century (my daughter is one) don't have the "OMG a TB!" mentality that we grownups have. The smallest capacity hard-drive my daughter will probably remember was 5 gigs - and that was in an iPod.  Things like 64-bit, gigahertz speeds, multiprocessing, fast ethernet, wifi, home broadband... always been there.  DVD-R media has, to her knowledge, <i>always</i> been there.  (I did once have to explain to her that CDs used to be the size of platters and made of black plastic, after she found some Queensr&#255;che vinyl.)</p><p>She's ten now, and you can put a half-terabyte or more in a laptop, so while the idea of some big scientific project spitting out 50 or 60 laptops worth of data in a night is clearly a lot of data, it's not something that can't be envisioned.</p></htmltext>
<tokenext>What aallan said - although , 2015 ?
I thought the big projects ( LSST , EELT , TMT ) were all setting a 2018 target now.I went to a talk a month and a half ago by LSST 's lead camera scientist ( Steve Kahn ) and LSST is at this point very much vaporware ( as in , they 've got some of the money , and some of the parts , but are nowhere near having all the money or having it all built .
) Even Pan-STARRS , which is only supposed to crank out 10TB a night , only has 1 of 4 planned scopes built ( they 're building a second ) , and has been having optical quality problems with that one .
By the time kids born at the turn of the century are leaving high school , though , yes , we do expect things like these to be up and running.But at the risk of sounding like that one college that publishes a list every year of what the freshman class of that year does and does n't know , kids born around the turn of the century ( my daughter is one ) do n't have the " OMG a TB !
" mentality that we grownups have .
The smallest capacity hard-drive my daughter will probably remember was 5 gigs - and that was in an iPod .
Things like 64-bit , gigahertz speeds , multiprocessing , fast ethernet , wifi , home broadband... always been there .
DVD-R media has , to her knowledge , always been there .
( I did once have to explain to her that CDs used to be the size of platters and made of black plastic , after she found some Queensr   che vinyl .
) She 's ten now , and you can put a half-terabyte or more in a laptop , so while the idea of some big scientific project spitting out 50 or 60 laptops worth of data in a night is clearly a lot of data , it 's not something that ca n't be envisioned .</tokentext>
<sentencetext>What aallan said - although, 2015?
I thought the big projects (LSST, EELT, TMT) were all setting a 2018 target now.I went to a talk a month and a half ago by LSST's lead camera scientist (Steve Kahn) and LSST is at this point very much vaporware (as in, they've got some of the money, and some of the parts, but are nowhere near having all the money or having it all built.
) Even Pan-STARRS, which is only supposed to crank out 10TB a night, only has 1 of 4 planned scopes built (they're building a second), and has been having optical quality problems with that one.
By the time kids born at the turn of the century are leaving high school, though, yes, we do expect things like these to be up and running.But at the risk of sounding like that one college that publishes a list every year of what the freshman class of that year does and doesn't know, kids born around the turn of the century (my daughter is one) don't have the "OMG a TB!
" mentality that we grownups have.
The smallest capacity hard-drive my daughter will probably remember was 5 gigs - and that was in an iPod.
Things like 64-bit, gigahertz speeds, multiprocessing, fast ethernet, wifi, home broadband... always been there.
DVD-R media has, to her knowledge, always been there.
(I did once have to explain to her that CDs used to be the size of platters and made of black plastic, after she found some Queensrÿche vinyl.
)She's ten now, and you can put a half-terabyte or more in a laptop, so while the idea of some big scientific project spitting out 50 or 60 laptops worth of data in a night is clearly a lot of data, it's not something that can't be envisioned.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729755</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29741063</id>
	<title>Hey I knew that!</title>
	<author>Whiteox</author>
	<datestamp>1255454340000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>It's like this:<br>Learn to play all the campaigns on Age of Empires II of which there is a population limit of 75.<br>Repeat for a number of years until you are perfect and the most efficient.<br>Then go play a network AOEII game with a pop cap of 200 and you will invariably lose because you can't get your head around it.<br>The game is simple, yet hard to manipulate when scaled up and takes a lot more effort to win. And that's only changing one variable.</p></htmltext>
<tokenext>It 's like this : Learn to play all the campaigns on Age of Empires II of which there is a population limit of 75.Repeat for a number of years until you are perfect and the most efficient.Then go play a network AOEII game with a pop cap of 200 and you will invariably lose because you ca n't get your head around it.The game is simple , yet hard to manipulate when scaled up and takes a lot more effort to win .
And that 's only changing one variable .</tokentext>
<sentencetext>It's like this:Learn to play all the campaigns on Age of Empires II of which there is a population limit of 75.Repeat for a number of years until you are perfect and the most efficient.Then go play a network AOEII game with a pop cap of 200 and you will invariably lose because you can't get your head around it.The game is simple, yet hard to manipulate when scaled up and takes a lot more effort to win.
And that's only changing one variable.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729799</id>
	<title>Do we all work with all the data in internet?</title>
	<author>Anonymous</author>
	<datestamp>1255431060000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Now you are focusing in a problem of small area (big sets of data), which is ok in itself.</p><p>Just don't forget that small scale makes all the difference.</p></htmltext>
<tokenext>Now you are focusing in a problem of small area ( big sets of data ) , which is ok in itself.Just do n't forget that small scale makes all the difference .</tokentext>
<sentencetext>Now you are focusing in a problem of small area (big sets of data), which is ok in itself.Just don't forget that small scale makes all the difference.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729861</id>
	<title>Indeed</title>
	<author>Anonymous</author>
	<datestamp>1255432020000</datestamp>
	<modclass>Interestin</modclass>
	<modscore>5</modscore>
	<htmltext>I worked for one of the detectors at CERN, and I strongly agree with the notion of Science being a data management problem.

We (intend to<nobr> <wbr></nobr>:-) pull a colossal amount of data from the detectors (about 40 TB/sec in case of the experiment I was working for). Unsurprisingly, all of it can't be stored. There's a dedicated group of people whose only job is to make sure that only relevant information is extracted, and another small group whose only job is to make sure that all this information can be stored, accessed, and processed at large scales. In short, there is a lot that happens with the data before it is even seen by a physicist.

Having said that, I agree that very few people have a real appreciation and/or understanding of these kinds of systems and even fewer have the required depth of knowledge to build them. But this tends to be a highly specialized area, and I can't imagine it's easy to study it as a generic subject.</htmltext>
<tokenext>I worked for one of the detectors at CERN , and I strongly agree with the notion of Science being a data management problem .
We ( intend to : - ) pull a colossal amount of data from the detectors ( about 40 TB/sec in case of the experiment I was working for ) .
Unsurprisingly , all of it ca n't be stored .
There 's a dedicated group of people whose only job is to make sure that only relevant information is extracted , and another small group whose only job is to make sure that all this information can be stored , accessed , and processed at large scales .
In short , there is a lot that happens with the data before it is even seen by a physicist .
Having said that , I agree that very few people have a real appreciation and/or understanding of these kinds of systems and even fewer have the required depth of knowledge to build them .
But this tends to be a highly specialized area , and I ca n't imagine it 's easy to study it as a generic subject .</tokentext>
<sentencetext>I worked for one of the detectors at CERN, and I strongly agree with the notion of Science being a data management problem.
We (intend to :-) pull a colossal amount of data from the detectors (about 40 TB/sec in case of the experiment I was working for).
Unsurprisingly, all of it can't be stored.
There's a dedicated group of people whose only job is to make sure that only relevant information is extracted, and another small group whose only job is to make sure that all this information can be stored, accessed, and processed at large scales.
In short, there is a lot that happens with the data before it is even seen by a physicist.
Having said that, I agree that very few people have a real appreciation and/or understanding of these kinds of systems and even fewer have the required depth of knowledge to build them.
But this tends to be a highly specialized area, and I can't imagine it's easy to study it as a generic subject.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730009</id>
	<title>Sooo true!</title>
	<author>psnyder</author>
	<datestamp>1255434300000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p><div class="quote"><p>'If they imprint on these small systems, that becomes their frame of reference and what they're always thinking about,' said Jim Spohrer</p></div><p>That is SOOO true!  I mean, I was brought up on my Commodore 64, and I have NO IDEA how to to contemplate petabytes of data!  (What does that EVEN MEAN?!?)  I still don't see why ANYONE would need more than 64kB of memory.</p></div>
	</htmltext>
<tokenext>'If they imprint on these small systems , that becomes their frame of reference and what they 're always thinking about, ' said Jim SpohrerThat is SOOO true !
I mean , I was brought up on my Commodore 64 , and I have NO IDEA how to to contemplate petabytes of data !
( What does that EVEN MEAN ? ! ?
) I still do n't see why ANYONE would need more than 64kB of memory .</tokentext>
<sentencetext>'If they imprint on these small systems, that becomes their frame of reference and what they're always thinking about,' said Jim SpohrerThat is SOOO true!
I mean, I was brought up on my Commodore 64, and I have NO IDEA how to to contemplate petabytes of data!
(What does that EVEN MEAN?!?
)  I still don't see why ANYONE would need more than 64kB of memory.
	</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729865</id>
	<title>Internet scale of petabytes of data...</title>
	<author>Anonymous</author>
	<datestamp>1255432020000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>As an Internet user, I really can't imagine how I can download / upload petabytes of data, in my whole life.</htmltext>
<tokenext>As an Internet user , I really ca n't imagine how I can download / upload petabytes of data , in my whole life .</tokentext>
<sentencetext>As an Internet user, I really can't imagine how I can download / upload petabytes of data, in my whole life.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729819</id>
	<title>Wrong</title>
	<author>Hognoxious</author>
	<datestamp>1255431240000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>2</modscore>
	<htmltext>Summary uses data and information as if they are synonyms.   They are not.</htmltext>
<tokenext>Summary uses data and information as if they are synonyms .
They are not .</tokentext>
<sentencetext>Summary uses data and information as if they are synonyms.
They are not.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729747</id>
	<title>Generation R</title>
	<author>Anonymous</author>
	<datestamp>1255430580000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>If we are the generation Y, they will be the Generation R - from Ritalin</p></htmltext>
<tokenext>If we are the generation Y , they will be the Generation R - from Ritalin</tokentext>
<sentencetext>If we are the generation Y, they will be the Generation R - from Ritalin</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729755</id>
	<title>The LSST?</title>
	<author>aallan</author>
	<datestamp>1255430640000</datestamp>
	<modclass>Informativ</modclass>
	<modscore>4</modscore>
	<htmltext><p> <em>Students are beginning to work with data sets like the Large Synoptic Survey Telescope, the largest public data set in the world. The telescope takes detailed images of large chunks of the sky and produces about 30 terabytes of data each night.</em> </p><p>Err no it doesn't, and no they aren't. The telescope hasn't been built yet? First light isn't scheduled until late in 2015.</p><p>

Al.</p></htmltext>
<tokenext>Students are beginning to work with data sets like the Large Synoptic Survey Telescope , the largest public data set in the world .
The telescope takes detailed images of large chunks of the sky and produces about 30 terabytes of data each night .
Err no it does n't , and no they are n't .
The telescope has n't been built yet ?
First light is n't scheduled until late in 2015 .
Al .</tokentext>
<sentencetext> Students are beginning to work with data sets like the Large Synoptic Survey Telescope, the largest public data set in the world.
The telescope takes detailed images of large chunks of the sky and produces about 30 terabytes of data each night.
Err no it doesn't, and no they aren't.
The telescope hasn't been built yet?
First light isn't scheduled until late in 2015.
Al.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29738431</id>
	<title>Datasets of interest</title>
	<author>xenocide2</author>
	<datestamp>1255433460000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Part of the problem is that young students fresh out of high school have no pet datasets. For many, they're buying a new laptop for college and keeping, at most, their music. Chat logs, banking, browsing history; it hasn't occurred to them to keep these things. Hell, I doubt few CS students make backups of their own computers. I know I didn't.</p><p>Without a personal dataset of interest to maintain and process, you'll find little demand from students for classes on large dataset computations. Unless they enjoy astronomy, or biology, or whatever, in which case they're likely in a different major. If we want to train CS majors to help in other fields, we need to promote and identify personal data first.</p></htmltext>
<tokenext>Part of the problem is that young students fresh out of high school have no pet datasets .
For many , they 're buying a new laptop for college and keeping , at most , their music .
Chat logs , banking , browsing history ; it has n't occurred to them to keep these things .
Hell , I doubt few CS students make backups of their own computers .
I know I did n't.Without a personal dataset of interest to maintain and process , you 'll find little demand from students for classes on large dataset computations .
Unless they enjoy astronomy , or biology , or whatever , in which case they 're likely in a different major .
If we want to train CS majors to help in other fields , we need to promote and identify personal data first .</tokentext>
<sentencetext>Part of the problem is that young students fresh out of high school have no pet datasets.
For many, they're buying a new laptop for college and keeping, at most, their music.
Chat logs, banking, browsing history; it hasn't occurred to them to keep these things.
Hell, I doubt few CS students make backups of their own computers.
I know I didn't.Without a personal dataset of interest to maintain and process, you'll find little demand from students for classes on large dataset computations.
Unless they enjoy astronomy, or biology, or whatever, in which case they're likely in a different major.
If we want to train CS majors to help in other fields, we need to promote and identify personal data first.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730307</id>
	<title>Re:Students don't need to think at internet scale</title>
	<author>Yvanhoe</author>
	<datestamp>1255438320000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>Shhhh, let them start their One Supercomputer Per Child program. It can only be good.</htmltext>
<tokenext>Shhhh , let them start their One Supercomputer Per Child program .
It can only be good .</tokentext>
<sentencetext>Shhhh, let them start their One Supercomputer Per Child program.
It can only be good.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729777</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29746403</id>
	<title>Pedants should be Pedantic</title>
	<author>BonysGambit</author>
	<datestamp>1255542000000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>When we speak of "Science" in a general sense, it's about using the Scientific Method to pursue a goal or enhance our knowledge. This has nothing to do with the size of the data accumulated to perform the task.
These days, all of us are learning to think at "Internet Scale." Join Facebook and "befriend" 200 million people. Enroll in LinkedIn and you have 40 million possible connections. National debts are measured in numbers with more zeros than ever used before to describe money.
 In other words, every field of human endeavour these days, presents its own data management problem.
If I may introduce the crass topic of business into such a rarified air of Science; in today's Inbound Marketing arena, the volume of data being accumulated about Visitors to one's website, some of whom become Prospects and then Clients, is literally Internet sized.
So what's a person to do? Same thing we've always done - automate to handle it. We have used technology and tools to overcome human limitations since the first ape used a bone as a hammer (if you liked the movie 2001's analogy). So marketers today can use Sales and Marketing Automation to reduce huge data sets to usable and understandable sizes, in the same way that any other field will employ computer methods to do the same.
Data management problems, in other words, are a field unto themselves, requiring specialists such as DBAs, Hardware and Software Engineers. Not Scientists in the general sense, but specialists.
There's more on these ideas at <a href="http://www.inbound-marketing-automation.ca/blog/" title="inbound-ma...omation.ca" rel="nofollow">http://www.inbound-marketing-automation.ca/blog/</a> [inbound-ma...omation.ca]</htmltext>
<tokenext>When we speak of " Science " in a general sense , it 's about using the Scientific Method to pursue a goal or enhance our knowledge .
This has nothing to do with the size of the data accumulated to perform the task .
These days , all of us are learning to think at " Internet Scale .
" Join Facebook and " befriend " 200 million people .
Enroll in LinkedIn and you have 40 million possible connections .
National debts are measured in numbers with more zeros than ever used before to describe money .
In other words , every field of human endeavour these days , presents its own data management problem .
If I may introduce the crass topic of business into such a rarified air of Science ; in today 's Inbound Marketing arena , the volume of data being accumulated about Visitors to one 's website , some of whom become Prospects and then Clients , is literally Internet sized .
So what 's a person to do ?
Same thing we 've always done - automate to handle it .
We have used technology and tools to overcome human limitations since the first ape used a bone as a hammer ( if you liked the movie 2001 's analogy ) .
So marketers today can use Sales and Marketing Automation to reduce huge data sets to usable and understandable sizes , in the same way that any other field will employ computer methods to do the same .
Data management problems , in other words , are a field unto themselves , requiring specialists such as DBAs , Hardware and Software Engineers .
Not Scientists in the general sense , but specialists .
There 's more on these ideas at http : //www.inbound-marketing-automation.ca/blog/ [ inbound-ma...omation.ca ]</tokentext>
<sentencetext>When we speak of "Science" in a general sense, it's about using the Scientific Method to pursue a goal or enhance our knowledge.
This has nothing to do with the size of the data accumulated to perform the task.
These days, all of us are learning to think at "Internet Scale.
" Join Facebook and "befriend" 200 million people.
Enroll in LinkedIn and you have 40 million possible connections.
National debts are measured in numbers with more zeros than ever used before to describe money.
In other words, every field of human endeavour these days, presents its own data management problem.
If I may introduce the crass topic of business into such a rarified air of Science; in today's Inbound Marketing arena, the volume of data being accumulated about Visitors to one's website, some of whom become Prospects and then Clients, is literally Internet sized.
So what's a person to do?
Same thing we've always done - automate to handle it.
We have used technology and tools to overcome human limitations since the first ape used a bone as a hammer (if you liked the movie 2001's analogy).
So marketers today can use Sales and Marketing Automation to reduce huge data sets to usable and understandable sizes, in the same way that any other field will employ computer methods to do the same.
Data management problems, in other words, are a field unto themselves, requiring specialists such as DBAs, Hardware and Software Engineers.
Not Scientists in the general sense, but specialists.
There's more on these ideas at http://www.inbound-marketing-automation.ca/blog/ [inbound-ma...omation.ca]</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651</id>
	<title>Data management problem</title>
	<author>Anonymous</author>
	<datestamp>1255429140000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>5</modscore>
	<htmltext>Science has always been about extracting knowledge from thoughtfully-generated and -processed data. Managing enormous datasets is not science per se, it's computer engineering. It's useless to say 'hey I'm processing 30 TB' if you're processing them wrong. Scientific method and principles are what count, and they don't change.</htmltext>
<tokenext>Science has always been about extracting knowledge from thoughtfully-generated and -processed data .
Managing enormous datasets is not science per se , it 's computer engineering .
It 's useless to say 'hey I 'm processing 30 TB ' if you 're processing them wrong .
Scientific method and principles are what count , and they do n't change .</tokentext>
<sentencetext>Science has always been about extracting knowledge from thoughtfully-generated and -processed data.
Managing enormous datasets is not science per se, it's computer engineering.
It's useless to say 'hey I'm processing 30 TB' if you're processing them wrong.
Scientific method and principles are what count, and they don't change.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729827</id>
	<title>Not until Internets are improved...</title>
	<author>Anonymous</author>
	<datestamp>1255431360000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>... and opened up for anyone to use, as well as more datasets opened freely for anyone to use.</p><p>These 2 things are holding back innovation in so many areas.<br>Damn ISPs and their laze. (read: greed)</p></htmltext>
<tokenext>... and opened up for anyone to use , as well as more datasets opened freely for anyone to use.These 2 things are holding back innovation in so many areas.Damn ISPs and their laze .
( read : greed )</tokentext>
<sentencetext>... and opened up for anyone to use, as well as more datasets opened freely for anyone to use.These 2 things are holding back innovation in so many areas.Damn ISPs and their laze.
(read: greed)</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29736613</id>
	<title>Proof of Ignorance</title>
	<author>Anonymous</author>
	<datestamp>1255426620000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>         This overwhelming data issue points to a basic fact. The universe contains a sum of information that we may label X. Humanity at its best operates with way less than 1\% of X which defines our species as being better than 99\% lost in ignorance. In essence the noble human mind operates with, in effect, an intelligence that might as well be as low as the common Earth worm. Providing the entire universe with a humorous display as we have all kinds of social kinkiness in assigning our notions of intellectual and academic abilities to our fellow dumb as a rock humans.</p></htmltext>
<tokenext>This overwhelming data issue points to a basic fact .
The universe contains a sum of information that we may label X. Humanity at its best operates with way less than 1 \ % of X which defines our species as being better than 99 \ % lost in ignorance .
In essence the noble human mind operates with , in effect , an intelligence that might as well be as low as the common Earth worm .
Providing the entire universe with a humorous display as we have all kinds of social kinkiness in assigning our notions of intellectual and academic abilities to our fellow dumb as a rock humans .</tokentext>
<sentencetext>         This overwhelming data issue points to a basic fact.
The universe contains a sum of information that we may label X. Humanity at its best operates with way less than 1\% of X which defines our species as being better than 99\% lost in ignorance.
In essence the noble human mind operates with, in effect, an intelligence that might as well be as low as the common Earth worm.
Providing the entire universe with a humorous display as we have all kinds of social kinkiness in assigning our notions of intellectual and academic abilities to our fellow dumb as a rock humans.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729777</id>
	<title>Students don't need to think at internet scale</title>
	<author>Anonymous</author>
	<datestamp>1255430880000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>2</modscore>
	<htmltext><p>They just need to think. That's what they study for (ideally). Thinking people with open minds can tackle anything, including the "scale of the internet".</p><p>When I was in high school, I used a slide rule. When I entered university, I got me a calculator. Did maths or problem solving abilities change or improve because of the calculator? no. Student today can jolly well learn about networking on small LANs, or learn to manage small datasets on aging university computers, so long as what they learn is good, they'll be able to transpose their knowledge on a vaster scale, or invent the next Big Thing. I don't see the problem.</p></htmltext>
<tokenext>They just need to think .
That 's what they study for ( ideally ) .
Thinking people with open minds can tackle anything , including the " scale of the internet " .When I was in high school , I used a slide rule .
When I entered university , I got me a calculator .
Did maths or problem solving abilities change or improve because of the calculator ?
no. Student today can jolly well learn about networking on small LANs , or learn to manage small datasets on aging university computers , so long as what they learn is good , they 'll be able to transpose their knowledge on a vaster scale , or invent the next Big Thing .
I do n't see the problem .</tokentext>
<sentencetext>They just need to think.
That's what they study for (ideally).
Thinking people with open minds can tackle anything, including the "scale of the internet".When I was in high school, I used a slide rule.
When I entered university, I got me a calculator.
Did maths or problem solving abilities change or improve because of the calculator?
no. Student today can jolly well learn about networking on small LANs, or learn to manage small datasets on aging university computers, so long as what they learn is good, they'll be able to transpose their knowledge on a vaster scale, or invent the next Big Thing.
I don't see the problem.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729745</id>
	<title>everybody can</title>
	<author>Fotograf</author>
	<datestamp>1255430580000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext>everybody can capture ridiculous amount of data, do it smart and manage them is what makes a genius.</htmltext>
<tokenext>everybody can capture ridiculous amount of data , do it smart and manage them is what makes a genius .</tokentext>
<sentencetext>everybody can capture ridiculous amount of data, do it smart and manage them is what makes a genius.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729817</id>
	<title>Why?</title>
	<author>benjamindees</author>
	<datestamp>1255431240000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Add me to the list of people who think this is a solution in search of a problem.</p><p>Oh, who the hell am I kidding.  I'm sure the problem they have in mind has something to do with spying on people.</p></htmltext>
<tokenext>Add me to the list of people who think this is a solution in search of a problem.Oh , who the hell am I kidding .
I 'm sure the problem they have in mind has something to do with spying on people .</tokentext>
<sentencetext>Add me to the list of people who think this is a solution in search of a problem.Oh, who the hell am I kidding.
I'm sure the problem they have in mind has something to do with spying on people.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29731399</id>
	<title>Re:The LSST?</title>
	<author>oneiros27</author>
	<datestamp>1255446660000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>That was my first thought in reading this, too.</p><p>There *are* large data systems online now, even if they're not of the scope of LSST.  The big difference is that the EOS-DIS (earth science) has funding to cover it stuff like building giant unified data centers (I think they pull 2TB/day<nobr> <wbr></nobr>... per satellite), while the rest of us us in the "space sciences" are trying to figure out how to get enough bandwidth to serve our data, and using various distributed data systems (PDS, the VxOs, etc.).  Once SDO finally launches (early next year?), we'll be generating over 2TB/day of useful data products (4TB/day of raw data), which is much larger than solar physics has been dealing with.</p><p>Oh<nobr> <wbr></nobr>... and to make things fun -- as someone else commented about today's hard drive sizes -- because of requirements to get things certified by required deadlines, and planning for procurement lag, plus whatever launch delays (or construction delays for LSST) the data systems might be 3+ years old by the time there's first light.</p><p>(disclaimer -- if it wasn't obvious, I actually work with these 'big science' data systems)</p></htmltext>
<tokenext>That was my first thought in reading this , too.There * are * large data systems online now , even if they 're not of the scope of LSST .
The big difference is that the EOS-DIS ( earth science ) has funding to cover it stuff like building giant unified data centers ( I think they pull 2TB/day ... per satellite ) , while the rest of us us in the " space sciences " are trying to figure out how to get enough bandwidth to serve our data , and using various distributed data systems ( PDS , the VxOs , etc. ) .
Once SDO finally launches ( early next year ?
) , we 'll be generating over 2TB/day of useful data products ( 4TB/day of raw data ) , which is much larger than solar physics has been dealing with.Oh ... and to make things fun -- as someone else commented about today 's hard drive sizes -- because of requirements to get things certified by required deadlines , and planning for procurement lag , plus whatever launch delays ( or construction delays for LSST ) the data systems might be 3 + years old by the time there 's first light .
( disclaimer -- if it was n't obvious , I actually work with these 'big science ' data systems )</tokentext>
<sentencetext>That was my first thought in reading this, too.There *are* large data systems online now, even if they're not of the scope of LSST.
The big difference is that the EOS-DIS (earth science) has funding to cover it stuff like building giant unified data centers (I think they pull 2TB/day ... per satellite), while the rest of us us in the "space sciences" are trying to figure out how to get enough bandwidth to serve our data, and using various distributed data systems (PDS, the VxOs, etc.).
Once SDO finally launches (early next year?
), we'll be generating over 2TB/day of useful data products (4TB/day of raw data), which is much larger than solar physics has been dealing with.Oh ... and to make things fun -- as someone else commented about today's hard drive sizes -- because of requirements to get things certified by required deadlines, and planning for procurement lag, plus whatever launch delays (or construction delays for LSST) the data systems might be 3+ years old by the time there's first light.
(disclaimer -- if it wasn't obvious, I actually work with these 'big science' data systems)</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729755</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729881</id>
	<title>The Petabyte Problem</title>
	<author>ghostlibrary</author>
	<datestamp>1255432320000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>4</modscore>
	<htmltext><p>I wrote up some notes from a NASA lunch meeting on this, titled (not too originally, I admit) 'The Petabyte Problem'.  It's at<br><a href="http://www.scientificblogging.com/daytime\_astronomer/petabyte\_problem" title="scientificblogging.com">http://www.scientificblogging.com/daytime\_astronomer/petabyte\_problem</a> [scientificblogging.com].  It's not just a question of thinking on the 'Internet scale', but about massive data handling in general.</p><p>What makes it different from previous eras (where MB was big, where GB was big) is that, before, the storage was expensive, yes, but bandwidth wasn't as much of a trouble for transmitting, if even locally.  You could store MBs or GBs on tape, ship it, and extract the data rapidly-- bus and LAN speeds were high.  Now, with PB, there's so much data that even if you ship a rack of TB drives and hook it up locally, you can't run a program on it in reasonable time.  Particularly for browsing or inquiries.</p><p>So we're having to rely much more on metadata or abstractions to sort out which data we can then process further.</p></htmltext>
<tokenext>I wrote up some notes from a NASA lunch meeting on this , titled ( not too originally , I admit ) 'The Petabyte Problem' .
It 's athttp : //www.scientificblogging.com/daytime \ _astronomer/petabyte \ _problem [ scientificblogging.com ] .
It 's not just a question of thinking on the 'Internet scale ' , but about massive data handling in general.What makes it different from previous eras ( where MB was big , where GB was big ) is that , before , the storage was expensive , yes , but bandwidth was n't as much of a trouble for transmitting , if even locally .
You could store MBs or GBs on tape , ship it , and extract the data rapidly-- bus and LAN speeds were high .
Now , with PB , there 's so much data that even if you ship a rack of TB drives and hook it up locally , you ca n't run a program on it in reasonable time .
Particularly for browsing or inquiries.So we 're having to rely much more on metadata or abstractions to sort out which data we can then process further .</tokentext>
<sentencetext>I wrote up some notes from a NASA lunch meeting on this, titled (not too originally, I admit) 'The Petabyte Problem'.
It's athttp://www.scientificblogging.com/daytime\_astronomer/petabyte\_problem [scientificblogging.com].
It's not just a question of thinking on the 'Internet scale', but about massive data handling in general.What makes it different from previous eras (where MB was big, where GB was big) is that, before, the storage was expensive, yes, but bandwidth wasn't as much of a trouble for transmitting, if even locally.
You could store MBs or GBs on tape, ship it, and extract the data rapidly-- bus and LAN speeds were high.
Now, with PB, there's so much data that even if you ship a rack of TB drives and hook it up locally, you can't run a program on it in reasonable time.
Particularly for browsing or inquiries.So we're having to rely much more on metadata or abstractions to sort out which data we can then process further.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729993</id>
	<title>Re:Internet scale of petabytes of data...</title>
	<author>Anonymous</author>
	<datestamp>1255434120000</datestamp>
	<modclass>None</modclass>
	<modscore>0</modscore>
	<htmltext><p>Get a better ISP<nobr> <wbr></nobr>:)</p><p>A 3mbit/s connection averaging 80\% utilization is roughly 1GB/hour.  Downloading 1 petabyte (10^15) at this rate takes 10^6 hours.  This is 42 kilodays, or 114 years.  You can do it, if you start young enough and live long enough.</p></htmltext>
<tokenext>Get a better ISP : ) A 3mbit/s connection averaging 80 \ % utilization is roughly 1GB/hour .
Downloading 1 petabyte ( 10 ^ 15 ) at this rate takes 10 ^ 6 hours .
This is 42 kilodays , or 114 years .
You can do it , if you start young enough and live long enough .</tokentext>
<sentencetext>Get a better ISP :)A 3mbit/s connection averaging 80\% utilization is roughly 1GB/hour.
Downloading 1 petabyte (10^15) at this rate takes 10^6 hours.
This is 42 kilodays, or 114 years.
You can do it, if you start young enough and live long enough.</sentencetext>
	<parent>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729865</parent>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29740263</id>
	<title>Bandwidth isn't the only issue with Internet Scale</title>
	<author>GrpA</author>
	<datestamp>1255446960000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><p>Working with a small firewalled service provider that is reasonably large in terms of IP Allocation (Over half a million addresses) I'm constantly amazed that none of the design engineers I encounter seem to envision the number of sessions a firewall has to cope with.</p><p>It's frustrating that we keep encountering firewalls with 10 Gbps + claimed throughput that fall over at barely more than 100 Mbps due to resource exhaustion and then the vendor engineers try to tell us that's because we aren't bonding the NICs.</p><p>It seems that no matter how often I explain it to them, they just can't get their heads around the idea that our problem isn't bandwidth, it's number of sessions.</p><p>The scale of the Internet isn't just measured in X x bits per second. There are other dimensions to it as well.</p><p>GrpA</p></htmltext>
<tokenext>Working with a small firewalled service provider that is reasonably large in terms of IP Allocation ( Over half a million addresses ) I 'm constantly amazed that none of the design engineers I encounter seem to envision the number of sessions a firewall has to cope with.It 's frustrating that we keep encountering firewalls with 10 Gbps + claimed throughput that fall over at barely more than 100 Mbps due to resource exhaustion and then the vendor engineers try to tell us that 's because we are n't bonding the NICs.It seems that no matter how often I explain it to them , they just ca n't get their heads around the idea that our problem is n't bandwidth , it 's number of sessions.The scale of the Internet is n't just measured in X x bits per second .
There are other dimensions to it as well.GrpA</tokentext>
<sentencetext>Working with a small firewalled service provider that is reasonably large in terms of IP Allocation (Over half a million addresses) I'm constantly amazed that none of the design engineers I encounter seem to envision the number of sessions a firewall has to cope with.It's frustrating that we keep encountering firewalls with 10 Gbps + claimed throughput that fall over at barely more than 100 Mbps due to resource exhaustion and then the vendor engineers try to tell us that's because we aren't bonding the NICs.It seems that no matter how often I explain it to them, they just can't get their heads around the idea that our problem isn't bandwidth, it's number of sessions.The scale of the Internet isn't just measured in X x bits per second.
There are other dimensions to it as well.GrpA</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729997</id>
	<title>IBM</title>
	<author>sdiz</author>
	<datestamp>1255434180000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><blockquote><div><p>... a director at IBM's Almaden Research Center</p></div></blockquote><p>

He is just trying to sell some mainframe computer.</p></div>
	</htmltext>
<tokenext>... a director at IBM 's Almaden Research Center He is just trying to sell some mainframe computer .</tokentext>
<sentencetext>... a director at IBM's Almaden Research Center

He is just trying to sell some mainframe computer.
	</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729875</id>
	<title>Huge Misstatement</title>
	<author>Anonymous</author>
	<datestamp>1255432260000</datestamp>
	<modclass>Insightful</modclass>
	<modscore>3</modscore>
	<htmltext><i>"Science these days has basically turned into a data-management problem," says Jimmy Lin.</i>
<br> <br>
This is about the grossest misstatement of the issue that I could imagine. Science is not a data-management problem at all. But it does, and will, most certainly, depend on data management. They are two very different things, no matter how closely they must work together.</htmltext>
<tokenext>" Science these days has basically turned into a data-management problem , " says Jimmy Lin .
This is about the grossest misstatement of the issue that I could imagine .
Science is not a data-management problem at all .
But it does , and will , most certainly , depend on data management .
They are two very different things , no matter how closely they must work together .</tokentext>
<sentencetext>"Science these days has basically turned into a data-management problem," says Jimmy Lin.
This is about the grossest misstatement of the issue that I could imagine.
Science is not a data-management problem at all.
But it does, and will, most certainly, depend on data management.
They are two very different things, no matter how closely they must work together.</sentencetext>
</comment>
<comment>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730053</id>
	<title>Needle in the haystack ...</title>
	<author>foobsr</author>
	<datestamp>1255434780000</datestamp>
	<modclass>None</modclass>
	<modscore>1</modscore>
	<htmltext><i>'Science these days has basically turned into a data-management problem,'</i>
<br> <br>
The assumption here is that with 'size of data-set approaching infinity' the probability of finding a random result is approaching 1. Ph.D. students might like that.
<br> <br>
CC.</htmltext>
<tokenext>'Science these days has basically turned into a data-management problem, ' The assumption here is that with 'size of data-set approaching infinity ' the probability of finding a random result is approaching 1 .
Ph.D. students might like that .
CC .</tokentext>
<sentencetext>'Science these days has basically turned into a data-management problem,'
 
The assumption here is that with 'size of data-set approaching infinity' the probability of finding a random result is approaching 1.
Ph.D. students might like that.
CC.</sentencetext>
</comment>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_4</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29741651
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_12</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29731399
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729755
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_8</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29733987
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729881
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_10</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730307
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729777
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_13</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730181
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729777
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_1</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730247
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_5</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29734029
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729861
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_11</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29732991
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_2</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730289
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_9</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29733247
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729861
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_3</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729993
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729865
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_0</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730011
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729755
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_7</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729833
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651
</commentlist>
</thread>
<thread>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#thread_09_10_13_0114230_6</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29733741
http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729745
</commentlist>
</thread>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.10</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729861
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29734029
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29733247
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.9</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729875
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.7</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729755
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730011
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29731399
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.11</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729651
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29732991
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730289
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29741651
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729833
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730247
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.8</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729881
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29733987
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.6</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729865
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729993
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.5</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729745
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29733741
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.0</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729909
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.3</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729777
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730181
-http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730307
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.1</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729747
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.4</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29730943
</commentlist>
</conversation>
<conversation>
	<id>http://www.semanticweb.org/ontologies/ConversationInstances.owl#conversation09_10_13_0114230.2</id>
	<commentlist>http://www.semanticweb.org/ontologies/ConversationInstances.owl#comment09_10_13_0114230.29729819
</commentlist>
</conversation>
