From pjk25@cornell.edu Thu Apr 13 02:20:51 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from iago.cs.cornell.edu (iago.cs.cornell.edu [128.84.96.10]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k3D6Ko202022 for ; Thu, 13 Apr 2006 02:20:50 -0400 (EDT) Received: from authusersmtp.mail.cornell.edu ([128.253.83.141]) by iago.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 13 Apr 2006 02:20:01 -0400 Received: from [10.0.1.201] (cpe-69-207-37-155.twcny.res.rr.com [69.207.37.155]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k3D6K041015463 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 13 Apr 2006 02:20:00 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v749.3) To: egs+summary@cs.cornell.edu Message-Id: Content-Type: multipart/signed; micalg=sha1; boundary=Apple-Mail-5--901715817; protocol="application/pkcs7-signature" From: Philip Kuryloski Subject: PAPER 21 Date: Thu, 13 Apr 2006 02:20:08 -0400 X-Mailer: Apple Mail (2.749.3) X-OriginalArrivalTime: 13 Apr 2006 06:20:01.0401 (UTC) FILETIME=[4BBD6E90:01C65EC2] --Apple-Mail-5--901715817 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed DIGITAL FOUNTAIN: The Digital Fountain is designed to efficiently multicast large files from a single source to many others in a non real-time scenario such as software distribution. This is achieved encoding the file such that the encoded packets can be continuously broadcasted and any listener who receives a set of these packets equal in data length to the original source file will be able to reconstruct it. The Digital Fountain encoding is achieved through the use of Tornado codes, which can be encoded and decoded rapidly without excessive computational power at the cost of some slight inefficiency, due to the fact that this process is achieved through repeated xor operations. The perfectly efficient Reed-Solomon codes result in a complex system of linear equations that must be solved, requiring considerably more computational power. These Tornado codes are then used to "stretch" the original source by some integer factor, and the resulting coded data is sent and resent in order. The authors use a stretch factor of 2 and Tornado Z codes with an efficiency of 1.054 compared to Reed-Solomon codes. 16MB blocks can be encoded and decoded in less than 4 and 3 seconds, respectively, using Tornado Z codes, an operation which would take over 30,000 & 13,000 seconds using Reed-Solomon codes. The authors also compare their scheme to schemes which interleave erasure coding schemes across multiple blocks, finding that it is difficult to properly choose the extent to which the blocks must be interleaved for optimal system performance in the face of various packet loss rates. They find the Tornado coding to be more efficient and orders of magnitude faster to decode. The authors also suggest the use of layered multicast, where hosts with greater bandwidth where lower levels receive subsets of the data sent to higher levels. what is the right stretch factor? not for live streams software distribution TORNADO CODES: The authors suggest the use of a many-to-many multicast scenario by allowing clients to download from a number of mirror sites simultaneously. Normally this would require prohibitively expensive coordination between mirrors and clients, however the authors use Tornado coded data to eliminate the need for feedback in the protocol. NETWORK CODING: The purpose of this system is the same as that addressed by the previous two papers. The approach differs, however, in that while in the previous two solutions the source tornado codes the content, while in this case the collective of nodes downloading the file tornado code content, achieving what the authors call network coding. As it the case in the previous solution, the tornado coding, or more generally Forward Error Correction codes, eliminate the need for extensive coordination between nodes to forward the appropriate pieces of data to each other. Instead, the data is network coded and is more likely to be useful to others. Network coding provides better performance than server encoded schemes, and also provides more robustness for situations in where the server may not remain online for an extended period of time. Network coding reduces the rarity of any given packet and ensuring that some packets can be lost but the original data will still be available. In the network coding scheme, nodes always forward a linear combination of the block which they have received, rather than choosing a particular block. If a node downstream is in need of at least one of the combined blocks, the linear combination of blocks will be useful. As a result receiving nodes must solve a system of equations to decode the original source data. This allows incentive mechanisms such as BitTorrent's to work more effectively, as it allows nodes to trade data more easily. Experimentation shows that network coding is most effective when the network is heterogenous: in a truly random graph/topology, nodes can share effectively without too much risk of duplicate data. The system also handles churn gracefully. While it results in high complexity than source coded schemes, network coding provides the best efficiency in a content distribution system as it applies the benefit of FEC codes at all available instances in the system. The previous two schemes provided important strides forward with the use of more efficient FEC codes. The second demonstrates that FEC codes can reduce the amount of coordination necessary in such a system. --Apple-Mail-5--901715817 Content-Transfer-Encoding: base64 Content-Type: application/pkcs7-signature; name=smime.p7s Content-Disposition: attachment; filename=smime.p7s MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGFjCCAs8w ggI4oAMCAQICAw+L7TANBgkqhkiG9w0BAQQFADBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhh d3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVt YWlsIElzc3VpbmcgQ0EwHhcNMDUwOTI2MTc1NzM0WhcNMDYwOTI2MTc1NzM0WjBDMR8wHQYDVQQD ExZUaGF3dGUgRnJlZW1haWwgTWVtYmVyMSAwHgYJKoZIhvcNAQkBFhFwamsyNUBjb3JuZWxsLmVk dTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBALFkgyhHSufUWaYxKh+wvUSDmrM8cViE JjeRS7Ssdd5tf0ckH2iwktNuSkxSsavsmAY+8zahwJjwk/JWTVyOGW/QsjgA5zoTJeAz4ah/QcKZ hou20lN6NvlFZWA43b4/jwtpVVa2RMS1fitkEs7YA9N16akGyXCpJR2i6EVTk7tx8/zf7i7bqg4t tmbJaySQyMQ4QV1O+F00m+zms0WZN5XRDqPwU2/WZfUE5BK/pGLkkFheBGSJJssuOsct8ctup0AI fJLlLZZhBCEdNeM2x9KfQEm+Tk3Ty0zl0pOewe7oW9vgwBJ2LwTVurzQ7qXeq1VhkDmkQJOwjcxM ssGAPXMCAwEAAaMuMCwwHAYDVR0RBBUwE4ERcGprMjVAY29ybmVsbC5lZHUwDAYDVR0TAQH/BAIw ADANBgkqhkiG9w0BAQQFAAOBgQBanW/NR5+pfeGOS7lM21kLObfzzGKtHvTFZ/RS0cSgWSKaCZfx aLLhqC9EFFFxh0b4wn0zCTv4CQWhrpaPZZC7oroP70kqWypQdjFbQ2rlLrVVS8pE4gtjZnRPFMr0 BEH+1K7kWB6kTHvg2eI1EotCI92yARGzlzKrXjPonHppijCCAz8wggKooAMCAQICAQ0wDQYJKoZI hvcNAQEFBQAwgdExCzAJBgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcT CUNhcGUgVG93bjEaMBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmlj YXRpb24gU2VydmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFp bCBDQTErMCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0wMzA3 MTcwMDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUg Q29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwg SXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV+065yplaHmjAdQRw nd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfArhVqqP3FWy688Cwfn8R+R NiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUqVIUPSAR/p7bRPGEEQB5kGXJgt/sCAwEA AaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMGA1UdHwQ8MDowOKA2oDSGMmh0dHA6Ly9jcmwu dGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZyZWVtYWlsQ0EuY3JsMAsGA1UdDwQEAwIBBjApBgNV HREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxhYmVsMi0xMzgwDQYJKoZIhvcNAQEFBQADgYEA SIzRUIPqCy7MDaNmrGcPf6+svsIXoUOWlJ1/TCG4+DYfqi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6E sZkbAMUaC4J0niVQlGLH2ydxVyWN3amcOY6MIE9lX5Xa9/eH1sYITq726jTlEBpbNU1341YheILc IRk13iSx0x1G/11fZU8xggLnMIIC4wIBATBpMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3 dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1h aWwgSXNzdWluZyBDQQIDD4vtMAkGBSsOAwIaBQCgggFTMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0B BwEwHAYJKoZIhvcNAQkFMQ8XDTA2MDQxMzA2MjAwOFowIwYJKoZIhvcNAQkEMRYEFBOAmS/KtGVk u4NwOniMoiLGlt9fMHgGCSsGAQQBgjcQBDFrMGkwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRo YXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVl bWFpbCBJc3N1aW5nIENBAgMPi+0wegYLKoZIhvcNAQkQAgsxa6BpMGIxCzAJBgNVBAYTAlpBMSUw IwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVy c29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIDD4vtMA0GCSqGSIb3DQEBAQUABIIBAJhOmTCnk30O FqEDo6K2wn+YeOzwmdHljHlil2bayNiSR3joE65p2w48wJQFwCqRJHH/HQsEcLjHLA1vaG2UexVJ ksPz/TeArbE+eCeAdgC3mZfe4Ce1uONkPxzUn9BUkwcgWwJU57ctLVUfZyeb3AYKaZ0CxxSbXGPN MQacY1ohKwyJimOZ9T02uECIWufQiRX4iw9X89Y/bcTVJbm1iamrYOFLrRvUzlerMd+CHzxen1U4 /QsVPS8hOx0/B0SuKoqrZusC3UtSUL58klL3eq2cXLKSU0Mtg+RGEio8AT5YVE81dfyDCJSQ2TbJ 0IDpsNXKiDVFMXDOUBcUIv/RaJEAAAAAAAA= --Apple-Mail-5--901715817-- From niranjan.sivakumar@gmail.com Thu Apr 13 11:03:48 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,HTML_00_10, HTML_MESSAGE autolearn=no version=3.1.0 X-Spam-Level: Received: from penguin.cs.cornell.edu (penguin.cs.cornell.edu [128.84.96.11]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k3DF3l201234 for ; Thu, 13 Apr 2006 11:03:47 -0400 (EDT) Received: from xproxy.gmail.com ([66.249.82.194]) by penguin.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 13 Apr 2006 10:57:54 -0400 Received: by xproxy.gmail.com with SMTP id s19so1171296wxc for ; Thu, 13 Apr 2006 07:57:54 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=GNNTL/lJgK/SyZ76gEa3ntK5qBI2ck8tTKhH6L5zO4ZkDD/xdNn4QyVWSGfjPN7uwAQVt+PRm6DDjg4jo4fK3lYWoCvKs7TSO/4sQ8h2D90hRnGip2bGjzJHa9bnYkDANlayOm8+StTcNCq5nf0YiTLftTKfpKTtTeMeTrIGvwE= Received: by 10.70.59.18 with SMTP id h18mr721056wxa; Thu, 13 Apr 2006 07:51:20 -0700 (PDT) Received: by 10.70.125.19 with HTTP; Thu, 13 Apr 2006 07:51:20 -0700 (PDT) Message-ID: Date: Thu, 13 Apr 2006 10:51:20 -0400 From: "Niranjan Sivakumar" To: egs+summary@cs.cornell.edu Subject: PAPER 21 MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----=_Part_18623_28691456.1144939880520" X-OriginalArrivalTime: 13 Apr 2006 14:57:54.0857 (UTC) FILETIME=[A4F68990:01C65F0A] ------=_Part_18623_28691456.1144939880520 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Niranjan Sivakumar A Digital Fountain Approach to Reliable Distribution of Bulk Data Accessing Multiple Mirror Sites in Parallel: Using Tornado Codes to Speed Up Downloads Network Coding for Large Scale Content Distribution The Digital Fountain is based on the concept of a fountain or stream of dat= a from which clients can receive some number of packets that are not necessarily the actual pieces of the content asynchronously and reconstruct the content that they sought. While this is not perfectly achievable, it i= s possible to use erasure encoding to approximate this behavior to some level= . The Digital Fountain compared a few different erasure coding techniques. Reed-Solomon erasure codes require that a client obtains some set of original and redundant packets such that the total size equals that of the original data in order to reconstruct it. However, the drawback is that complex calculations are required for reconstruction. An alternate scheme, Tornado coding, allows for much simpler calculations (based on xor) at the expense of requiring a slightly larger set of data to perform the reconstruction. The paper suggests that the total stretching should be kep= t to a small multiple of the original size in order to reduce reception of redundant packets. Tornado coding was also compared to interleaved erasure coding and shown to be more efficient. Tornado codin scales better to larger files than interleaved coding. Two examples were provided for implementations of a multicasting system using tornado codes. One approach was multi-layered multicasting. In this system, data is sent to multiple multicast layers and recipients can subscribe to some subset of these layers. A recipient's subscription level is based on their network performance. Another application was to setup a number of mirror server and allow clients to download data from the servers without complex coordination. Each server transmits its encoded data and recipients can grab packets from any set of the servers that they desire until they have enough information to reconstruct what is desired. Network Coding deals with a similar issue to the digital fountain but allow= s coding to occur not only at the data source, but also at in clients in orde= r to facilitate collaborative content distribution. The system is designed similarly to that of BitTorrent, where clients can initially contact a tracker to bootstrap themselves into the network. Network Coding occurs by having nodes send linear combinations of blocks that they currently store. An original file can be reconstructed when a node receives a number of packets equal to the size of the original file that have associated coefficient vectors that are linearly independent to each other. There is = a possibility of getting redundant blocks in this system, but it is quite small. The system also provides some incentive mechanisms to prevent leeching. This system provides priority exchanges over free uploads and a tit-for-tat mechanism like in BitTorrent. While both of these systems are quite interesting and seem to be workable a= t some level, it seems that there hasn't been a strong showing of the need fo= r such coding mechanisms. The kind of loss described in the network models does not appear to be a huge issue in current network implementations. Network Coding provides a system that appears to be more complex than the fountain approach. The structure is similar to that of BitTorrent, but there doesn't seem to be a very persuasive reason as to why this extra complexity should be taken on instead of just using BitTorrent. ------=_Part_18623_28691456.1144939880520 Content-Type: text/html; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Niranjan Sivakumar

A Digital Fountain Approach to Reliable Distribut= ion of Bulk Data
Accessing Multiple Mirror Sites in Parallel:  Usin= g Tornado Codes to Speed Up Downloads
Network Coding for Large Scale Con= tent Distribution

The Digital Fountain is based on the concept of a fountain or strea= m of data from which clients can receive some number of packets that are no= t necessarily the actual pieces of the content asynchronously and reconstru= ct the content that they sought.  While this is not perfectly achievab= le, it is possible to use erasure encoding to approximate this behavior to = some level.

The Digital Fountain compared a few different erasure coding techni= ques.  Reed-Solomon erasure codes require that a client obtains some s= et of original and redundant packets such that the total size equals that o= f the original data in order to reconstruct it.  However, the drawback= is that complex calculations are required for reconstruction.  An alt= ernate scheme, Tornado coding, allows for much simpler calculations (based = on xor) at the expense of requiring a slightly larger set of data to perfor= m the reconstruction.  The paper suggests that the total stretching sh= ould be kept to a small multiple of the original size in order to reduce re= ception of redundant packets.  Tornado coding was also compared to int= erleaved erasure coding and shown to be more efficient.  Tornado codin= scales better to larger files than interleaved coding.

Two examples were provided for implementations of a multicasting sy= stem using tornado codes.  One approach was multi-layered multicasting= .  In this system, data is sent to multiple multicast layers and recip= ients can subscribe to some subset of these layers.  A recipient's sub= scription level is based on their network performance.  Another applic= ation was to setup a number of mirror server and allow clients to download = data from the servers without complex coordination.  Each server trans= mits its encoded data and recipients can grab packets from any set of the s= ervers that they desire until they have enough information to reconstruct w= hat is desired.

Network Coding deals with a similar issue to the digital fountain b= ut allows coding to occur not only at the data source, but also at in clien= ts in order to facilitate collaborative content distribution.  The sys= tem is designed similarly to that of BitTorrent, where clients can initiall= y contact a tracker to bootstrap themselves into the network.  Network= Coding occurs by having nodes send linear combinations of blocks that they= currently store.  An original file can be reconstructed when a node r= eceives a number of packets equal to the size of the original file that hav= e associated coefficient vectors that are linearly independent to each othe= r.  There is a possibility of getting redundant blocks in this system,= but it is quite small.  The system also provides some incentive mecha= nisms to prevent leeching.  This system provides priority exchanges ov= er free uploads and a tit-for-tat mechanism like in BitTorrent.

While both of these systems are quite interesting and seem to be wo= rkable at some level, it seems that there hasn't been a strong showing of t= he need for such coding mechanisms.  The kind of loss described in the= network models does not appear to be a huge issue in current network imple= mentations.  Network Coding provides a system that appears to be more = complex than the fountain approach.  The structure is similar to that = of BitTorrent, but there doesn't seem to be a very persuasive reason as to = why this extra complexity should be taken on instead of just using BitTorre= nt. ------=_Part_18623_28691456.1144939880520-- From nsg7@cornell.edu Thu Apr 13 11:43:43 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from iago.cs.cornell.edu (iago.cs.cornell.edu [128.84.96.10]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k3DFhg210586 for ; Thu, 13 Apr 2006 11:43:43 -0400 (EDT) Received: from postoffice10.mail.cornell.edu ([132.236.56.14]) by iago.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 13 Apr 2006 11:41:46 -0400 Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k3DFfiOu011782 for ; Thu, 13 Apr 2006 11:41:45 -0400 (EDT) Received: from 132.236.227.119 by webmail.cornell.edu with HTTP; Thu, 13 Apr 2006 11:41:45 -0400 (EDT) Message-ID: <1360.132.236.227.119.1144942905.squirrel@webmail.cornell.edu> Date: Thu, 13 Apr 2006 11:41:45 -0400 (EDT) Subject: PAPER 21 From: "Nicholas S Gerner" To: egs+summary@cs.cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-OriginalArrivalTime: 13 Apr 2006 15:41:46.0594 (UTC) FILETIME=[C599BC20:01C65F10] Tornado codes are a family of data encodings which are efficient to encode and decode and allow receivers to reconstruct source data when any (1+epsilon)k packets are received (where there are k original source packets). The advantage of Tornado codes are that they provide an efficient approximation to an ideal digital fountain where any k packets from an infinite number of distinct encoding packets can be used to retrieve the original k packets of source data. Specifically, Tornado Z encoding is compared to Solomon-Reed encoding which takes quadratic time in the length of the source data to encode or decode. While Tornado encoding introduces some decoding inefficiency ((1+epsilon)k packets are needed to reconstruct source data), the run-time of encoding and decoding of Tornado make it much more feasible for large file distribution schemes. When coupled with multicast, a digital fountain provides a channel to which clients can tune-in and with only slightly more packets than the unencoded source would have required can download the source file, regardless of which specific packets are retrieved. This model introduces packet distinctiveness problems in real-world implementations (where there aren't an infinite number of distinct encoding packets and useless duplicates may be received). Tornado is shown to outperform alternate digital fountain approximations in these settings also either when holding decoding inefficiency constant across both system and measuring decoding time and when holding decoding time constant and measuring decoding inefficiency. Tornado is also shown applied to a many-to-many file distribution system (parallel downloading from mirrors). In such a system mirrors all provide a multicast channel broadcasting Tornado encoded packets. Peers receive these packets from several mirrors. While not scaling linearly (optimally) with the number of parallel mirrors (due to packet distinctiveness problems and Tornado decoding inefficiency), the speedup is still very significant. Tornado proposes a tradeoff between de/encoding cost and decoding efficiency (number of packets required to retrieve the original k source packets). If such a trade-off is an inherrent problem it seems that some model could theoretically show the trade-off. Since the contribution of Tornado is an encoding which makes a good compromise in this trade-off it's critical that such a trade-off exist. "Network Coding for Large Scale Content Distribution" presents an evaluation of network coding across a BitTorrent like network in which every peer re-encodes packets when uploading to peers. In this way almost all packets are unique and of use to each peer. Peers can decode any k linearly independent packets to retrieve the original k source packets. This scheme is compared to using no-encoding or a fixed source encoding of k+l packets at the server and is shown to outperform both in terms of finish time. This is because the peers are able to retrieve k distinct packets sooner using network coding than with source coding or no encoding at all. Furthermore, this scheme does not require global knowledge or rarest-first policies since the network coding scheme creates packets which are likely to be unique at each peer automatically, without the need for global coordination. Network coding, however, does not examine the cost of encoding or decoding exposed by Tornado. The network coding scheme involves solving a system of linear equations which Tornado shows to be prohibitively expensive for large file sizes. From sh366@cornell.edu Thu Apr 13 12:22:44 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from penguin.cs.cornell.edu (penguin.cs.cornell.edu [128.84.96.11]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k3DGMh222109 for ; Thu, 13 Apr 2006 12:22:43 -0400 (EDT) Received: from postoffice10.mail.cornell.edu ([132.236.56.14]) by penguin.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 13 Apr 2006 12:20:15 -0400 Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k3DGKDES006561 for ; Thu, 13 Apr 2006 12:20:14 -0400 (EDT) Message-ID: <974740351.1144945213559.JavaMail.webber@orpheus3.dataserver.cornell.edu> Date: Thu, 13 Apr 2006 12:20:13 -0400 (EDT) From: Huang Shiang-Jia To: egs+summary@cs.cornell.edu Subject: PAPER 21 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: uPortal WEB email client 3.0 X-OriginalArrivalTime: 13 Apr 2006 16:20:15.0278 (UTC) FILETIME=[25AEF4E0:01C65F16] * Avalanche is a large scale content distribution scheme based on network coding whose goal is to improve the scalability and efficiency of existing systems. * It works in a way similar to BitTorrent, but for each node, the decision to propagate blocks is made based on only local information. It is thus more scalable. * It does so by transmitting random linear combinations of the blocks along with the coefficients of this combination, rather than each peer simply transmitting blocks. * The randomization of the coding process makes the scheduling of block propagation much easier. The content distribution is therefore more efficient. * The experiments show that network coding perform well when nodes have heterogeneous access capacities, when node arrivals and departures are unsynchronized, when there are bottlenecks in the overlay topology and when there are mechanisms to discourage free-riders. [Issues] The encoding/decoding processes cause overhead and delay. Besides, malicious nodes may introduce arbitrary blocks to the system to make the reconstruction of files difficult. * This paper presents a scheme for many-to-many content distributions (multiple clients to access data from multiple mirror sites simultaneously). In this way, downloads speed up and the loads of servers are more balanced. * To avoid the complex client-server negotiation, it proposes a feedback-free protocol based on Tornado codes. * Tornado codes are a class of erasure codes that support error-correcting and are fast in encoding/decoding processes. Tornado codes are fixed rate and use sparse bipartite graphs to trade encoding and decoding speed for reception overhead. * This paper shows that Tornado codes provide a closer approximation to a multicast protocol than previous Reed-Soloman erasure codes. * In its protocol, packets are routed diversely from endpoints to endpoints. The source delivers the packets generated by digital fountain along multiple paths. The destination can reconstruct the data provided a sufficient number of packets. From victoria@cs.hmc.edu Thu Apr 13 12:22:43 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from penguin.cs.cornell.edu (penguin.cs.cornell.edu [128.84.96.11]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k3DGMh222107 for ; Thu, 13 Apr 2006 12:22:43 -0400 (EDT) Received: from turing.cs.hmc.edu ([134.173.42.99]) by penguin.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 13 Apr 2006 12:19:39 -0400 Received: by turing.cs.hmc.edu (Postfix, from userid 34382) id 0D59353230; Thu, 13 Apr 2006 09:19:38 -0700 (PDT) Date: Thu, 13 Apr 2006 09:19:37 -0700 From: Victoria Krafft To: egs+summary@cs.cornell.edu Subject: PAPER 21 Message-ID: <20060413161937.GA21323@cs.hmc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-OriginalArrivalTime: 13 Apr 2006 16:19:39.0309 (UTC) FILETIME=[103E85D0:01C65F16] All of these papers attempt to improve the speed and reliability of downloading large files from the Internet. They focus on various content encoding schemes, which allow them to increase the speed which downloads occur at, while still getting the correct data. In the digital fountain approach, the source is constantly sending out the data. Any client which wishes to obtain the data need only get enough packets from that stream, and it will be able to reconstruct the data. Ideally, the client would only need enough packets to equal the size of the data being sent. A fairly efficient approximation to this fountain can be built using erasure codes. The Reed-Solomon erasure codes are too expensive to encode and decode, so the authors use Tornado codes, which require more packets to reassemble the data, but can be encoded and decoded quickly. This approach can also be used when multiple servers distribute the file, allowing for many-to-many content distribution. Another approach to this problem is network coding. With digital fountains, the data is only encoded by the server. Network encoding allows nodes between the server and the client to encode the information they have received, increasing the information available to the client. The intermediary nodes send out a linear combination of existing blocks of data, which will reduce the problem of rare blocks, and increase the potential amount of information available to the client nodes. Effectively, nodes which have already received some information become content distributors. This makes the network much more resilient, and improves its behavior when faced with flash crowds or network partitions. Their experimental results show that this is much more efficient than only encoding at the server, or not encoding at all. -- Victoria Krafft From kelvinso@gmail.com Thu Apr 13 14:01:11 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from penguin.cs.cornell.edu (penguin.cs.cornell.edu [128.84.96.11]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k3DI1A215244 for ; Thu, 13 Apr 2006 14:01:10 -0400 (EDT) Received: from wproxy.gmail.com ([64.233.184.235]) by penguin.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 13 Apr 2006 14:00:43 -0400 Received: by wproxy.gmail.com with SMTP id i3so166004wra for ; Thu, 13 Apr 2006 11:00:42 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=a5VtNtHZo1kQnn75OKqodXRqoOEjG1TeV5tKPoWHKpPb2lqbO8vVXBs5Y+O//gRUvp0oRACWCIhVd5FmfLi8bVxCA7HMw3fU+uh4kIk5lbmvVLliWBc2M3QUV2r3kI71g/6L0WCCGpwr2dPGyR4PXUxAZd4UgctzdWv4Of7fKUQ= Received: by 10.54.103.7 with SMTP id a7mr546213wrc; Thu, 13 Apr 2006 09:08:10 -0700 (PDT) Received: by 10.54.79.14 with HTTP; Thu, 13 Apr 2006 09:08:10 -0700 (PDT) Message-ID: <6e1ca4560604130908r72216c98n3d7e2ffc54f14701@mail.gmail.com> Date: Thu, 13 Apr 2006 12:08:10 -0400 From: "Chiu Wah Kelvin So" To: egs+summary@cs.cornell.edu Subject: Paper 21 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline X-OriginalArrivalTime: 13 Apr 2006 18:00:43.0174 (UTC) FILETIME=[2E96E460:01C65F24] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k3DI1A215244 Both the papers, "A Digital Fountain Approach to Reliable Distribution of Bulk Data" and "Accessing Multiple Mirror Sites in Parallel: Using Tornado Codes to Speed up Downloads," uses idea of tornado codes to speed up the data dissemination. Digital Fountain is an ideal solution where a server injects a stream of distinct encoding packets into the network, and clients can reconstruct the source data using any k packets. Reed-Solomon uses technique of linear combination of k variables to form n packets. Clients only need to receive k of the n packets to reconstruct the source data. However, one of the drawbacks of Reed-Solomon Code is that it takes long time to encode and decode. Therefore, it limits the number of pieces Reed-Solomon Code can construct. Second technique that builds on top of Reed-Solomon Code is to use interleaving. Interleaved codes are constructed by breaking the file into smaller blocks, and each smaller block is coded using Reed-Solomon Code. This approach adds decoding inefficiency to it because now each block has to receive k pieces and all the pieces after k pieces don't contribute any to the source data. Tornado code has a magnitude order of lower computation overhead and is a better approximation of a digital fountain. Tornado code breaks up a source data into n pieces and clients can resemble the packet using any (1 + e)k pieces with a minor decoding inefficiency, e is usually around 0.05. And the encoding and decoding is a lot faster than the previous approaches. In the simulations, it shows that it is an order of magnitude faster than the standard erasure codes. Finally, it describes how to use the technique to build reliable data dissemination and parallel downloads from multiple mirror servers. In parallel downloads, it shows how to tradeoff between the number of servers and stretch factor for reasonable duplicate packets. The third paper, "Network Coding for Large Scale Content Distribution," presents how to use network coding to improve content distribution under BitTorrent similar environment. One of the drawbacks of current BitTorrent system is that it doesn't use any kind of coding. Therefore, if some nodes disappear with the pieces of data that no one else has, none of the nodes in BitTorrent can reconstruct the source data. With network coding, each node will re-encode the data using all the pieces that they currently own. Using re-encoding in each of the node, the diversity of the encoded data will increase and clients only need k distinct encoded data, where they are linear independent of each other, to reconstruct the source data. This paper shows that it improves by more than 20-30% with network coding compared to coding at the server only, and more than a few factor improvements over unencoded information. However, in this paper, it ignores the computation overhead of encoding and decoding of data over and over. It may be a large overhead for large-size data because the network coding technique that they use is similar to Reed-Solomon code which requires heavy computation overhead. From gp72@cornell.edu Sat May 13 17:57:56 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe1.cs.cornell.edu (exchfe1.cs.cornell.edu [128.84.97.27]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k4DLvt216050 for ; Sat, 13 May 2006 17:57:55 -0400 (EDT) Received: from penguin.cs.cornell.edu ([128.84.96.11]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Sat, 13 May 2006 17:56:59 -0400 Received: from postoffice10.mail.cornell.edu ([132.236.56.14]) by penguin.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Sat, 13 May 2006 17:56:54 -0400 Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k4DLuret019722; Sat, 13 May 2006 17:56:53 -0400 (EDT) Received: from 128.84.98.245 by webmail.cornell.edu with HTTP; Sat, 13 May 2006 17:56:54 -0400 (EDT) Message-ID: <1923.128.84.98.245.1147557414.squirrel@webmail.cornell.edu> In-Reply-To: <1255.128.84.98.245.1147497043.squirrel@webmail.cornell.edu> References: <2537.128.84.98.245.1147406914.squirrel@webmail.cornell.edu> <3469.128.84.98.245.1147411785.squirrel@webmail.cornell.edu> <4952.128.84.98.245.1147484280.squirrel@webmail.cornell.edu> <1157.128.84.98.245.1147492519.squirrel@webmail.cornell.edu> <1255.128.84.98.245.1147497043.squirrel@webmail.cornell.edu> Date: Sat, 13 May 2006 17:56:54 -0400 (EDT) Subject: PAPER 21 From: "Gopal Parameswaran" To: egs+summary@cs.cornell.edu Cc: gp72@cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-OriginalArrivalTime: 13 May 2006 21:56:54.0836 (UTC) FILETIME=[25F36740:01C676D8] Tornado Codes and Digital Fountain In this paper the authors provides an alternate solution to existing solutions to the problem of feedback implosion when the packet loss in the network is high. Most solutions are based on transmitting redundant packets of data and each receiver listens till it gets all the packets of data. The solutions suggested uses erasure codes that are faster since erasure codes help in transforming a message into blocks of data where a subset of blocks can be used to recover the compete data. In Digital fountain the server transmits k packets of encoded data that each client receives and once the client has received exactly k packets of the data it would be sufficient for decoding the message and hence the client can stop listening to the data stream. Thus of the n encoding packets only k packets are required where n/k is called the stretch factor of the erasure code used. The repeated broadcast of the packets with each cycle ensures that the client would be able to recover the data in a lossy environment. However as the author points out this method has a serious drawback in a high loss environment in that it could be possible that certain receivers keep on getting packets that it already has and is waiting for the last packets from the last few blocks it still needs to reconstruct. Thus the client may not receive all the k packets need for message reconstruction but under experimentation it was estimated that if n=2k then even for high loss rates the efficiency of the system was acceptable. However the computational intensity of the encoding and decoding operations was a limitations and the authors uses the Tornado codes that are a faster method of encoding and decoding which are better than the Reed-Solomon codes which uses a set of linear equations whereas in Tornado codes the average number of variables per equation is small and thus reduces the computation significantly. The tornado codes are based on exclusively OR operations and involves first solving the variables in the equations for the packets that they have received and using a substitution rule to gather the values of missing variables from the next packet as they arrive. The decoding in this operations stops once enough packets are received to reconstruct the message. Network Coding for Large Scale Content Distribution The authors suggests an alternate method of coding using network coding where the generation of the blocks are randomized and ensures no order required for transmission of the packets and improves the performance where the nodes in the network needs to make block forwarding based on the local information available at that node. The Co-operative distribution solution suggested by the authors uses the concept of local encoding at the nodes in the interiors of the network and is called Network coding and is similar to that of bittorrent in operation but proposes a more efficient solution. When nodes request for a particular file on the server the server distributes the random blocks to the different nodes and also passes information of a random set of the distributed nodes to each of the nodes and the nodes then collectively download the blocks that they do not have from each other. Thus the users knows only a subset of the nodes as its neighbors and when neighbors leave the system then the node can make request for additional neighbors or when a node needs to improve its parallel download. Thus unlike in Digital Fountain the load is taken off the server and distributed to the clients who co-operate among themselves. In network coding random coefficients are picked that are then used to multiply each of the blocks and the results of the multiplication added together. The results of the multiplication and the coefficients are transmitted. The file can be recovered after receiving k blocks with associated coefficients that are linearly independent of reach other.