The Problem

Peer-to-peer filesharing is one of the most significant new features of the Internet. Many networks, however, are rife with pollution, content that is not authentic. Some examples of what we consider pollution are files with:

  • damaged, corrupt, or missing contents;
  • dangerous content, such as viruses;
  • patently incorrect title, file type, or other metadata;
  • misleading metadata designed to confound user searches.

Recent studies indicate that much of this pollution is deliberate. Although currently most pollution is benign, the potential security risk is very real as polluters become more sophisticated.

How is pollution managed today?

A wide range of approaches have been used to deal with pollution. At one extreme, each user is left to fend for herself. Files must be downloaded one at a time and examined locally for authenticity. In addition to the high bandwidth cost of this approach, each user must duplicate the entire pollution detection and filtering effort of every other like minded user.

At the other extreme, the network might impose a single, or very few, choke points where incoming content can be inspected for authenticity. These choke points serve as gateways for incoming content, and could be manned by a small circle of expert human operators who examine each file before allowing it to pass into the network. Besides the disadvantages stemming from the centralized design of this approach, it is not clear that this approach could deal with deliberate, sustained, and sophisticated pollution attacks, as have been seen in some existing peer-to-peer networks. In addition, each user must put all their trust in a small group of participants whose judgment may or may not differ from their own.

Most deployed peer-to-peer networks rely on ad hoc approaches to combat the problem of pollution. The typical approach is to rank search results by the frequency each result is found in the network. Under the assumption that frequently-shared files are likely to be authentic, the user can make a judgment about which version of file to select among potentially many search results. Alternatively, a search result might be ranked by the quality of the peer offering the file, where "quality" is measured as the number of files shared, or its bandwidth, etc. Unfortunately, an attacker bent on polluting the network can easily fool such approaches. This is especially true when participants are anonymous. An attacker can easily register thousands of bogus (virtual) clients in the network, making polluted versions of files seem even more popular than the authentic versions.

What can be done?

The problem of detecting and filtering pollution can be quite difficult, and the problem will become even harder as attackers become more sophisticated and determined. Peer-to-peer networks are currently experiencing only the beginnings of what promises to be a prolonged "arms race" against polluters.

We believe that what is needed is a solution providing at least the following properties:

  • allows users to gauge file authenticity with confidence;
  • avoids centralized components during normal operation;
  • works on a per-file basis, not a per-peer or per-author basis;
  • avoids globally "pre-trusted" peers or the need for global consensus;
  • works in both structured and unstructured filesharing networks.
The Credence project addresses these requirements.

Credence Project Page

SourceForge.net Logo

Computer Science Department
Cornell University