1. How bad is the pollution problem on peer-to-peer filesharing networks ?

Researchers who examined pollution in Kazaa estimate that over 50% of the copies of recent files are polluted. In addition, spammers have recently targeted P2P filesharing systems -- for instance, every single image and movie query on the Gnutella network receives at least one false response (with the filename set to match the query, and the contents pushing for a "free iPod") from a spammer (as of 3/10/2005).

The web went through a similar phase: when search engines used to rank web pages by the frequency and placement of phrases appearing in them, spammers would create web pages with an entire dictionary just to appear in response to unrelated searches. The same is happening in P2P networks, where files are being created with long and misleading lists of keywords, and mislabeled files are being introduced on purpose to make good content more difficult to find. It was not until Google's PageRank algorithm that duplicitous labeling came to a halt on the web. Credence is similar in spirit to PageRank for P2P.

2. How does Credence help ?

Credence enables users to vote for files that they deem to have accurate descriptions and against files that contain inaccurate and misleading descriptions (spam). It computes the trustworthiness of files based on these votes to enable downloaders to select good files before downloading them.

3. How does Credence know who is trustworthy and who is a spammer ?

Initially, it doesn't. As you vote for files, it stores your votes and discovers the set of peers with whom your votes are correlated. It also communicates with peers to find out about other peers with whom they in turn are correlated. The outcome of this computation is a numerical value computed for each file appearing in query results that reflects the probability that the given file is trustworthy.

If you vote thumbs-up for good files and thumbs-down for bad files, you will be grouped with the vast majority of people who also vote honestly. You will then compute a high trustworthiness metric for all files that this (potentially very large) group of users has ever voted on. If you vote inaccurately (i.e. you are a spammer), you will compute a low trustworthiness metric for other non-spam files, and honest users will compute a low trustworthiness coefficient for your opinion. It is thus in your best interest to vote honestly.

4. I just downloaded Credence, why does it not provide any information on the trustworthiness of files ?

Credence needs time and your votes to figure out which files would be trustworthy for you. You must vote thumbs-up and thumbs-down on at least some files so the system can figure out with which other peers you are correlated.

If you want to speed this process up, you can inherit the correlations off of a previously existing peer.

By the way, bootstrapping a large-scale reputation system with a peer-to-peer implementation is a difficult problem - no one has done it yet! So we hope that you will participate in this ongoing experiment -- in the worst case, you will still have all the other features of LimeWire available to you. Our simulations indicate that it may take up to two weeks for a node to start seeing accurate trustworthiness estimates. They also indicate that there is a reward at the end; there is a phase transition where the system will suddenly be able to compute accurate trustworthiness measures for a large percentage of files.

5. Does Credence provide file recommendations ?

No. Credence is intended to be a reputation, not a recommendation, system. If you need recommendations, we suggest using Amazon - its recommendations work perfectly well, whereas p2p pollution is a separate, real and existing problem. Since Credence is not a recommendation system, your thumbs-up and thumbs-down decisions should be based on an objective evaluation of whether a file's description matches its contents, not on matters of taste. We hope that the use of such an objective evaluation function that is inherently shared by all members of the community will create an easy to identify group consisting of honest users (and potentially many other groups of spammers and malicious users who might end up trusting each other).

6. I hate the music group X. Should I vote thumbs-down for their songs ?

No. See the question above - your votes should simply reflect whether the file's description is accurate and whether its contents are intact. Voting thumbs-down for a perfectly good file may cause your node to be lumped in with spammers and reduce the effectiveness of Credence for you (i.e. you will likely see more spam in your searches).

7. Can a group of spammers game the Credence algorithm by voting thumbs-up for each others' spam ?

No. The trustworthiness computation is designed to preclude such attacks.

8. What happens when a large number of spammers vote each others' spam up ? Can they fool the reputation system ?

No. Credence's reputation computation is similar to Google's PageRank, but is more general - every node computes a different rank based on its own votes. Reputation flows from a given good node along trust edges towards other nodes. Spammers can create tight cliques in which everyone votes on each others' spam, but the entire clique will be deemed untrustworthy. And if anyone in the spammer clique does a search, they will see each others' spam ranked high.

9. How is Credence any different from previous work on reputation systems ?

There has been much work in academia on reputation systems. Credence differs from this work in three significant ways:

  • Credence derives all of its data from votes on objects and does not require the user to vote on other peers. Individuals in filesharing networks are typically anonymous, can acquire multiple identities, and can change their behavior over time, making systems based on voting on peers, as opposed to objects, non-starters in the peer-to-peer domain.
  • Credence has a distributed design. Many other successful reputation systems, such as those used in ebay, amazon and others, rely on a central host for calculating reputations, or else rely on a set of trusted entities. Credence is entirely decentralized and does not assume that there are well-known, fully trustworthy nodes.
  • Credence has actually been implemented in full. Previous work has been mostly of academic interest. We implemented Credence as an add-on to the popular LimeWire filesharing software, and kept it backwards compatible with the Gnutella network. It acts as an overlay on top of the Gnutella overlay. We are not aware of any actual large-scale deployments of peer-to-peer reputation systems in the past.

10. I have downloaded and installed Credence. What do I do now?

When Credence first starts, it will begin the process of finding other Credence users on the network, and discovering reputation information about popular files in the network. You don't have to do anything for this to happen -- it begins automatically in the background as you are searching for and downloading files on the Gnutella network using Credence/LimeWire.

The key features offered by Credence are the ability to get reliable ratings for files in the search window before you download them, and to vote on files from the library window after you download. For best results, you should vote on as many files as possible. This helps other users in the network by providing information about the files, but more importantly, it helps calibrate Credence's rating mechanism to your voting habits. So it is important that you vote honestly on your files (so you can get back honest ratings on other files), and to vote on both popular, common files, and on less popular, rare files.

11. Why do some files show ratings in the left column of the search window, but others do not? How can I get ratings for all files in the search results window?

In the background, Credence explores the network to discover information about other Credence users and their voting habits. In the process, it also records ratings information about files that it comes across. Ratings information gathered from your previous searches is also saved temporarily, along with your own voting history. This saved information is displayed automatically in the search window during new searches, so you can see the (partial) ratings information about some search results without having to do anything extra.

Other files might not yet have any information on record, however, and will show a blank in the ratings column. For these, you can click the Ratings button in the search window. This brings up a screen with more detailed information, which Credence gathers by actively querying the Gnutella network. This process may take some time, and does use up some network bandwidth, so you probably only want to do this for files you are interested in downloading.

Lastly, some files will turn up no information even if you click on the Ratings button from the search window. This is usually because no Credence users have yet voted on the file. If you decide to download the file anyway, be sure to vote on it to help expand the Credence network!

12. How do I vote for a file? The search window has only a Ratings button, but no voting button.

You can only vote on files after you download them. Once you have downloaded a file, go to the library window, select the file, and click either the thumbs up or thumbs down button.

You can also vote by right-clicking on the file in the library window and selecting Rate Up or Rate Down

13. Voting on files one at a time is tedious. Is there a better way?

Yes. If you have a lot of files that you want to vote on, you can select them all in the library window, and click thumbs up or thumbs down. This will vote on all of the selected files. You can use the normal selection techniques: use Ctrl-a to select all files; use Ctrl-click to select an additional file; use Shift-click to select a whole range of files.

The latest version of Credence also gives you a chance to vote files down when you delete them. So if your normal habit is to delete bad files, you will get a chance to vote thumbs down when you click the Delete button. When using this feature, be sure to vote down files only if they are corrupt, poor quality, or otherwise broken. Don't vote files down just because you dislike the music, picture, or whatever (see Question 6).

14. When I try to vote on a file, I get an error saying "The SHA1 hash is not available for this file", and explaining that my vote can not be recorded. What is a SHA1 hash, and what can I do to avoid this problem?

The SHA1 hash is a fingerprint of the file contents used to identify the file in Credence. When you vote on a file, your vote records the SHA1 hash of the file so that other Credence users know what file the vote was for.

LimeWire computes the SHA1 hash for all files in your Shared Folder automatically, in the background. Depending on the size and number of files you have shared, this can take some time, however, so the SHA1 hash for some files might not be computed yet. You don't have to do anything special to fix this error. Just leave LimeWire running for a while; LimeWire will eventually compute all of the hashes, allowing you to vote on the file. In the meantime, you can continue to search for files, get ratings for them, and download files, and you can still vote for other files.

15. I have a question that is not answered here, what do I do ?

You can connect with us and the rest of the Credence user community through our online forums.

Thank you for helping us out with a very interesting experiment -- the first large-scale deployment of a peer-to-peer reputation mechanism. In return, we hope that Credence will help reduce pollution from your filesharing experience.

Credence Project Page

SourceForge.net Logo

Computer Science Department
Cornell University