After all deliverables have been submitted to CMS, we will ask you to fill out an online form to officially evaluate your teammates. The purpose of this peer evaluation is to evaluate team citizenship, not technical capability.

You will rate each team member, including yourself, on the scale shown below. These ratings should reflect each individual’s level of participation and effort and their sense of team responsibility. The scale is as follows:

You will also write brief comments to justify your ratings. Your comments will not be revealed to your teammates. Only the professor will see your comments.

The ratings you give your peers and yourself will be transformed into a numeric score for each team member. That score will be used for the 20% of the final project devoted to peer evaluations. So, the ratings you submit will not be directly revealed to your team members, but some function of them will be.


Just do your best to honestly assess yourself and your teammates. The rating system we are using is robust and works well in practice. We have used it for several years in CS 3110. It has also been studied academically; see the citations at the end of the document for details.

The rest of this document describes how scores will be calculated. The details are here as reassurance, rather than because you need to know them. It’s fine to stop reading here.

Calculation of Scores

Each qualitative rating will be transformed into a quantitative rating as follows:

Suppose that a team of three people submits the following ratings:

Name Vote 1 Vote 2 Vote 3
David 87.5 100 87.5
Anne 100 100 87.5
Michael 75 75 75

Each vote was submitted by one of the team members, providing a rating of all three team members (including themself). For example, maybe David submitted vote 2, in which he gives himself and Anne ratings of 100, but gives Michael a score of 75. But, it doesn’t matter who submitted which vote for the calculations we’re about to describe. A four-person team would, of course, have an additional row and an additional column.

The individual rating for a team member is their average quantitative rating, including their own self-rating. These are the individual ratings for our example team:

Name Individual Rating
David 91.67
Anne 95.83
Michael 75

The team rating is the average of all the quantitative ratings for all team members. Our example team has nine quantitative ratings, and the average of them is 87.5, so that is the team rating.

The individual adjustment factor (henceforth, factor) is an individual’s rating divided by the team rating. The factor is capped at 1.05. The teamwork score is the factor times 20, rounded to the nearest integer. For our example team, the factors and teamwork scores are as follows:

Name Factor Score
David 1.047 21
Anne 1.050 21
Michael 0.857 17

That teamwork score is what will be used for the 20-point Peer Evaluation component of the final project grade. Note that it’s possible for some team members to get a small bonus of 1 point.

How this Worked in the Past

Similar calculations have been used for several years in 3110. In that course, it resulted in a mean factor of 1.010 and standard deviation of 0.091. So, only in rather extreme situations would anyone lose more than about 5 points from their teamwork score. We therefore recommend that, instead of trying to overanalyze or game this calculation, you simply fill out the evaluations as honestly as you can.

Some Examples

Next, we discuss some situations that might arise and how this scoring system handles them.

Everyone gives the same rating to everyone. Then everyone gets a teamwork score of 20. For example:

Name Vote 1 Vote 2 Vote 3 Individual Factor Score
David 75 75 75 75 1 20
Anne 75 75 75 75 1 20
Michael 75 75 75 75 1 20
Team: 75

Note that it doesn’t matter whether everyone used 75 or 100 or 25 for their votes. As long as everyone agrees, everyone gets the score of 20.

One person dislikes the rest of the team. Then the other team members’ scores go down, but not by much.

Name Vote 1 Vote 2 Vote 3 Individual Factor Score
David 100 100 0 66.67 0.857 17
Anne 100 100 0 66.67 0.857 17
Michael 100 100 100 100 1.05 21
Team: 77.78

Whoever submitted Vote 3 (probably Michael) has caused David and Anne’s scores to go down by 3 points. Out of their final grade in the entire course, this makes little difference.

The rest of the team dislikes one person. That person’s score goes down by about half.

Name Vote 1 Vote 2 Vote 3 Individual Factor Score
David 100 100 100 100 1.05 21
Anne 100 100 100 100 1.05 21
Michael 0 0 100 33.33 0.429 9
Team: 77.78

It looks like David and Anne don’t like Michael. He loses 11 points. This might be enough to reduce his final grade by 1/3 letter grade (e.g., A to A-), but no more than that. This is an extreme situation, because it makes Michael’s factor go down so low. In such cases, the professor will read (i) the written comments provided by the other team members to see whether they provide justification for lowering Michael’s score. If the professor thinks the other team members have been too critical, then the professor could raise Michael’s factor.

The dislike is mutual. Then the outcome doesn’t change by much.

Name Vote 1 Vote 2 Vote 3 Individual Factor Score
David 100 100 0 66.67 1.05 21
Anne 100 100 0 66.67 1.05 21
Michael 0 0 100 33.33 0.6 12
Team: 55.56

This time Michael dislikes David and Anne, too. Their score remain unchanged; his goes up by a little.

Team members fail to provide ratings. If a team member fails to vote, that person’s column will be filled automatically. A 0 will imputed to any team member who didn’t vote (including themself), and a 25 to those who did. For example, suppose that Michael failed to vote. Then his vote (#3 below) would be filled in with a 25 for Anne and David and a 0 for himself:

Name Vote 1 Vote 2 Vote 3 Individual Factor Score
David 100 100 25 75 1.038 21
Anne 100 100 25 75 1.038 21
Michael 100 100 0 66.67 0.923 18
Team: 72.22

This results in about a 10% deduction for Michael.


The core of this rating scheme is based on the one used in Cornell CS 3110. It has been examined and found to be highly useful and infrequently problematic in three academic publications: