

	      Algorithms for a failure detector service
	      -----------------------------------------

The FD service consists of a coordinator and normal members. The
coordinator is an elected member of the group itself and sends out periodic
heartbeat messages to all group members (except itself). When a member does
not respond with an acknowledge message within a timeout, it will be
suspected and the FD listeners will be notified of that fact, so they can
take corrective action (e.g. removing the suspected member from their group
membership). In case the coordinator dies itself, some other member has to
take over its role. In order to avoid a sophisticated election/voting
protocol, the next member in the member list assumes the role of
coordinator. This is possible as all members have the same order on the
member list. Below the algorithms for coordinator and normal members are
outlined.

Coordinator:

- wakeup every n seconds
- if membership size is <= 1: go to sleep again
- else:
  - create a copy of the member list
  - send heartbeat to all members of the group (except itself)
  - for each ack sent by a member:
    - remove that member from the copy list
  - if copy list becomes empty or timeout for responses has been reached:
    - if there are still members left in the list:
      - send suspect message to all listeners



Normal members:

- set time t =curent time
- start timer (set to timeout)

- on receipt of a heartbeat message:
  - return response to sender


- if timer goes off:
  - if I'm the next in the list: assume the role of coordinator
  - otherwise:
    - send NewCoordinator message to all members of the group (except itself)



- reception of NewCoordinator message:
  - if I'm the next in the list: assume the role of coordinator


- become the new coordinator:
  - create coordinator object and start it

