This paper describes a technique for providing reliable delivery on top of a unreliabile multicast. The principal problem dealt with in the paper is avoiding the explosion of messages that would result by adapting TCP-like loss recovery to this many-receiver world. The first change necessary is switching to negative-acknowledgments. To make sure receivers are aware of which messages have been sent, periodic reports of the extent of messages (expressed as sequence numbers for each sender) are multicasted to the group by each participant. To prevent the size of each these reception reports from exploding, these reception reports only refer to a particular application-level entity being operated upon (a whiteboard "page" in the application in the paper).
From this reception information, participants can decide when to send a negative acknowledgment (which the paper calls a "repair request"). To prevent many repair requests from being sent at once (due to the failure of a link relatively "early" in the multicast tree), senders wait a random amount of time proportional to their distance from the original sender before sending the repair request. If, before this timer expires, they see another repair request for the same block, they backoff their timer exponentially (and use the same strategy to resend a transmission if the timer expires). Any host which has the data (all that received it are assumed to buffer the data), can send a retransmission, but uses a similar timer mechanism to lower the liklihood of duplicate retransmissions.
To select the random backoff values, the authors propose a heuristic scheme using feedback from the numbher of seen repair requests and retransmissions and some manually sent clamps. The authors evaluated this scheme in simulations over random multicast trees with up to 5000 nodes.
Presumably because it was divised with all participants being "first-class" — equally able to send messages, not distinguished by the strucutre of the mutlicast tree, and able to leave or join the group at will — this scheme requires nodes to track and resend a disappointing amount of state. Each node needs to keep estimates of the round-trip-time to each other node in the multicast group. To make these estimates, each node must send periodic reception reports. Without a means to summarize these reports (besides switching "pages"), the size of each report is proportional to the number of senders. (The authors do propose "local recovery" to limit the scope of retransmissions and associated messages, but this scheme does not reduce the need Though this amount state and messages are probably manageable for a low-bandwidth stream with thousands of recipients, it does not sound like scaling this scheme to larger data sizes and more recipients will be comfortable.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment