9.3 Failure-tolerant BoT
Bag of Tasks revisited
-
pvm_notify()
can be used to build a failure-tolerant BoT
- If a worker fails, the algorithm would normally block
waiting for a response
- The notify message would be received instead of the
response, allowing the master to re-schedule
- Can use
HostAdd
to respond to new resources
- The state diagrams are modified like so: