The Communication Complexity of Fault-Tolerant Distributed Computation of Aggregate Functions
COM1 Level 2
SR8, COM1-02-08
closeAbstract:
Multi-party communication complexity involves distributed computation of a function over inputs held by multiple distributed players. A key focus of distributed computing research, since the very beginning, has been to tolerate failures. It is thus natural to ask ``{\em If we want to compute a certain function while tolerating a certain number of failures, what will the communication complexity be?}''
This thesis centers on the above question. Specifically, we consider a system of $N$ nodes which are connected by edges and then form some topology. Each node holds an input and the goal is for a special {\em root} node to learn some certain function over all inputs.
All nodes in the system except the root node may experience {\em crash} failures, with the total number of edges incidental to failed nodes being upper bounded by $f$. This thesis makes the following contributions:
1) We prove that there exists an {\em exponential gap} between the non-fault-tolerant and fault-tolerant communication complexity of {\sc Sum};
2) We prove near-optimal lower and upper bounds on the fault-tolerant communication complexity of general {\em commutative and associative aggregates} (such as {\sc Sum});
3) We introduce a new two-party problem {\sc UnionSizeCP} which comes with a novel {\em cycle promise}. Such a problem is the key enabler of our lower bounds on the fault-tolerant communication complexity of {\sc Sum}.
We further prove that this cycle promise and {\sc UnionSizeCP} likely play a fundamental role in reasoning about fault-tolerant communication complexity of many functions beyond {\sc Sum}.