Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
education [2019/05/27 12:08]
fablpd
education [2019/05/27 12:09]
fablpd
Line 35: Line 35:
   * **Robust Distributed Machine Learning**: With the proliferation of big datasets and models, Machine Learning is becoming distributed. Following the standard parameter server model, the learning phase is taken by two categories of machines: parameter servers and workers. Any of these machines could behave arbitrarily (i.e., said Byzantine) affecting the model convergence in the learning phase. Our goal in this project is to build a system that is robust against Byzantine behavior of both parameter server and workers. Our first prototype, AggregaThor(https://​www.sysml.cc/​doc/​2019/​54.pdf),​ describes the first scalable robust Machine Learning framework. It fixed a severe vulnerability in TensorFlow and it showed how to make TensorFlow even faster, while robust. Contact [[https://​people.epfl.ch/​arsany.guirguis|Arsany Guirguis]] or [[https://​people.epfl.ch/​sebastien.rouault|Sébastien Rouault]] for more information.   * **Robust Distributed Machine Learning**: With the proliferation of big datasets and models, Machine Learning is becoming distributed. Following the standard parameter server model, the learning phase is taken by two categories of machines: parameter servers and workers. Any of these machines could behave arbitrarily (i.e., said Byzantine) affecting the model convergence in the learning phase. Our goal in this project is to build a system that is robust against Byzantine behavior of both parameter server and workers. Our first prototype, AggregaThor(https://​www.sysml.cc/​doc/​2019/​54.pdf),​ describes the first scalable robust Machine Learning framework. It fixed a severe vulnerability in TensorFlow and it showed how to make TensorFlow even faster, while robust. Contact [[https://​people.epfl.ch/​arsany.guirguis|Arsany Guirguis]] or [[https://​people.epfl.ch/​sebastien.rouault|Sébastien Rouault]] for more information.
  
-  * **Stochastic gradient: (artificial) reduction of the ratio variance/​norm for adversarial distributed SGD**: +  * **Stochastic gradient: (artificial) reduction of the ratio variance/​norm for adversarial distributed SGD**: One computationally-efficient and non-intrusive line of defense for adversarial distributed SGD (e.g. 1 parameter server distributing the gradient estimation to several, possibly adversarial workers) relies on the honest workers to send back gradient estimations with sufficiently low variance; assumption which is sometimes hard to satisfy in practice.
-One computationally-efficient and non-intrusive line of defense for adversarial distributed SGD (e.g. 1 parameter server distributing the gradient estimation to several, possibly adversarial workers) relies on the honest workers to send back gradient estimations with sufficiently low variance; assumption which is sometimes hard to satisfy in practice.+
 One solution could be to (drastically) increase the batch-size at the workers, but doing so may as well defeat the very purpose of distributing the computation. One solution could be to (drastically) increase the batch-size at the workers, but doing so may as well defeat the very purpose of distributing the computation.
- 
 In this project, we propose two approaches that you can choose to explore (also you may propose a different approach) to (artificially) reduce the ratio variance/​norm of the stochastic gradients, while keeping the benefits of the distribution. In this project, we propose two approaches that you can choose to explore (also you may propose a different approach) to (artificially) reduce the ratio variance/​norm of the stochastic gradients, while keeping the benefits of the distribution.
 The first proposed approach, speculative,​ boils down to "​intelligent"​ coordinate selection. The first proposed approach, speculative,​ boils down to "​intelligent"​ coordinate selection.
 The second makes use of some kind of "​momentum"​ at the workers. The second makes use of some kind of "​momentum"​ at the workers.
- 
 [1] "​Machine Learning with Adversaries:​ Byzantine Tolerant Gradient Descent"​ (https://​papers.nips.cc/​paper/​6617-machine-learning-with-adversaries-byzantine-tolerant-gradient-descent) [1] "​Machine Learning with Adversaries:​ Byzantine Tolerant Gradient Descent"​ (https://​papers.nips.cc/​paper/​6617-machine-learning-with-adversaries-byzantine-tolerant-gradient-descent)
 [2] "​Federated Learning: Strategies for Improving Communication Efficiency"​ (https://​arxiv.org/​abs/​1610.05492) [2] "​Federated Learning: Strategies for Improving Communication Efficiency"​ (https://​arxiv.org/​abs/​1610.05492)