Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
education [2019/05/24 17:13]
fablpd
education [2019/05/24 17:34]
fablpd
Line 33: Line 33:
   * **[[Distributed ML|Distributed Machine Learning]]**:​ contact [[http://​people.epfl.ch/​georgios.damaskinos|Georgios Damaskinos]] for more information.   * **[[Distributed ML|Distributed Machine Learning]]**:​ contact [[http://​people.epfl.ch/​georgios.damaskinos|Georgios Damaskinos]] for more information.
  
-  * **Robust Distributed Machine Learning**: ​The goal of such project ​is to work on the design and implementation ​of algorithms ​and systems to improve the robustness ​of distributed ML schemes that would tolerate poisoned datasoftware bugs as well as hardware failuresThe practical work will be done on TensorFlow or Pytorch. Our first prototype, AggregaThor (https://​www.sysml.cc/​doc/​2019/​54.pdf),​ describes the first scalable robust Machine Learning framework. It fixed a severe vulnerability in TensorFlow and it showed how to make TensorFlow even faster, while robust. Contact [[https://​people.epfl.ch/​arsany.guirguis|Arsany Guirguis]] or [[https://​people.epfl.ch/​sebastien.rouault|Sébastien Rouault]] for more information.+  * **Robust Distributed Machine Learning**: ​With the proliferation ​of big datasets and models, Machine Learning ​is becoming distributed. Following ​the standard parameter server model, the learning phase is taken by two categories ​of machines: parameter servers ​and workers. Any of these machines could behave arbitrarily (i.e.said Byzantine) affecting the model convergence in the learning phaseOur goal in this project is to build a system that is robust against Byzantine behavior of both parameter server and workers. Our first prototype, AggregaThor(https://​www.sysml.cc/​doc/​2019/​54.pdf),​ describes the first scalable robust Machine Learning framework. It fixed a severe vulnerability in TensorFlow and it showed how to make TensorFlow even faster, while robust. Contact [[https://​people.epfl.ch/​arsany.guirguis|Arsany Guirguis]] or [[https://​people.epfl.ch/​sebastien.rouault|Sébastien Rouault]] for more information.
  
   * **Consistency in global-scale storage systems**: We offer several projects in the context of storage systems, ranging from implementation of social applications (similar to [[http://​retwis.redis.io/​|Retwis]],​ or [[https://​github.com/​share/​sharejs|ShareJS]]) to recommender systems, static content storage services (à la [[https://​www.usenix.org/​legacy/​event/​osdi10/​tech/​full_papers/​Beaver.pdf|Facebook'​s Haystack]]),​ or experimenting with well-known cloud serving benchmarks (such as [[https://​github.com/​brianfrankcooper/​YCSB|YCSB]]);​ please contact [[http://​people.epfl.ch/​dragos-adrian.seredinschi|Adrian Seredinschi]] or [[https://​people.epfl.ch/​karolos.antoniadis|Karolos Antoniadis]] ​ for further information.   * **Consistency in global-scale storage systems**: We offer several projects in the context of storage systems, ranging from implementation of social applications (similar to [[http://​retwis.redis.io/​|Retwis]],​ or [[https://​github.com/​share/​sharejs|ShareJS]]) to recommender systems, static content storage services (à la [[https://​www.usenix.org/​legacy/​event/​osdi10/​tech/​full_papers/​Beaver.pdf|Facebook'​s Haystack]]),​ or experimenting with well-known cloud serving benchmarks (such as [[https://​github.com/​brianfrankcooper/​YCSB|YCSB]]);​ please contact [[http://​people.epfl.ch/​dragos-adrian.seredinschi|Adrian Seredinschi]] or [[https://​people.epfl.ch/​karolos.antoniadis|Karolos Antoniadis]] ​ for further information.