Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
education [2020/11/24 12:01] fablpd |
education [2020/11/24 12:02] fablpd |
||
---|---|---|---|
Line 42: | Line 42: | ||
* **Robust Distributed Machine Learning**: With the proliferation of big datasets and models, Machine Learning is becoming distributed. Following the standard parameter server model, the learning phase is taken by two categories of machines: parameter servers and workers. Any of these machines could behave arbitrarily (i.e., said Byzantine) affecting the model convergence in the learning phase. Our goal in this project is to build a system that is robust against Byzantine behavior of both parameter server and workers. Our first prototype, AggregaThor(https://mlsys.org/Conferences/2019/doc/2019/54.pdf), describes the first scalable robust Machine Learning framework. It fixed a severe vulnerability in TensorFlow and it showed how to make TensorFlow even faster, while robust. Contact [[https://people.epfl.ch/arsany.guirguis|Arsany Guirguis]] for more information. | * **Robust Distributed Machine Learning**: With the proliferation of big datasets and models, Machine Learning is becoming distributed. Following the standard parameter server model, the learning phase is taken by two categories of machines: parameter servers and workers. Any of these machines could behave arbitrarily (i.e., said Byzantine) affecting the model convergence in the learning phase. Our goal in this project is to build a system that is robust against Byzantine behavior of both parameter server and workers. Our first prototype, AggregaThor(https://mlsys.org/Conferences/2019/doc/2019/54.pdf), describes the first scalable robust Machine Learning framework. It fixed a severe vulnerability in TensorFlow and it showed how to make TensorFlow even faster, while robust. Contact [[https://people.epfl.ch/arsany.guirguis|Arsany Guirguis]] for more information. | ||
+ | |||
* **Consistency in global-scale storage systems**: We offer several projects in the context of storage systems, ranging from implementation of social applications (similar to [[http://retwis.redis.io/|Retwis]], or [[https://github.com/share/sharejs|ShareJS]]) to recommender systems, static content storage services (à la [[https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf|Facebook's Haystack]]), or experimenting with well-known cloud serving benchmarks (such as [[https://github.com/brianfrankcooper/YCSB|YCSB]]); please contact [[http://people.epfl.ch/dragos-adrian.seredinschi|Adi Seredinschi]] or [[https://people.epfl.ch/karolos.antoniadis|Karolos Antoniadis]] for further information. | * **Consistency in global-scale storage systems**: We offer several projects in the context of storage systems, ranging from implementation of social applications (similar to [[http://retwis.redis.io/|Retwis]], or [[https://github.com/share/sharejs|ShareJS]]) to recommender systems, static content storage services (à la [[https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf|Facebook's Haystack]]), or experimenting with well-known cloud serving benchmarks (such as [[https://github.com/brianfrankcooper/YCSB|YCSB]]); please contact [[http://people.epfl.ch/dragos-adrian.seredinschi|Adi Seredinschi]] or [[https://people.epfl.ch/karolos.antoniadis|Karolos Antoniadis]] for further information. |