Software projects developed at DCL

DCL has a github page where most new software projects are published:


Designed for the DSN '21 paper: “Garfield: System Support for Byzantine Machine Learning.”

Garfield is a library that transparently makes machine learning (ML) applications, initially built with popular (but fragile) frameworks, e.g., TensorFlow and PyTorch, Byzantine-resilient. Garfield relies on a novel object-oriented design, reducing the coding effort, and addressing the vulnerability of the shared-graph architecture followed by classical ML frameworks. It encompasses various communication patterns and supports computations on CPUs and GPUs, allowing addressing the general question of the practical cost of Byzantine resilience in ML applications. Garfield has been thoroughly experimented with three main ML architectures: (1) a single server with multiple workers, (2) several servers and workers, and (3) peer–to–peer settings. Using Garfield, we highlight interesting facts about the cost of Byzantine resilience. In particular, (1) Byzantine resilience, unlike crash resilience, induces an accuracy loss, (2) the throughput overhead comes more from communication than from robust aggregation, and (3) tolerating Byzantine servers costs more than tolerating Byzantine workers. The source code of Garfield was evaluated by experts from C4DT@EPFL, and the open-source version was also used in other projects.



Designed for the Middleware '20 paper: “FeGAN: Scaling Distributed GANs.”

FeGAN is a system to train generative adversarial networks (GANs) in the federated learning setup. FeGAN has a scalable design while being also robust to non-iidness of data (i.e., tolerates skewed distribution of data on devices). FeGAN makes three important design choices to achieve its goals: (1) co-locating the discriminator and the generator networks on all devices, (2) using balanced sampling, and (3) using KL-weighting. The first decision promotes the scalability of FeGAN and reduces the probability of falling into the vanishing gradients problem. Balanced sampling enables FeGAN to not fall into the mode collapse problem while KL-weighting is designed to resist learning divergence. Unlike existing distributed GANs approaches, FeGAN scales to hundreds of devices. Moreover, FeGAN achieves 5x throughput gain while using 1.5x less bandwidth compared to its state-of-the-art competitor, namely MD-GAN. It also boosts training by 2.6x compared to the celebrated Federated Averaging (FedAvg) algorithm. The source code of FeGAN was evaluated by experts and was given ACM accreditations for being functional and reusable.



Designed for the MLSYS '19 paper: “AggregaThor: Byzantine Machine Learning via Robust Gradient Aggregation.”

AggregaThor is the first scalable Byzantine resilient framework for distributed machine learning applications. AggregaThor is built on top of TensorFlow while achieving transparency: applications built with TensorFlow do not need to change their interfaces to be made Byzantine-resilient. AggregaThor uses the parameter server architecture, and it adds (to vanilla TensorFlow) two main layers: (1) the aggregation layer and (2) the communication layer. The former uses a statistically-robust gradient aggregation rule, called Multi-Krum, to robustly aggregate workers' gradients, ensuring convergence of training even in the existence of malicious workers. The communication layer enables users to experiment with unreliable transport layer (i.e., using UDP), which achieves better performance than vanilla TensorFlow in highly-saturated networks. The source code of AggregaThor was evaluated by experts and was given ACM accreditations for being functional and reusable.



Designed for the PODC '18 paper: “Locking Timestamps Versus Locking Objects.”

The purpose of this library is to showcase the potential practical benefits of Multiversion Timestamp Locking (MVTL), a new family of concurrency control algorithms that operate at the granularity of individual points of logical time, instead of considering entire objects or versions of objects. Our library provides implementations of a key-value store supporting distributed transactions. We provide multiple concurrency control options: (1) MVTIL (multiversion timestamp interval locking), a protocol based on the MVTL, (2) MVTO (multiversion timestamp ordering), and (3) 2PL (two-phase locking). The library also contain scripts that allow the deployment and testing of the various implementations on EC2.



Designed for the ATC '18 paper: “Log-Free Concurrent Data Structures.”

libNVRAM is a suite of libraries aimed at durable and concurrent data structures, in particular in the context of upcoming byte-addressable non-volatile memory technologies. The suite contains nv-structs - a set of lock-free data structures designed for non-volatile RAM, the link cache - a volatile buffer designed to improve data structure performance, nv-epochs - a durable memory manager, as well as nv-jemalloc - a version of jemalloc enhanced to simulate NVRAM latencies and write-backs. Additionally, libNVRAM also includes nv-memcached, a durable implementation of Memcached.

Details & Code


Designed for the EuroSys '17 paper: “FloDB: Unlocking Memory in Persistent Key-Value Stores.”

FloDB is a LSM memory component architecture which allows throughput to scale on modern multicore machines with ample memory sizes. The main idea underlying FloDB is essentially to bootstrap the traditional LSM architecture by adding a small in-memory buffer layer on top of the memory component. This buffer offers low-latency operations, masking the write latency of the sorted memory component. Integrating this buffer in the classic LSM memory component to obtain FloDB is not trivial and requires revisiting the algorithms of the user-facing LSM operations (search, update, scan). FloDB's two layers can be implemented with state-of-the-art, highly-concurrent data structures. This way, as we show in the paper, FloDB eliminates significant synchronization bottlenecks in classic LSM designs, while offering a rich LSM API.

Details & Code


Designed for the EuroSys '17 paper: “Abstracting Multi-Core Topologies with MCTOP.”

MCTOP is an abstraction of multi-core topologies augmented with important low-level hardware information, such as memory bandwidths and communication latencies. We automatically generate MCTOP using libmctop, our library that leverages the determinism of cache-coherence protocols to infer the topology of multi-cores using only latency measurements. MCTOP enables developers to accurately define performance policies, expressing high-level semantics that utilize the low-level performance details of multi-cores. This way, MCTOP enables the design of easy, portable, and efficient optimizations.

Details & Code


Designed for the Middleware '16 paper: “Locking Made Easy.”

GLS is a middleware that makes lock-based programming simple and effective. GLS offers the classic lock-unlock interface of locks. However, in contrast to classic lock libraries, GLS does not require any effort from the programmer for allocating and initializing locks, nor for selecting the appropriate locking strategy. With GLS, all these intricacies of locking are hidden from the programmer. GLS is based on GLK, a generic lock algorithm that dynamically adapts to the contention level on the lock object. GLK is able to deliver the best performance among simple spinlocks, scalable queue-based locks, and blocking locks. Furthermore, GLS offers several debugging options for easily detecting various lock-related issues, such as deadlocks.

Details & Code


Designed for the ATC '16 paper: “Unlocking Energy.”

LOCKIN is a new locking library that includes more than 10 state-of-the-art lock algorithms. For simplicity, LOCKIN offers these algorithms in C header files, so that selecting the algorithm to use can be as simple as setting one configuration flag. LOCKIN can be also set to overload the default pthread mutex lock interface, in order to allow for easily modifying locks in pthread-based systems. Most importantly, LOCKIN includes MUTEXEE, our optimized version of pthread mutex algorithm. MUTEXEE delivers significant performance and energy efficiency benefits over mutex.

Details & Code


Designed for the PPoPP '16 paper: “Optimistic concurrency with OPTIK.”

OPTIK is a new practical design pattern for designing and implementing fast and scalable concurrent data structures. OPTIK relies on the commonly-used technique of version numbers for detecting conflicting concurrent operations. We implement the OPTIK pattern using the novel concept of OPTIK locks. We publish the code for OPTIK locks and several data structure implementations using OPTIK in ASCYLIB (see below). Using OPTIK, we have designed one array-map, two linked lists, three hash tables that use the array-map and the linked lists, a skip list, and a binary search tree.

Details & Code


Designed for the PPoPP '16 paper: “ESTIMA: Extrapolating ScalabiliTy of In-Memory Applications.”

ESTIMA is an easy-to-use tool for extrapolating the scalability of in-memory applications. It is designed to perform a simple, yet important task: given an application on a small machine with a handful of cores, it extrapolates its performance to a larger machine with more cores. ESTIMA uses stalled cycles, both from hardware and software, reported by the application and runtime libraries. It automates the measurement and extrapolation process, requiring minimum input from the user.

Details & Code


Designed for the ASPLOS '15 paper: “Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures.”

ASCYLIB is a concurrent-search data-structure (CSDS) library. It contains over 30 implementations of linked lists, hash tables, skip lists, and binary search trees (BST). ASCYLIB contains sequential, lock-based, and lock-free implementations for each data structure. ASCYLIB works on x86, SPARC, and Tilera architectures and contains tests to evaluate the throughput, latency, latency distribution, and energy efficiency of the included data structures.

Details & Code


Designed for the Middleware '14 paper: “Consensus Inside.”

Scaling to a large number of cores with non-uniform communication latency and unpredictable response time may call for viewing a modern many-core architecture as a distributed system. In this view, the cores replicate shared data and ensure consistency among replicas through a message-passing based agreement protocol. We perform an in-depth study of message-passing agreement on many-cores, with a particular focus on the possibility of such a protocol being non-blocking. We identify a number of optimizations that are specific to the many-core environment and introduce 1Paxos, a new non-blocking agreement protocol that takes up the challenges of this environment.



Designed for the SOSP '13 paper: “Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask.”

SSYNC is a cross-platform synchronization suite; it works on x86_64, SPARC, and Tilera processors. SSYNC contains libslock, a library that abstracts lock algorithms behind a common interface and libssmp, a library with fine-tuned implementations of message passing for each of the supported platforms. SSYNC also includes microbenchmarks for measuring the latencies of the cache coherence, the locks, and the message passing, as well as ssht, i.e., a cache efficient hash table.

Details & Code


Designed for the EuroSys '12 paper: “TM2C: a Software Transactional Memory for Many-Cores.”

TM2C, is the first software transactional memory protocol for many-core systems. TM2C exploits network-on-chip communications to get granted accesses to shared data through efficient message passing. In particular, it allows visible read accesses and hence effective distributed contention management with eager conflict detection. TM2C comes with FairCM, a companion contention manager that ensures starvation-freedom, i.e., the eventual termination of every transactions. TM2C has been ported to Intel’s SCC, i386, x86_64, SPARC, and Tilera processors.

Details & Code

STMs and STM benchmarks

LPD has designed various state-of-the-art STM systems (SwissTM, eSTM) and STM benchmarks (STMBench7, Synchrobench, LeeTM).

Details & Code