Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
education [2021/10/08 12:52]
fablpd
education [2022/02/25 11:02]
fablpd
Line 56: Line 56:
 \\ \\
  
-  * **GANs with Transformers**: Since their introduction in 2017, the Transformer architecture revolutionized ​the NLP machine learning modelsThanks ​to the scalability of self-attention only architectures, the models can now scale into trillions of parametersallowing human-like capacities of text generationHoweverthey are not without their own shortcomings,​ notably due to their max-likelihood training mode over data that contains potentially undesirable statistical associations. An alternative approach ​to generative learning - Generative Adversarial Networks (GANs) - perform remarkably well when it comes to images, but have until recently struggled ​with textsdue to their sequential ​and discrete nature that is not compatible with gradient back-propagation they need to trainSome of those issues have been solved, but a major one their scalability due to usage of RNNs instead of pure self-attention architectures. Previously, we were able to show that it is impossible ​to trivially replace RNN layers with Transformer layers ​(https://​arxiv.org/​abs/​2108.12275presented in RANLP2021). This project will be building ​on those results ​and attempting ​to create stable Transformer-based ​Text GANs based on the tricks known to stabilize Transformer ​training ​or to attempt to theoretically demonstrate ​the inherent instability ​of Transformer-derived architectures in adversarial regime. You will need solid background knowledge of linear algebraacquaintance with the theory ​of machine learning, specifically neural networks, as well as experience with scientific computing in Python, ideally with PyTorch experienceExperience ​with NLP desirable, but not required.+  * **Hijacking proof-of-work to make it useful: distributed gradient-free learning approach**: Proof-of-work blockchains - notably Bitcoin and Ethereum - reach a probabilistic consensus about the contents of the blockchain by a mechanism of probabilistic leader electionEvery contributor ​to the consensus tries to solve a puzzleand the first one to succeed is elected a leaderallowed to create the next block and publicly add information to itThe puzzle needs to be hard to solve and easy to verifysolvable only by random guessing and not allowing for any shortcuts and allow for its difficulty ​to be tuned so that nodes don't find answers ​to it simultaneously and take different leaderships forking the chain in two. Partial cryptographic hash reversal has traditionally been a perfect candidate for such puzzle, but it has no interest outside being a challenge for blockchain. And with 100-300 PetaFLOP/s (drawing 100 TWh/y) of general purpose computational power being tied into Ethereum blockchain alone as of early 2022the waste of computational resources ​and energy ​is colossalWhile the interest ​of blockchains and the suitability of proof-of-work as a mechanism ​to run them is widely debated, it's at this day the mechanism for the two largest ones. We try to at least use some of that challenge useful by injecting a "​try"​ step of a (1,λ)-ES evolutionary search algorithm into the hash computation loop, slowing it down and making it do something useful in during the slowdown period. This class of evolutionary search algorithm achieves a good performance ​on black-bock optimization tasks (sometimes exceeding RL approaches in traditionally RL problems), is embarrassingly parallel, fits well the requirements for a proof-of-work function ​and can be empirically optimized ​to minimize the waste of computational resources during a training run. However, in its current state the (1,λ)-ES-based ​useful proof-of-work has been proven ​to work in cases where the data used for the training ​tasks can be fully replicated among the nodes. For numerous applications,​ it is not an option. Finding ways to solve that problem, both from a theoretical and an experimental perspective will be the goal of this project. 
 +You will need solid skills in Python (Rust and WebAssembly are a plus)basic understanding of distributed algorithms and of machine learning ​conceptsSome familiarity ​with blockchains and black box optimization is a plus, but is not a requirement. Contact Andrei Kucharavy (andrei.kucharavy@epfl.ch) for more information.