Programming Replicated, Distributed Systems

Concurrency makes everything hard

For many programmers, writing efficient concurrent datastructures is an exercise in scouring github and hoping that code quality and number of “stars” really do correlate. When it comes time to write your own concurrent datastructures, queue many sleepless nights spent worrying about whether this or that particular concurrency primitive is fast enough, or correct enough, to make the code work at the speeds you need. If there is any solace felt, it comes from the once-safe assumption of strong consistency; that, no matter how many races we might have missed, at least the result of the program will be some valid interleaving of concurrent threads.

This assumption is wrong.

Because hardware isn’t as helpful as it once was. Modern processor and language memory models reduce the available consistency assumptions so much that it’s possible to read values which were never written, justify your own reads, and arbitrarily miss updates from concurrent threads. Simple patterns, like initializing an object and then handing it off directly to a newly-spawned thread, are simply not guaranteed to work.

And distribution makes it harder.

In a distributed setting, the costs of using stronger consistency models grow exponentially with the number of replicas in your system (essential for fault-tolerance) and the distance between these replicas (essential for global performance and availability). While there are a lot of good people doing excellent work to try and bring strongly-consistent shared data to a geo-distributed setting, the fundamentals are still not in our favor. And, far more urgently, major parts of the web have already moved to weaker consistency models.

We’re going to fix that. Or at least help.

This project aims to bring the power of programming languages to bear on the world of weakly-consistent distributed programming. Our secret sauce is information flow, a key technology borrowed from the security literature and applied with new life here. Using information flow, it’s possible to ensure that weakly-consistent observations can never unduly influence strongly-consistent computations; it makes the points of crossover explicit, statically requiring that programmers upgrade the consistency of their observations when appropriate.

This technique is powerful; for more details, see our paper from PLDI 2018.

And we’re not done yet.

Our new project Gallifrey continues this legacy with a new programming language for the distributed world. Ready more at its project page!

Avatar
Matthew Milano
PhD Candidate, Programming Languages and Systems

Posts

Presenting MixT, our language for sanely mixing distinct consistency levels within a single transaction. Now in Tech Report form!

CONTINUE READING

Publications

Programming efficient distributed, concurrent systems requires new abstractions that go beyond traditional sequential programming. But programmers already have trouble getting sequential code right, so simplicity is essential. The core problem is that low-latency, high-availability access to data requires replication of mutable state. Keeping replicas fully consistent is expensive, so the question is how to expose asynchronously replicated objects to programmers in a way that allows them to reason simply about their code. We propose an answer to this question in our ongoing work designing a new language, Gallifrey, which provides orthogonal replication through restrictions with merge strategies, contingencies for conflicts arising from concurrency, and branches, a novel concurrency control construct inspired by version control, to contain provisional behavior.
3rd Summit on Advances in Programming Languages (SNAPL 2019), 2019

Programming concurrent, distributed systems is hard—especially when these systems mutate shared, persistent state replicated at geographic scale. To enable high availability and scalability, a new class of weakly consistent data stores has become popular. However, some data needs strong consistency. To manipulate both weakly and strongly consistent data in a single transaction, we introduce a new abstraction: mixed-consistency transactions, embodied in a new embedded language, MixT. Programmers explicitly associate consistency models with remote storage sites; each atomic, isolated transaction can access a mixture of data with different consistency models. Compile-time information-flow checking, applied to consistency models, ensures that these models are mixed safely and enables the compiler to automatically partition transactions. New run-time mechanisms ensure that consistency models can also be mixed safely, even when the data used by a transaction resides on separate, mutually unaware stores. Performance measurements show that despite their stronger guarantees, mixed-consistency transactions retain much of the speed of weak consistency, significantly outperforming traditional serializable transactions.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Talks

Presenting MixT, a domain-specific programming language for mixed-consistency transactions
Next
Previous