Green threads (user space threads or fibers) are a simple way to write concurrent programs without dealing with the operating system. Rather than forking multiple OS threads or processes, a program can use lightweight green threads—which are scheduled in user space—to manage concurrency in a cooperative (rather than preemptive) way.
Green threads are also a perfect match for evented I/O. Both
event-based I/O and green threads are focused on doing many things at the same
time without incurring the overhead or complexity of OS-based multitasking.
Imagine a Web server that has to deal with multiple requests at the same time.
Most of the server’s time is spent waiting for I/O operations to complete
(reading from a socket, writing to a socket, reading from a file on disk).
Evented I/O lets the server move on with its work while it’s waiting for I/O and
deal with I/O events as they arrive (using
select, for example).
Green threads can help manage the state for
each HTTP connection: one green thread can be assigned to each request. Avoiding
the use of OS threads for this task simplifies the implementation and reduces
the overhead of heavy system calls.
Green threads also resemble coroutines in that both are pieces of code that stop and start at arbitrary, programmer-specified points. Python, as of version 2.5, supports coroutines natively. Bluelet is a Python module that implements green threads, complete with simple evented socket I/O support, using the language’s built-in coroutines. It represents a simple (500-LOC as of this writing), pure-Python, lightweight implementation of green threading that makes it easy to write concurrent socket applications. It’s an alternative to the excellent greenlet library, which is much more powerful but relies on a C extension to the Python interpreter.
In a nutshell, Bluelet lets you write concurrent socket programs that look sequential. And it does it with style.
An Example: The Echo Server
An “echo” server is a canonical stupid example for demonstrating socket programming. It simply accepts connections, reads lines, and writes everything it reads back to the client.
Here’s an example using plain Python sockets:
import socket listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listener.bind(('', 4915)) listener.listen(1) while True: sock, addr = listener.accept() while True: data = sock.recv(1024) if not data: break sock.sendall(data)
The code is very simple, but its synchronousness has a major problem: the server can accept only one connection at a time. This won’t do even for very small server applications.
One solution to this problem is to fork several operating system threads or processes that each run the same synchronous code. This, however, quickly becomes complex and makes the application harder to manage. Python’s asyncore module provides a way to write asynchronous servers that accept multiple connections in the same OS thread:
import asyncore import socket class Echoer(asyncore.dispatcher_with_send): def handle_read(self): data = self.recv(1024) self.send(data) class EchoServer(asyncore.dispatcher): def __init__(self): asyncore.dispatcher.__init__(self) self.create_socket(socket.AF_INET, socket.SOCK_STREAM) self.bind(('', 4915)) self.listen(1) def handle_accept(self): sock, addr = self.accept() handler = Echoer(sock) server = EchoServer() asyncore.loop()
Asynchronous or evented I/O lets the thread run a single
select() loop to
handle all connections and send callbacks when events (such as accepts and data
packets) occur. However, the code becomes much more complex: the execution of a
simple echo server gets broken up into smaller methods and the control flow
becomes hard to follow.
Bluelet (like other coroutine-based async I/O libraries) lets you write code that looks sequential but acts concurrent. Like so:
import bluelet def echoer(conn): while True: data = yield conn.recv(1024) if not data: break yield conn.sendall(data) bluelet.run(bluelet.server('', 4915, echoer))
Except for the yield keyword, note that this code appears very similar to our first, sequential version. (Bluelet also takes care of the boilerplate socket setup code.) This works because echoer is a Python coroutine: everywhere it says yield, it temporarily suspends its execution. Bluelet’s scheduler then takes over and waits for events, just like asyncore. When a socket event happens, the coroutine is resumed at the point it yielded. So there’s no need to break up your code; it can all appear as a single code block. Neat!
You can find the echo-server example in Bluelet’s GitHub repository (see the sequential version, the asyncore version, a straightforward Bluelet version, and a more complicated version that shows more of Bluelet’s inner workings). I’ve written a couple more examples that illustrate how Bluelet can be used to accomplish useful things (at least more useful than echoing).
httpd.py is a simple HTTP server that scales up the pattern we used in the echo server above. It shows in more detail how to use Bluelet’s asynchronous socket API.
crawler.py shows how Bluelet can be used to write concurrent network clients (as opposed to servers). The program just fetches tweets from Twitter’s REST API for five users. The file shows the same task implemented in four ways: sequentially, with OS threads, with OS processes, and with Bluelet. It even does a performance comparison of the various approaches, which shows that, for network-bound workloads like this one, green threads are about as fast as OS-level multitasking.
Bluelet’s Programming Model
Bluelet’s README has more detail on its API. Bluelet includes just a
few expressions meant to be used with Python’s
yield expression in
coroutines—together, these are enough to add concurrency to a sequential
socket program (and maybe more!).
Give Bluelet a try and let me know how it works out for you!
Bluelet: “Easy concurrency without all the messy parallelism.”