Get in Touch

|| Blog || WebSockets in Python for Practitioners

WebSockets in Python for Practitioners

May 2022

Finding a Python package, for WebSocket connectivity, often turns out to be harder than initially planned. But as we want to make challenging things a bit easier for people around us, we have created a practitioners' guideline referring to utilizing Websockets in Python, without async.

Avoiding async

But what is the reason behind avoiding async? Simply put, because the async paradigm in Python may prove challenging even for people which are familiar with event loop based languages. It really is one hard to love trick, and once you add the async keyword to your codebase, it is going to spread.

This, is done by design to increase performance and enforce rules which eventually make life easier. It could even be considered an outstanding performance for server apps and microservices. Great for the language as a whole, the async functions is short, fast and (b)lock perfectly over input /output.

Turns out most functions written in Python are not quite like that. A lot of people love the language for it's capability of simplifying the numerical heavy lifting in fields like image processing, data science and machine learning. However, it is by no means input /output and, on top everything, it has a lengthy execution time.

The challenge

Consider the following example in which a WebSocket connection does both request/response and state maintenance for an application such as a chat. How would you approach starting to prototype this in Jupyter?

# jupyter cell 1
sock = socket.open( 'endpoint' )
sock.send( 'query ring' )
sock.recv() # N unrelated
sock.recv() # 'response ring + data' 

# update chat mesages
while( 'when does this stop?' ): 
    sock.recv() # M unrelated

# jupyter cell 2
# unreachable
sock.send( 'query users' )

One method would be to build a single cell from intertwining the recv calls with legitimate cell contents. If state management solves your problem - go for it.

A solution

Is to handle both the receive and sends within a separate thread. Some coroutines (async) code are required to make this work, otherwise the order of sends and receivals may need to be known beforehand.

In order to isolate the async code from the rest of our implementation, you can use the curio package which does support execution of a threaded event loop and provides a great UniversalQueue object for communication between the async thread and the main, synchronous one. Framing of the data packets and handshaking is performed using wsproto.

In this scenario, the data and events are passed to the client using callbacks, not the most 'Pythonic' way of doing things, but definitely not the worst.

Introducing nwebsocket

The 'n' parts stands for normal or node-like. This package replicates the API surface of the WebSocket client found in your browser. So, do Python practitioners have to deal with the async compromises?

A transitional approach

Is the magic WebSocket class which can run callbacks. The first thing you need to do to make your life easier is to isolate the callbacks right-away, and place the data received into instance variables. This can be done by distinguishing messages, as seen in the example below:

from nwebsocket import WebSocket 

class ChatProtocol( WebSocket ):
    def __init__( url ):
        super.__init__( url ) 
        
        # store an array of messages
        self.messages = []

    def onmessage( m ):
        # if string, add to internal variable 
        if( isinstance( m, str ) ):
            self.messages.append( json.loads( m )[ 'data' ] )
        # if bytes, reflect payload
        else:
            self.send( m )

    def post( message ):
        self.send( json.dumps( dict( type = 'message', data = message ) ) )

In this scenario you can use any ChatProtocol instance to send messages and read the current messages list from it. The messages property is changed only from inside this class. Any other logic like connection state can also be stored within the class.

Keep track of query responses

If the endpoint supports some form of request/response messaging, keeping track of of the message- request correspondence can be tricky. A solution to this is to find a property which acts like a reflect header. Whatever information you send to the endpoint will be reflected back to your client. This way you can determine precisely when the query has finished and what is the server response.

Is this method faster?

It depends. Roundtrip numbers on localhost seem to sit fairly low under Windows.

Results on Unix systems seem to be at least one order of magnitudes faster, sitting at about 3 thousand roundabout queries per second.

Conclusions

In the end, you have to try it for yourself and, for this reason, we'll finish this article with a guideline on when to use async in Python.

Guidelines:

In a perfect world there should be a perfect solution for running hardware intensive tasks on the event loop but, unfortunately, there hasn't been one found yet.

We recommend using async if you:

demand high throughput
have a processing task which is relatively fast
are prepared to hadle the learning curve

It is not advised to use it to parallelize hardware intensive, long running functions. For this, there are better existing alternatives, either multiprocessing, or load balancing and auto-scaling.

Otherwise: