19 Oct, 2021

Websockets: The Powerful Protocol

Just a basic overview on how WebSockets work.

Needs an update.

This piece of writing might be outdated, or might have incorrect information at certain places. I might work on a newer version of this post.

The WebSocket protocol was described in the specification RFC 6455, and …….

Eh, feels like reading straight out from a textbook, right?

Let’s make it a bit more exciting.

What is a WebSocket?

A WebSocket is a persistent connection between a client and server. WebSockets provide a bidirectional, full-duplex communications channel that operates over HTTP through a single TCP/IP socket connection. At its core, the WebSocket protocol facilitates message passing between a client and server.

Why do we need WebSockets?

Before diving into the need of WebSockets, let’s see what challenges we face while building real-time web apps.

The web was built around the idea that a client would request data from a server, and the server would fulfill those requests. Web applications are growing faster and consume more data than ever (a great talk on web bloat). The traditional HTTP model has many drawbacks, which are not suitable for real-time web apps. For example, one drawback is that each request opens and closes a new TCP connection.

To overcome this drawback, long polling comes into play. In long polling, a client can send an HTTP request with a long timeout period and the server can keep pushing data back to the client.

While this looks good, it still is a workaround; long polling is problematic when there is no data available to send back to the client. The server needs to unnecessarily hold the resources throughout the length of the poll (timeout).

Another drawback is that it carries the overhead of HTTP. With every single HTTP request, headers and cookie data are transferred to the server, which in turn increases latency. This would be a bad choice for someone who might want to develop real-time games/web apps, as reducing latency is crucial.

This is where WebSockets come into play. They provide a persistent connection between a client and server, which both parties can use to start sending data at any time.

How the Protocol Works

This article won’t cover how framing is done in WebSockets, and other details about the protocol. Read RFC 6455 for more info.

The client establishes a WebSocket connection through a process known as the WebSocket handshake. This process starts with the client sending a regular HTTP request to the server. An Upgrade header is included in this request that informs the server that the client wishes to establish a WebSocket connection.

WebSockets do not use the http:// or https:// scheme, they use ws: (or wss: for a secure WebSocket). The remainder of the URI is the same as an HTTP URI (host, port, path and query parameters).

Once the handshake is complete, the initial HTTP connection is replaced by a WebSocket connection that uses the same underlying TCP/IP connection.

Now, any of the parties can transfer data. Data is transferred through a WebSocket as messages, each of which consists of one or more frames containing the data being sent.

In order to ensure the message can be properly reconstructed when it reaches the client, each frame is prefixed with 4–12 bytes of data about the payload.

Here is a preview of a frame as described in RFC 6455:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

To close a WebSocket connection, a closing frame is sent. Once the close frame has been received by both parties, the TCP connection is torn down. The server always initiates closing the TCP connection.

A Simple Demo

We’ll be using Go for writing the application, and the golang.org/x/net/websocket package for WebSocket connections.

HTTP is a stream protocol, WebSockets are frame-based. We need to prepare a block of data and send it as a set of frames. The websockets package provides Message to achieve this. The Message object has two methods: Send and Receive, which take in the WebSocket as the first parameter.

Let’s start with a simple example, where the server sends 5 messages to the client, and the client in return sends the received message to the server:

Here is the server code:

// server.go

package main

import (
	"fmt"
	"net/http"

	"golang.org/x/net/websocket"
)

func SendMessages(ws *websocket.Conn) {
	// send 5 messages
	for i := 0; i < 5; i++ {
		message := "New event!"

		// send message
		fmt.Println("Sending message to client: " + message)
		err := websocket.Message.Send(ws, message)
		if err != nil {
			fmt.Println("Could not send message: " + err.Error())
			break
		}

		// receive response
		var response string
		err = websocket.Message.Receive(ws, &response)
		if err != nil {
			fmt.Println("Could not receive response: " + err.Error())
			break
		}

		fmt.Println("Received response from client: " + response)
	}
}

func main() {
	http.Handle("/", websocket.Handler(SendMessages))
	err := http.ListenAndServe(":8080", nil)
	if err != nil {
		panic(err)
	}
}

Here is the code for the client:

// client.go

package main

import (
	"fmt"
	"io"
	"os"

	"golang.org/x/net/websocket"
)

func main() {
	// handle invalid usage
	if len(os.Args) != 2 {
		fmt.Println("Usage: ", os.Args[0], "ws://host:port")
		os.Exit(1)
	}

	// get url
	url := os.Args[1]

	// initiate connection
	conn, err := websocket.Dial(url, "", "http://localhost:8080")
	if err != nil {
		panic(err)
	}

	// receive messages
	var message string
	for {
		err := websocket.Message.Receive(conn, &message)
		if err != nil {
			// EOF is sent when connection is lost
			if err == io.EOF {
				// graceful shutdown
				break
			}

			fmt.Println("Could not receive message: " + err.Error())
			break
		}

		fmt.Println("Received message from server: " + message)

		// send message
		err = websocket.Message.Send(conn, message)
		if err != nil {
			fmt.Println("Could not return message: " + err.Error())
			break
		}
	}

	os.Exit(0)
}

Let’s run the app:

WebSockets Preview — Connecting a client

Notice how the server responds when another client connects:

Web-based Demo:

In this demo, we’ll build a WebSockets server that delivers an HTML page which sets up a WebSocket and displays information from that server using WebSockets. The server will fetch random values using the random.org API.

Let’s write the code for the server:

// server.go

package main

import (
	"fmt"
	"io"
	"net/http"
	"strings"
	"time"

	"golang.org/x/net/websocket"
)

func GetValue(ws *websocket.Conn) {
	for {
		// get random number
		resp, err := http.Get("https://www.random.org/integers/?num=1&min=1&max=100&col=1&base=10&format=plain&rnd=new")
		if err != nil {
			panic(err)
		}

		val, err := io.ReadAll(resp.Body)
		if err != nil {
			panic(err)
		}

		// send message
		message := strings.TrimSpace(string(val))
		fmt.Println("Sending " + message + " as message to WebSocket")
		err = websocket.Message.Send(ws, message)
		if err != nil {
			panic(err)
		}

		// sleep for 2 seconds
		time.Sleep(2 * 1000 * 1000 * 1000)

		// receive response
		var response string
		err = websocket.Message.Receive(ws, &response)
		if err != nil {
			panic(err)
		}
		fmt.Println("Received back from client: " + response)
	}
}

func main() {
    // NOTE: replace it with your own path
	var ROOT_DIR = "/home/user/path"

	server := http.FileServer(http.Dir(ROOT_DIR))
	http.Handle("/num", websocket.Handler(GetValue))
	http.Handle("/", server)

	err := http.ListenAndServe(":8080", nil)
	if err != nil {
		panic(err)
	}
}

Here’s the HTML code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/aura.css/aura.css"/>
    <title>Websocket Demo</title>

    <script type = "text/javascript">
        function WebSocketTest(){
            if("WebSocket" in window){
                alert("WebSocket is supported!");
                var ws = new WebSocket("ws://localhost:8080/num");

                ws.onopen = function(){
                    alert("WebSocket is opened!");
                }

                ws.onmessage = function(e){
                    var msg = e.data;
                    document.getElementById("display").innerHTML = msg;
                    ws.send("Message received.")
                }

                ws.onclose = function(){
                    alert("WS is closed.");
                }
            } else {
                alert("WebSocket is not supported!")
            }
        }
    </script>
</head>
<body>
    <header>
        <h1 style = "color:mediumseagreen"><u>WebSocket Test</u></h1>
        <h1 id = "display">None</h1>
        <a href="javascript:WebSocketTest()">Run WebSocket Test.</a>
    </header>
</body>
</html>

A socket has 4 events:

open: Connection established
message: Data received
error: WebSocket error
close: Connection closed

We are using the WebSocket API for handling events like handling WebSocket errors, closing connections, etc.

Now, we’ll run our server using go run server.go.

On opening http://localhost:8080/websocket.html, we should see our app running. On inspecting the Network tab, we can see that the num WebSocket is running:

Let’s see the terminal for client messages:

The web app also alerts when the connection is open/closed.

Can I use WebSockets?

WebSockets are supported in almost all modern web browsers. For more info on browser support, check out: Can I use Web Sockets?

Final Thoughts

WebSockets are poweful; having the ability to open bidirectional, low latency connections enables multiple opportunities for making real-time web applications. Some apps already use WebSockets, including Swiggy and many more. Almost all real-time apps are using WebSockets.

After reading this article, I hope you’d now be more interested in WebSockets! 😁