Async Rust Isn't Bad: You Are

There have been quite a few articles in the past year or so about the downsides of using Rust and introducing the async keyword into your code base:

It generally boils down to two things:

  • async is invasive. The path of least resistance is to make your entire code base async along with just the parts that need to be async.
  • This "function coloring" is at odds with the stdlib functions, resulting in 3rd party crates for nearly everything.

The problem starts to escalate when you get to crates: the crates.io ecosystem is a disaster. You either use Boost: Rust Edition, or take your chances on smol and find out weeks later there is a lib you want to use, but it, like several others, leak out tokio types and now you have the New Rust Dilema: Rewrite It In Another Runtime?

How Did We Get Here?

I'm putting my money on the never ending Next Hotness mindset of the npm install app world made their way into Rust and cargo combined with bullshit programmer propoganda.

Are We Web Yet?:

There are some awesome Rust and WebAssembly projects out there. For example, Yew and Seed let you create front-end web apps with Rust in a way that feels almost like React.js.

Uhm, why? Is that really what is holding you back from making a website? You'd really love to, but can't possibly type <html> in an editor because the Blazingly Fast™ purist in you would die inside?

When Rust 1.0 finally launched, it seemed like every other week there was some HN post about not having async, and that was what was holding it back. Rust needed to be on the web, and to be on the web: you had to be async. Of course, all these comments and pressure were from a crowd who will never actually build something with more than 12 users or anything more substantial than what is needed to fill up a Medium blog post with slick code snippets and Blazingly Fast™ Marketing Propoganda.

And now we're here, where having the async word on your repo is some shiny badge of honor instead of what it should be: a warning the problem is so complex or the hardware too constrained that using async was the only way out. Instead if you look around Rust GitHub repos, it quickly becomes apparent that it is more import to have your program made with async than actual real shit that matters: what is the resource usage like? How much CPU does this take (yes, I know hardware is far enough along and Electron has made people igore RAM, but these are the things that actually matter)? How many concurrent users can I expect from my machine? How many write ops/sec can I expect? Not that your shit is async.

You Don't Need It

Rust didn't ship with async, it was workable for asynchronous programming then. Sure, back then you had to import libc, get the file descriptor, mark it as O_NONBLOCK, but that ended when Rust added set_nonblocking to TcpStream in v1.9.0. You wanna know what else didn't ship with async? The thing all your bullshit async websocket servers run on: Linux. Last time I checked, Linux, and the POSIX world in whole, seems to be doing just fine running the entire internet without it. signal, timerfd, epoll, kqueue exist. Guess what? That's all tokio and these runtimes are doing. They can't magically put something to sleep. The kernel is the one who's not bringing your CPU to a halt when your read isn't doing anything, not tokio. What else isn't async? I dunno, maybe the thing most the internet uses on top of Linux: Java 8. Shocker, I know. Reality is a bitch sometimes. Just remember, if async really was the answer server side, Node.js would have already won. Last time I checked, if you are using Node.js as your backend: you're aren't actually doing anything at scale, or Blazingly Fast™.

After a few years, Rust released async functions. The wait was over. Instant confusion, no one seemed to grasp what they were actually trying to do. What is this? I can't just put async on my function and it be fast now? I need an executor? What is that? How do you write a execu-oh wait: tokio! It was then that the Rust Standard Library: Async Edition was born. The crates.io ecosystem has since been split, and anything related to network i/o is async. Want to know how bad tokio has actually spread into the wider Rust ecosystem? Of course you do - crates.io provides a reverse dependents search where you can view a crate, then view all other crates that depend on it. Super neat feature, try it with tokio. At the time of writing this post, crates.io throws a 500. I would guess because OOM? Timeout? The point is: it is absolutely absurd how invasive tokio has become to the wider Rust ecosystem.

If you are starting your project with async and not writing your own executor: stop. You don't need it. If you are wondering if you need async for your application: you don't. Those who need async already know they need it, why, and how they are going to build up their own specific executor for their needs. The real secret is that the majority of these types of applications should be on systems without threads, the tiny embedded world. That's who really needs this. Not you on an infinite core virtual machine who doesn't even know how your code gets to the CPU to execute and probably can't figure it without docker and some 3rd party service to deploy it there.

Frameworks are a system, Neo. Those frameworks are your enemy.

Proof is in the Pudding

I've talked a bunch of shit, now I ought to back it up with something otherwise I'll just be old man yelling at the cloud and the young kids with blue hair. Let's do some asynchronous network programming with the quintessential Webscale™ setup: a websocket server.

We have two programs, they each do the same thing:

  • TLS handshake
  • Websocket handshake
  • Clients once connected send 10Kb of JSON every 1sec.
  • Server reads in data, deserializes the JSON into its struct repr.

All experiments are on a Raspberry Pi 4. Server A is using the stdlib, no async keyword. Server B is using tokio + async keyword.

The Pudding

The client bench example is in the webscale project. Cert generation script is also included.

| Item            | Webscale | Webscale Tokio |
|-----------------|----------|----------------|
| Compile time    | 5m 43s   | 6m 31s         |
| Binary size     | 13Mb     | 14Mb           |
| Runtime Memory* | 4Mb      | 5.2Mb          |
| Max clients**   | ~25k     | ~25k           |

No practical difference. They are the same. Outside of realizing you can get the same "performance" without the async keyword, can we take a minute and salute the hardware world and Linux a bit? This little $40 computer powered by a USB wallwart is capable of handling more concurrent users than you'll ever see on your lambda-docker-servless-cloud-edge-compute app.

  • I gave up putting in work to try and get the rpi to not kill the program after around 25k concurrent connections. If someone wants to figure it out, feel free to send a PR.)
  • Runtime memory is heaptracker, 100 clients sending 10kb for 5mins.

Start Being Good: Linux + epoll

Learn about your system. Stop being a framework developer. In not too long of a time from now, Rust will be headed the path of the JS world: Now Hiring: Tokio Developer. Tokio and Node.js are built around an event loop. With tokio, it is backed by mio. With Node, it is backed by libuv. All they do is wrap up what the system provides for an event loop: epoll. Sure, there is kqueue for BSDs and whatever Windows has, but be honest: you really gonna go deploy something on a server that is running macOS?

Take this event loop skeleton and modify it for your needs. Maybe you have giant writes instead of mainly reads and you need to also listen for when the write buffer is available? Same epoll, different flags.

fn event_loop(epoll_fd: RawFd) {
    fn contains_close_event(e: epoll::Events) -> bool {
        (e & (epoll::Events::EPOLLERR
            | epoll::Events::EPOLLHUP
            | epoll::Events::EPOLLRDHUP))
            .bits()
            > 0
    }

    fn contains_read_event(e: epoll::Events) -> bool {
        (e & epoll::Events::EPOLLIN).bits() > 0
    }

    let pool = threadpool::ThreadPool::new(10);
    let mut scratch: [epoll::Event; 10] = unsafe { mem::zeroed() };
    loop {
        let nevents = match epoll::wait(epoll_fd, -1, &mut scratch) {
            Ok(amt) => amt,
            Err(e) => {
                error!("epoll wait: {e}");
                return;
            }
        };

        let mut process = Vec::<(
            Arc<Mutex<tungstenite::WebSocket<Connection>>>,
            Vec<tungstenite::Message>,
        )>::new();

        for event in &scratch[0..nevents] {
            let flags = epoll::Events::from_bits_retain(event.events);
            let conn =
                event.data as *const Mutex<tungstenite::WebSocket<Connection>>;
            let conn = unsafe { Arc::from_raw(conn) };
            if contains_close_event(flags) {
                close_connection(conn);
                continue;
            } else if contains_read_event(flags) {
                let mut buf = Vec::<tungstenite::Message>::new();
                let mut error = false;
                loop {
                    let result = { conn.lock().read() };
                    match result {
                        Ok(msg) => buf.push(msg),
                        Err(e) => match e {
                            tungstenite::Error::Io(e) => {
                                if e.kind() == io::ErrorKind::WouldBlock {
                                    break;
                                }

                                error!("recv: {e}");
                                error = true;
                                break;
                            }
                            e => {
                                error!("recv: {e}");
                                error = true;
                                break;
                            }
                        },
                    }
                }

                if error {
                    close_connection(conn);
                    continue;
                }

                process.push((conn.clone(), buf));
                let _forget = Arc::into_raw(conn);
            }
        }

        for (conn, msgs) in process {
            pool.execute(move || process_msgs(conn, msgs));
        }
    }
}

fn close_connection(conn: Arc<Mutex<tungstenite::WebSocket<Connection>>>) {
    let mut ws = conn.lock();
    let _ = ws.get_mut().shutdown();
}

fn process_msgs(
    conn: Arc<Mutex<tungstenite::WebSocket<Connection>>>,
    msgs: Vec<tungstenite::Message>,
) {
    for msg in msgs {
        todo!("look ma, no async keyword here")
    }
}

When you take Rust and start using the actual system (Rust is a systems language after all), you will also grow your programming knowledge bank. You gotta do some unsafe shit in this world. Be warned though, do not fall victim to the Zero Unsafe Rust™ charlatans badge of honor either. Rust has unsafe for a reason, don't be scared to use it. Instead learn why and how to use it, so you know when it is worth it and when it's not. Everything is a tradeoff. Programming is just tradeoffs, all the way down.

Once you understand how to do this stuff at the OS level, you can take that knowledge to any language or framework later. That is actual important shit to learn. Be an engineer. When you are a framework developer, you can only solve problems that the framework does for you. Once you know how systems work, you can engineer a solution for yourself.