Go Concurrency [2/2]: Work Stealing, the Netpoller, and Async Preemption

Part 2 of 2 : this post builds on the GMP model from Part 1.

The GMP model explains the structure of Go’s scheduler. This post explains how it stays efficient under real-world load. Three mechanisms that run silently behind every Go program:

Work stealing – balances uneven workloads across processors
The Netpoller – handles network I/O without blocking OS threads
Async preemption – stops CPU-heavy goroutines from starving others

Once you understand these three, the scheduler stops feeling magical and starts feeling inevitable.

1. Work Stealing: No Idle Processors

Every P maintains its own Local Run Queue (LRQ), a fixed-size ring buffer of up to 256 goroutines. When a goroutine becomes runnable, Go places it into the LRQ of the current P rather than a shared global queue. This matters for performance: local queues require no global lock, produce less cache contention, and exhibit better CPU locality.

The problem local queues create is uneven load. One P can drain its queue while another is backed up with hundreds of runnable goroutines. Go’s answer is work stealing, an idle P hunts for work rather than parking.

When a P’s LRQ is empty, it searches in this order:

Its own LRQ
The Global Run Queue (GRQ) – goroutines not yet assigned to any P
The Netpoller – goroutines whose network I/O just completed
Another busy P’s LRQ – stealing half the goroutines from the tail

The tail-steal detail is deliberate. Each P consumes its LRQ from the head, so stealing from the tail takes work that wasn’t about to run anyway, it minimises disruption to the victim P.

Go Work Stealing

The 61-tick rule: preventing GRQ starvation

If Ps only ever steal from each other, goroutines sitting in the GRQ could wait indefinitely. To prevent this, runtime/proc.go has a hard-coded rule: every 61st scheduling tick, a P checks the GRQ before its own LRQ.

The number 61 is intentionally prime, it distributes GRQ checks unevenly across Ps so they don’t all hammer the shared queue simultaneously. Small detail, large consequence.

2. The Netpoller: Network I/O Without Blocking Threads

In Part 1 we covered blocking syscalls: file I/O blocks the M, a P-handoff is required, another M picks up the P and keeps running. The OS thread is genuinely blocked for the duration.

Network I/O is handled completely differently, and understanding why is one of the more valuable things you can take from this series.

Modern operating systems expose readiness notification APIs – epoll on Linux, kqueue on macOS, IOCP on Windows. Instead of blocking a thread until data arrives, you register interest in a file descriptor and the OS notifies you when it’s readable. Go’s Netpoller wraps these OS primitives behind a single internal interface.

Think of it like waiting for food delivery. One option: stand frozen at the door until the doorbell rings. The other: go about your day, and let the doorbell tell you when it’s time. The first option is what most runtimes do with network I/O – the OS thread literally stops and waits. Go uses the second. The OS is the doorbell.

Here’s what actually happens when your goroutine calls conn.Read():

Phase 1 : Register and park. The runtime tells the OS to watch the socket and notify it when data is ready. The goroutine is then parked, removed from the run queue entirely and placed in a waiting list. It’s not scheduled anywhere. It’s suspended.

Phase 2 : M keeps going. The OS thread that was running your goroutine is now free. It immediately picks up the next G from its local run queue. From the OS’s perspective the thread never stopped, it just switched tasks. The thread is not waiting. Only the goroutine is.

Phase 3 : OS signals readiness. Data arrives. The OS notifies the Netpoller that the socket is readable. The Netpoller marks the goroutine runnable and places it back on a run queue.

Phase 4 : Any P resumes it. Whichever processor is free next picks up that goroutine and runs it, not necessarily the same M that started it. conn.Read() returns data to your code as if it had been a simple blocking call all along.

The distinction that follows from this explains a huge amount of Go’s network performance:

Operation	What blocks	What Go does
File I/O	The M (OS thread)	P-handoff. Another M takes the P
Network I/O	The G (goroutine)	Park G only. M keeps running

File I/O lands in a different row because the kernel doesn’t offer a readiness notification for disk reads the same way it does for sockets. The OS thread genuinely blocks inside the kernel, which is why the P-handoff is necessary, the blocked M has to detach its P so another M can keep running other goroutines. With network I/O, the M never blocks, so no handoff is needed.

That’s why Go servers can sustain massive concurrency on relatively few OS threads. A Go HTTP server handling 50,000 simultaneous connections doesn’t need 50,000 OS threads waiting on sockets, it needs enough threads to keep the CPUs busy doing actual compute work. The goroutines waiting on network I/O cost almost nothing while parked.

The boundary to remember: file I/O blocks the M, network I/O parks the G. Everything downstream of that distinction, thread count, memory use, throughput under load follows from it.

Go netpoller 4 phases

3. Async Preemption: Stopping a Goroutine That Won’t Yield

Before Go 1.14, the scheduler was cooperative. Goroutines yielded at natural pause points: function calls, channel operations, syscalls. For the vast majority of programs this worked fine, because function calls are ubiquitous.

The failure case was a tight compute loop:

func hotLoop() {
    for {
        // heavy computation, no function calls, no yield points
    }
}

This goroutine would hold its P indefinitely. Every other goroutine scheduled on that P would starve. The scheduler had no mechanism to interrupt running code; it could only wait for a voluntary yield that was never coming.

sysmon and SIGURG

Go 1.14 introduced asynchronous preemption via a background thread called sysmon (system monitor). It runs outside the normal scheduler loop, no P, never visible in goroutine counts, and its only job is to watch for misbehaving goroutines.

Every ~10ms, sysmon scans all running goroutines. If one has held its P for more than 10ms without yielding, sysmon sends a SIGURG signal to the M running it. SIGURG was chosen deliberately, it was historically used for out-of-band TCP data and almost no real application uses it, so intercepting it doesn’t interfere with user code.

On receiving SIGURG, the M:

Pauses the goroutine at its current instruction
Saves the full register state
Moves the goroutine to the tail of the GRQ
Picks up the next G from its LRQ

Async preemption

A compute-heavy goroutine still runs for substantial stretches, up to 10ms uninterrupted is real throughput. But it no longer has the power to freeze an entire processor. The scheduler forcibly rebalances execution.

How It All Fits Together

Mechanism	Problem it solves
P-handoff (Part 1)	Blocking syscalls freeze the M
Work stealing	Uneven load leaves some Ps idle
Netpoller	Network I/O would block OS threads
Async preemption	CPU-bound goroutines starve others

None of this requires anything from you as the programmer. The scheduler is watching load distribution, parking goroutines on I/O, and preempting runaway compute, constantly, transparently, on every goroutine you launch.

Seeing It in Action: `go tool trace`

The best way to make this concrete is go tool trace. Add tracing to your program:

package main

import (
    "os"
    "runtime/trace"
)

func main() {
    f, _ := os.Create("trace.out")
    trace.Start(f)
    defer trace.Stop()

    // your code here
}

Then run:

go run main.go
go tool trace trace.out

Once you open the trace viewer, you’ll start recognising patterns:

Goroutines disappearing and reappearing → Netpoller park/wakeup
Goroutines jumping between P lanes → work stealing
Periodic short interruptions during CPU loops → async preemption
Threads detaching from Ps and reattaching → P-handoff

At that point the scheduler stops being abstract. You can watch it make decisions in real time, and every decision will make sense.

Write straightforward Go. The runtime earns its keep.

1. Work Stealing: No Idle Processors#

The 61-tick rule: preventing GRQ starvation#

2. The Netpoller: Network I/O Without Blocking Threads#

3. Async Preemption: Stopping a Goroutine That Won’t Yield#

sysmon and SIGURG#

How It All Fits Together#

Seeing It in Action: go tool trace#