Skip to main content

Watcher

Api::watch() simply terminates when the connection drops and does not handle resourceVersion expiration. watcher() layers a state machine on top to provide automatic reconnection, initial list loading, and error recovery as a Stream.

Role of the Watcher

The limitations of Api::watch() can be summarized as follows:

LimitationDescription
Terminates on disconnectMust be manually restarted
No resourceVersion expiration handlingErrors out on 410 Gone responses
No initial listwatch only sees changes "from now on"

watcher() handles all of these automatically:

use kube::runtime::{watcher, WatchStreamExt};

let wc = watcher::Config::default();
let stream = watcher(api, wc)
.default_backoff() // Exponential backoff retry on errors
.applied_objects(); // Convert Event<K> to K stream

State Machine

The watcher's internals are a state machine that transitions through five states.

What each state does:

StateBehaviorAPI Call
EmptyInitial state or reset after error. Branches based on the configured strategy.--
InitPagePaginated LIST call (page_size=500). Emits Event::InitApply(obj) for each page. Fetches the next page using continue_token.LIST ?limit=500&continue=...
InitialWatchStarts a WATCH with sendInitialEvents=true. The server sends existing objects one by one and signals completion with a Bookmark.WATCH ?sendInitialEvents=true
InitListedInitial list complete. Emits Event::InitDone and starts a WATCH with the last resourceVersion.WATCH ?resourceVersion=...
WatchingNormal watch state. Added/Modified become Event::Apply, Deleted becomes Event::Delete. Returns to Empty on 410 Gone or disconnection.-- (maintains existing connection)

Two Initial List Strategies

ListWatch (Default)

The default strategy that works on all Kubernetes versions.

  1. LIST call (page_size=500, same as client-go)
  2. Exhaust all pages using continue_token
  3. Start WATCH with the last resourceVersion
// Default -- uses ListWatch strategy
let wc = watcher::Config::default();
Initial LIST Memory

On large clusters, the entire resource set is loaded page by page. If the object count is high, memory usage can spike significantly during the initial LIST.

StreamingList (K8s 1.27+)

An efficient strategy available from Kubernetes 1.27 onwards.

  1. WATCH + sendInitialEvents=true + resourceVersionMatch=NotOlderThan
  2. Server sends existing objects one by one as Added events
  3. Bookmark signals initial list completion
// Use StreamingList strategy
let wc = watcher::Config::default().streaming_lists();

Unlike LIST, objects are processed one at a time, resulting in better memory efficiency. However, it requires Kubernetes 1.27+ (WatchList feature gate).

Event Abstraction

The watcher transforms Kubernetes WatchEvents into higher-level Events.

pub enum Event<K> {
Init, // Re-list started
InitApply(K), // Each object in the initial list
InitDone, // Initial list complete
Apply(K), // Added/Modified during watch
Delete(K), // Deleted during watch
}

Mapping from Kubernetes WatchEvent:

WatchEventDuring initial listDuring watch
AddedInitApply(K)Apply(K)
Modified--Apply(K)
Deleted--Delete(K)
BookmarkInternal handling (resourceVersion update)Internal handling

The reason Init/InitApply/InitDone exist separately is for the reflector's atomic swap pattern. A new buffer is created at Init, filled with InitApply, and swapped in all at once at InitDone.

Error Recovery and Backoff

All errors from the watcher are treated as retriable.

default_backoff

let stream = watcher(api, wc)
.default_backoff(); // Exponential backoff: 1s -> 2s -> 4s -> ... -> 60s (max)

The backoff resets upon receiving a successful event.

Never use without backoff

Using the watcher without backoff causes the stream to terminate on the first error. This can lead to a production incident where the entire Controller stops.

If an API server outage occurs without backoff, tight-loop retries can cause CPU/memory to spike out of control.

410 Gone Handling

The Kubernetes API server retains resourceVersion history for only about 5 minutes. If the delay exceeds this, it returns a 410 Gone response.

When the watcher receives 410 Gone, it returns to the Empty state and performs a full re-list. Enabling bookmarks causes the resourceVersion to be periodically updated, significantly reducing the frequency of 410 responses.

watcher::Config

let wc = watcher::Config::default()
.labels("app=myapp") // label selector
.fields("status.phase=Running") // field selector
.timeout(300) // watch timeout (server-side)
.page_size(100) // ListWatch page size
.streaming_lists() // StreamingList strategy
.any_semantic() // ListSemantic::Any
.disable_bookmarks(); // Disable bookmarks (on by default)
OptionDefaultDescription
labelsNoneNarrow watch scope with a label selector
fieldsNoneField selector
timeout295sServer-side watch timeout
page_size500ListWatch page size
streaming_lists()offEnable StreamingList strategy
any_semantic()offListSemantic::Any -- allows cached data (faster but possibly stale)
disable_bookmarks()--Disable Bookmark events (on by default; disabling increases 410 frequency)

Things to Know

Watch events are not guaranteed to be delivered

DELETE events can be lost during network disconnections. A re-list will recover the final state, but intermediate events are permanently lost. Use finalizers when deletion detection is critical.

Memory spikes during re-list

When a re-list occurs on a large cluster, old data + JSON buffer + deserialized objects all exist in memory simultaneously. Memory usage can temporarily reach 2-3x the normal level. Using StreamingList can mitigate this.

410 comes quickly without bookmarks

For quiet resources with infrequent changes, the resourceVersion doesn't get updated and expires quickly. Bookmarks periodically update the resourceVersion to prevent 410 Gone. Since bookmarks are enabled by default, you don't need to worry about this unless you explicitly disable them.