Skip to main content
Use computer_action when an agent needs to interact with the browser like a user: click, type, scroll, drag, copy, paste, or inspect the screen. Pass one action for a simple operation, or batch several actions into one call to reduce latency.
Include a screenshot as the last action when you need to see the result. screenshot, read_clipboard, and get_mouse_position return data, so they must be the last action if included.

Parameters

ParameterDescription
session_idBrowser session ID. Required.
actionsOrdered list of one or more actions to perform. Required.

Action types

TypeDescription
click_mouseClick at x, y. Supports button, click_type, num_clicks, hold_keys.
move_mouseMove the cursor to x, y.
type_textType text, with optional inter-key delay.
press_keyPress keys (X11 keysym names or combos like Ctrl+t, Return).
scrollScroll at x, y by delta_x/delta_y (positive = right/down).
drag_mouseDrag along a path of [x, y] points.
set_cursorShow or hide the cursor (hidden).
sleepWait duration_ms between steps when the page needs time to react.
write_clipboardWrite text to the browser session clipboard.
read_clipboardRead the browser session clipboard. Must be the last action if included.
screenshotCapture the page, optionally limited to a region.
get_mouse_positionReturn the current cursor position.
  1. Start with screenshot so the agent can see the viewport and coordinate space.
  2. Batch related pointer and keyboard actions together.
  3. Add sleep between actions that trigger navigation, animation, or async UI updates.
  4. End with screenshot, read_clipboard, or get_mouse_position only when you need returned data.

Search from the page

{
  "session_id": "browser_abc123",
  "actions": [
    { "type": "click_mouse", "click_mouse": { "x": 420, "y": 300 } },
    { "type": "type_text", "type_text": { "text": "kernel browsers" } },
    { "type": "press_key", "press_key": { "keys": ["Return"] } },
    { "type": "sleep", "sleep": { "duration_ms": 1000 } },
    { "type": "screenshot" }
  ]
}

Clipboard example

{
  "session_id": "browser_2vDb5kRmZ4nP8xQ1cA7",
  "actions": [
    { "type": "write_clipboard", "write_clipboard": { "text": "https://kernel.sh/docs" } },
    { "type": "press_key", "press_key": { "keys": ["Ctrl+l"] } },
    { "type": "press_key", "press_key": { "keys": ["Ctrl+v"] } },
    { "type": "press_key", "press_key": { "keys": ["Return"] } },
    { "type": "screenshot" }
  ]
}