Skip to content

stemado/oto

Repository files navigation

OTO

OTO

Stop clicking.


OTO ("auto") is browser automation for Claude Code. You describe what you need done on a website and Claude opens a browser, figures out the page, and does it.

No scripts. No selectors. You talk, Claude browses.

Scrape data from any site

Hacker News scraping demo

Export a workflow, then schedule it

Export and schedule demo

Discover the APIs hiding under the UI

Network discovery demo

How It Works

OTO scouts the page structure, then acts — the same way you'd inspect a page in DevTools before clicking anything:

  1. Scout — compact structural overview (~200 tokens, not a raw DOM dump)
  2. Find — search for elements by text, type, or selector
  3. Act — click, type, select, navigate
  4. Scout again — see what changed

Most browser tools for AI take full-page screenshots and have the model interpret pixels. A single Playwright MCP screenshot costs ~124,000 tokens and the model still has to guess at selectors from what it sees. OTO reads the DOM directly and returns a compact report (~200 tokens) with exact CSS selectors — 98% less than a screenshot.

Landscape Analysis

Point OTO at any website and get a structured report that maps out the product:

/landscape https://example.com --depth standard

OTO browses the site autonomously — navigating pages, extracting vocabulary, capturing screenshots — and produces a four-pillar report:

  1. Language & Terminology — domain nouns, verbs, labels, naming patterns
  2. Product & Feature Profile — what the product does, core features, conventions
  3. Design & UX Context — layout model, visual hierarchy, interaction patterns, annotated screenshots
  4. Behavioral Flows — step-by-step narration of key user workflows

Three depth levels control scope: light (content sites, ~1 min), standard (established products with 1-2 flows), and deep (complex products with full observations and 2-3 flows). Output lands in landscape/{product-name}/ with the report and all screenshots.

Export and Schedule

Walk through a workflow once, conversationally. Then capture it:

/export enrollment

OTO produces a self-contained directory at workflows/enrollment/ with:

  • A standalone Python script using botasaurus-driver with human-like timing
  • A portable workflow JSON for use with any executor
  • requirements.txt and .env.example for credentials

Then schedule it:

/schedule enrollment

OTO detects your OS and creates the right scheduled task — Windows Task Scheduler, macOS launchd, or Linux cron.

Credential Safety

fill_secret reads credentials from .env server-side and types them directly into form fields. Claude only sees "chars_typed": 22 — never the actual value. Exported scripts use ${ENV_VAR} references. Authorization and Cookie headers are scrubbed from network logs before they reach the conversation.

2FA Support

OTO's get_2fa_code polls Twilio's SMS API for OTP codes — Claude clicks "Send Code" in the browser, the tool watches for the SMS, extracts the code, and types it in. Requires a Twilio account with an SMS number set as the 2FA recipient.

Anti-Detection

OTO uses Botasaurus under the hood, which handles browser fingerprinting and detection evasion automatically. Sites that block Selenium and Playwright see a normal browser session.

Comparison

OTO Playwright MCP Chrome Extension MCP Selenium / scripts
Works on sites you don't control Yes Limited — your own app Limited — your active session Blocked by detection
Page discovery Compact scout (~200 tokens) Full screenshot (~124,000 tokens) You provide selectors You provide selectors
Credential safety Never in conversation Plaintext in context Plaintext in context In your script
Anti-detection Built-in None None None
2FA Built-in No No You build it
Export to script One command No No You write it
Cross-platform scheduling One command No No You configure it

Benchmarks

Task OTO tokens Playwright MCP tokens Reduction Wall-clock Success
Fact lookup (Wikipedia) ~1,264 ~124,000 98% fewer 11.0s 3/3
Form fill + verify (httpbin) ~3,799 ~124,000 97% fewer 25.2s 3/3

Claude Sonnet 4.6, 3 runs each, wall-clock = browser time only (excludes model reasoning). Playwright MCP baseline is a single full-page screenshot. Full results: v0.2

Security

  • Credential isolationfill_secret reads from .env server-side; passwords never enter the conversation
  • Header redaction — Authorization, Cookie, and API key headers scrubbed from network logs
  • Export scrubbing — credentials parameterized as environment variable references
  • URL validation — blocks file://, javascript://, cloud metadata endpoints, and localhost (opt-in via OTO_ALLOW_LOCALHOST)
  • Path traversal protection — validates all file paths
  • Invisible character stripping — removes zero-width Unicode that could hide prompt injection
  • Content boundary markers — wraps web-sourced data to distinguish data from instructions

Install

Prerequisites

  1. Claude Code installed and working
  2. Python 3.11+ (python --version)
  3. uv (pip install uv)
  4. Google Chrome
  5. Node.js

Two commands

/plugin marketplace add stemado/oto
/plugin install oto@oto

Restart Claude Code. Done.

Updating

/plugin marketplace update oto
/plugin install oto@oto

Tools

Tool What it does
launch_session Open a browser (headed or headless, optional proxy)
scout_page_tool Structural page overview: iframes, shadow DOM, element counts (~200 tokens)
find_elements Search for elements by text, type, or CSS selector
execute_action_tool Click, type, select, navigate, scroll, hover, wait
fill_secret Type credentials from .env without exposing them in conversation
get_2fa_code Retrieve a 2FA OTP code from Twilio SMS
execute_javascript Run arbitrary JS in the page context
take_screenshot Capture the page for Claude to see
inspect_element Deep-inspect visibility, overlays, shadow DOM, ARIA
process_download Convert and move downloaded files
get_session_history Export the full session as a structured workflow log
monitor_network Watch HTTP traffic to discover API endpoints under the UI
record_video Record the browser session as MP4
close_session Close the browser and release resources

Slash Commands

Command What it does
/oto:demo Run a full capabilities demo
/oto:scout <url> Launch a browser and scout any page
/oto:export [name] Export the current session as a replayable workflow
/oto:schedule [list|name|delete] Schedule an exported workflow
/oto:report [bug|feature|friction] File a GitHub issue with diagnostics
/oto:landscape <url> Run a structured landscape analysis of a product or website
/oto:benchmark Run performance benchmarks

License

MIT