Browser Service

TL;DR

The Browser service automates Camoufox browsers on remote machines. Launch headless or visible browsers, navigate pages, interact with forms (click, type, select), extract data, capture screenshots, intercept network requests, and generate PDFs. Includes a Capabilities API for organized access to scrolling, input, timing, DOM parsing, network capture, and visual debugging features.

Automate browsers on remote machines using the SDK.

What are the prerequisites?

The remote machine must have:

CMDOP agent installed
Camoufox browser (bundled or installed separately)

How do I launch a browser?


from cmdop import AsyncCMDOPClient
 
async with AsyncCMDOPClient.remote(api_key="cmd_xxx") as client:
    await client.terminal.set_machine("my-server")
 
    # Launch browser
    browser = await client.browser.launch()

What launch options are available?


browser = await client.browser.launch(
    headless=True,          # Run without display
    proxy="http://proxy:8080",  # Route traffic through proxy
    user_data_dir="/tmp/browser-profile",  # Persist browser data across sessions
    viewport={"width": 1920, "height": 1080}  # Set browser window size
)

How do I navigate pages?


# Navigate to URL
page = await browser.new_page()
await page.goto("https://example.com")
 
# Wait for load
await page.wait_for_load_state("networkidle")
 
# Get current URL
url = page.url
print(f"Current URL: {url}")

How do I interact with page elements?

How do I click elements?


# Click by selector
await page.click("button.submit")
 
# Click by text
await page.click("text=Sign In")
 
# Click at coordinates
await page.click(position={"x": 100, "y": 200})

How do I type text?


# Type in input
await page.fill("input[name='email']", "[email protected]")
await page.fill("input[name='password']", "password")
 
# Type with delay (simulate human)
await page.type("input[name='search']", "query", delay=100)


# Select dropdown option by value attribute
await page.select_option("select#country", "US")
 
# Select dropdown option by visible label text
await page.select_option("select#country", label="United States")

How do I handle checkboxes?


# Check a checkbox element
await page.check("input[type='checkbox']")
 
# Uncheck a checkbox element
await page.uncheck("input[type='checkbox']")

How do I wait for elements?


# Wait for element
await page.wait_for_selector(".result")
 
# Wait for element to be visible
await page.wait_for_selector(".modal", state="visible")
 
# Wait for element to disappear
await page.wait_for_selector(".loading", state="hidden")
 
# Wait with timeout
await page.wait_for_selector(".data", timeout=10000)

How do I extract data from pages?


# Get text content
text = await page.text_content(".article")
 
# Get all matching elements
items = await page.query_selector_all(".product")
for item in items:
    name = await item.text_content(".name")
    price = await item.text_content(".price")
    print(f"{name}: {price}")
 
# Get attribute
href = await page.get_attribute("a.link", "href")
 
# Get input value
value = await page.input_value("input#search")

How do I capture screenshots?


# Full page screenshot
await page.screenshot(path="./screenshot.png")
 
# Element screenshot
element = await page.query_selector(".chart")
await element.screenshot(path="./chart.png")
 
# Get screenshot as bytes
screenshot = await page.screenshot()

How do I work with network requests?

How do I intercept requests?


# Define a route handler that blocks image requests
async def handle_route(route):
    if route.request.resource_type == "image":
        await route.abort()  # Block the image request
    else:
        await route.continue_()  # Allow all other requests
 
# Apply the route handler to all URLs
await page.route("**/*", handle_route)

How do I capture network traffic?


# Store captured network requests in a list
requests = []
 
# Define listener that records each request's URL and method
def on_request(request):
    requests.append({
        "url": request.url,
        "method": request.method
    })
 
# Attach the listener to the page's request event
page.on("request", on_request)
 
# Navigate triggers the listener for every outgoing request
await page.goto("https://example.com")
 
print(f"Captured {len(requests)} requests")

How do I wait for a response?


# Wait for a response matching the URL pattern while clicking
async with page.expect_response("**/api/data") as response_info:
    await page.click("button.load")  # This click triggers the API call
 
# Retrieve the matched response and parse its JSON body
response = await response_info.value
data = await response.json()

How do I execute JavaScript?


# Run JavaScript and return the page title
result = await page.evaluate("document.title")
 
# Pass a Python argument into the JavaScript function
result = await page.evaluate(
    "(selector) => document.querySelectorAll(selector).length",
    ".item"  # This value is passed as the 'selector' parameter
)
 
# Modify page by scrolling to the bottom
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")

How do I work with multiple pages?


# Open new page
page2 = await browser.new_page()
await page2.goto("https://other-site.com")
 
# Switch between pages
pages = browser.pages
for page in pages:
    print(page.url)

How do I generate PDFs?


# Generate PDF with default settings
await page.pdf(path="./page.pdf")
 
# Generate PDF with custom format, background colors, and margins
await page.pdf(
    path="./page.pdf",
    format="A4",               # Paper size format
    print_background=True,     # Include CSS background colors/images
    margin={"top": "1cm", "bottom": "1cm"}  # Page margins
)

How do I handle authentication?


# Create a browser context with HTTP basic auth credentials
context = await browser.new_context(
    http_credentials={
        "username": "user",
        "password": "pass"
    }
)
# Pages opened in this context auto-send auth headers
page = await context.new_page()

How do I manage cookies?


# Get cookies
cookies = await context.cookies()
 
# Set cookies
await context.add_cookies([{
    "name": "session",
    "value": "abc123",
    "domain": "example.com"
}])
 
# Clear cookies
await context.clear_cookies()

Error Handling


from cmdop.exceptions import BrowserError, TimeoutError
 
try:
    # Attempt to click with a 5-second timeout
    await page.click("button.submit", timeout=5000)
except TimeoutError:
    # Raised when the element isn't found within the timeout
    print("Button not found within timeout")
except BrowserError as e:
    # Catch-all for other browser-related errors
    print(f"Browser error: {e}")

How do I close the browser?


# Close page
await page.close()
 
# Close browser
await browser.close()

What does a web scraping example look like?


async def scrape_products(url: str):
    # Connect to remote CMDOP agent
    async with AsyncCMDOPClient.remote(api_key="cmd_xxx") as client:
        await client.terminal.set_machine("scraper-server")
 
        # Launch headless browser for background scraping
        browser = await client.browser.launch(headless=True)
        page = await browser.new_page()
 
        # Navigate and wait for product elements to render
        await page.goto(url)
        await page.wait_for_selector(".product")
 
        # Extract name and price from each product element
        products = []
        items = await page.query_selector_all(".product")
 
        for item in items:
            name = await item.text_content(".name")
            price = await item.text_content(".price")
            products.append({"name": name, "price": price})
 
        # Clean up browser resources
        await browser.close()
        return products

What is the Capabilities API?

The browser session exposes capabilities for organized access:

How does the Scroll capability work?


# Scroll down
await session.scroll.js("down", pixels=500)
 
# Scroll to bottom
await session.scroll.to_bottom()
 
# Scroll to element
await session.scroll.to_element(".footer")
 
# Get scroll info
info = await session.scroll.info()
print(f"Position: {info.position}, Max: {info.max_scroll}")
 
# Infinite scroll data collection
async for data in session.scroll.collect():
    process(data)

How does the Input capability work?


# Click via JavaScript
await session.input.click_js("button.submit")
 
# Press key
await session.input.key("Escape")
await session.input.key("Enter")
 
# Click all matching elements
await session.input.click_all(".checkbox")
 
# Hover
await session.input.hover(".tooltip-trigger")

How does the Timing capability work?


# Wait milliseconds
await session.timing.wait(500)
 
# Wait seconds
await session.timing.seconds(2)
 
# Random delay (human-like)
await session.timing.random(min_ms=200, max_ms=500)
 
# Set timeout for operations
await session.timing.timeout(10000)

How does the DOM capability work?


# Get raw HTML
html = await session.dom.html()
 
# Get text content
text = await session.dom.text()
 
# Get BeautifulSoup object
soup = await session.dom.soup()
titles = soup.select("h2.title")
 
# Parse structured data
data = await session.dom.parse()
 
# Select elements
elements = await session.dom.select(".product")
 
# Close modal popups
await session.dom.close_modal()
 
# Extract with patterns
results = await session.dom.extract({
    "title": "h1",
    "price": ".price",
    "description": ".desc"
})

How does the Fetch capability work?


# Fetch JSON from page context
data = await session.fetch.json("/api/data")
 
# Execute multiple requests
results = await session.fetch.all([
    {"url": "/api/users"},
    {"url": "/api/products"},
])

How does the Network capability work?


# Enable network capture
await session.network.enable(
    max_exchanges=500,
    max_response_size=5_000_000
)
 
# Navigate and capture
await session.navigate("https://example.com")
 
# Get all captured requests
exchanges = await session.network.get_all()
for ex in exchanges:
    print(f"{ex.method} {ex.url} -> {ex.status}")
 
# Filter by pattern
api_calls = await session.network.filter(
    url_pattern="/api/",
    resource_types=["xhr", "fetch"]
)
 
# Get last request
last = await session.network.last()
 
# Get statistics
stats = await session.network.stats()
print(f"Total: {stats.total_requests}, Bytes: {stats.total_bytes}")
 
# Clear captured data
await session.network.clear()
 
# Disable capture
await session.network.disable()

How does the Visual capability work?


# Show toast message
await session.visual.toast("Processing...")
 
# Countdown timer
await session.visual.countdown(seconds=5, message="Loading")
 
# Visual click indicator
await session.visual.click(x=100, y=200)
 
# Move cursor indicator
await session.visual.move(x=300, y=400)
 
# Highlight element
await session.visual.highlight(".important")
await session.visual.hide_highlight()
 
# Clear mouse trail
await session.visual.clear_trail()

How do I use the Network Analyzer?

Discover API endpoints from network traffic:


from cmdop.helpers import NetworkAnalyzer
 
# Create analyzer bound to the current browser session
analyzer = NetworkAnalyzer(session)
 
# Navigate to URL and capture network traffic for 30 seconds
snapshot = await analyzer.capture(
    url="https://example.com/products",
    wait_seconds=30,          # Duration to record traffic
    url_pattern="/api/",      # Only capture URLs containing /api/
    same_origin=True,         # Exclude third-party requests
    min_size=100,             # Ignore responses smaller than 100 bytes
)
 
# Identify the most data-rich API endpoint from captured traffic
best = snapshot.best_api()
if best:
    print(f"API: {best.url}")
    print(f"Items: {best.item_count}")
    print(f"Fields: {best.item_fields}")
 
    # Generate ready-to-use reproduction code for the endpoint
    print(best.to_curl())     # Output as curl command
    print(best.to_httpx())    # Output as Python httpx code
 
# Access all captured browser state alongside network data
print(f"Cookies: {snapshot.cookies}")
print(f"Local Storage: {snapshot.local_storage}")
print(f"Total Requests: {snapshot.total_requests}")

AI Agent Service — AI operations
Streaming — Real-time events