Skip to Content

Browser Service

TL;DR

The Browser service automates Camoufox browsers on remote machines. Launch headless or visible browsers, navigate pages, interact with forms (click, type, select), extract data, capture screenshots, intercept network requests, and generate PDFs. Includes a Capabilities API for organized access to scrolling, input, timing, DOM parsing, network capture, and visual debugging features.

Automate browsers on remote machines using the SDK.

What are the prerequisites?

The remote machine must have:

  • CMDOP agent installed
  • Camoufox browser (bundled or installed separately)

How do I launch a browser?

from cmdop import AsyncCMDOPClient async with AsyncCMDOPClient.remote(api_key="cmd_xxx") as client: await client.terminal.set_machine("my-server") # Launch browser browser = await client.browser.launch()

What launch options are available?

browser = await client.browser.launch( headless=True, # Run without display proxy="http://proxy:8080", # Route traffic through proxy user_data_dir="/tmp/browser-profile", # Persist browser data across sessions viewport={"width": 1920, "height": 1080} # Set browser window size )

How do I navigate pages?

# Navigate to URL page = await browser.new_page() await page.goto("https://example.com") # Wait for load await page.wait_for_load_state("networkidle") # Get current URL url = page.url print(f"Current URL: {url}")

How do I interact with page elements?

How do I click elements?

# Click by selector await page.click("button.submit") # Click by text await page.click("text=Sign In") # Click at coordinates await page.click(position={"x": 100, "y": 200})

How do I type text?

# Type in input await page.fill("input[name='email']", "[email protected]") await page.fill("input[name='password']", "password") # Type with delay (simulate human) await page.type("input[name='search']", "query", delay=100)

How do I select dropdown options?

# Select dropdown option by value attribute await page.select_option("select#country", "US") # Select dropdown option by visible label text await page.select_option("select#country", label="United States")

How do I handle checkboxes?

# Check a checkbox element await page.check("input[type='checkbox']") # Uncheck a checkbox element await page.uncheck("input[type='checkbox']")

How do I wait for elements?

# Wait for element await page.wait_for_selector(".result") # Wait for element to be visible await page.wait_for_selector(".modal", state="visible") # Wait for element to disappear await page.wait_for_selector(".loading", state="hidden") # Wait with timeout await page.wait_for_selector(".data", timeout=10000)

How do I extract data from pages?

# Get text content text = await page.text_content(".article") # Get all matching elements items = await page.query_selector_all(".product") for item in items: name = await item.text_content(".name") price = await item.text_content(".price") print(f"{name}: {price}") # Get attribute href = await page.get_attribute("a.link", "href") # Get input value value = await page.input_value("input#search")

How do I capture screenshots?

# Full page screenshot await page.screenshot(path="./screenshot.png") # Element screenshot element = await page.query_selector(".chart") await element.screenshot(path="./chart.png") # Get screenshot as bytes screenshot = await page.screenshot()

How do I work with network requests?

How do I intercept requests?

# Define a route handler that blocks image requests async def handle_route(route): if route.request.resource_type == "image": await route.abort() # Block the image request else: await route.continue_() # Allow all other requests # Apply the route handler to all URLs await page.route("**/*", handle_route)

How do I capture network traffic?

# Store captured network requests in a list requests = [] # Define listener that records each request's URL and method def on_request(request): requests.append({ "url": request.url, "method": request.method }) # Attach the listener to the page's request event page.on("request", on_request) # Navigate triggers the listener for every outgoing request await page.goto("https://example.com") print(f"Captured {len(requests)} requests")

How do I wait for a response?

# Wait for a response matching the URL pattern while clicking async with page.expect_response("**/api/data") as response_info: await page.click("button.load") # This click triggers the API call # Retrieve the matched response and parse its JSON body response = await response_info.value data = await response.json()

How do I execute JavaScript?

# Run JavaScript and return the page title result = await page.evaluate("document.title") # Pass a Python argument into the JavaScript function result = await page.evaluate( "(selector) => document.querySelectorAll(selector).length", ".item" # This value is passed as the 'selector' parameter ) # Modify page by scrolling to the bottom await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")

How do I work with multiple pages?

# Open new page page2 = await browser.new_page() await page2.goto("https://other-site.com") # Switch between pages pages = browser.pages for page in pages: print(page.url)

How do I generate PDFs?

# Generate PDF with default settings await page.pdf(path="./page.pdf") # Generate PDF with custom format, background colors, and margins await page.pdf( path="./page.pdf", format="A4", # Paper size format print_background=True, # Include CSS background colors/images margin={"top": "1cm", "bottom": "1cm"} # Page margins )

How do I handle authentication?

# Create a browser context with HTTP basic auth credentials context = await browser.new_context( http_credentials={ "username": "user", "password": "pass" } ) # Pages opened in this context auto-send auth headers page = await context.new_page()

How do I manage cookies?

# Get cookies cookies = await context.cookies() # Set cookies await context.add_cookies([{ "name": "session", "value": "abc123", "domain": "example.com" }]) # Clear cookies await context.clear_cookies()

Error Handling

from cmdop.exceptions import BrowserError, TimeoutError try: # Attempt to click with a 5-second timeout await page.click("button.submit", timeout=5000) except TimeoutError: # Raised when the element isn't found within the timeout print("Button not found within timeout") except BrowserError as e: # Catch-all for other browser-related errors print(f"Browser error: {e}")

How do I close the browser?

# Close page await page.close() # Close browser await browser.close()

What does a web scraping example look like?

async def scrape_products(url: str): # Connect to remote CMDOP agent async with AsyncCMDOPClient.remote(api_key="cmd_xxx") as client: await client.terminal.set_machine("scraper-server") # Launch headless browser for background scraping browser = await client.browser.launch(headless=True) page = await browser.new_page() # Navigate and wait for product elements to render await page.goto(url) await page.wait_for_selector(".product") # Extract name and price from each product element products = [] items = await page.query_selector_all(".product") for item in items: name = await item.text_content(".name") price = await item.text_content(".price") products.append({"name": name, "price": price}) # Clean up browser resources await browser.close() return products

What is the Capabilities API?

The browser session exposes capabilities for organized access:

How does the Scroll capability work?

# Scroll down await session.scroll.js("down", pixels=500) # Scroll to bottom await session.scroll.to_bottom() # Scroll to element await session.scroll.to_element(".footer") # Get scroll info info = await session.scroll.info() print(f"Position: {info.position}, Max: {info.max_scroll}") # Infinite scroll data collection async for data in session.scroll.collect(): process(data)

How does the Input capability work?

# Click via JavaScript await session.input.click_js("button.submit") # Press key await session.input.key("Escape") await session.input.key("Enter") # Click all matching elements await session.input.click_all(".checkbox") # Hover await session.input.hover(".tooltip-trigger")

How does the Timing capability work?

# Wait milliseconds await session.timing.wait(500) # Wait seconds await session.timing.seconds(2) # Random delay (human-like) await session.timing.random(min_ms=200, max_ms=500) # Set timeout for operations await session.timing.timeout(10000)

How does the DOM capability work?

# Get raw HTML html = await session.dom.html() # Get text content text = await session.dom.text() # Get BeautifulSoup object soup = await session.dom.soup() titles = soup.select("h2.title") # Parse structured data data = await session.dom.parse() # Select elements elements = await session.dom.select(".product") # Close modal popups await session.dom.close_modal() # Extract with patterns results = await session.dom.extract({ "title": "h1", "price": ".price", "description": ".desc" })

How does the Fetch capability work?

# Fetch JSON from page context data = await session.fetch.json("/api/data") # Execute multiple requests results = await session.fetch.all([ {"url": "/api/users"}, {"url": "/api/products"}, ])

How does the Network capability work?

# Enable network capture await session.network.enable( max_exchanges=500, max_response_size=5_000_000 ) # Navigate and capture await session.navigate("https://example.com") # Get all captured requests exchanges = await session.network.get_all() for ex in exchanges: print(f"{ex.method} {ex.url} -> {ex.status}") # Filter by pattern api_calls = await session.network.filter( url_pattern="/api/", resource_types=["xhr", "fetch"] ) # Get last request last = await session.network.last() # Get statistics stats = await session.network.stats() print(f"Total: {stats.total_requests}, Bytes: {stats.total_bytes}") # Clear captured data await session.network.clear() # Disable capture await session.network.disable()

How does the Visual capability work?

# Show toast message await session.visual.toast("Processing...") # Countdown timer await session.visual.countdown(seconds=5, message="Loading") # Visual click indicator await session.visual.click(x=100, y=200) # Move cursor indicator await session.visual.move(x=300, y=400) # Highlight element await session.visual.highlight(".important") await session.visual.hide_highlight() # Clear mouse trail await session.visual.clear_trail()

How do I use the Network Analyzer?

Discover API endpoints from network traffic:

from cmdop.helpers import NetworkAnalyzer # Create analyzer bound to the current browser session analyzer = NetworkAnalyzer(session) # Navigate to URL and capture network traffic for 30 seconds snapshot = await analyzer.capture( url="https://example.com/products", wait_seconds=30, # Duration to record traffic url_pattern="/api/", # Only capture URLs containing /api/ same_origin=True, # Exclude third-party requests min_size=100, # Ignore responses smaller than 100 bytes ) # Identify the most data-rich API endpoint from captured traffic best = snapshot.best_api() if best: print(f"API: {best.url}") print(f"Items: {best.item_count}") print(f"Fields: {best.item_fields}") # Generate ready-to-use reproduction code for the endpoint print(best.to_curl()) # Output as curl command print(best.to_httpx()) # Output as Python httpx code # Access all captured browser state alongside network data print(f"Cookies: {snapshot.cookies}") print(f"Local Storage: {snapshot.local_storage}") print(f"Total Requests: {snapshot.total_requests}")

Next

Last updated on