Extract
TL;DR
The extract service uses AI to pull structured data from URLs or raw text content, returning fully typed results via Zod schemas or JSON Schema. Define your output shape with Zod, point it at a URL or paste in text, and get typed data back. Every extraction includes usage metrics (input/output tokens, latency). Supports complex nested schemas with arrays and optional fields.
Extract structured data from content using AI and Zod schemas. Get typed output with usage metrics.
How do I set up the extract service?
import { CMDOPClient } from '@cmdop/node';
import { z } from 'zod';
// Connect to the cloud relay (no machine needed -- extraction runs server-side)
const client = await CMDOPClient.remote({ apiKey: 'cmdop_xxx' });How do I extract data with a Zod schema?
// Define a Zod schema describing the shape of data to extract
const ProductInfo = z.object({
name: z.string(),
price: z.number(),
currency: z.string(),
features: z.array(z.string()),
inStock: z.boolean(),
});
// Run extraction against a URL -- AI reads the page and fills the schema
const result = await client.extract.runSchema({
prompt: 'Extract product information',
schema: ProductInfo,
url: 'https://example.com/product/123',
});
// result.data is fully typed as { name: string; price: number; ... }
console.log(result.data.name);
console.log(result.data.price);
console.log(result.data.features);How do I extract data with a JSON Schema?
// Use a plain JSON Schema object instead of Zod (output is untyped)
const result = await client.extract.runSchema({
prompt: 'Extract contact details',
jsonSchema: {
type: 'object',
properties: {
name: { type: 'string' },
email: { type: 'string', format: 'email' },
phone: { type: 'string' },
},
required: ['name', 'email'],
},
content: 'John Doe, [email protected], +1-555-0123', // Raw text input
});How do I extract from raw text content?
// Extract from a raw text string instead of a URL
const result = await client.extract.runSchema({
prompt: 'Extract all dates mentioned',
schema: z.object({
dates: z.array(z.object({
date: z.string(), // The extracted date string
context: z.string(), // Surrounding context explaining the date
})),
}),
content: longTextContent, // Pass raw text via the content parameter
});How do I access extraction metrics?
Every extraction result includes usage metrics:
const result = await client.extract.runSchema({
prompt: 'Extract product info',
schema: ProductInfo,
url: 'https://example.com/product',
});
// Access token usage and latency from the metrics object
console.log(result.metrics.inputTokens); // Tokens sent to the AI model
console.log(result.metrics.outputTokens); // Tokens generated by the AI model
console.log(result.metrics.latencyMs); // Total extraction time in millisecondsWhat parameters does runSchema() accept?
runSchema(options)
| Parameter | Type | Description |
|---|---|---|
options.prompt | string | Extraction instructions |
options.schema | ZodType | Zod schema for typed output |
options.jsonSchema | object | JSON Schema (alternative to Zod) |
options.url | string | URL to extract from |
options.content | string | Raw content to extract from |
Provide either url or content, not both.
Result
| Field | Type | Description |
|---|---|---|
data | T | Typed extracted data |
metrics.inputTokens | number | Input tokens used |
metrics.outputTokens | number | Output tokens used |
metrics.latencyMs | number | Extraction latency in ms |
Last updated on