MetaVision API Documentation
Introduction
MetaVision is an advanced Multi-Modal AI analysis engine with a built-in content optimization pipeline and optional audio transcription. All media is automatically processed through the pipeline before VLM analysis, ensuring optimal quality and token efficiency.
MetaVision uses an asynchronous queue-based architecture. Submit, then poll for results.
/api/v1/analyze. You receive a 202 Accepted with a job_id.optimizing). Metadata including codec info, face detection, and thumbnails are extracted.transcribe: true, audio is transcribed with optional diarization (status: transcribing).analyzing).completed, the response includes the full analysis results, transcript, pipeline metadata, and asset URLs.Authentication
All API requests require a Bearer token:
Authorization: Bearer YOUR_API_KEY
Submit Analysis
POST /api/v1/analyze
Media Source (one required)
| Parameter | Type | Description |
|---|---|---|
media | string | A publicly accessible HTTP URL pointing to media. |
media_base64 | string | Raw base64-encoded media data. Max 2MB. Requires media_type. |
media_type | string | Required. The type of media: image, video, or audio. |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
categories | array[string] | (All) | List of Category IDs to execute. |
custom_category | string | null | Custom analysis instruction (max 2000 chars). |
system_prompt | string | null | Override system prompt. Use {{CATEGORY_PROMPT}} placeholder. |
model | string | null | Force a specific Agent UUID. |
detail | string | "balanced" | Quality profile ID controlling optimization and billing. |
transcribe | boolean | false | Enable audio transcription (audio/video only). Billed as flat +1× price. |
transcribe_diarize | boolean | false | Enable speaker diarization in transcription. |
transcribe_align | boolean | false | Enable word-level timestamp alignment in transcription. |
Submit Response (202 Accepted)
{
"job_id": "a1b2c3d4-...",
"status": "queued",
"created_at": "2024-03-20T10:00:00.000Z",
"poll_url": "/api/v1/jobs/a1b2c3d4-..."
}
Poll for Results
GET /api/v1/jobs/{job_id}
Response Statuses
| Status | Description |
|---|---|
queued | Job is waiting in the queue. Keep polling. |
optimizing | Media is being processed by the optimization pipeline. Keep polling. |
transcribing | Audio is being transcribed. Keep polling. |
analyzing | Optimized content is being analyzed by VLM agents. Keep polling. |
completed | Analysis complete. Response includes meta, data, pipeline_info, and pipeline_assets. |
failed | Job failed. Response includes error. |
Completed Response
{
"job_id": "a1b2c3d4-...",
"status": "completed",
"meta": {
"request_id": "a1b2c3d4-...",
"model_name": "general-v2",
"execution_time": 8.45,
"successful_categories": 2,
"failed_categories": 0,
"total_tokens_in": 1500,
"total_tokens_out": 300,
"estimated_cost": 0.0045
},
"data": {
"title": { "result": "Sunset over a mountain range" },
"custom_category": { "result": "The mood is serene." },
"transcript": {
"segments": [
{ "start": 0.0, "end": 2.5, "text": "Hello world", "speaker": "SPEAKER_01" }
],
"detected_language": "en",
"language_probability": 0.997
}
},
"pipeline_info": {
"metadata": {
"streams": [
{ "codec_type": "video", "codec_name": "h264", "width": 1280, "height": 720 },
{ "codec_type": "audio", "codec_name": "aac", "sample_rate": "44100", "channels": 2 }
],
"format": { "format_name": "mov,mp4", "duration": "219.286", "size": "82989735" }
},
"faces": {
"faces": [
{ "id": 1, "file": "face_001.jpg", "appearances": 3, "first_seen_time": "00:00:04" }
]
},
"download_info": {
"title": "Video Title",
"description": "Video description...",
"thumbnail": "https://example.com/thumb.jpg",
"duration": 219.0,
"webpage_url": "https://example.com/video"
}
},
"pipeline_assets": {
"thumbnail.jpg": "https://s3.../thumbnail.jpg",
"waveform.png": "https://s3.../waveform.png",
"faces/face_001.jpg": "https://s3.../faces/face_001.jpg",
"thumbnails/0001.jpg": "https://s3.../thumbnails/0001.jpg"
}
}
The transcript key only appears in data when transcribe: true was set. The pipeline_info object contains media metadata extracted by the optimization pipeline (codec info, face detection, source metadata). The pipeline_assets object contains URLs to generated assets like thumbnails, waveform, and face crops.
Account Balance
Retrieve your usage statistics grouped by time period.
GET /api/v1/account/balance
Optional Query Parameters
| Parameter | Type | Description |
|---|---|---|
from | string | Start date for custom range (YYYY-MM-DD) |
to | string | End date for custom range (YYYY-MM-DD) |
Response
{
"api_key_prefix": "e59e5c0d-...",
"description": "Production App",
"overall": {
"total_requests": 1250,
"total_success": 1200,
"total_failed": 50,
"total_billed": 12.50
},
"today": { ... },
"week": { ... },
"month": { ... },
"breakdown_by_type": [
{ "media_type": "image", "request_count": 800, "success_count": 790, "failed_count": 10, "total_billed": 4.00 },
{ "media_type": "video", "request_count": 450, "success_count": 410, "failed_count": 40, "total_billed": 8.50 }
],
"custom_range": null
}
When from and to are provided, the custom_range field contains the aggregated stats for that date range.
Pricing
Retrieve your configured per-category prices and available quality profiles.
GET /api/v1/account/prices
Response
{
"price_per_image": 0.005,
"price_per_video": 0.01,
"price_per_audio": 0.008,
"quality_profiles": [
{ "id": "low", "name": "Low", "billing_multiplier": 1 },
{ "id": "balanced", "name": "Balanced", "billing_multiplier": 1 },
{ "id": "high", "name": "High", "billing_multiplier": 2 },
{ "id": "ultra", "name": "Ultra", "billing_multiplier": 4 }
],
"billing_note": "Each category is billed at price_per_type × quality_multiplier. Transcription adds a flat 1× price_per_type regardless of quality level."
}
Request History
List your past analysis requests with pagination and filters. You can also fetch full results or cancel queued jobs.
List Requests
GET /api/v1/account/requests
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
page | integer | 1 | Page number |
limit | integer | 50 | Results per page (max 100) |
status | string | all | Filter: success, partial, failed |
media_type | string | all | Filter: image, video, audio |
from | string | - | Start date (YYYY-MM-DD) |
to | string | - | End date (YYYY-MM-DD) |
Response
{
"requests": [
{
"request_id": "a1b2c3d4-...",
"status": "success",
"media_type": "video",
"categories": "[\"title\",\"description\"]",
"quality_profile": "balanced",
"billed_amount": 0.02,
"execution_time": 8.4,
"successful_categories": 2,
"failed_categories": 0,
"tokens_in": 1500,
"tokens_out": 300,
"agent_name": "general-v3",
"created_at": "2024-03-20T10:00:00.000Z"
}
],
"total": 156,
"page": 1,
"limit": 50
}
Get Request Detail
GET /api/v1/account/requests/{request_id}
Returns the full analysis result including all category data, pipeline info, and assets. Checks active jobs first (KV), then falls back to the log database (D1) for completed historical requests.
Response (Completed)
{
"request_id": "a1b2c3d4-...",
"status": "completed",
"media_type": "video",
"agent_name": "general-v3",
"categories": ["title", "description"],
"quality_profile": "balanced",
"created_at": "...",
"meta": {
"execution_time": 8.45,
"tokens_in": 1500,
"tokens_out": 300,
"billed_amount": 0.02,
"successful_categories": 2,
"failed_categories": 0
},
"data": {
"title": { "result": "..." },
"description": { "result": "..." }
},
"pipeline_info": { ... },
"pipeline_assets": {
"thumbnail.jpg": "https://s3.../thumbnail.jpg",
"waveform.png": "https://s3.../waveform.png"
}
}
Cancel a Queued Job
POST /api/v1/account/requests/{request_id}/cancel
Cancel a job that is still in queued status. Jobs that have already started processing cannot be cancelled.
Response
{
"success": true,
"job_id": "a1b2c3d4-...",
"status": "cancelled"
}
Transcription
MetaVision includes an integrated speech transcription pipeline for audio and video content. When enabled, the transcript is automatically provided to VLM agents for richer context, and included in the response.
Features
- Automatic language detection
- Word-level timestamp alignment (optional)
- Speaker diarization with speaker identification (optional)
- Configurable voice activity detection
How it works
Set transcribe: true on your request. The pipeline runs after media optimization and before VLM analysis. The transcript text is included in the VLM prompt so the model has both the media and the spoken content.
Billing
Transcription is billed as a flat 1× price_per_type per request, regardless of the quality profile multiplier. For example, if you request 3 categories with a 2× quality profile plus transcription, billing = (3 × price × 2) + (1 × price).
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
transcribe | boolean | false | Enable transcription. Only for audio and video. |
transcribe_diarize | boolean | false | Assign speaker IDs to segments. |
transcribe_align | boolean | false | Enable word-level timestamp alignment. |
Transcript Response Format
{
"transcript": {
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Hello, how are you?",
"speaker": "SPEAKER_01",
"words": [
{"word": "Hello,", "start": 0.1, "end": 0.5, "speaker": "SPEAKER_01"},
{"word": "how", "start": 0.6, "end": 0.9, "speaker": "SPEAKER_01"}
]
}
],
"detected_language": "en",
"language_probability": 0.997,
"speakers": {
"SPEAKER_01": {"name": "SPEAKER_01", "time": 2.5}
}
}
}
The words array is only present when transcribe_align: true. The speaker field and speakers object are only present when transcribe_diarize: true.
Pipeline Info & Assets
When media is processed through the optimization pipeline, the completed response includes two additional fields:
pipeline_info
A JSON object containing rich metadata extracted during optimization. The structure varies by media type and source, but commonly includes:
metadata.streams[]— Video and audio stream details (codec, resolution, framerate, sample rate, channels, bitrate)metadata.format— Container format info (format name, duration, file size, overall bitrate, tags/title/description)faces— Face detection results including unique face IDs, appearance counts, and timestamps of first/last appearancedownload_info— Source URL metadata (title, description, thumbnail, webpage URL, duration) when media was downloaded from a web URL
Not all fields are present for every request. Image analysis won't have video streams or faces. Direct URL uploads may not have download_info. Always check for the existence of fields before accessing them.
pipeline_assets
A map of filename → URL for generated assets. Common assets include:
| Key Pattern | Description |
|---|---|
thumbnail.jpg | Main thumbnail image |
thumbnails/0001.jpg … | Scene thumbnails extracted from video |
waveform.png | Audio waveform visualization |
faces/face_001.jpg … | Cropped face images from face detection |
Quality Profiles
The detail parameter selects a quality profile that controls how media is optimized and how billing is calculated.
Each category counts as billing_multiplier × price_per_category. Transcription is always billed at a flat 1× price regardless of the quality profile.
Analysis Categories
Available Models
Error Handling
| Status | Code | Description |
|---|---|---|
| 400 | invalid_request | Malformed JSON, missing media, invalid URL or base64, invalid model UUID, invalid quality profile, or transcription requested for image media. |
| 400 | invalid_categories | Requested categories do not exist or are disabled. |
| 401 | invalid_api_key | Missing or incorrect Authorization header. |
| 403 | account_disabled | API key disabled. |
| 404 | not_found | Job not found or expired. |
| 500 | internal_error | Server, pipeline, or transcription configuration error. |
Code Examples
import requests, time, base64
API_KEY = "your_api_key"
BASE = "https://metavision.vip.api.efficientstack.com/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
# URL-based submission with transcription
payload = {
"media": "https://example.com/video.mp4",
"media_type": "video",
"categories": ["title", "description"],
"detail": "balanced",
"transcribe": True,
"transcribe_diarize": True,
"transcribe_align": False
}
resp = requests.post(f"{BASE}/analyze", json=payload, headers=HEADERS)
job = resp.json()
print(f"Job: {job['job_id']}")
while True:
result = requests.get(f"{BASE}/jobs/{job['job_id']}", headers=HEADERS).json()
print(f"Status: {result['status']}")
if result["status"] == "completed":
print("Results:", result["data"])
if "transcript" in result["data"]:
print("Transcript:", result["data"]["transcript"])
if result.get("pipeline_info"):
print("Pipeline Info:", result["pipeline_info"])
if result.get("pipeline_assets"):
print("Assets:", result["pipeline_assets"])
break
elif result["status"] == "failed":
print("Error:", result.get("error"))
break
time.sleep(2)
# Check account balance
balance = requests.get(f"{BASE}/account/balance", headers=HEADERS).json()
print(f"Total billed: ${balance['overall']['total_billed']}")
# Check prices
prices = requests.get(f"{BASE}/account/prices", headers=HEADERS).json()
print(f"Image price: ${prices['price_per_image']}/category")
# Browse request history
history = requests.get(f"{BASE}/account/requests?limit=10", headers=HEADERS).json()
for req in history["requests"]:
print(f"{req['request_id'][:12]}... {req['status']} {req['media_type']}")
# Get full result for a past request (includes pipeline_info and pipeline_assets)
detail = requests.get(f"{BASE}/account/requests/{job['job_id']}", headers=HEADERS).json()
print("Full result:", detail)
# Cancel a queued job
cancel = requests.post(f"{BASE}/account/requests/{job['job_id']}/cancel", headers=HEADERS).json()
print("Cancel:", cancel)
const API_KEY = 'your_api_key';
const BASE = 'https://metavision.vip.api.efficientstack.com/api/v1';
const HEADERS = {'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json'};
async function analyzeMedia() {
const resp = await fetch(`${BASE}/analyze`, {
method: 'POST', headers: HEADERS,
body: JSON.stringify({
media: 'https://example.com/video.mp4',
media_type: 'video',
categories: ['title', 'description'],
detail: 'balanced',
transcribe: true,
transcribe_diarize: true
})
});
const job = await resp.json();
while (true) {
const result = await (await fetch(`${BASE}/jobs/${job.job_id}`, {headers: HEADERS})).json();
if (result.status === 'completed') {
console.log('Data:', result.data);
console.log('Pipeline Info:', result.pipeline_info);
console.log('Assets:', result.pipeline_assets);
return result;
}
if (result.status === 'failed') throw new Error(result.error);
await new Promise(r => setTimeout(r, 2000));
}
}
// Account balance
async function getBalance() {
return (await fetch(`${BASE}/account/balance`, {headers: HEADERS})).json();
}
// Prices
async function getPrices() {
return (await fetch(`${BASE}/account/prices`, {headers: HEADERS})).json();
}
// Request history
async function getHistory(page = 1) {
return (await fetch(`${BASE}/account/requests?page=${page}`, {headers: HEADERS})).json();
}
// Request detail (includes pipeline_info and pipeline_assets)
async function getRequestDetail(id) {
return (await fetch(`${BASE}/account/requests/${id}`, {headers: HEADERS})).json();
}
// Cancel a queued job
async function cancelJob(id) {
return (await fetch(`${BASE}/account/requests/${id}/cancel`, {method:'POST', headers: HEADERS})).json();
}
analyzeMedia();
# Submit analysis
JOB=$(curl -s -X POST "https://metavision.vip.api.efficientstack.com/api/v1/analyze" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"media":"https://example.com/video.mp4","media_type":"video","categories":["title"],"detail":"balanced","transcribe":true,"transcribe_diarize":true}')
JOB_ID=$(echo $JOB | jq -r '.job_id')
while true; do
RESULT=$(curl -s "https://metavision.vip.api.efficientstack.com/api/v1/jobs/$JOB_ID" -H "Authorization: Bearer YOUR_API_KEY")
STATUS=$(echo $RESULT | jq -r '.status')
echo "Status: $STATUS"
if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then echo $RESULT | jq .; break; fi
sleep 2
done
# View pipeline info from completed result
echo $RESULT | jq '.pipeline_info'
echo $RESULT | jq '.pipeline_assets'
# Account balance
curl -s "https://metavision.vip.api.efficientstack.com/api/v1/account/balance" -H "Authorization: Bearer YOUR_API_KEY" | jq .
# Account balance with date range
curl -s "https://metavision.vip.api.efficientstack.com/api/v1/account/balance?from=2024-01-01&to=2024-03-31" -H "Authorization: Bearer YOUR_API_KEY" | jq .
# Prices
curl -s "https://metavision.vip.api.efficientstack.com/api/v1/account/prices" -H "Authorization: Bearer YOUR_API_KEY" | jq .
# Request history
curl -s "https://metavision.vip.api.efficientstack.com/api/v1/account/requests?page=1&limit=10" -H "Authorization: Bearer YOUR_API_KEY" | jq .
# Request detail (includes pipeline_info and pipeline_assets)
curl -s "https://metavision.vip.api.efficientstack.com/api/v1/account/requests/$JOB_ID" -H "Authorization: Bearer YOUR_API_KEY" | jq .
# Cancel a queued job
curl -s -X POST "https://metavision.vip.api.efficientstack.com/api/v1/account/requests/$JOB_ID/cancel" -H "Authorization: Bearer YOUR_API_KEY" | jq .