This document provides detailed instructions for setting up, testing, and running the multimodal agents framework. It covers everything from API key configuration to hardware requirements and troubleshooting.
- Prerequisites
- Environment Setup
- API Key Configuration
- Hardware Requirements
- Running the Examples
- Testing Individual Components
- Common Issues & Troubleshooting
- Development Workflow
- Cost Management
- Production Deployment
| Software | Version | Purpose | Installation |
|---|---|---|---|
| Python | 3.11+ | Runtime | brew install python@3.11 or pyenv |
| pip/uv | Latest | Package manager | Comes with Python / pip install uv |
| ffmpeg | 6.0+ | Audio/video processing | brew install ffmpeg |
| PortAudio | Latest | Microphone access | brew install portaudio |
# Check Python version
python3 --version # Should be 3.11+
# Check ffmpeg
ffmpeg -version
# Check PortAudio (for microphone)
brew list portaudio # On macOS# Install Homebrew if not present
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install system dependencies
brew install python@3.11 ffmpeg portaudio
# Grant camera/microphone permissions
# System Preferences > Privacy & Security > Camera/Microphone
# Add Terminal.app or your IDE# Ubuntu/Debian
sudo apt update
sudo apt install python3.11 python3.11-venv python3-pip ffmpeg portaudio19-dev libportaudio2
# Fedora
sudo dnf install python3.11 ffmpeg portaudio-devel# Install uv
pip install uv
# Create virtual environment
cd multimodal-python-stack
uv venv
# Activate
source .venv/bin/activate # macOS/Linux
# or
.venv\Scripts\activate # Windows
# Install dependencies
uv pip install -r requirements.txt# Create virtual environment
python3.11 -m venv .venv
# Activate
source .venv/bin/activate
# Upgrade pip
pip install --upgrade pip
# Install dependencies
pip install -r requirements.txt# Install poetry
pip install poetry
# Install dependencies
poetry install
# Activate shell
poetry shell# Test imports
python -c "
from src.core.agent import AgentLoop
from src.models import create_model
from src.inputs import WebcamInput
print('All imports successful!')
"cp .env.example .envEdit .env with your actual API keys:
# ===========================================
# MODEL PROVIDERS
# ===========================================
# OpenAI - Required for GPT-4o and Whisper transcription
# Get key: https://platform.openai.com/api-keys
OPENAI_API_KEY=sk-proj-...
# Anthropic - Required for Claude models
# Get key: https://console.anthropic.com/settings/keys
ANTHROPIC_API_KEY=sk-ant-api03-...
# Google - Required for Gemini models
# Get key: https://aistudio.google.com/app/apikey
GOOGLE_API_KEY=AIza...
# Groq - Required for fast Llama inference
# Get key: https://console.groq.com/keys
GROQ_API_KEY=gsk_...
# Fireworks - Required for FireLLaVA
# Get key: https://fireworks.ai/api-keys
FIREWORKS_API_KEY=fw_...
# Together - Required for Together models
# Get key: https://api.together.xyz/settings/api-keys
TOGETHER_API_KEY=...
# ===========================================
# TOOL INTEGRATIONS
# ===========================================
# Slack Webhooks - For alert examples
# Create: https://api.slack.com/messaging/webhooks
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T.../B.../...
# Notion - For run-sheet logging examples
# Create integration: https://www.notion.so/my-integrations
# Then share a database with your integration
NOTION_API_KEY=secret_...
NOTION_DATABASE_ID=...# Test OpenAI
python -c "
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI()
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': 'Say hello'}],
max_tokens=10
)
print('OpenAI:', response.choices[0].message.content)
"
# Test Anthropic
python -c "
import os
from dotenv import load_dotenv
from anthropic import Anthropic
load_dotenv()
client = Anthropic()
response = client.messages.create(
model='claude-3-5-haiku-latest',
max_tokens=10,
messages=[{'role': 'user', 'content': 'Say hello'}]
)
print('Anthropic:', response.content[0].text)
"
# Test Google
python -c "
import os
from dotenv import load_dotenv
import google.generativeai as genai
load_dotenv()
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content('Say hello')
print('Google:', response.text)
"- Go to https://platform.openai.com/signup
- Create account or sign in
- Navigate to API Keys: https://platform.openai.com/api-keys
- Click "Create new secret key"
- Copy the key (starts with
sk-proj-) - Add billing: https://platform.openai.com/account/billing
Pricing (as of 2025):
- GPT-4o: $5.00/1M input, $15.00/1M output
- GPT-4o-mini: $0.15/1M input, $0.60/1M output
- Whisper: $0.006/minute
- Go to https://console.anthropic.com/
- Create account or sign in
- Navigate to API Keys: https://console.anthropic.com/settings/keys
- Click "Create Key"
- Copy the key (starts with
sk-ant-)
Pricing:
- Claude 3.5 Sonnet: $3.00/1M input, $15.00/1M output
- Claude 3.5 Haiku: $0.80/1M input, $4.00/1M output
- Go to https://aistudio.google.com/
- Sign in with Google account
- Click "Get API key" in the top right
- Create API key for a new or existing project
- Copy the key (starts with
AIza)
Pricing:
- Gemini 1.5 Flash: $0.075/1M input, $0.30/1M output (cheapest!)
- Gemini 1.5 Pro: $1.25/1M input, $5.00/1M output
- Go to https://console.groq.com/
- Create account or sign in
- Navigate to API Keys: https://console.groq.com/keys
- Click "Create API Key"
- Copy the key (starts with
gsk_)
Pricing:
- Llama 3.2 90B Vision: $0.11/1M tokens
- Llama 3.2 11B Vision: $0.05/1M tokens
- Go to https://api.slack.com/apps
- Click "Create New App" > "From scratch"
- Name it (e.g., "Multimodal Agent") and select workspace
- Go to "Incoming Webhooks" in sidebar
- Toggle "Activate Incoming Webhooks" ON
- Click "Add New Webhook to Workspace"
- Select the channel for alerts
- Copy the webhook URL
- Go to https://www.notion.so/my-integrations
- Click "New integration"
- Name it (e.g., "Multimodal Agent")
- Select workspace
- Copy the "Internal Integration Token" (starts with
secret_)
Database Setup:
- Create a new Notion database with these properties:
- Title (title type)
- Status (select: pending, in_progress, completed, blocked)
- Notes (rich text)
- Timestamp (date)
- Tags (multi-select, optional)
- Click "..." menu > "Add connections" > Select your integration
- Copy the database ID from the URL:
- URL:
https://www.notion.so/myworkspace/abc123def456... - Database ID:
abc123def456(32-character hex string)
- URL:
| Component | Requirement | Notes |
|---|---|---|
| CPU | 4 cores | For video encoding/decoding |
| RAM | 8 GB | 16 GB recommended |
| Storage | 1 GB free | For dependencies and temp files |
| Camera | USB or built-in | For webcam examples |
| Microphone | Any | For audio examples |
Works out of the box. Device ID is typically 0.
from src.inputs import WebcamInput
webcam = WebcamInput(device_id=0)# List available cameras (macOS)
system_profiler SPCameraDataType
# List available cameras (Linux)
v4l2-ctl --list-devices# Usually device_id=1 for external camera
webcam = WebcamInput(device_id=1)from src.inputs import RTSPInput
# Common RTSP URL formats:
# Hikvision: rtsp://admin:password@192.168.1.100:554/Streaming/Channels/101
# Dahua: rtsp://admin:password@192.168.1.100:554/cam/realmonitor?channel=1&subtype=0
# Generic: rtsp://user:pass@ip:port/path
camera = RTSPInput(
url="rtsp://admin:password@192.168.1.100:554/stream",
fps=1.0,
auto_reconnect=True
)# Quick camera test
import cv2
cap = cv2.VideoCapture(0) # Try 0, 1, 2...
if cap.isOpened():
ret, frame = cap.read()
print(f"Camera working: {ret}, Frame shape: {frame.shape if ret else 'N/A'}")
cap.release()
else:
print("Camera not found")- System Preferences > Privacy & Security > Microphone
- Enable for Terminal.app or your IDE
import sounddevice as sd
print(sd.query_devices())Output:
0 MacBook Pro Microphone, Core Audio (1 in, 0 out)
> 1 MacBook Pro Speakers, Core Audio (0 in, 2 out)
2 External USB Mic, Core Audio (1 in, 0 out)
Use the index as device_id:
from src.inputs import MicrophoneInput
mic = MicrophoneInput(device_id=0) # Built-in mic
mic = MicrophoneInput(device_id=2) # External USB micimport sounddevice as sd
import numpy as np
duration = 3 # seconds
sample_rate = 16000
print("Recording...")
audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)
sd.wait()
print(f"Recorded {len(audio)} samples, max amplitude: {np.max(np.abs(audio)):.4f}")Requirements: OpenAI API key, webcam
python examples/01_basic_webcam.pyWhat it does:
- Captures frames from webcam every 3 seconds
- Sends to GPT-4o-mini for description
- Prints observations to console
Expected output:
Starting basic webcam agent...
Press Ctrl+C to stop
[Agent] I see a person sitting at a desk with a laptop, appears to be in a home office setting.
[Agent] The scene is similar, the person is now typing on the keyboard.
Requirements: Anthropic or OpenAI API key, webcam or RTSP camera, Slack webhook (optional)
# With webcam (demo mode)
python examples/02_security_monitor.py
# With RTSP camera
RTSP_URL="rtsp://user:pass@192.168.1.100:554/stream" python examples/02_security_monitor.pyWhat it does:
- Monitors camera feed every 5 seconds
- Detects people, unusual activity, hazards
- Sends Slack alerts when something is detected
Expected output:
==================================================
Security Monitor
==================================================
✓ Anthropic (claude-3-5-haiku)
Slack alerts enabled
Using webcam as demo...
Monitoring started. Press Ctrl+C to stop.
--------------------------------------------------
[Observation] The frame shows an empty room with a desk and chair. No people or unusual activity detected.
--------------------------------------------------
[Observation] A person has entered the frame from the left side. They appear to be walking toward the desk.
[Alert Triggered] send_slack_alert: Person detected entering monitored area
[Alert Sent] ✓
--------------------------------------------------
Requirements: Google or OpenAI API key, webcam, Notion API (optional)
python examples/03_quality_inspector.pyWhat it does:
- Simulates manufacturing line inspection
- Analyzes each frame for defects
- Logs all inspections to Notion
- Triggers PLC reject on failures (simulated)
Expected output:
============================================================
Manufacturing Quality Inspector
============================================================
Using Gemini 1.5 Flash (cost-optimized)
Notion logging disabled (set NOTION_API_KEY and NOTION_DATABASE_ID)
PLC control enabled (simulation mode)
------------------------------------------------------------
Quality inspection started. Press Ctrl+C to stop.
------------------------------------------------------------
[Inspector] Analyzing product in frame. The item appears to be...
[✓ PASSED] Inspection #2024-01-15T10:23:45
[Stats] Total: 1 | Passed: 1 | Failed: 0 | Pass Rate: 100.0% | Rate: 12.0/min
[Inspector] Analyzing product in frame. I notice a visible scratch...
[✗ FAILED] Inspection #2024-01-15T10:23:47
Reason: Surface scratch detected on upper left quadrant, approximately 2cm long
[PLC] Reject triggered - Register 100
Requirements: OpenAI API key (for GPT-4o and Whisper), webcam, microphone
python examples/04_meeting_assistant.pyWhat it does:
- Records audio continuously
- Captures video periodically (every 30s)
- Transcribes speech with Whisper
- Extracts action items and decisions
- Logs to Notion, sends summaries to Slack
Expected output:
============================================================
Meeting Assistant
============================================================
Using GPT-4o for meeting analysis
Notion action items enabled
Slack summaries enabled
Using microphone + webcam
------------------------------------------------------------
Meeting recording started. Press Ctrl+C to end meeting.
------------------------------------------------------------
[Assistant] The meeting has begun. I can see 3 people in the room...
📋 ACTION ITEM: [ACTION] John to prepare Q4 budget proposal
[Duration: 5 min | Action Items: 1 | Decisions: 0]
✅ DECISION: [DECISION] Team agreed to postpone launch to March
[Duration: 8 min | Action Items: 1 | Decisions: 1]
📤 Slack Summary Sent
Requirements: API keys for providers you want to test
python examples/05_benchmark_providers.pyWhat it does:
- Tests all available providers
- Runs standardized scenarios
- Generates latency and cost tables
- Saves results to JSON
Expected output:
======================================================================
Multimodal Model Benchmark
======================================================================
Checking available providers...
✓ OpenAI (gpt-4o, gpt-4o-mini)
✓ Anthropic (claude-3-5-haiku, claude-3-5-sonnet)
✓ Google (gemini-1.5-flash)
✗ Groq (set GROQ_API_KEY)
✗ Fireworks (set FIREWORKS_API_KEY)
✗ Together (set TOGETHER_API_KEY)
Running benchmarks on 5 models...
Scenarios: single_frame, multi_frame, detailed_analysis, tool_calling
----------------------------------------------------------------------
Starting benchmarks (this may take a few minutes)...
----------------------------------------------------------------------
Benchmarking openai/gpt-4o-mini - single_frame
Benchmarking openai/gpt-4o-mini - multi_frame
...
======================================================================
Results
======================================================================
LATENCY (p50, milliseconds)
----------------------------------------------------------------------
| Provider | Model | single_frame | multi_frame | tool_calling |
|---|---|---:|---:|---:|
| openai | gpt-4o-mini | 423ms | 612ms | 489ms |
| openai | gpt-4o | 834ms | 1156ms | 923ms |
| anthropic | claude-3-5-haiku-latest | 367ms | 542ms | 421ms |
...
Results saved to: benchmarks/results/benchmark_results.json
# Test webcam
import asyncio
from src.inputs import WebcamInput
async def test_webcam():
webcam = WebcamInput(device_id=0, fps=1.0)
count = 0
async for frame in webcam.stream():
print(f"Frame {count}: shape={frame.shape}, source={frame.source}")
count += 1
if count >= 3:
break
await webcam.close()
asyncio.run(test_webcam())# Test microphone
import asyncio
from src.inputs import MicrophoneInput
async def test_mic():
mic = MicrophoneInput(sample_rate=16000, chunk_duration=2.0)
count = 0
async for chunk in mic.stream():
print(f"Chunk {count}: samples={len(chunk.data)}, duration={chunk.duration_seconds:.2f}s")
count += 1
if count >= 3:
break
await mic.close()
asyncio.run(test_mic())# Test model with a single frame
import asyncio
import numpy as np
from src.models import create_model
from src.core.types import Frame
async def test_model(provider, model_id):
model = create_model(provider, model_id)
# Create a test frame (random colored image)
test_frame = Frame(
data=np.random.randint(0, 256, (512, 512, 3), dtype=np.uint8),
source="test"
)
print(f"Testing {provider}/{model_id}...")
async for event in model.analyze(
frames=[test_frame],
audio_transcript=None,
tools=[],
context=[],
system_prompt="Describe this image briefly."
):
print(f" Event: {type(event).__name__}")
if hasattr(event, 'content'):
print(f" Content: {event.content[:100]}...")
# Test each provider
asyncio.run(test_model("openai", "gpt-4o-mini"))
asyncio.run(test_model("anthropic", "claude-3-5-haiku-latest"))
asyncio.run(test_model("google", "gemini-1.5-flash"))# Test Slack tool (requires SLACK_WEBHOOK_URL)
import asyncio
import os
from dotenv import load_dotenv
from src.tools import SlackAlertTool
load_dotenv()
async def test_slack():
tool = SlackAlertTool(
webhook_url=os.getenv("SLACK_WEBHOOK_URL"),
default_channel="#test-alerts"
)
result = await tool.execute(
message="Test alert from multimodal agent",
severity="info"
)
print(f"Slack result: {result}")
asyncio.run(test_slack())# Test Notion tool (requires NOTION_API_KEY and NOTION_DATABASE_ID)
import asyncio
import os
from dotenv import load_dotenv
from src.tools import NotionRunSheetTool
load_dotenv()
async def test_notion():
tool = NotionRunSheetTool(
api_key=os.getenv("NOTION_API_KEY"),
database_id=os.getenv("NOTION_DATABASE_ID")
)
result = await tool.execute(
title="Test Entry from Multimodal Agent",
status="completed",
notes="This is a test entry created by the setup script."
)
print(f"Notion result: {result}")
asyncio.run(test_notion())from src.memory import SlidingWindowMemory
from src.core.types import Message
from datetime import datetime
memory = SlidingWindowMemory(max_messages=5)
# Add some messages
for i in range(7):
memory.add(Message(
role="user" if i % 2 == 0 else "assistant",
content=f"Message {i}",
timestamp=datetime.now()
))
# Check what's retained
context = memory.get_context()
print(f"Messages in memory: {len(context)}")
for msg in context:
print(f" {msg.role}: {msg.content}")Problem: cv2.VideoCapture returns False
cap = cv2.VideoCapture(0)
print(cap.isOpened()) # FalseSolutions:
-
Check permissions (macOS):
- System Preferences > Privacy & Security > Camera
- Enable for Terminal.app
-
Try different device IDs:
for i in range(5): cap = cv2.VideoCapture(i) if cap.isOpened(): print(f"Camera found at index {i}") cap.release()
-
Check if camera is in use by another app:
# macOS lsof | grep -i camera
-
Restart camera service (macOS):
sudo killall VDCAssistant sudo killall AppleCameraAssistant
Problem: sounddevice.PortAudioError
Solutions:
-
Install PortAudio:
# macOS brew install portaudio # Then reinstall sounddevice pip uninstall sounddevice pip install sounddevice
-
Check permissions (macOS):
- System Preferences > Privacy & Security > Microphone
- Enable for Terminal.app
-
List devices and use explicit ID:
import sounddevice as sd print(sd.query_devices()) # Use the correct index in MicrophoneInput(device_id=X)
Problem: openai.AuthenticationError
Solution: Check API key is set correctly:
import os
from dotenv import load_dotenv
load_dotenv()
print(f"Key starts with: {os.getenv('OPENAI_API_KEY', '')[:10]}...")Problem: anthropic.RateLimitError
Solution: Add delays between requests or upgrade plan:
import asyncio
await asyncio.sleep(1) # Add between requestsProblem: google.api_core.exceptions.ResourceExhausted
Solution: Gemini has strict rate limits. Add delays:
config = AgentConfig(frame_interval_ms=5000) # Slow downProblem: ModuleNotFoundError: No module named 'src'
Solution: Run from project root or add to path:
import sys
sys.path.insert(0, '/path/to/multimodal-python-stack')Or set PYTHONPATH:
export PYTHONPATH="${PYTHONPATH}:/path/to/multimodal-python-stack"Problem: High memory usage with video
Solution: Reduce frame size and buffer:
webcam = WebcamInput(
max_size=256, # Smaller frames
fps=0.5, # Slower capture
)
config = AgentConfig(
max_frames=2, # Keep fewer frames
max_context_messages=10, # Smaller context
)Problem: Slow response times
Solutions:
-
Use faster models:
model = create_model("groq", "llama-3.2-11b-vision-preview") # Fastest model = create_model("google", "gemini-1.5-flash") # Fast and cheap
-
Reduce frame size:
webcam = WebcamInput(max_size=256)
-
Use low detail mode (OpenAI):
model = OpenAIVisionModel(model_id="gpt-4o-mini", image_detail="low")
# Install dev dependencies
pip install pytest pytest-asyncio
# Run all tests
pytest tests/
# Run specific test
pytest tests/test_core.py -v
# Run with coverage
pip install pytest-cov
pytest --cov=src tests/# Install ruff
pip install ruff
# Format code
ruff format .
# Check linting
ruff check .
# Fix auto-fixable issues
ruff check --fix .# Install mypy
pip install mypy
# Run type checker
mypy src/- Create
src/models/newprovider.py:
from src.models.base import VisionLanguageModel, ModelInfo
class NewProviderVisionModel(VisionLanguageModel):
provider = "newprovider"
MODELS = {
"model-name": ModelInfo(
model_id="model-name",
provider="newprovider",
display_name="Model Name",
max_images=10,
supports_video=False,
supports_tools=True,
cost_per_1k_input=0.001,
cost_per_1k_output=0.002,
context_window=128000,
),
}
async def analyze(self, frames, audio_transcript, tools, context, system_prompt):
# Implementation
...- Register in
src/models/__init__.py:
from src.models.newprovider import NewProviderVisionModel
PROVIDERS = {
# ...existing providers...
"newprovider": NewProviderVisionModel,
}- Add API key to
.env.example:
NEWPROVIDER_API_KEY=...- Create
src/tools/newtool.py:
from src.tools.base import Tool
from src.core.types import ToolResult
class NewTool(Tool):
name = "new_tool"
description = "Does something useful"
parameters = {
"type": "object",
"properties": {
"arg1": {"type": "string", "description": "First argument"},
},
"required": ["arg1"]
}
async def execute(self, arg1: str, **kwargs) -> ToolResult:
# Implementation
return ToolResult(output={"result": "success"})- Register in
src/tools/__init__.py:
from src.tools.newtool import NewTool
__all__ = [..., "NewTool"]Use the cost calculator:
from src.models import create_model
model = create_model("openai", "gpt-4o-mini")
# Estimate cost for a session
frames_per_minute = 12 # 1 frame every 5 seconds
minutes = 60
total_frames = frames_per_minute * minutes
# ~85 tokens per image (low detail)
# ~50 tokens output per response
tokens_in = total_frames * 85
tokens_out = total_frames * 50
cost_in = (tokens_in / 1000) * model.cost_per_1k_input_tokens
cost_out = (tokens_out / 1000) * model.cost_per_1k_output_tokens
total_cost = cost_in + cost_out
print(f"Estimated cost for 1 hour: ${total_cost:.2f}")| Provider | Model | Est. Cost/Hour |
|---|---|---|
| gemini-1.5-flash | $0.03 | |
| Groq | llama-3.2-11b | $0.05 |
| OpenAI | gpt-4o-mini | $0.15 |
| Anthropic | claude-3.5-haiku | $0.60 |
| OpenAI | gpt-4o | $3.00 |
| Anthropic | claude-3.5-sonnet | $4.50 |
class BudgetTracker:
def __init__(self, budget_usd: float):
self.budget = budget_usd
self.spent = 0.0
def add_cost(self, cost: float):
self.spent += cost
if self.spent > self.budget * 0.8:
print(f"WARNING: 80% of budget used (${self.spent:.2f}/${self.budget})")
if self.spent > self.budget:
raise Exception(f"Budget exceeded: ${self.spent:.2f}/${self.budget}")
# Usage
tracker = BudgetTracker(budget_usd=10.0)
# Call tracker.add_cost() after each API call# Dockerfile
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
libportaudio2 \
libgl1-mesa-glx \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy source code
COPY src/ src/
COPY examples/ examples/
# Run
CMD ["python", "examples/02_security_monitor.py"]# Build and run
docker build -t multimodal-agent .
docker run --env-file .env --device /dev/video0 multimodal-agent# Production .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Logging
LOG_LEVEL=INFO
LOG_FORMAT=json
# Rate limiting
MAX_REQUESTS_PER_MINUTE=60
# Cost controls
BUDGET_USD_PER_HOUR=5.0
# Monitoring
SENTRY_DSN=https://...# healthcheck.py
import asyncio
from src.models import create_model
async def check_health():
checks = {}
# Check OpenAI
try:
model = create_model("openai", "gpt-4o-mini")
# Quick test
checks["openai"] = "ok"
except Exception as e:
checks["openai"] = f"error: {e}"
# Check camera
try:
import cv2
cap = cv2.VideoCapture(0)
checks["camera"] = "ok" if cap.isOpened() else "error: not found"
cap.release()
except Exception as e:
checks["camera"] = f"error: {e}"
return checks
if __name__ == "__main__":
results = asyncio.run(check_health())
for check, status in results.items():
print(f"{check}: {status}")import structlog
import logging
# Configure structured logging
structlog.configure(
processors=[
structlog.stdlib.filter_by_level,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()
],
wrapper_class=structlog.stdlib.BoundLogger,
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
)
logger = structlog.get_logger()
# Usage
logger.info("agent_started", model="gpt-4o-mini", input="webcam")
logger.info("frame_processed", latency_ms=423, tokens=156)
logger.warning("rate_limit_approaching", requests=58, limit=60)
logger.error("api_error", provider="openai", error="timeout")# Basic webcam
python examples/01_basic_webcam.py
# Security monitor
python examples/02_security_monitor.py
# Quality inspector
python examples/03_quality_inspector.py
# Meeting assistant
python examples/04_meeting_assistant.py
# Benchmarks
python examples/05_benchmark_providers.py| Variable | Required For | Example |
|---|---|---|
OPENAI_API_KEY |
OpenAI, Whisper | sk-proj-... |
ANTHROPIC_API_KEY |
Anthropic | sk-ant-... |
GOOGLE_API_KEY |
Gemini | AIza... |
GROQ_API_KEY |
Groq | gsk_... |
SLACK_WEBHOOK_URL |
Slack alerts | https://hooks.slack.com/... |
NOTION_API_KEY |
Notion logging | secret_... |
NOTION_DATABASE_ID |
Notion logging | abc123... |
RTSP_URL |
IP cameras | rtsp://user:pass@ip:port/path |
| Use Case | Recommended | Command |
|---|---|---|
| Cheapest | Gemini Flash | create_model("google", "gemini-1.5-flash") |
| Fastest | Groq Llama | create_model("groq", "llama-3.2-11b-vision-preview") |
| Best balance | GPT-4o-mini | create_model("openai", "gpt-4o-mini") |
| Best quality | GPT-4o | create_model("openai", "gpt-4o") |
| Best reasoning | Claude Sonnet | create_model("anthropic", "claude-3-5-sonnet-latest") |