Jarvis — AI Assistant on Reachy Mini

An embodied AI assistant inspired by Jarvis from Iron Man, running on the Reachy Mini robot with OpenAI Agents as its brain.

Design Principles

  1. Latency is king. First audible response within 300-600ms (filler or real). Full answer streams after. People forgive dumb; they don't forgive slow.

  2. Presence, not request/response. A continuous 30Hz "presence loop" runs independent of the LLM — breathing, micro-nods, gaze tracking. The robot feels alive even when silent.

  3. Embodiment as policy, not library calls. The LLM outputs an "embodiment plan" (intent, prosody, motion primitives) with each response. A renderer maps those to physical behavior. No random "play happy" uncanny valley.

  4. Barge-in. User can interrupt at any time. TTS stops immediately, new utterance is captured. This single behavior makes it feel 10x more real.

  5. Guardrails. Destructive smart home actions use dry-run by default. Everything is audit-logged. Permissions model for sensitive operations.

  6. Honest state broadcasting. It's obvious when Jarvis is listening vs idle vs muted, through posture and behavior, not just an LED.

Architecture

┌───────────────────────────────────────────────────────────────────┐
│                      PRESENCE LOOP (30Hz)                         │
│  Always running. Receives lightweight signals, outputs motion.    │
│                                                                   │
│  Signals in:              States:                                 │
│    vad_energy ──┐          IDLE     → breathing, drift            │
│    doa_angle  ──┤          LISTENING → orient, micro-nods, lean   │
│    face_pos   ──┼────────► THINKING → look away, processing anim │
│    llm_state  ──┤          SPEAKING → stable gaze, intent motion  │
│    embody_cmd ──┘          MUTED    → privacy posture             │
└───────────────────────────────────────────────────────────────────┘

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Audio Input    │     │   Agent Brain     │     │   Audio Output  │
│                  │     │                  │     │                  │
│  Mic → VAD ──────┼────►│  Agent SDK       │────►│  Stream TTS     │
│       ↓          │     │  + MCP tools:    │     │  (ElevenLabs)   │
│  Whisper STT ────┼────►│    embody/robot │     │                  │
│                  │     │    smart_home/* │     │  Barge-in:      │
│  Barge-in: ◄─────┼─────│    todoist/*    │◄────│  VAD interrupts │
│  stop TTS        │     │    memory/*     │     │  playback       │
└─────────────────┘     └──────────────────┘     └─────────────────┘

┌──────────────────┐     ┌──────────────────┐
│  Face Tracker    │     │   Audit Log      │
│  YOLOv8 → face   │────►│ ~/.jarvis/audit  │
│  position signal │     │   .jsonl         │
│                  │     │ (rotating)       │
└──────────────────┘     └──────────────────┘

Components

Component Technology Purpose
Presence Loop Custom 30Hz controller Continuous micro-behaviors, state machine
Brain OpenAI Agents SDK + custom tools Reasoning, conversation, embodiment plans
Speech-to-Text Whisper (local, faster-whisper) Transcribe user speech
Text-to-Speech ElevenLabs API (streaming) Jarvis voice, sentence-level streaming
VAD Silero VAD End-of-utterance + barge-in detection
Face Tracking YOLOv8 (ultralytics) Face position → presence loop signal
Robot Control Reachy Mini SDK Head (6DOF), body, antennas, emotions
Smart Home Home Assistant REST API Lights, climate, media — with dry-run + audit

Setup

cd jarvis
uv sync
cp .env.example .env
# Fill in: OPENAI_API_KEY, ELEVENLABS_API_KEY
# Optional: HASS_URL, HASS_TOKEN for smart home
# Optional: HOME_PERMISSION_PROFILE=readonly (state only) or control (default)
# Optional: HOME_REQUIRE_CONFIRM_EXECUTE=true (require confirm=true on all executes)
# Optional: HOME_CONVERSATION_ENABLED=true (enable HA conversation intent tool)
# Optional: HOME_CONVERSATION_PERMISSION_PROFILE=readonly|control (default readonly)
# Optional: SAFE_MODE_ENABLED=true (force mutating actions into restricted/dry-run behavior)
# Optional: TODOIST_API_TOKEN / TODOIST_PROJECT_ID / TODOIST_PERMISSION_PROFILE
# Optional: NOTION_API_TOKEN / NOTION_DATABASE_ID (integration_hub notion notes backend)
# Optional: TODOIST_TIMEOUT_SEC=10.0 / PUSHOVER_TIMEOUT_SEC=10.0
# Optional: PUSHOVER_API_TOKEN / PUSHOVER_USER_KEY / NOTIFICATION_PERMISSION_PROFILE
# Optional: NUDGE_POLICY=interrupt|defer|adaptive / NUDGE_QUIET_HOURS_START / NUDGE_QUIET_HOURS_END
# Optional: EMAIL_SMTP_HOST / EMAIL_FROM / EMAIL_DEFAULT_TO / EMAIL_PERMISSION_PROFILE / EMAIL_TIMEOUT_SEC
# Optional: WEATHER_UNITS=metric|imperial / WEATHER_TIMEOUT_SEC
# Optional: WEBHOOK_ALLOWLIST=example.com,api.example.com / WEBHOOK_AUTH_TOKEN / WEBHOOK_TIMEOUT_SEC
# Optional: SLACK_WEBHOOK_URL / DISCORD_WEBHOOK_URL
# Optional: PERSONA_STYLE=terse|composed|friendly|jarvis / BACKCHANNEL_STYLE=quiet|balanced|expressive
# Optional: IDENTITY_ENFORCEMENT_ENABLED / IDENTITY_DEFAULT_USER / IDENTITY_DEFAULT_PROFILE
# Optional: IDENTITY_USER_PROFILES / IDENTITY_TRUSTED_USERS
# Optional: IDENTITY_REQUIRE_APPROVAL / IDENTITY_APPROVAL_CODE
# Optional: PLAN_PREVIEW_REQUIRE_ACK=true (require preview_token before risky execute tools)
# Optional: MEMORY_RETENTION_DAYS / AUDIT_RETENTION_DAYS (0 disables pruning)
# Optional: MEMORY_PII_GUARDRAILS_ENABLED=true|false
# Optional: MEMORY_ENCRYPTION_ENABLED / AUDIT_ENCRYPTION_ENABLED / JARVIS_DATA_KEY
# Optional: WAKE_MODE / WAKE_CALIBRATION_PROFILE / WAKE_WORDS / WAKE_WORD_SENSITIVITY / VOICE_TIMEOUT_PROFILE
# Optional: STT_FALLBACK_ENABLED / WHISPER_MODEL_FALLBACK / TTS_FALLBACK_TEXT_ONLY
# Optional: OPENAI_ROUTER_MODEL / ROUTER_TIMEOUT_SEC / POLICY_ROUTER_MIN_CONFIDENCE
# Optional: INTERRUPTION_ROUTER_TIMEOUT_SEC / INTERRUPTION_RESUME_MIN_CONFIDENCE
# Optional: SEMANTIC_TURN_ENABLED / SEMANTIC_TURN_ROUTER_TIMEOUT_SEC / SEMANTIC_TURN_MIN_CONFIDENCE
# Optional: SEMANTIC_TURN_EXTENSION_SEC / SEMANTIC_TURN_MAX_TRANSCRIPT_CHARS
# Optional: MODEL_FAILOVER_ENABLED / MODEL_SECONDARY_MODE / WATCHDOG_* / TURN_TIMEOUT_ACT_SEC / STARTUP_STRICT
# Optional: OPERATOR_SERVER_ENABLED / OPERATOR_SERVER_HOST / OPERATOR_SERVER_PORT / OPERATOR_AUTH_MODE / OPERATOR_AUTH_TOKEN
# Optional: WEBHOOK_INBOUND_ENABLED / WEBHOOK_INBOUND_TOKEN
# Optional: RECOVERY_JOURNAL_PATH / DEAD_LETTER_QUEUE_PATH (interrupted-action + failed-outbound journals)
# Optional: EXPANSION_STATE_PATH / RELEASE_CHANNEL_CONFIG_PATH (roadmap + release-channel persistence/check config)
# Optional: NOTES_CAPTURE_DIR / QUALITY_REPORT_DIR (integration capture + report artifact locations)
# Optional: OBSERVABILITY_* (DB/state/event paths, burst threshold, snapshot interval)
# Optional: SKILLS_ENABLED / SKILLS_DIR / SKILLS_ALLOWLIST / SKILLS_REQUIRE_SIGNATURE / SKILLS_SIGNATURE_KEY

Smart home safety defaults:

First-Time Operator Checklist

  1. Copy .env.example to .env, then set required keys: OPENAI_API_KEY and ELEVENLABS_API_KEY.
  2. If using integrations, set both values for each pair:
    • HASS_URL and HASS_TOKEN
    • PUSHOVER_API_TOKEN and PUSHOVER_USER_KEY
  3. Choose explicit permission profiles before first run:
    • HOME_PERMISSION_PROFILE=readonly (recommended first boot)
    • TODOIST_PERMISSION_PROFILE=readonly
    • NOTIFICATION_PERMISSION_PROFILE=off
  4. Run local validation gates:
    • make check
    • make test-faults
  5. Start in simulation mode and confirm no startup warnings are emitted:
    • uv run python -m jarvis --sim --no-vision
  6. If Home Assistant is enabled, run a dry_run=true smart-home request first before any live execute.

Usage

# Full Jarvis experience
uv run python -m jarvis

# Without face tracking (audio only)
uv run python -m jarvis --no-vision

# Text output instead of TTS (debugging)
uv run python -m jarvis --no-tts

# Simulation mode (no robot connected)
uv run python -m jarvis --sim

# Verbose logging
uv run python -m jarvis --debug

# Create a backup bundle (memory, audit logs, runtime state, operator settings)
uv run python -m jarvis --backup ~/.jarvis/backups/jarvis-$(date +%Y%m%d-%H%M%S).tar.gz

# Restore from a backup bundle (overwrite existing files)
uv run python -m jarvis --restore ~/.jarvis/backups/jarvis-20260227-120000.tar.gz --force

# Open operator console
open http://127.0.0.1:8765

Developer Checks

# Full lint + full test suite
make check

# Fast local regression pass
make test-fast

# Simulation-focused validation pass
make test-sim

# Fault-injection oriented subset (network, HTTP, summary, and storage taxonomy)
make test-faults

# Soak/stability subset
make test-soak

# Extended soak profile (simulation + fault profiles + checkpoint/retry validation)
make test-soak-extended

# Personality A/B drift checks (brevity + confirmation friction)
make test-personality

# Deployment/security gate (lint + tests + fault subset + workflow pin checks)
make security-gate

# Combined release-readiness gate (lint + acceptance + release checks + strict eval)
make readiness

# Marker-based subsets
uv run pytest -q -m fast
uv run pytest -q -m fault
uv run pytest -q -m slow

Equivalent scripts are available under scripts/:

CI runs the same lint + test gates on every push and pull request via ci.yml. Workflow linting and YAML hygiene run via workflow-sanity.yml. Nightly soak coverage is scheduled in nightly-soak.yml. Readiness-gate automation is scheduled/on-demand in jarvis-readiness.yml.

CI Workflow Intent and Failure Routing

Workflow Intent Failure routing (first stop)
ci.yml / lint Static checks (ruff) src/, tests/, and Python style issues in the failing path
ci.yml / tests Full regression (pytest) Failing test module and corresponding implementation area
ci.yml / faults Fault-injection taxonomy + error-path contract tests/test_tools_services.py fault tests and src/jarvis/tools/services.py normalization paths
workflow-sanity.yml Workflow hygiene (actionlint, tabs, script executability/shebang) .github/workflows/* and scripts/*.sh
shellcheck.yml Shell script linting scripts/*.sh syntax/quoting/safety
security.yml Scheduled/PR CodeQL scan Security findings in SARIF report; route by file ownership
nightly-soak.yml Long-run stability signal tests/test_main_audio.py -k soak, audio/runtime regressions

Project Structure

jarvis/
├── pyproject.toml
├── .env.example
├── ~/.jarvis/audit.jsonl      # Auto-created audit log (runtime path)
├── src/
│   └── jarvis/
│       ├── __main__.py        # Entry point + conversation loop
│       ├── config.py          # Settings & env vars
│       ├── brain.py           # OpenAI Agents SDK orchestrator
│       ├── observability.py   # Telemetry store + metrics export
│       ├── operator_server.py # Local operator dashboard/API
│       ├── skills.py          # Local skill discovery + lifecycle
│       ├── presence.py        # 30Hz presence loop (the soul)
│       ├── tools/
│       │   ├── robot.py       # embody, play_emotion, play_dance
│       │   ├── services.py    # shared runtime/helpers + MCP tool registry
│       │   └── services_domains/
│       │       ├── home.py         # home_orchestrator domain handler
│       │       ├── planner.py      # planner_engine domain handler
│       │       ├── integrations.py # integration_hub domain handler
│       │       ├── comms.py        # channel/email/todoist/pushover handlers
│       │       ├── governance.py   # skills_governance + quality_evaluator + embodiment_presence
│       │       └── trust.py        # proactive_assistant + memory_governance + identity_trust
│       ├── audio/
│       │   ├── vad.py         # Silero voice activity detection
│       │   ├── stt.py         # faster-whisper transcription
│       │   └── tts.py         # ElevenLabs synthesis
│       ├── vision/
│       │   └── face_tracker.py  # YOLOv8 detection → presence signals
│       └── robot/
│           └── controller.py  # Reachy Mini SDK wrapper