Add paths provider and Paths sensor#2611
Merged
Merged
Conversation
Introduce a PathsProvider that subscribes to the om/paths topic via zenoh, deserializes CDR-encoded Paths messages, classifies path indices into movement options (turn left, move forwards, turn right, move back) and generates a natural-language assessment string. Expose methods to retrieve the latest lidar string, valid paths and movement options, and add graceful stop/cleanup. Add a PathsSensor plugin that registers as "Paths", polls the provider on a fixed cadence, converts raw assessments into timestamped messages, maintains a bounded history and formats the latest buffer for prompts. Unit tests cover payload deserialization and movement-string generation.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
openminddev
added a commit
that referenced
this pull request
Jun 9, 2026
#2616) * Add Go runtime and plugin scaffolding Introduce a new Go workspace for the om1 project: add Makefile, cmd/main, go.mod and go.sum. Implement core internal packages for runtime functionality and plugins: config loader/types, actions (connectors, orchestrator, schema generation and tests), backgrounds (orchestrator and registry), inputs (sensors and orchestrator), llm interfaces, fuser (prompt fusion + KB), hooks runner, http client, and plugin entrypoints. Includes unit tests for action schema generation and action orchestrator tick/stop behavior. Provides plumbing for registering/loading plugins and basic orchestration patterns (concurrent/sequential/dependency modes). * Add GoogleASR, WebSocket client, refactor inputs Add a reconnecting WebSocket client and a new GoogleASR input plugin (PortAudio + WS) and include a conversation.json5 example. Refactor the inputs API: Sensor interface changed (Listen/Poll/RawToText/FormattedLatestBuffer/Stop), Message shape renamed, and Orchestrator no longer stores buffers internally (uses sensors' FormattedLatestBuffer). Update Fuser.Fuse to accept sensor buffer slices. Update runtime to pass sensor buffers and simplify LLM orchestration: Orchestrator now wraps 'llm' field and manages history; call sites adjusted. Rename and consolidate speak action plugin to speak/elevenlabs_tts (types and logging names updated) and add actions package entrypoint; remove old move and speak packages. Update telemetry API: IOProvider.RecordTick signature simplified. Add go.mod dependencies for portaudio and gorilla/websocket. Improve JSON5 loader to quote unquoted keys. Note: these changes introduce several breaking API changes (Sensor, Fuser.Fuse, RecordTick, action connector types) that require updates across callers. * Add zenoh-c support and TTS speaking flag Makefile: add automatic download/installation of zenoh-c, set CGO flags and platform-specific DYLD/LD env handling, and propagate the library path to build, run, lint, test, fmt, vet, and dependency targets; expand help text and fix install target. Providers: add a new atomic Speaking flag (internal/providers/tts_state.go) to indicate when TTS is streaming audio. ElevenLabs TTS: set providers.Speaking true before synthesis/playback and false after to mark active playback; remove an extra enqueue log line. Google ASR: skip audio capture while providers.Speaking is set to avoid ASR picking up playback audio. These changes ensure the zenoh-c runtime library is available during development and CI, and avoid input capture interfering with TTS playback. * Ignore Zenoh runtime directory in .gitignore Add an entry to .gitignore to exclude the Zenoh runtime/cache directory (go/.zenoh-c/) and a comment marker. This prevents local Zenoh state from being accidentally committed. * Refactor plugins, schema exports, and logging Expose and refactor internal plugin/schema APIs, tighten logging, and remove unused test code. Key changes: - Actions: clarify AgentAction fields, introduce Factory/Register/Load patterns for connectors. - Schema: export InterfaceSpec, InterfaceRegistry and schema helpers (BuildSchema, BuildPropertySchema, KindToJSONType) and wire BuildSchemaForAction to use them. - Backgrounds: add Factory registry, Register and Load helpers and an UnknownPluginError type. - Providers: remove the HistoryManager/related LLM history code and unused imports. - LLM/Runtime/Inputs/Plugins: remove debug prints, adjust buffering and minor formatting/locking fixes. - Google ASR: add Time to ASRMessage, reduce stats ticker interval (30s→15s), improve latency logging, and tidy audio packaging/stream handling. - Deleted obsolete tests (actions/orchestrator_test.go, actions/schema_test.go). These changes prepare the codebase for external use of schema/registry helpers, improve observability, and clean up dead code. * Add zenoh session, global logger, and arm_g1 plugin Introduce a global logger package and replace ad-hoc zap creation across plugins: internal/logger provides Set/Get for a shared *zap.Logger; main sets the global logger and passes the logger into runtime. Add a CGo zenoh wrapper (internal/zenoh/session.go) that exposes Open/Put/Close to publish raw bytes via zenoh-c (requires zenoh-c headers/libs). Add a new arm_g1 Zenoh action plugin (plugins/actions/arm_g1_zenoh) that registers the arm_g1 interface and publishes Unitree G1 arm requests to a Zenoh topic, including a CDR little-endian serializer for Unitree requests. Wire the new plugin by importing it in plugins/actions/actions.go. Update existing plugins (emotion, speak/elevenlabs_tts, inputs/google_asr) to use logger.Get() instead of creating new zap instances and add an info log when enqueueing TTS text. * Use ./config paths in go/Makefile Change config path references in go/Makefile from ../config/*.json5 to ./config/*.json5 for the run, dev, and list-configs targets so configuration files are resolved relative to the go directory when invoking these commands. * Switch to zenoh-go client (remove cgo) Replace the old cgo-based zenoh-c wrapper with the pure-Go github.com/eclipse-zenoh/zenoh-go client. Implement a new Session using zenoh-go (Open with optional endpoint, Put, Close) and add a Publish helper. Simplify the arm_g1/zenoh connector to use the new session API (no config parsing, rely on default), and add debug log-level to the Makefile dev run. go.mod was updated to include the zenoh-go dependency and related indirect changes. * Add context cancellation to ws client and fix locks ws: add context with cancel to Client, use DialContext so in-progress TLS handshakes are interrupted on Close, and call cancel() from Close. Make Close idempotent for stopCh. Improve read/write loops: early-return on stopCh, log read loop stop, treat normal close as non-error, and fix conn mutex handling in writeMessage with deferred unlock. google_asr: Fix Stop() to avoid calling Stream methods while holding the sensor mutex by capturing and nil-ing paStream under lock, unlocking, then stopping/closing the stream. These changes improve shutdown correctness and avoid deadlocks during teardown. * Add Unitree G1 config; update loader & plugin Add a new Unitree G1 conversation JSON5 config (go/config/unitree_g1_conversation.json5) providing a greeting mode, agent settings, actions and connectors. Fix stripJSON5 in go/internal/config/loader.go to remove backslash-newline continuations before splitting and preserve trailing-comma cleanup. Update plugin imports and package: change actions import from arm_g1_zenoh to arm_g1 and rename file/package go/plugins/actions/arm_g1_zenoh/... to go/plugins/actions/arm_g1/zenoh.go with package name adjusted accordingly. * Add zenoh Publisher, use in arm_g1, fix ticks Introduce a Publisher wrapper in the zenoh session package with DeclarePublisher/Put/Drop and adjust Session.Close to call Close(nil). Update arm_g1 connector to open a zenoh session with optional endpoint, declare a publisher for sport requests, use the publisher for puts, log/handle failures gracefully, and drop the publisher on Stop. Fix CDR alignment, buffer sizing and padding in serializeUnitreeRequest and standardize the JSON parameter formatting to match the Python connector. Convert Tick implementations (emotion, arm_g1, elevenlabs TTS) to block on ctx.Done() and simplify Stop to properly cleanup publishers and sessions. * Add Zenoh CDR helpers and ASR Zenoh publish Introduce CDR serialization helpers and integrate Zenoh publishing for ASR, plus add WS reconnect behavior. - Add internal/zenoh/cdr.go with AppendInt32LE, AppendUint32LE, AppendInt64LE and AppendCDRString helpers for little-endian CDR encoding. - Use the new zenoh helpers in plugins/actions/arm_g1 to replace local byte-append helpers. - Integrate Zenoh into GoogleASR: add ZenohEndpoint config, open a session and declare a publisher during sensor init, publish serialized ASR text (serializeASRText) on buffer flush, and clean up publisher/session on stop. serializeASRText builds a CDR LE payload including a timestamp, UUID frame_id, and the text. - Update internal/ws client Connect to support automatic reconnect when cfg.Reconnect is true, retrying every 5s and honoring context cancellation. - Add usage of github.com/google/uuid in ASR serialization. * Switch emotion action connector to zenoh Replace the old log-based emotion action with a Zenoh-backed connector. Configs (conversation.json5 and unitree_g1_conversation.json5) now reference connector "zenoh" for the emotion action. The previous emotion/log connector implementation was removed and a new emotion/zenoh connector was added which opens a Zenoh session, declares a publisher on topic "om/avatar/request", and publishes serialized AvatarFaceRequest payloads (CDR little-endian encoding with timestamp, request ID, code, and face_text). The new connector gracefully handles Zenoh unavailability, logs publish results, and cleans up publisher/session on Stop. * Serialize face_text with explicit length and NUL Replace zenohsession.AppendCDRString with manual serialization: append a NUL (0x00) to faceText, write the byte length as a little-endian uint32, then append the bytes. This ensures the CDR string is encoded with an explicit length and null terminator for compatibility with the Zenoh consumer. * Add Emotion action input and export connector Introduce an Emotion enum and EmotionInput struct for the emotion action, including EnumValues enumerating supported expressions (happy, confused, curious, excited, sad, think). Register the emotion interface with a descriptive message and register the Zenoh connector. Rename and export constructor from newZenohConnector to NewZenohConnector and update its registration. Also remove redundant comment lines in arm_g1/zenoh.go. * Fix CDR padding in avatar request serialization Adjust serializeAvatarRequest to follow CDR alignment rules: align before fields (no trailing padding after request_id since next field is int8) and insert padding before the face_text uint32 length. Implement request_id encoding without post-padding (write length + bytes explicitly), increase buffer capacity, and clarify comments about wire layout. Also remove a redundant comment in arm_g1 zenoh publisher code. * Add AvatarProvider and zenoh subscriber support Introduce a new AvatarProvider (go/internal/providers/avatar.go) that manages zenoh publishers/subscribers for om/avatar/request and om/avatar/response, handles CDR (pycdr2-compatible) serialization/deserialization for avatar commands and health checks, and exposes SendAvatarCommand. Extend zenoh session (go/internal/zenoh/session.go) with Subscriber type, DeclareSubscriber and Drop to support incoming messages. Update emotion plugin (go/plugins/actions/emotion/zenoh.go) to use the AvatarProvider singleton instead of managing its own zenoh session/publisher and remove duplicate serialization logic; Stop() is simplified accordingly. The provider handles unavailable zenoh sessions gracefully and replies to STATUS requests with health responses. * Clean up avatar provider comments and logs Add a descriptive comment for NewAvatarProvider, simplify the handleRequest comment, and remove debug log statements for health-check requests/responses to reduce log noise. Minor whitespace cleanup; no functional changes to request handling or publishing behavior. * Add ElevenLabs TTS provider and lifecycle hooks Introduce a centralized ElevenLabs TTS provider and lifecycle hook support across the runtime. - Add go/internal/providers/elevenlabs.go: a singleton ElevenLabsProvider that handles HTTP synthesis, persistent ffplay streaming, queueing, silence handling and lifecycle for TTS playback. - Extend hooks runner (go/internal/hooks/hooks.go): support templated variables, improved command execution with captured stdout/stderr, message handler that forwards messages to the ElevenLabs provider, helper funcs for template formatting and safe string extraction. - Wire lifecycle hooks into runtime and mode manager (go/internal/runtime/*): global hooks are created and invoked for OnStartup, OnEntry and OnExit with context payloads; time-based transitions fire OnTimeout hooks; Transition now includes reason and transition context. - Refactor speak action (go/plugins/actions/speak/elevenlabs_tts.go): remove duplicated ffplay/http logic and use the new providers.ElevenLabs singleton, simplifying connector and lifecycle. - Improve WebSocket client resilience (go/internal/ws/client.go): better logging, non-blocking read/reconnect behavior and a reconnect helper to re-dial when enabled. - Add example lifecycle_hooks to config (go/config/unitree_g1_conversation.json5) demonstrating on_startup message handler using elevenlabs. These changes centralize TTS handling, reduce duplicated code, and add lifecycle hook templating and global hook handling to support safe automated startup/transition messages. * Export constructors and add docs to runtime/plugins Export and document several constructors and runtime types, and add brief lifecycle comments across runtime and plugin code. Renamed newModeSetup to NewModeSetup and updated runtime to use the exported constructor; added docs for loadComponents, toRuntimeConfig, buildMeta, addMeta, collectSchemas, mergePrompt, and toolCallsToMaps. Introduced ModeState/ModeManager comments and a NewModeManager constructor doc. In plugin connectors: export and rename connector constructors (NewArmG1ZenohConnector, NewEmotionZenohConnector, NewElevenLabsTTS), add Tick/Stop/no-op lifecycle comments, and simplify arm_g1 connector by removing the customActionMap and sending actions directly. Also removed an extraneous package doc comment in ws/client.go. These changes improve API visibility and add documentation for maintainability; note the behavioral change in arm_g1/zenoh where actions are no longer remapped. * Refactor orchestrators, plugins, and logger APIs Introduce clearer constructors, helper functions, and plugin registries across packages and update callers accordingly. Key changes: move logger builder to internal/logger (BuildLogger) and use it from main; rename fuser.New -> NewFuser and hooks.New -> NewHooks and update runtime callers; add doc comments and small API helpers for actions/inputs/backgrounds/llm (Call/Result types, registry Factory types, Register/Load helpers, UnknownPluginError.Error implementations); add orchestration helper methods (runConcurrent/runSequential/runWithDeps, SetSchemas/FunctionSchemas/Reset) and small runtime adjustments. These changes improve naming consistency, visibility of helpers, plugin loading ergonomics, and overall code readability without changing core behavior. * Add OpenAI helpers and Gemini latency logging Introduce utility functions and logging to the LLM plugin common code: add parseOpenAIResponse, buildMessages, remarshal helper, and logResponseLatency to capture latency and proxy/upstream headers. Import net/http, time, logger and zap for these features. Export and rename Gemini constructor to NewGemini, update llm registration accordingly, export FunctionSchemas and SetSchemas, and wire request timing + logResponseLatency into Gemini's Call to record response metrics. These changes improve observability and provide small API/visibility refactors for the Gemini LLM integration. * Add context-aware mode transitions & PortAudio refcount Introduce context-aware mode transitions and a safe PortAudio reference counter. - Config: change default mode to "conversation", add an "approaching" mode and a set of transition_rules to unitree_g1_conversation.json5; add go/config/memory to .gitignore. - Types: extend TransitionRule with ContextConditions to support context-aware transitions. - ModeManager: subscribe to a Zenoh topic (om/mode/context) for best-effort user context updates, store userContext, add CheckTransitions with ordered checks (time-based, context-aware, input-triggered), and implement helpers to evaluate context conditions and priorities. Add Close and UpdateUserContext helpers. - Runtime: clone per-mode LLM config before adding metadata and error if no LLM configured; tidy lifecycle/orchestrator comments and ensure manager.Close() on shutdown. - Audio: add providers/portaudio.go implementing a process-wide reference-counted PortAudio wrapper; integrate it into GoogleASRInput (Acquire/Release, captureDone coordination, safer stream stop/close) to avoid terminating PortAudio while others still use it. - Misc: add github.com/google/uuid to go.mod; small comment and code cleanups across files. These changes enable context-driven transitions, prevent concurrent PortAudio termination races, and avoid mutating shared LLM configuration maps. * Add greeting conversation and IO providers Introduce greeting conversation state machine, IO provider, and related plumbing for TTS/status publishing and background detection. Adds a ConfidenceCalculator-driven GreetingConversationStateMachineProvider and IOProvider singletons, ElevenLabs greeting connector, TTS text normalization and CDR status serialization, and ApproachingPerson background task. Replaces the old slim providers.go, adds util.ToFloat/FloatFrom and updates manager to use it, increments tick counter in runtime, and registers the new plugins in main/actions. Also includes a small, likely accidental edit to google_asr.go. * Rename speak action and update connector Change the agent action in unitree_g1_conversation config from "speak" to "greeting_conversation" (name and llm_label) and switch the connector from "elevenlabs_tts" to "greeting_conversation_elevenlabs" to match the plugin. Also remove an extraneous debug log line in greeting_conversation_elevenlabs.Tick to reduce noisy logging. * Add face presence provider and refactor zenoh Introduce a FacePresence provider and sensor: new providers/face_presence.go and inputs/face_presence.go implement fetching /who, snapshot shaping, and a sensor that polls and formats presence lines. Add util.Sleep to support context-aware sleeps. Refactor zenoh session handling: make Open use defaults and local-network discovery (OpenWithOptions/openClient/openDiscovery), add a default endpoint constant, and improve logging/error messages. Update dependent code to the new APIs: AvatarProvider no longer takes an endpoint and callers use providers.Avatar(), various plugins (arm_g1, emotion, greeting_conversation, google_asr, approaching_person) now call zenoh.Open() without endpoint and use util.Sleep where appropriate. Remove an unused local sleep helper and drop ZenohEndpoint config field from greeting_conversation. Overall: new face-presence feature plus cleanup and more robust zenoh session management. * Use in-process ModeContext for transitions Introduce a ModeContextProvider singleton as an in-process, best-effort bus for user-context updates that drive context-aware mode transitions. Replace previous Zenoh-based context propagation: ModeManager no longer subscribes to the Zenoh topic and its Close() is simplified; runtime now consumes providers.ModeContext().Updates() and uses a new scheduleTransition helper to enqueue mode changes. Update plugins to publish context updates via providers.ModeContext() (ApproachingPerson and greeting_conversation) and adjust imports/cleanup accordingly. Also add ApproachingPerson to the conversation config and adjust transition rules to target conversation mode; remove a reconnect log line from the websocket client. * Move Go app to repository root; remove Python src Large repo reorganization: moved the Go project from the go/ subdirectory into the repository root (Makefile, cmd/, internal/, plugins/, go.mod/sum, etc.), preserving file contents. Removed legacy Python src/ tree, many config JSON5 files and tests, and deleted .gitmodules. Updated .pre-commit-config.yaml and .github/copilot-instructions.md as part of the cleanup. This simplifies the repository layout and centralizes the Go application. * Migrate CI & Docker to Go build with zenoh-c Replace Python-centric CI and runtime with a Go-focused build pipeline and zenoh-c integration. GitHub Actions workflows updated to setup-go, cache/download zenoh-c, run make build/test/lint (go vet, golangci-lint, go test) and verify the produced binary; many Python/uv/CycloneDDS steps removed. Dockerfile switched to a multi-stage Go builder image that builds the om1 binary, bundles the zenoh-c shared libs, and provides a slimmer runtime image with a simplified entrypoint. Makefile BUILD_DIR path fixed and docker-compose env/command switched to OM1_CONFIG with updated defaults. .gitignore cleaned to remove Python artifacts and track .zenoh-c and build outputs. * Swap client/discovery fallback logic Align OpenWithOptions behavior with the updated LocalNetwork comment: when LocalNetwork is true, try connecting to the local router (openClient) and fall back to discovery; when false, try discovery first and fall back to client connect. Also updated the LocalNetwork doc comment and related warning log messages to reflect the corrected semantics. * Add function hook registry and hooks Introduce a function-type hook registry and several lifecycle hook implementations. Added internal/hooks/registry.go to register and look up module.function hooks and extended Runner.execute to run "function" hooks via executeFunction. Implemented greeting hooks (greeting_start_hook, greeting_end_hook) and person-follow hooks (start/switch/stop/set_mode) in internal/hooks, including TTS integration and HTTP interactions with vision/follow services. Extracted ElevenLabs config construction into elevenLabsConfigFrom and switched lifecycle code to use it. Exposed GreetingConversation state accessors (TurnCount, MaxTurnCount) in the providers, and applied small cleanups to plugin comments and status serialization comment removal. * Migrate LLM plugins to Go (providers + DualLLM) (#2592) * Support multiple llm plugins * Add support for dual.go and test cases * Remove some comments * Fix lint issues and install portaudio in github workflow * Update test cases * Add Prometheus metrics server and record metrics Introduce internal/metrics package that registers Prometheus histograms and gauges for LLM, ASR and HTTP timing metrics and exposes a /metrics endpoint on :9090 via StartServer. Integrate metrics lifecycle into cmd/main.go (start and graceful shutdown). Instrument Google ASR plugin to record speech duration, utterance-end and final transcript latencies (observeASR helper). Instrument LLM response path to log and emit LLM round-trip latency and HTTP timing metrics (recordResponseLatency and RecordHTTPTiming). Update go.mod to add prometheus client and bump Go version. * Ignore coverage.out and remove .python-version Add coverage.out to .gitignore to prevent committing test coverage output. Remove the .python-version file (3.12) to stop pinning the project's Python version in the repository. * Add TTS time-to-first-audio metrics Introduce TTS latency metrics to track time-to-first-audio: a histogram (om1_tts_latency_seconds) and a gauge (om1_tts_latency_last_seconds) in internal/metrics with labels model and endpoint. Register the metrics in init and instrument ElevenLabsProvider.synthesize to record the latency from request start to the first audio chunk, updating both the histogram and the last-value gauge. * Record HTTP timing metrics in ElevenLabs Add a metrics.RecordHTTPTiming call in synthesize to capture request host, path, method, status and timing-related headers (X-Proxy-Parse-Ms, X-Upstream-Total-Ms, X-Upstream-TTFB-Ms, X-Proxy-Total-Ms) for observability. The call is placed after the response is validated and before streaming the body so proxy/upstream latency details are emitted to metrics without changing streaming behavior. * Migrate om1 Grafana dashboard to Grafana v13 Convert grafana/dashboards/om1-dashboard.json to the Grafana 13 elements-based schema. Legacy panels, targets and fieldConfig blocks were replaced with elements.panel-* entries using PanelQuery/DataQuery and unified vizConfig (stat, timeseries, heatmap, piechart). Annotations were updated and cursorSync enabled; Prometheus expressions and legend formats were preserved while thresholds, units and display options were refined. Dashboard/visualization versions set to 13.0.1+security-01 for compatibility. * Remove .typos.toml Delete the .typos.toml typos-checker configuration file. The removed file contained ignore regexes, custom word mappings (OT, WAAS, ser, asend) and path excludes for src/unitree/ and src/ubtech/. * Add greeting modes, people TTS & convo history Add a greeting_conversation.json5 config to define approaching, greeting, and conversation modes with lifecycle hooks and transition rules. Extend the ElevenLabs provider to support per-utterance voice overrides (introduce ttsRequest, change queue type, AddTextWithVoice, and synthesize to accept voiceID). Add a new people-aware ElevenLabs TTS connector (plugins/actions/speak/elevenlabs_people_tts.go) that selects voice IDs based on FacePresence and configured name->voice mappings. Add a ConversationHistory input sensor (plugins/inputs/conversation_history.go) that polls IO for recent ASR lines, maintains a bounded sliding-window of messages, and formats history for LLM prompts. * Add Kokoro TTS provider and connector Introduce a new Kokoro TTS provider (internal/providers/kokoro.go) that streams PCM audio from an OpenAI-compatible Kokoro endpoint into a persistent ffplay process. The provider is a singleton with a queued, non-blocking AddText API, handles ffplay lifecycle, records TTS/HTTP metrics, and exposes configurable defaults (base URL, voice/model, output format, rate). Add a speak connector (plugins/actions/speak/kokoro_tts.go) to enqueue text for playback with optional silence-rate handling and configuration parsing. Also remove the legacy speak/elevenlabs_people_tts.go connector (voice-per-person logic) as part of this change. * Extract Google ASR common code, add RTSP input Refactor Google ASR implementation by extracting shared websocket/packaging/metrics/zenoh logic into a new google_asr_common.go (googleASRCommon). The existing microphone-based GoogleASR sensor now embeds and reuses the common code, simplifying google_asr.go. A new GoogleASRRTSPInput (google_asr_rtsp.go) was added to stream PCM from an RTSP source (via ffmpeg) and forward chunks through the shared common code. This removes duplicated functionality, centralizes websocket/zenoh handling and statistics, and enables multiple audio sources (mic and RTSP) to reuse the same ASR plumbing. * Add doc comment for GoogleASRConfig Add a brief comment above the GoogleASRConfig type to explain that it configures the local microphone-sourced Google ASR sensor. Improves code readability and intent for future maintainers. * Add knowledgebase support with HNSW embeddings Introduce a knowledgebase package that provides an HTTP embedder and an HNSW-backed KnowledgeBase (embedding + nearest-neighbor search) for RAG. Wire KB into the runtime: new KBSpec fields, runtime initializes the KB when configured, and fuser now accepts a logger and queries the KB using the latest voice input (with warnings logged on failures). Add on-disk KB artifacts (graph/json), remove legacy FAISS/PKL, and update go.mod/go.sum with required dependencies for coder/hnsw and related packages. Also include a small zenoh log message tweak and conversation config defaults for the KB (base URL, name, min_score). * Remove redundant comment about bufio.Reader Delete a two-line comment in loadGraph that explained wrapping *os.File with bufio.NewReader for hnsw.Import. The code behavior is unchanged — graph.Import still uses bufio.NewReader(f) — the comment was considered unnecessary. * Add LLM-driven greeting hook and config fixes Introduce LLM-based greeting generation for the greeting_start_hook: add prompt/defaults, generateGreeting and greetingLLM helpers, and a static fallback when LLM or face snapshot fails. Update greeting hook to log results and use TTS provider accordingly, and import time and the internal llm package. Also load lifecycle hook metadata into HandlerConfig during mode setup, and fix initializeMode to pass the correct modeConfig.cfg.LifecycleHooks to hooks.NewHooks. These changes improve greeting naturalness and ensure lifecycle hooks are initialized with meta. * Add tracing, sensor tick opt-in, and audio warmup Introduce a lightweight tracing facility and related runtime changes; add optional sensor tick behavior; and prime ElevenLabs audio on startup. - Add providers.Tracer (singleton) which writes JSONL traces to traces/YYYY-MM-DD.jsonl, supports Enable/Disable/Gauge/SetGeneration and daily rotation. - Wire tracer into Runtime: Runtime now holds a tracer, mode setup propagates SystemConfig.UseTracer to RuntimeConfig, runtime enables tracer for modes that opt in, increments generation per cortex loop, and records LLM prompt + structured tool-call output each tick. Tracer is stopped on runtime shutdown. - Add traceOutput helper to convert llm.Response tool calls to maps used for tracing. - Change cortex loop timing to use a per-iteration timer (allows immediate wake/reset semantics) and track a generation integer for tracing and logging. - Add TickTrigger interface and triggersTick helper so sensors can opt in to waking the cortex loop; Orchestrator.runSensor now checks triggersTick before signaling TickNow. google_asr_common now implements TriggersTick() and returns true. - ElevenLabsProvider: add warmupSilenceMs constant and warmUp() to prime audio device by playing a short silence burst (called before processing queue), plus minor playback flow adjustments. These changes improve observability of LLM interactions, avoid unnecessary cortex wake-ups by non-triggering sensors, and reduce first-play audio glitches by warming up the audio device. * Rename OM1_CONFIG to OM1_COMMAND and tidy Dockerfiles Update Dockerfile and docker-compose.yml to use OM1_COMMAND instead of OM1_CONFIG: the entrypoint now reads OM1_COMMAND (and passes it as -config to the binary), and docker-compose exposes OM1_COMMAND as the env var and removes the redundant command override. Also remove some Dockerfile comments and the mDNS nsswitch modification to simplify the image. Note: deployments should update any references to OM1_CONFIG to OM1_COMMAND to keep runtime behavior unchanged. * Update base image and add PortAudio deps Bump the build base image from golang:1.22-bookworm to golang:1.26-bookworm and install PortAudio packages. Added portaudio19-dev to the builder stage so native PortAudio bindings can be compiled, and added libportaudio2 to the runtime stage to provide the PortAudio runtime library. * Add ElevenLabs People TTS connector Introduce ElevenLabsPeopleConnector (plugins/actions/speak/elevenlabs_people_tts.go). Registers the action "speak/elevenlabs_people_tts" and extends ElevenLabsConnector to choose a voice based on the closest person detected by the FacePresence input (parses "Closest: <Name>"). Configuration supports "voice_id" (default) and "voice_ids" (person->voice map). Honors existing silence_rate throttling and enqueues text for ElevenLabs, falling back to the default voice when no matching person is found. * update config prompt for microsoft build (#2598) * update config prompt for microsoft build * add contect about microsoft build in the prompt * update prompt * add a new config * Trim trailing spaces in Microsoft Build greeting Remove stray trailing space characters from two identical lines in config/greeting_microsoft_build.json5 (Microsoft Build event description). This is a formatting cleanup only and does not change content or behavior. * Add knowledge base Prometheus metrics and dashboard panels (#2599) * Add knowledge base prometheus metrics and update grafana dashboard * Shorten RecordKBQuery comment Replace the verbose multi-line comment for RecordKBQuery with a concise one-line summary. Removed parameter-level details about embedSeconds/querySeconds and embedding-step behavior; no functional code changes. --------- Co-authored-by: openmindev <147775420+openminddev@users.noreply.github.com> * Remove CycloneDDS config and env/volume Delete cyclonedds/cyclonedds.xml and remove the related Docker Compose settings: ROS_DOMAIN_ID, CYCLONEDDS_HOME, CYCLONEDDS_URI, and CMAKE_PREFIX_PATH environment variables, as well as the ./cyclonedds volume mount. Cleans up unused CycloneDDS configuration and its mounting from the compose service. * Remove config schema generator workflow and script Delete the GitHub Actions workflow and Python script that generated and uploaded the OM1 configuration schema. Removed .github/workflows/generate-config-schema.yml (CI job that ran scripts/generate_schema.py, uploaded to S3 and invalidated CloudFront) and scripts/generate_schema.py (AST-based scanner that produced OM1_config_schema.json5 from inputs, LLMs, backgrounds, actions, hooks, and transition rules). This cleans up the repo by removing the automated schema generation pipeline. * Add extensive unit tests across packages Add a broad set of unit tests to improve coverage and validate behavior across the codebase. Tests added for internal packages: actions (loading, orchestrator, schema), backgrounds, config loader, fuser, hooks (runner and registry), httpclient, inputs (orchestrator and sensors), knowledgebase (embedding), llm (adapter and orchestrator), logger, metrics, and providers (face presence). Also update several plugin test files for LLM adapters. go.mod updated to include github.com/kylelemons/godebug as an indirect dependency. These tests exercise orchestration, plugin loading, schema generation, config parsing, background loops, input handling, KB integration, metrics recording, and error/panic recovery behavior. * Cleanup: ignore Close errors and remove unused field Small cleanup changes across tests and knowledgebase code: remove an unused `internal` field from moveInput, replace direct defer Close() calls with explicit ignores (e.g. `defer func() { _ = f.Close() }()`) to satisfy linters/static checks, and drop an obsolete test comment in qwen_test. These are non-functional cleanup changes to silence warnings and tidy tests. * Add ResetUserContext and reset on mode init Introduce ModeManager.ResetUserContext to clear the userContext map under mutex protection. Call ResetUserContext from Runtime.initializeMode to ensure per-user context is reset when a mode is initialized. Also remove a couple of now-unnecessary comments in manager.go. * Migrate ElevenLabs ASR to Go (mic + RTSP inputs) (#2597) * Migrate ElevenLabs ASR to Go (mic + RTSP inputs) Port the ElevenLabs ASR feature from the legacy Python plugins to Go, mirroring the existing Google ASR triple (common + mic + rtsp): - ElevenLabsASRInput: local microphone capture via PortAudio - ElevenLabsASRRTSPInput: RTSP audio decoded via ffmpeg - elevenlabsASRCommon: shared websocket streaming, transcript buffering, latency metrics, and zenoh broadcast on om/asr/text Faithful to the Python original: ElevenLabs short language codes (auto/en/zh/...), partial->speech-start / committed->final-transcript protocol, CJK-aware length filter (rune count, not byte count), and om1_asr_latency metrics labeled model="elevenlabs". Reuses the package-level helpers shared with Google ASR (AudioMetadata, ASRMessage, ASRStatistics, cjkRegex, serializeASRText) so the working Google path is untouched. Adds unit tests (testify/require) covering transcript acceptance, language mapping, audio packaging wire format, websocket message handling, and buffer flush semantics. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix issues * Merge asr_common * Fix test cases * Refine comments and document ASR wire layout Clean up and clarify comments across multiple files: remove Python-centric references and streamline wording in background orchestrator, avatar CDR comments, greeting conversation, Kokoro TTS, approaching person, and face presence. Explicitly document the CDR wire layout for ASRText in plugins/inputs/asr_common.go and clarify ModeManager highestPriorityTarget behavior (ties preserve config order). These are documentation/comment-only changes and do not alter runtime behavior. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: openmindev <147775420+openminddev@users.noreply.github.com> * Add integration tests and testdata Add an integration test suite and supporting fixtures for end-to-end runtime verification. Introduces test/integration/*.go (harness, config runner, and a fuzz driver) that provide a test harness with mock inputs, a scripted LLM, and a recording action connector; helpers run the runtime and assert recorded actions. Adds multiple JSON5 test cases under test/integration/testdata/test_cases and a fuzz case used by the fuzz driver exercising mode transitions, time-based/context-aware/input-triggered transitions, multi-action outputs, and negative/no-op scenarios. Also updates the Makefile help text and adds a test-integration target that runs the integration tests with the integration build tag (and ensures zenoh-c is downloaded before running). * Remove URID from TestBuildMeta expectations Update TestBuildMeta to reflect that SystemConfig and buildMeta no longer include URID. The test setup no longer provides URID and the expected meta map removes the "URID" key to match current behavior. * Refactor TTS into internal/providers/tts with interrupt handling (#2601) * Refactor TTS into internal/providers/tts Move ElevenLabs and Kokoro TTS implementations into a new internal/providers/tts package and introduce a shared ttsBase/player to consolidate common playback logic. Deleted the old providers elevenlabs.go and kokoro.go, renamed tts state files and added an Interrupt atomic flag. Updated imports and callers across hooks, plugins (greeting_conversation, speak), and input sensors to reference the new tts package and to support interrupting TTS when user speech is detected. This centralizes TTS behavior, reduces duplication, and adds TTS interrupt handling for inputs. * TTS: add generation-based interrupt handling Introduce a generation counter to TTS to avoid playing utterances queued before an interrupt. Changes: - Add generation atomic counter and RequestInterrupt() which increments generation and sets Interrupt. - Add generation field to ttsRequest and stamp new requests with generation.Load() when enqueuing. - Skip processing queued requests whose generation is older than the current generation in processAudio(), instead of draining the whole queue. - Remove the previous drainQueue implementation. - Update ASR input (asr_common) to call tts.RequestInterrupt() when TTS should be interrupted. This prevents stale queued utterances from playing after an interrupt (e.g., user speech) while keeping the queue intact for newer requests. * Log websocket message payloads as text or bytes Enhance ws client logging in readLoop: instead of logging only the byte length, log the actual payload. Text messages are logged as strings, and non-text messages are logged as byte slices (using zap.ByteString). Message type field and onMessage behavior are unchanged; this improves debugging visibility of incoming payloads. * Remove Microsoft greeting config; update Unitree Delete the Microsoft Build greeting config and simplify environment example. Removed config/greeting_microsoft_build.json5 and trimmed .env.example (removed several unused integration placeholders and URID). Updated config/unitree_g1_conversation.json5: changed default agent_name from Pam to Iris, removed the approaching mode, expanded allowed physical actions, enabled TTS interrupt for ASR input, replaced the greeting_conversation action with a speak action using elevenlabs_tts (added voice_id), cleared lifecycle hooks, and reset transition_rules to an empty list. These changes streamline the conversation config and remove event-specific settings. * Add new arm action enum values Extend ArmAction.EnumValues with additional gesture/action identifiers to support more arm behaviors: come_closer, flexible, hands_up, heart, push, rotate_hands, salute, shrug, speack_action_extended, and speak_action. This expands the set of supported actions for the ARM G1 plugin (note: speack_action_extended retains the existing spelling from the diff). * Handle interrupts while finishing ffplay Add interrupt handling during TTS playback teardown. After synthesis, processAudio now checks for an Interrupt and calls handleInterrupt(). finishPlayback was rewritten to poll (50ms) for global Interrupts while waiting for ffplay, killing the ffplay process early if interrupted and preserving the existing 10s timeout. Also ensure p.ffplay is cleared on exit. This makes external interrupts terminate ffplay promptly and avoids leaving the process reference set. * Add model param and escape API key in Google ASR Expose an optional `model` config for Google ASR (regular and RTSP) and include it as a query parameter on the WebSocket URL when set. Also URL-escape the `api_key` when constructing the WS endpoint to avoid malformed URLs. Updated the API docs to list the new `model` parameter. * Require ASR transcripts to have 3+ words Tighten acceptASRTranscript heuristic: change the non-CJK minimum word count from >1 to >2 (i.e., require at least three words). This reduces short or ambiguous transcripts being accepted and cuts down on noise; CJK handling remains unchanged. * Update Unitree G1 identity, gestures, and TTS Change default robot name from Pam to Iris and streamline identity text. Expand allowed physical actions and add detailed behavior guidelines/gesture mappings to improve interaction (e.g., shake_hand, face_wave, salute, heart, shrug, come_closer, rotate_hand, flexible, speak_action(_extended), etc.). Add a new agent input entry (arm_g1, llm_label: robot_action, connector: zenoh). Add "model: \"long\"" to the TTS configuration. Also simplify some conversation prompt content and remove prior event/company-specific guidance. * Migrate documentation to Golang (#2596) * update readme for Golang migration - initial commit * update contribution.md * update config.md * update input.md * update introduction.md * update new_mode.md for Golang migration * update example docs files for Golang migration * update intro and get started docs for Golang migration * update config and input docs for Golang migration * update llm and action docs for Golang migration * update project structure docs for Golang migration * update trouble shooting guide for Golang migration * update remaining docs for Golang migration * updated docs * updated make commands and steps in readme and getting started * Adjust ASR tests for three-word transcript Update ASR unit tests to expect a three-word transcript "hello there world" instead of the previous two-word string. In plugins/inputs/elevenlabs_asr_test.go: add a "three english words" case (accepted) and change the "two english words" case to be rejected; update the committed message assertion to match the new three-word transcript. In plugins/inputs/google_asr_test.go: update the expected parsed reply to "hello there world" and ensure speech timing state is validated accordingly. These changes align test expectations with updated transcript handling logic. * Add GitHub Actions binary release workflow Add a new .github/workflows/binary-release.yml workflow to build and publish OM1 binaries. The workflow supports workflow_dispatch inputs (version, publish) and automatic behavior for tag pushes and nightly builds, sets ZENOH_C_VERSION, caches zenoh-c, and builds for linux-amd64, linux-arm64, darwin-arm64, darwin-amd64 and windows-amd64. Artifacts are packaged (tar.gz on Unix, zip on Windows) with bundled zenoh-c libraries and adjusted rpaths (patchelf / install_name_tool), checksums are generated, and artifacts are uploaded. When publish is enabled the job flattens artifacts, manages a nightly tag, and creates/updates a GitHub Release with the built files. * Update CI push trigger to 'go' branch Change the GitHub Actions binary-release workflow to trigger on pushes to the 'go' branch instead of 'main'. Tag-based triggers and manual workflow_dispatch remain unchanged. Aligns the release workflow with the repository's branch naming. * Switch macOS runner and remove Windows build job Update .github/workflows/binary-release.yml to use the macos-15-intel runner for the darwin-amd64 job (was macos-13). Remove the entire Windows build job (build-windows) including MSYS2 setup, zenoh-c download, build and packaging steps, and adjust the release job dependencies to no longer require build-windows. * Update nightly release name Change the release name used by softprops/action-gh-release for nightly builds from 'Nightly (latest main)' to 'Development Build' in .github/workflows/binary-release.yml. No other workflow behavior was modified. * Format README note and link MIT license Update README.md for clarity and readability: convert the plain Note line into a markdown admonition ([!NOTE]) for consistent styling, and replace repeated plain-text MIT License mentions with links to the project's LICENSE file (./LICENSE) to make it easier for readers to view the full license. * Condense README license section Replace a verbose, repetitive paragraph describing the MIT License with a concise single-line reference to the LICENSE file in README.md to reduce verbosity and duplication. * Add release environment and summary step Set job environment name and url based on the release version (nightly -> development, otherwise production) and wire the release URL to the environment. Add an id to the GitHub release step so its outputs can be referenced. Append a new step that writes a formatted release summary to $GITHUB_STEP_SUMMARY including release URL, channel, commit SHA and listed release assets. * Use 'staging' env for nightly releases Update .github/workflows/binary-release.yml to set environment.name to 'staging' when needs.setup.outputs.version == 'nightly' (previously 'development'), falling back to 'production' otherwise. This routes nightly release artifacts to the staging environment. * Add parallel asr (#2604) * Migrate Riva to Golang * Add parallel ASR * Remove testing functions * Fix merge conflicts * Shorten comments * Run make fmt * Optimize ASR folder structure * Rename ASR model to provider and refactor aggregator to sensor core (#2606) * Rename ASR model->provider and refactor aggregator Replace the 'Model' identifier with 'Provider' across ASR configs, streams, metrics, logs, and tests. Refactor asrAggregator into asrSensorCore (renaming constructor/newAggregator to newSensorCore and updating receivers and methods). Update transcriberStream to use provider and adjust onTranscript signature, metric labels, logging keys, and parallel-ASR dedup state (lastModel -> lastProvider). Also include minor comment and formatting tweaks. * Add VSCode Go settings and stop ignoring .vscode Remove .vscode from .gitignore and add .vscode/settings.json to commit VS Code Go configuration. The new settings configure go test flags (-p 8, -v), set CGO include/lib paths and runtime library paths to the local .zenoh-c directory, and enable the "integration" build tag so VS Code can build and run integration tests that rely on the native zenoh-c library. --------- Co-authored-by: openminddev <147775420+openminddev@users.noreply.github.com> * Update README (#2608) * Migrate Riva to Golang * Add parallel ASR * Remove testing functions * Fix merge conflicts * Shorten comments * Run make fmt * Optimize ASR folder structure * Rename ASR model->provider and refactor aggregator Replace the 'Model' identifier with 'Provider' across ASR configs, streams, metrics, logs, and tests. Refactor asrAggregator into asrSensorCore (renaming constructor/newAggregator to newSensorCore and updating receivers and methods). Update transcriberStream to use provider and adjust onTranscript signature, metric labels, logging keys, and parallel-ASR dedup state (lastModel -> lastProvider). Also include minor comment and formatting tweaks. * Add VSCode Go settings and stop ignoring .vscode Remove .vscode from .gitignore and add .vscode/settings.json to commit VS Code Go configuration. The new settings configure go test flags (-p 8, -v), set CGO include/lib paths and runtime library paths to the local .zenoh-c directory, and enable the "integration" build tag so VS Code can build and run integration tests that rely on the native zenoh-c library. * Add Go vs Python feature comparison to README Introduce a new "Go vs. Python Feature Comparison" section in README.md that documents current parity between the Go and Python runtimes. Adds a capabilities table showing which features are available or under development in Go (hardware connectors, VLMs, sensors, messaging, simulators, full autonomy, etc.), plus a note recommending the Python runtime for features still marked as under development and links to the Python runtime and contributing guidance. --------- Co-authored-by: Shicai He <94800998+shicaih@users.noreply.github.com> * Add VLM video stream provider and utils Introduce a new internal/providers/vlm package that implements video capture and utilities. Adds Frame (with custom JSON/base64 marshaling), streamBase (lifecycle, buffering, drop counting), and helpers splitJPEGStream and jpegQScale. Implements VideoStream (camera capture via ffmpeg) and VideoRTSPStream (RTSP capture with reconnect logic and ffmpeg arg builders). Adds video device enumeration for Linux and macOS (avfoundation) and unit tests covering JPEG splitting, qscale mapping, Frame JSON, stream lifecycle, and defaults. This enables capturing MJPEG frames from local devices or RTSP sources and provides safe, buffered delivery to consumers. * Add VLM input plugin and response latency metrics Introduce a Visual Language Model (VLM) input plugin (camera + RTSP) with a vision client, sensor implementation, Gemini defaults and unit tests; register the plugin in inputs. Add Prometheus VLM metrics and a generic RecordResponseLatency helper in internal/metrics and switch existing OpenAI-compatible LLM providers to use it (removing duplicated latency/logging code). Improve video capture handling: run-restart loop for VideoStream, lower default JPEG quality, add camera retry delay and util.Sleep usage, and minor RTSP stream cleanup. Update README and conversation config to surface VLMGemini support. * Update README VLM support status Clarify Visual Language Models (VLM) support in the README: update the table note to indicate OpenAI and Gemini VLMs are supported (removing the previous note about lack of Go support). This aligns the documentation with current capabilities. * Add VLM describer, latest frame cache & greeting Introduce an internal VLM describer (internal/providers/vlm) to call vision chat-completions with optional image attachment and record metrics. Add a singleton LatestFrame provider (internal/providers/latest_frame.go) with tests to store and retrieve the most-recent JPEG frame and a freshness check. Integrate vision-based greeting into the greeting hook: attempt a vision describe using the latest frame (with fall back to text-only LLM), and add related defaults and helpers. Add util.FirstNonEmpty and its tests, refactor the vlm input to use the new describer and to populate the LatestFrame, and remove the old vlm client implementation. * Remove DecodeFormat and add RTSPURL default Remove the DecodeFormat field and its default from the VideoRTSPStream implementation and tests (internal/providers/vlm). Update VideoRTSPStreamConfig and NewVideoRTSPStream to no longer handle decode format. Add a plugin-level default RTSP URL and ensure NewRTSPSensor fills cfg.RTSPURL when empty (plugins/inputs/vlm), and adjust the constructor call to match the simplified config. Update tests to drop the DecodeFormat assertion. * Track pending TTS requests and wait for playback Add an atomic pending counter and Busy() helper so Busy reflects queued-but-unplayed speech. Increment pending when requests are enqueued and decrement when handled. Refactor player loop into handleRequest to centralize synthesis/playback, ensure Speaking is set/cleared reliably, handle pre-roll silence, errors and interrupts, and require ffplay availability. In greeting_conversation: introduce ttsPollInterval, use tts.Busy() instead of fragile flags, remove pendingFinishedUpdate, add switchWhenTTSDone to wait (with timeout) for TTS to drain before switching modes, and simplify waitingOnTTS logic. These changes prevent mode switches while queued TTS remains and add a timeout to avoid stalling. * Add GreetingStatus sensor and final-turn guidance Add a new GreetingStatus input sensor and expose final-turn guidance from the greeting state machine. - config/greeting_conversation.json5: register the new GreetingStatus input in the conversation config. - internal/providers/greeting_conversation_state.go: add finalTurnGuidance constant and an EndingGuidance() method (mutex-protected) that returns guidance when the conversation is concluding or about to hit max turns. - plugins/inputs/greeting_status.go: new sensor that registers as "GreetingStatus", retrieves the GreetingConversationStateMachineProvider, and exposes the EndingGuidance via FormattedLatestBuffer(). Other sensor methods are present as no-ops/stand-ins. This enables the runtime to surface a short LLM guidance for the final exchange so the assistant can produce a brief, warm goodbye and mark the conversation as finished. * Add VLMGeminiRTSP and adjust ending logic Register VLMGeminiRTSP in the greeting conversation config (added to two component lists). Adjust EndingGuidance in the state machine: update the comment, remove special-case checks for finished/concluding states, and change the trigger condition to only return finalTurnGuidance when turnCount+1 > maxTurnCount (tightens the ending logic/addresses an off-by-one and redundant state handling). * Remove three persona IDs from greeting config Delete rubail, samantha, and david persona ID entries from config/greeting_conversation.json5. This cleans up the persona mapping in the greeting conversation configuration by removing these (presumably obsolete or unused) entries. * Add VLMGeminiRTSP to unitree convo config Update config/unitree_g1_conversation.json5: remove the `model: "long"` line from the existing block and append a new object `{ type: "VLMGeminiRTSP" }` to the handlers array. This enables integration of a Gemini RTSP VLM entry in the conversation configuration. * Add go2 odom zenoh provider and CDR helpers Introduce a Unitree GO2 odometry Zenoh provider and tests, plus CDR (de)serialization helpers. Adds internal/providers/unitree/go2/odom_zenoh.go implementing OdomZenohProvider that subscribes to a PoseStamped topic, decodes CDR-encoded payloads, computes pose/yaw/movement/body state and exposes a Position snapshot; and internal/providers/unitree/go2/odom_zenoh_test.go with unit tests for deserialization, quaternion->euler and odom processing. Also extends internal/zenoh/cdr.go with ReadFloat64LE and AppendFloat64LE utilities and clarifies AppendCDRString behavior. * Add Unitree Go2 odom sensor and format tweaks Register and implement a new UnitreeGo2Odom input sensor (with tests) that reads odometry from the unitree/go2 provider and produces human-readable messages. Update plugins/inputs to import the unitree/go2 plugin so it is built. Standardize message formatting across inputs: conversation_history, face_presence, and vlm now include the descriptor and use a consistent quoted format; tests adjusted accordingly. Minor whitespace/clarity tweaks in internal/providers/unitree/go2/odom_zenoh.go (no behavior changes). * Add zenoh probe tool and fix go2 odom decoding Add a new zenoh diagnostic tool and Makefile target: introduce cmd/zenohprobe (standalone probe for subscribing to zenoh keys, hex-dumping payloads and optionally decoding PoseStamped) and expose it via a new `make probe` target. Update internal providers/unitree/go2/odom_zenoh.go: change default odom topic to "odom", add a rate-limited debug log (15s) to reduce spamming, and improve CDR decoding for the Go2 odometry payload by accounting for child_frame_id and updating comments to reflect nav_msgs/Odometry layout. Minor safety checks and alignment handling included. * Update go2 odom logs: add z and body_attitude Remove the verbose Info log emitted when movement is detected to reduce log noise, and add the z coordinate and body_attitude field to the periodic debug log so diagnostics include full pose information. * Add paths provider and Paths sensor (#2611) Introduce a PathsProvider that subscribes to the om/paths topic via zenoh, deserializes CDR-encoded Paths messages, classifies path indices into movement options (turn left, move forwards, turn right, move back) and generates a natural-language assessment string. Expose methods to retrieve the latest lidar string, valid paths and movement options, and add graceful stop/cleanup. Add a PathsSensor plugin that registers as "Paths", polls the provider on a fixed cadence, converts raw assessments into timestamped messages, maintains a bounded history and formats the latest buffer for prompts. Unit tests cover payload deserialization and movement-string generation. * Add Unitree Go2 config and update Paths comments Add a new autonomy config for Unitree Go2 (config/unitree_go2_autonomy.json5) defining a 'Bits' agent: system prompts, inputs (Unitree odom, Google ASR, VLM Gemini RTSP, Paths), Gemini LLM settings, and actions for TTS and emotion. Also refine comments in plugins/inputs/paths.go to clarify RawToText now documents appending assessments to the message buffer and FormattedLatestBuffer description is simplified to state it returns the most recent buffered assessment. * Add Unitree Go2 autonomy plugin and rename arm connector (#2612) * Add Unitree Go2 autonomy and rename arm connector Introduce a new Unitree Go2 autonomy plugin and related movement primitives, and standardize connector naming/logging. - Add a full autonomy connector for Unitree Go2 (plugins/actions/unitree/go2/autonomy): movement planning, cmd_vel serialization, AI status request/response, guard watcher, and unit tests. - Register and wire the new autonomy and unitree arm plugins in plugins/actions/actions.go. - Rename/move the G1 arm connector from arm_g1 to unitree_g1_arm (including tests) and update config references in greeting/unitree configs. - Add PathsProvider Movement() and Movement struct (internal/providers/paths.go) to expose derived path options programmatically. - Add util.StringFrom helper (internal/util/convert.go) for safe string extraction from decoded JSON. - Standardize logger contexts (logger.Named) and clean up log message prefixes across multiple connectors/background tasks. These changes add autonomous movement capabilities, improve APIs for path/movement data, and align connector names used in configs and registration. * dualLLM: add logger fallback and use it Add a logger() helper to dualLLM that returns the existing logger or a default named "DualLLM". Replace direct d.log references with d.logger() at the sub-call failure, race completion, and quality-evaluation fallback sites to avoid nil logger panics and ensure consistent logger naming. * Rename move_go2_autonomy to unitree_go2_autonomy Update config/unitree_go2_autonomy.json5: changed the action entry name from "move_go2_autonomy" to "unitree_go2_autonomy" to align the config entry with the file/connector naming and avoid naming inconsistencies. * Add stall detection and progress tracking for moves Introduce progress-based convergence checks and timeouts for move commands. Shorten tick interval (100ms -> 50ms) and add stallTimeout, commandTimeout, turnProgressEps and driveProgressEps constants. Extend moveCommand with timing/progress fields (started, lastImprove, bestGap) and add markPhase/recordProgress helpers. Initialize progress tracking in queue and replace legacy movementAttempts/gapPrevious counters with recordProgress checks in tickTurn and tickDrive, aborting on stall or command timeout. Adjust turn/drive logic accordingly and remove obsolete counters from moveConnector. * Remove redundant comment in move.go Delete an unnecessary comment in plugins/actions/unitree/go2/autonomy/move.go that described an "overshoot" nudge; no functional changes were made to the logic, just cleaned up the source for clarity. * Add Cloud Sim support (#2613) * Add Unitree Go2 autonomy and rename arm connector Introduce a new Unitree Go2 autonomy plugin and related movement primitives, and standardize connector naming/logging. - Add a full autonomy connector for Unitree Go2 (plugins/actions/unitree/go2/autonomy): movement planning, cmd_vel serialization, AI status request/response, guard watcher, and unit tests. - Register and wire the new autonomy and unitree arm plugins in plugins/actions/actions.go. - Rename/move the G1 arm connector from arm_g1 to unitree_g1_arm (including tests) and update config references in greeting/unitree configs. - Add PathsProvider Movement() and Movement struct (internal/providers/paths.go) to expose derived path options programmatically. - Add util.StringFrom helper (internal/util/convert.go) for safe string extraction from decoded JSON. - Standardize logger contexts (logger.Named) and clean up log message prefixes across multiple connectors/background tasks. These changes add autonomous movement capabilities, improve APIs for path/movement data, and align connector names used in configs and registration. * dualLLM: add logger fallback and use it Add a logger() helper to dualLLM that returns the existing logger or a default named "DualLLM". Replace direct d.log references with d.logger() at the sub-call failure, race completion, and quality-evaluation fallback sites to avoid nil logger panics and ensure consistent logger naming. * Rename move_go2_autonomy to unitree_go2_autonomy Update config/unitree_go2_autonomy.json5: changed the action entry name from "move_go2_autonomy" to "unitree_go2_autonomy" to align the config entry with the file/connector naming and avoid naming inconsistencies. * Add stall detection and progress tracking for moves Introduce progress-based convergence checks and timeouts for move commands. Shorten tick interval (100ms -> 50ms) and add stallTimeout, commandTimeout, turnProgressEps and driveProgressEps constants. Extend moveCommand with timing/progress fields (started, lastImprove, bestGap) and add markPhase/recordProgress helpers. Initialize progress tracking in queue and replace legacy movementAttempts/gapPrevious counters with recordProgress checks in tickTurn and tickDrive, aborting on stall or command timeout. Adjust turn/drive logic accordingly and remove obsolete counters from moveConnector. * Remove redundant comment in move.go Delete an unnecessary comment in plugins/actions/unitree/go2/autonomy/move.go that described an "overshoot" nudge; no functional changes were made to the logic, just cleaned up the source for clarity. * Add cloud session and hybrid zenoh backend Introduce a cloudsession package (client, session, topics) implementing a WebSocket-based cloud broker client with binary/JSON frames and tests. Add zenoh backend abstractions: a Session interface, concrete local zenoh backend, cloud backend adapter, and a hybrid session that routes IsCloudTopic topics to the cloud broker. Wire default zenoh options via SetDefaultOptions and use SystemConfig.UseSim/APIKey in runtime to enable the hybrid/cloud mode. Update various providers and plugins to depend on the zenoh Session/Publisher/Subscriber interfaces instead of concrete types. * Generalize meta map and include UseSim flag Change buildMeta to return map[string]any (instead of map[string]string) and update its comment to reference system-level values. Add the "use_sim" boolean to the metadata when SystemConfig.UseSim is true. Update addMeta to accept meta as map[string]any and adjust tests to the new type, adding a test that verifies use_sim is omitted when false. * Add simulator note and use_sim config Update README to consolidate simulator status: add a Simulators row noting Gazebo/Isaac Sim support and two Zenoh backend types, and remove the duplicate/older entry. Add use_sim setting to unitree_go2_autonomy.json5 (env var USE_SIM, default false) so the runtime can be toggled to use simulators. * Add Unitree Go2 location action connector Introduce a new actions connector for Unitree Go2 that saves/records the robot's current location via an HTTP POST to a map/orchestrator endpoint and announces success via ElevenLabs TTS. Adds configuration parsing with sensible defaults (base URL, map name, TTS defaults, timeout) and a Connector implementation (Connect, Tick, Stop). Registers the interface and the action in plugins/actions/actions.go and includes comprehensive unit tests for config pars…
openminddev
added a commit
that referenced
this pull request
Jun 11, 2026
* Add Go runtime and plugin scaffolding
Introduce a new Go workspace for the om1 project: add Makefile, cmd/main, go.mod and go.sum. Implement core internal packages for runtime functionality and plugins: config loader/types, actions (connectors, orchestrator, schema generation and tests), backgrounds (orchestrator and registry), inputs (sensors and orchestrator), llm interfaces, fuser (prompt fusion + KB), hooks runner, http client, and plugin entrypoints. Includes unit tests for action schema generation and action orchestrator tick/stop behavior. Provides plumbing for registering/loading plugins and basic orchestration patterns (concurrent/sequential/dependency modes).
* Add GoogleASR, WebSocket client, refactor inputs
Add a reconnecting WebSocket client and a new GoogleASR input plugin (PortAudio + WS) and include a conversation.json5 example. Refactor the inputs API: Sensor interface changed (Listen/Poll/RawToText/FormattedLatestBuffer/Stop), Message shape renamed, and Orchestrator no longer stores buffers internally (uses sensors' FormattedLatestBuffer). Update Fuser.Fuse to accept sensor buffer slices. Update runtime to pass sensor buffers and simplify LLM orchestration: Orchestrator now wraps 'llm' field and manages history; call sites adjusted. Rename and consolidate speak action plugin to speak/elevenlabs_tts (types and logging names updated) and add actions package entrypoint; remove old move and speak packages. Update telemetry API: IOProvider.RecordTick signature simplified. Add go.mod dependencies for portaudio and gorilla/websocket. Improve JSON5 loader to quote unquoted keys. Note: these changes introduce several breaking API changes (Sensor, Fuser.Fuse, RecordTick, action connector types) that require updates across callers.
* Add zenoh-c support and TTS speaking flag
Makefile: add automatic download/installation of zenoh-c, set CGO flags and platform-specific DYLD/LD env handling, and propagate the library path to build, run, lint, test, fmt, vet, and dependency targets; expand help text and fix install target.
Providers: add a new atomic Speaking flag (internal/providers/tts_state.go) to indicate when TTS is streaming audio.
ElevenLabs TTS: set providers.Speaking true before synthesis/playback and false after to mark active playback; remove an extra enqueue log line.
Google ASR: skip audio capture while providers.Speaking is set to avoid ASR picking up playback audio.
These changes ensure the zenoh-c runtime library is available during development and CI, and avoid input capture interfering with TTS playback.
* Ignore Zenoh runtime directory in .gitignore
Add an entry to .gitignore to exclude the Zenoh runtime/cache directory (go/.zenoh-c/) and a comment marker. This prevents local Zenoh state from being accidentally committed.
* Refactor plugins, schema exports, and logging
Expose and refactor internal plugin/schema APIs, tighten logging, and remove unused test code. Key changes:
- Actions: clarify AgentAction fields, introduce Factory/Register/Load patterns for connectors.
- Schema: export InterfaceSpec, InterfaceRegistry and schema helpers (BuildSchema, BuildPropertySchema, KindToJSONType) and wire BuildSchemaForAction to use them.
- Backgrounds: add Factory registry, Register and Load helpers and an UnknownPluginError type.
- Providers: remove the HistoryManager/related LLM history code and unused imports.
- LLM/Runtime/Inputs/Plugins: remove debug prints, adjust buffering and minor formatting/locking fixes.
- Google ASR: add Time to ASRMessage, reduce stats ticker interval (30s→15s), improve latency logging, and tidy audio packaging/stream handling.
- Deleted obsolete tests (actions/orchestrator_test.go, actions/schema_test.go).
These changes prepare the codebase for external use of schema/registry helpers, improve observability, and clean up dead code.
* Add zenoh session, global logger, and arm_g1 plugin
Introduce a global logger package and replace ad-hoc zap creation across plugins: internal/logger provides Set/Get for a shared *zap.Logger; main sets the global logger and passes the logger into runtime. Add a CGo zenoh wrapper (internal/zenoh/session.go) that exposes Open/Put/Close to publish raw bytes via zenoh-c (requires zenoh-c headers/libs). Add a new arm_g1 Zenoh action plugin (plugins/actions/arm_g1_zenoh) that registers the arm_g1 interface and publishes Unitree G1 arm requests to a Zenoh topic, including a CDR little-endian serializer for Unitree requests. Wire the new plugin by importing it in plugins/actions/actions.go. Update existing plugins (emotion, speak/elevenlabs_tts, inputs/google_asr) to use logger.Get() instead of creating new zap instances and add an info log when enqueueing TTS text.
* Use ./config paths in go/Makefile
Change config path references in go/Makefile from ../config/*.json5 to ./config/*.json5 for the run, dev, and list-configs targets so configuration files are resolved relative to the go directory when invoking these commands.
* Switch to zenoh-go client (remove cgo)
Replace the old cgo-based zenoh-c wrapper with the pure-Go github.com/eclipse-zenoh/zenoh-go client. Implement a new Session using zenoh-go (Open with optional endpoint, Put, Close) and add a Publish helper. Simplify the arm_g1/zenoh connector to use the new session API (no config parsing, rely on default), and add debug log-level to the Makefile dev run. go.mod was updated to include the zenoh-go dependency and related indirect changes.
* Add context cancellation to ws client and fix locks
ws: add context with cancel to Client, use DialContext so in-progress TLS handshakes are interrupted on Close, and call cancel() from Close. Make Close idempotent for stopCh. Improve read/write loops: early-return on stopCh, log read loop stop, treat normal close as non-error, and fix conn mutex handling in writeMessage with deferred unlock. google_asr: Fix Stop() to avoid calling Stream methods while holding the sensor mutex by capturing and nil-ing paStream under lock, unlocking, then stopping/closing the stream. These changes improve shutdown correctness and avoid deadlocks during teardown.
* Add Unitree G1 config; update loader & plugin
Add a new Unitree G1 conversation JSON5 config (go/config/unitree_g1_conversation.json5) providing a greeting mode, agent settings, actions and connectors. Fix stripJSON5 in go/internal/config/loader.go to remove backslash-newline continuations before splitting and preserve trailing-comma cleanup. Update plugin imports and package: change actions import from arm_g1_zenoh to arm_g1 and rename file/package go/plugins/actions/arm_g1_zenoh/... to go/plugins/actions/arm_g1/zenoh.go with package name adjusted accordingly.
* Add zenoh Publisher, use in arm_g1, fix ticks
Introduce a Publisher wrapper in the zenoh session package with DeclarePublisher/Put/Drop and adjust Session.Close to call Close(nil). Update arm_g1 connector to open a zenoh session with optional endpoint, declare a publisher for sport requests, use the publisher for puts, log/handle failures gracefully, and drop the publisher on Stop. Fix CDR alignment, buffer sizing and padding in serializeUnitreeRequest and standardize the JSON parameter formatting to match the Python connector. Convert Tick implementations (emotion, arm_g1, elevenlabs TTS) to block on ctx.Done() and simplify Stop to properly cleanup publishers and sessions.
* Add Zenoh CDR helpers and ASR Zenoh publish
Introduce CDR serialization helpers and integrate Zenoh publishing for ASR, plus add WS reconnect behavior.
- Add internal/zenoh/cdr.go with AppendInt32LE, AppendUint32LE, AppendInt64LE and AppendCDRString helpers for little-endian CDR encoding.
- Use the new zenoh helpers in plugins/actions/arm_g1 to replace local byte-append helpers.
- Integrate Zenoh into GoogleASR: add ZenohEndpoint config, open a session and declare a publisher during sensor init, publish serialized ASR text (serializeASRText) on buffer flush, and clean up publisher/session on stop. serializeASRText builds a CDR LE payload including a timestamp, UUID frame_id, and the text.
- Update internal/ws client Connect to support automatic reconnect when cfg.Reconnect is true, retrying every 5s and honoring context cancellation.
- Add usage of github.com/google/uuid in ASR serialization.
* Switch emotion action connector to zenoh
Replace the old log-based emotion action with a Zenoh-backed connector. Configs (conversation.json5 and unitree_g1_conversation.json5) now reference connector "zenoh" for the emotion action. The previous emotion/log connector implementation was removed and a new emotion/zenoh connector was added which opens a Zenoh session, declares a publisher on topic "om/avatar/request", and publishes serialized AvatarFaceRequest payloads (CDR little-endian encoding with timestamp, request ID, code, and face_text). The new connector gracefully handles Zenoh unavailability, logs publish results, and cleans up publisher/session on Stop.
* Serialize face_text with explicit length and NUL
Replace zenohsession.AppendCDRString with manual serialization: append a NUL (0x00) to faceText, write the byte length as a little-endian uint32, then append the bytes. This ensures the CDR string is encoded with an explicit length and null terminator for compatibility with the Zenoh consumer.
* Add Emotion action input and export connector
Introduce an Emotion enum and EmotionInput struct for the emotion action, including EnumValues enumerating supported expressions (happy, confused, curious, excited, sad, think). Register the emotion interface with a descriptive message and register the Zenoh connector. Rename and export constructor from newZenohConnector to NewZenohConnector and update its registration. Also remove redundant comment lines in arm_g1/zenoh.go.
* Fix CDR padding in avatar request serialization
Adjust serializeAvatarRequest to follow CDR alignment rules: align before fields (no trailing padding after request_id since next field is int8) and insert padding before the face_text uint32 length. Implement request_id encoding without post-padding (write length + bytes explicitly), increase buffer capacity, and clarify comments about wire layout. Also remove a redundant comment in arm_g1 zenoh publisher code.
* Add AvatarProvider and zenoh subscriber support
Introduce a new AvatarProvider (go/internal/providers/avatar.go) that manages zenoh publishers/subscribers for om/avatar/request and om/avatar/response, handles CDR (pycdr2-compatible) serialization/deserialization for avatar commands and health checks, and exposes SendAvatarCommand. Extend zenoh session (go/internal/zenoh/session.go) with Subscriber type, DeclareSubscriber and Drop to support incoming messages. Update emotion plugin (go/plugins/actions/emotion/zenoh.go) to use the AvatarProvider singleton instead of managing its own zenoh session/publisher and remove duplicate serialization logic; Stop() is simplified accordingly. The provider handles unavailable zenoh sessions gracefully and replies to STATUS requests with health responses.
* Clean up avatar provider comments and logs
Add a descriptive comment for NewAvatarProvider, simplify the handleRequest comment, and remove debug log statements for health-check requests/responses to reduce log noise. Minor whitespace cleanup; no functional changes to request handling or publishing behavior.
* Add ElevenLabs TTS provider and lifecycle hooks
Introduce a centralized ElevenLabs TTS provider and lifecycle hook support across the runtime.
- Add go/internal/providers/elevenlabs.go: a singleton ElevenLabsProvider that handles HTTP synthesis, persistent ffplay streaming, queueing, silence handling and lifecycle for TTS playback.
- Extend hooks runner (go/internal/hooks/hooks.go): support templated variables, improved command execution with captured stdout/stderr, message handler that forwards messages to the ElevenLabs provider, helper funcs for template formatting and safe string extraction.
- Wire lifecycle hooks into runtime and mode manager (go/internal/runtime/*): global hooks are created and invoked for OnStartup, OnEntry and OnExit with context payloads; time-based transitions fire OnTimeout hooks; Transition now includes reason and transition context.
- Refactor speak action (go/plugins/actions/speak/elevenlabs_tts.go): remove duplicated ffplay/http logic and use the new providers.ElevenLabs singleton, simplifying connector and lifecycle.
- Improve WebSocket client resilience (go/internal/ws/client.go): better logging, non-blocking read/reconnect behavior and a reconnect helper to re-dial when enabled.
- Add example lifecycle_hooks to config (go/config/unitree_g1_conversation.json5) demonstrating on_startup message handler using elevenlabs.
These changes centralize TTS handling, reduce duplicated code, and add lifecycle hook templating and global hook handling to support safe automated startup/transition messages.
* Export constructors and add docs to runtime/plugins
Export and document several constructors and runtime types, and add brief lifecycle comments across runtime and plugin code. Renamed newModeSetup to NewModeSetup and updated runtime to use the exported constructor; added docs for loadComponents, toRuntimeConfig, buildMeta, addMeta, collectSchemas, mergePrompt, and toolCallsToMaps. Introduced ModeState/ModeManager comments and a NewModeManager constructor doc. In plugin connectors: export and rename connector constructors (NewArmG1ZenohConnector, NewEmotionZenohConnector, NewElevenLabsTTS), add Tick/Stop/no-op lifecycle comments, and simplify arm_g1 connector by removing the customActionMap and sending actions directly. Also removed an extraneous package doc comment in ws/client.go. These changes improve API visibility and add documentation for maintainability; note the behavioral change in arm_g1/zenoh where actions are no longer remapped.
* Refactor orchestrators, plugins, and logger APIs
Introduce clearer constructors, helper functions, and plugin registries across packages and update callers accordingly. Key changes: move logger builder to internal/logger (BuildLogger) and use it from main; rename fuser.New -> NewFuser and hooks.New -> NewHooks and update runtime callers; add doc comments and small API helpers for actions/inputs/backgrounds/llm (Call/Result types, registry Factory types, Register/Load helpers, UnknownPluginError.Error implementations); add orchestration helper methods (runConcurrent/runSequential/runWithDeps, SetSchemas/FunctionSchemas/Reset) and small runtime adjustments. These changes improve naming consistency, visibility of helpers, plugin loading ergonomics, and overall code readability without changing core behavior.
* Add OpenAI helpers and Gemini latency logging
Introduce utility functions and logging to the LLM plugin common code: add parseOpenAIResponse, buildMessages, remarshal helper, and logResponseLatency to capture latency and proxy/upstream headers. Import net/http, time, logger and zap for these features. Export and rename Gemini constructor to NewGemini, update llm registration accordingly, export FunctionSchemas and SetSchemas, and wire request timing + logResponseLatency into Gemini's Call to record response metrics. These changes improve observability and provide small API/visibility refactors for the Gemini LLM integration.
* Add context-aware mode transitions & PortAudio refcount
Introduce context-aware mode transitions and a safe PortAudio reference counter.
- Config: change default mode to "conversation", add an "approaching" mode and a set of transition_rules to unitree_g1_conversation.json5; add go/config/memory to .gitignore.
- Types: extend TransitionRule with ContextConditions to support context-aware transitions.
- ModeManager: subscribe to a Zenoh topic (om/mode/context) for best-effort user context updates, store userContext, add CheckTransitions with ordered checks (time-based, context-aware, input-triggered), and implement helpers to evaluate context conditions and priorities. Add Close and UpdateUserContext helpers.
- Runtime: clone per-mode LLM config before adding metadata and error if no LLM configured; tidy lifecycle/orchestrator comments and ensure manager.Close() on shutdown.
- Audio: add providers/portaudio.go implementing a process-wide reference-counted PortAudio wrapper; integrate it into GoogleASRInput (Acquire/Release, captureDone coordination, safer stream stop/close) to avoid terminating PortAudio while others still use it.
- Misc: add github.com/google/uuid to go.mod; small comment and code cleanups across files.
These changes enable context-driven transitions, prevent concurrent PortAudio termination races, and avoid mutating shared LLM configuration maps.
* Add greeting conversation and IO providers
Introduce greeting conversation state machine, IO provider, and related plumbing for TTS/status publishing and background detection. Adds a ConfidenceCalculator-driven GreetingConversationStateMachineProvider and IOProvider singletons, ElevenLabs greeting connector, TTS text normalization and CDR status serialization, and ApproachingPerson background task. Replaces the old slim providers.go, adds util.ToFloat/FloatFrom and updates manager to use it, increments tick counter in runtime, and registers the new plugins in main/actions. Also includes a small, likely accidental edit to google_asr.go.
* Rename speak action and update connector
Change the agent action in unitree_g1_conversation config from "speak" to "greeting_conversation" (name and llm_label) and switch the connector from "elevenlabs_tts" to "greeting_conversation_elevenlabs" to match the plugin. Also remove an extraneous debug log line in greeting_conversation_elevenlabs.Tick to reduce noisy logging.
* Add face presence provider and refactor zenoh
Introduce a FacePresence provider and sensor: new providers/face_presence.go and inputs/face_presence.go implement fetching /who, snapshot shaping, and a sensor that polls and formats presence lines. Add util.Sleep to support context-aware sleeps. Refactor zenoh session handling: make Open use defaults and local-network discovery (OpenWithOptions/openClient/openDiscovery), add a default endpoint constant, and improve logging/error messages. Update dependent code to the new APIs: AvatarProvider no longer takes an endpoint and callers use providers.Avatar(), various plugins (arm_g1, emotion, greeting_conversation, google_asr, approaching_person) now call zenoh.Open() without endpoint and use util.Sleep where appropriate. Remove an unused local sleep helper and drop ZenohEndpoint config field from greeting_conversation. Overall: new face-presence feature plus cleanup and more robust zenoh session management.
* Use in-process ModeContext for transitions
Introduce a ModeContextProvider singleton as an in-process, best-effort bus for user-context updates that drive context-aware mode transitions. Replace previous Zenoh-based context propagation: ModeManager no longer subscribes to the Zenoh topic and its Close() is simplified; runtime now consumes providers.ModeContext().Updates() and uses a new scheduleTransition helper to enqueue mode changes. Update plugins to publish context updates via providers.ModeContext() (ApproachingPerson and greeting_conversation) and adjust imports/cleanup accordingly. Also add ApproachingPerson to the conversation config and adjust transition rules to target conversation mode; remove a reconnect log line from the websocket client.
* Move Go app to repository root; remove Python src
Large repo reorganization: moved the Go project from the go/ subdirectory into the repository root (Makefile, cmd/, internal/, plugins/, go.mod/sum, etc.), preserving file contents. Removed legacy Python src/ tree, many config JSON5 files and tests, and deleted .gitmodules. Updated .pre-commit-config.yaml and .github/copilot-instructions.md as part of the cleanup. This simplifies the repository layout and centralizes the Go application.
* Migrate CI & Docker to Go build with zenoh-c
Replace Python-centric CI and runtime with a Go-focused build pipeline and zenoh-c integration. GitHub Actions workflows updated to setup-go, cache/download zenoh-c, run make build/test/lint (go vet, golangci-lint, go test) and verify the produced binary; many Python/uv/CycloneDDS steps removed. Dockerfile switched to a multi-stage Go builder image that builds the om1 binary, bundles the zenoh-c shared libs, and provides a slimmer runtime image with a simplified entrypoint. Makefile BUILD_DIR path fixed and docker-compose env/command switched to OM1_CONFIG with updated defaults. .gitignore cleaned to remove Python artifacts and track .zenoh-c and build outputs.
* Swap client/discovery fallback logic
Align OpenWithOptions behavior with the updated LocalNetwork comment: when LocalNetwork is true, try connecting to the local router (openClient) and fall back to discovery; when false, try discovery first and fall back to client connect. Also updated the LocalNetwork doc comment and related warning log messages to reflect the corrected semantics.
* Add function hook registry and hooks
Introduce a function-type hook registry and several lifecycle hook implementations. Added internal/hooks/registry.go to register and look up module.function hooks and extended Runner.execute to run "function" hooks via executeFunction. Implemented greeting hooks (greeting_start_hook, greeting_end_hook) and person-follow hooks (start/switch/stop/set_mode) in internal/hooks, including TTS integration and HTTP interactions with vision/follow services. Extracted ElevenLabs config construction into elevenLabsConfigFrom and switched lifecycle code to use it. Exposed GreetingConversation state accessors (TurnCount, MaxTurnCount) in the providers, and applied small cleanups to plugin comments and status serialization comment removal.
* Migrate LLM plugins to Go (providers + DualLLM) (#2592)
* Support multiple llm plugins
* Add support for dual.go and test cases
* Remove some comments
* Fix lint issues and install portaudio in github workflow
* Update test cases
* Add Prometheus metrics server and record metrics
Introduce internal/metrics package that registers Prometheus histograms and gauges for LLM, ASR and HTTP timing metrics and exposes a /metrics endpoint on :9090 via StartServer. Integrate metrics lifecycle into cmd/main.go (start and graceful shutdown). Instrument Google ASR plugin to record speech duration, utterance-end and final transcript latencies (observeASR helper). Instrument LLM response path to log and emit LLM round-trip latency and HTTP timing metrics (recordResponseLatency and RecordHTTPTiming). Update go.mod to add prometheus client and bump Go version.
* Ignore coverage.out and remove .python-version
Add coverage.out to .gitignore to prevent committing test coverage output. Remove the .python-version file (3.12) to stop pinning the project's Python version in the repository.
* Add TTS time-to-first-audio metrics
Introduce TTS latency metrics to track time-to-first-audio: a histogram (om1_tts_latency_seconds) and a gauge (om1_tts_latency_last_seconds) in internal/metrics with labels model and endpoint. Register the metrics in init and instrument ElevenLabsProvider.synthesize to record the latency from request start to the first audio chunk, updating both the histogram and the last-value gauge.
* Record HTTP timing metrics in ElevenLabs
Add a metrics.RecordHTTPTiming call in synthesize to capture request host, path, method, status and timing-related headers (X-Proxy-Parse-Ms, X-Upstream-Total-Ms, X-Upstream-TTFB-Ms, X-Proxy-Total-Ms) for observability. The call is placed after the response is validated and before streaming the body so proxy/upstream latency details are emitted to metrics without changing streaming behavior.
* Migrate om1 Grafana dashboard to Grafana v13
Convert grafana/dashboards/om1-dashboard.json to the Grafana 13 elements-based schema. Legacy panels, targets and fieldConfig blocks were replaced with elements.panel-* entries using PanelQuery/DataQuery and unified vizConfig (stat, timeseries, heatmap, piechart). Annotations were updated and cursorSync enabled; Prometheus expressions and legend formats were preserved while thresholds, units and display options were refined. Dashboard/visualization versions set to 13.0.1+security-01 for compatibility.
* Remove .typos.toml
Delete the .typos.toml typos-checker configuration file. The removed file contained ignore regexes, custom word mappings (OT, WAAS, ser, asend) and path excludes for src/unitree/ and src/ubtech/.
* Add greeting modes, people TTS & convo history
Add a greeting_conversation.json5 config to define approaching, greeting, and conversation modes with lifecycle hooks and transition rules. Extend the ElevenLabs provider to support per-utterance voice overrides (introduce ttsRequest, change queue type, AddTextWithVoice, and synthesize to accept voiceID). Add a new people-aware ElevenLabs TTS connector (plugins/actions/speak/elevenlabs_people_tts.go) that selects voice IDs based on FacePresence and configured name->voice mappings. Add a ConversationHistory input sensor (plugins/inputs/conversation_history.go) that polls IO for recent ASR lines, maintains a bounded sliding-window of messages, and formats history for LLM prompts.
* Add Kokoro TTS provider and connector
Introduce a new Kokoro TTS provider (internal/providers/kokoro.go) that streams PCM audio from an OpenAI-compatible Kokoro endpoint into a persistent ffplay process. The provider is a singleton with a queued, non-blocking AddText API, handles ffplay lifecycle, records TTS/HTTP metrics, and exposes configurable defaults (base URL, voice/model, output format, rate). Add a speak connector (plugins/actions/speak/kokoro_tts.go) to enqueue text for playback with optional silence-rate handling and configuration parsing. Also remove the legacy speak/elevenlabs_people_tts.go connector (voice-per-person logic) as part of this change.
* Extract Google ASR common code, add RTSP input
Refactor Google ASR implementation by extracting shared websocket/packaging/metrics/zenoh logic into a new google_asr_common.go (googleASRCommon). The existing microphone-based GoogleASR sensor now embeds and reuses the common code, simplifying google_asr.go. A new GoogleASRRTSPInput (google_asr_rtsp.go) was added to stream PCM from an RTSP source (via ffmpeg) and forward chunks through the shared common code. This removes duplicated functionality, centralizes websocket/zenoh handling and statistics, and enables multiple audio sources (mic and RTSP) to reuse the same ASR plumbing.
* Add doc comment for GoogleASRConfig
Add a brief comment above the GoogleASRConfig type to explain that it configures the local microphone-sourced Google ASR sensor. Improves code readability and intent for future maintainers.
* Add knowledgebase support with HNSW embeddings
Introduce a knowledgebase package that provides an HTTP embedder and an HNSW-backed KnowledgeBase (embedding + nearest-neighbor search) for RAG. Wire KB into the runtime: new KBSpec fields, runtime initializes the KB when configured, and fuser now accepts a logger and queries the KB using the latest voice input (with warnings logged on failures). Add on-disk KB artifacts (graph/json), remove legacy FAISS/PKL, and update go.mod/go.sum with required dependencies for coder/hnsw and related packages. Also include a small zenoh log message tweak and conversation config defaults for the KB (base URL, name, min_score).
* Remove redundant comment about bufio.Reader
Delete a two-line comment in loadGraph that explained wrapping *os.File with bufio.NewReader for hnsw.Import. The code behavior is unchanged — graph.Import still uses bufio.NewReader(f) — the comment was considered unnecessary.
* Add LLM-driven greeting hook and config fixes
Introduce LLM-based greeting generation for the greeting_start_hook: add prompt/defaults, generateGreeting and greetingLLM helpers, and a static fallback when LLM or face snapshot fails. Update greeting hook to log results and use TTS provider accordingly, and import time and the internal llm package. Also load lifecycle hook metadata into HandlerConfig during mode setup, and fix initializeMode to pass the correct modeConfig.cfg.LifecycleHooks to hooks.NewHooks. These changes improve greeting naturalness and ensure lifecycle hooks are initialized with meta.
* Add tracing, sensor tick opt-in, and audio warmup
Introduce a lightweight tracing facility and related runtime changes; add optional sensor tick behavior; and prime ElevenLabs audio on startup.
- Add providers.Tracer (singleton) which writes JSONL traces to traces/YYYY-MM-DD.jsonl, supports Enable/Disable/Gauge/SetGeneration and daily rotation.
- Wire tracer into Runtime: Runtime now holds a tracer, mode setup propagates SystemConfig.UseTracer to RuntimeConfig, runtime enables tracer for modes that opt in, increments generation per cortex loop, and records LLM prompt + structured tool-call output each tick. Tracer is stopped on runtime shutdown.
- Add traceOutput helper to convert llm.Response tool calls to maps used for tracing.
- Change cortex loop timing to use a per-iteration timer (allows immediate wake/reset semantics) and track a generation integer for tracing and logging.
- Add TickTrigger interface and triggersTick helper so sensors can opt in to waking the cortex loop; Orchestrator.runSensor now checks triggersTick before signaling TickNow. google_asr_common now implements TriggersTick() and returns true.
- ElevenLabsProvider: add warmupSilenceMs constant and warmUp() to prime audio device by playing a short silence burst (called before processing queue), plus minor playback flow adjustments.
These changes improve observability of LLM interactions, avoid unnecessary cortex wake-ups by non-triggering sensors, and reduce first-play audio glitches by warming up the audio device.
* Rename OM1_CONFIG to OM1_COMMAND and tidy Dockerfiles
Update Dockerfile and docker-compose.yml to use OM1_COMMAND instead of OM1_CONFIG: the entrypoint now reads OM1_COMMAND (and passes it as -config to the binary), and docker-compose exposes OM1_COMMAND as the env var and removes the redundant command override. Also remove some Dockerfile comments and the mDNS nsswitch modification to simplify the image. Note: deployments should update any references to OM1_CONFIG to OM1_COMMAND to keep runtime behavior unchanged.
* Update base image and add PortAudio deps
Bump the build base image from golang:1.22-bookworm to golang:1.26-bookworm and install PortAudio packages. Added portaudio19-dev to the builder stage so native PortAudio bindings can be compiled, and added libportaudio2 to the runtime stage to provide the PortAudio runtime library.
* Add ElevenLabs People TTS connector
Introduce ElevenLabsPeopleConnector (plugins/actions/speak/elevenlabs_people_tts.go). Registers the action "speak/elevenlabs_people_tts" and extends ElevenLabsConnector to choose a voice based on the closest person detected by the FacePresence input (parses "Closest: <Name>"). Configuration supports "voice_id" (default) and "voice_ids" (person->voice map). Honors existing silence_rate throttling and enqueues text for ElevenLabs, falling back to the default voice when no matching person is found.
* update config prompt for microsoft build (#2598)
* update config prompt for microsoft build
* add contect about microsoft build in the prompt
* update prompt
* add a new config
* Trim trailing spaces in Microsoft Build greeting
Remove stray trailing space characters from two identical lines in config/greeting_microsoft_build.json5 (Microsoft Build event description). This is a formatting cleanup only and does not change content or behavior.
* Add knowledge base Prometheus metrics and dashboard panels (#2599)
* Add knowledge base prometheus metrics and update grafana dashboard
* Shorten RecordKBQuery comment
Replace the verbose multi-line comment for RecordKBQuery with a concise one-line summary. Removed parameter-level details about embedSeconds/querySeconds and embedding-step behavior; no functional code changes.
---------
Co-authored-by: openmindev <147775420+openminddev@users.noreply.github.com>
* Remove CycloneDDS config and env/volume
Delete cyclonedds/cyclonedds.xml and remove the related Docker Compose settings: ROS_DOMAIN_ID, CYCLONEDDS_HOME, CYCLONEDDS_URI, and CMAKE_PREFIX_PATH environment variables, as well as the ./cyclonedds volume mount. Cleans up unused CycloneDDS configuration and its mounting from the compose service.
* Remove config schema generator workflow and script
Delete the GitHub Actions workflow and Python script that generated and uploaded the OM1 configuration schema. Removed .github/workflows/generate-config-schema.yml (CI job that ran scripts/generate_schema.py, uploaded to S3 and invalidated CloudFront) and scripts/generate_schema.py (AST-based scanner that produced OM1_config_schema.json5 from inputs, LLMs, backgrounds, actions, hooks, and transition rules). This cleans up the repo by removing the automated schema generation pipeline.
* Add extensive unit tests across packages
Add a broad set of unit tests to improve coverage and validate behavior across the codebase. Tests added for internal packages: actions (loading, orchestrator, schema), backgrounds, config loader, fuser, hooks (runner and registry), httpclient, inputs (orchestrator and sensors), knowledgebase (embedding), llm (adapter and orchestrator), logger, metrics, and providers (face presence). Also update several plugin test files for LLM adapters. go.mod updated to include github.com/kylelemons/godebug as an indirect dependency. These tests exercise orchestration, plugin loading, schema generation, config parsing, background loops, input handling, KB integration, metrics recording, and error/panic recovery behavior.
* Cleanup: ignore Close errors and remove unused field
Small cleanup changes across tests and knowledgebase code: remove an unused `internal` field from moveInput, replace direct defer Close() calls with explicit ignores (e.g. `defer func() { _ = f.Close() }()`) to satisfy linters/static checks, and drop an obsolete test comment in qwen_test. These are non-functional cleanup changes to silence warnings and tidy tests.
* Add ResetUserContext and reset on mode init
Introduce ModeManager.ResetUserContext to clear the userContext map under mutex protection. Call ResetUserContext from Runtime.initializeMode to ensure per-user context is reset when a mode is initialized. Also remove a couple of now-unnecessary comments in manager.go.
* Migrate ElevenLabs ASR to Go (mic + RTSP inputs) (#2597)
* Migrate ElevenLabs ASR to Go (mic + RTSP inputs)
Port the ElevenLabs ASR feature from the legacy Python plugins to Go,
mirroring the existing Google ASR triple (common + mic + rtsp):
- ElevenLabsASRInput: local microphone capture via PortAudio
- ElevenLabsASRRTSPInput: RTSP audio decoded via ffmpeg
- elevenlabsASRCommon: shared websocket streaming, transcript buffering,
latency metrics, and zenoh broadcast on om/asr/text
Faithful to the Python original: ElevenLabs short language codes
(auto/en/zh/...), partial->speech-start / committed->final-transcript
protocol, CJK-aware length filter (rune count, not byte count), and
om1_asr_latency metrics labeled model="elevenlabs".
Reuses the package-level helpers shared with Google ASR (AudioMetadata,
ASRMessage, ASRStatistics, cjkRegex, serializeASRText) so the working
Google path is untouched.
Adds unit tests (testify/require) covering transcript acceptance,
language mapping, audio packaging wire format, websocket message
handling, and buffer flush semantics.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Fix issues
* Merge asr_common
* Fix test cases
* Refine comments and document ASR wire layout
Clean up and clarify comments across multiple files: remove Python-centric references and streamline wording in background orchestrator, avatar CDR comments, greeting conversation, Kokoro TTS, approaching person, and face presence. Explicitly document the CDR wire layout for ASRText in plugins/inputs/asr_common.go and clarify ModeManager highestPriorityTarget behavior (ties preserve config order). These are documentation/comment-only changes and do not alter runtime behavior.
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: openmindev <147775420+openminddev@users.noreply.github.com>
* Add integration tests and testdata
Add an integration test suite and supporting fixtures for end-to-end runtime verification. Introduces test/integration/*.go (harness, config runner, and a fuzz driver) that provide a test harness with mock inputs, a scripted LLM, and a recording action connector; helpers run the runtime and assert recorded actions. Adds multiple JSON5 test cases under test/integration/testdata/test_cases and a fuzz case used by the fuzz driver exercising mode transitions, time-based/context-aware/input-triggered transitions, multi-action outputs, and negative/no-op scenarios. Also updates the Makefile help text and adds a test-integration target that runs the integration tests with the integration build tag (and ensures zenoh-c is downloaded before running).
* Remove URID from TestBuildMeta expectations
Update TestBuildMeta to reflect that SystemConfig and buildMeta no longer include URID. The test setup no longer provides URID and the expected meta map removes the "URID" key to match current behavior.
* Refactor TTS into internal/providers/tts with interrupt handling (#2601)
* Refactor TTS into internal/providers/tts
Move ElevenLabs and Kokoro TTS implementations into a new internal/providers/tts package and introduce a shared ttsBase/player to consolidate common playback logic. Deleted the old providers elevenlabs.go and kokoro.go, renamed tts state files and added an Interrupt atomic flag. Updated imports and callers across hooks, plugins (greeting_conversation, speak), and input sensors to reference the new tts package and to support interrupting TTS when user speech is detected. This centralizes TTS behavior, reduces duplication, and adds TTS interrupt handling for inputs.
* TTS: add generation-based interrupt handling
Introduce a generation counter to TTS to avoid playing utterances queued before an interrupt. Changes:
- Add generation atomic counter and RequestInterrupt() which increments generation and sets Interrupt.
- Add generation field to ttsRequest and stamp new requests with generation.Load() when enqueuing.
- Skip processing queued requests whose generation is older than the current generation in processAudio(), instead of draining the whole queue.
- Remove the previous drainQueue implementation.
- Update ASR input (asr_common) to call tts.RequestInterrupt() when TTS should be interrupted.
This prevents stale queued utterances from playing after an interrupt (e.g., user speech) while keeping the queue intact for newer requests.
* Log websocket message payloads as text or bytes
Enhance ws client logging in readLoop: instead of logging only the byte length, log the actual payload. Text messages are logged as strings, and non-text messages are logged as byte slices (using zap.ByteString). Message type field and onMessage behavior are unchanged; this improves debugging visibility of incoming payloads.
* Remove Microsoft greeting config; update Unitree
Delete the Microsoft Build greeting config and simplify environment example. Removed config/greeting_microsoft_build.json5 and trimmed .env.example (removed several unused integration placeholders and URID). Updated config/unitree_g1_conversation.json5: changed default agent_name from Pam to Iris, removed the approaching mode, expanded allowed physical actions, enabled TTS interrupt for ASR input, replaced the greeting_conversation action with a speak action using elevenlabs_tts (added voice_id), cleared lifecycle hooks, and reset transition_rules to an empty list. These changes streamline the conversation config and remove event-specific settings.
* Add new arm action enum values
Extend ArmAction.EnumValues with additional gesture/action identifiers to support more arm behaviors: come_closer, flexible, hands_up, heart, push, rotate_hands, salute, shrug, speack_action_extended, and speak_action. This expands the set of supported actions for the ARM G1 plugin (note: speack_action_extended retains the existing spelling from the diff).
* Handle interrupts while finishing ffplay
Add interrupt handling during TTS playback teardown. After synthesis, processAudio now checks for an Interrupt and calls handleInterrupt(). finishPlayback was rewritten to poll (50ms) for global Interrupts while waiting for ffplay, killing the ffplay process early if interrupted and preserving the existing 10s timeout. Also ensure p.ffplay is cleared on exit. This makes external interrupts terminate ffplay promptly and avoids leaving the process reference set.
* Add model param and escape API key in Google ASR
Expose an optional `model` config for Google ASR (regular and RTSP) and include it as a query parameter on the WebSocket URL when set. Also URL-escape the `api_key` when constructing the WS endpoint to avoid malformed URLs. Updated the API docs to list the new `model` parameter.
* Require ASR transcripts to have 3+ words
Tighten acceptASRTranscript heuristic: change the non-CJK minimum word count from >1 to >2 (i.e., require at least three words). This reduces short or ambiguous transcripts being accepted and cuts down on noise; CJK handling remains unchanged.
* Update Unitree G1 identity, gestures, and TTS
Change default robot name from Pam to Iris and streamline identity text. Expand allowed physical actions and add detailed behavior guidelines/gesture mappings to improve interaction (e.g., shake_hand, face_wave, salute, heart, shrug, come_closer, rotate_hand, flexible, speak_action(_extended), etc.). Add a new agent input entry (arm_g1, llm_label: robot_action, connector: zenoh). Add "model: \"long\"" to the TTS configuration. Also simplify some conversation prompt content and remove prior event/company-specific guidance.
* Migrate documentation to Golang (#2596)
* update readme for Golang migration - initial commit
* update contribution.md
* update config.md
* update input.md
* update introduction.md
* update new_mode.md for Golang migration
* update example docs files for Golang migration
* update intro and get started docs for Golang migration
* update config and input docs for Golang migration
* update llm and action docs for Golang migration
* update project structure docs for Golang migration
* update trouble shooting guide for Golang migration
* update remaining docs for Golang migration
* updated docs
* updated make commands and steps in readme and getting started
* Adjust ASR tests for three-word transcript
Update ASR unit tests to expect a three-word transcript "hello there world" instead of the previous two-word string. In plugins/inputs/elevenlabs_asr_test.go: add a "three english words" case (accepted) and change the "two english words" case to be rejected; update the committed message assertion to match the new three-word transcript. In plugins/inputs/google_asr_test.go: update the expected parsed reply to "hello there world" and ensure speech timing state is validated accordingly. These changes align test expectations with updated transcript handling logic.
* Add GitHub Actions binary release workflow
Add a new .github/workflows/binary-release.yml workflow to build and publish OM1 binaries. The workflow supports workflow_dispatch inputs (version, publish) and automatic behavior for tag pushes and nightly builds, sets ZENOH_C_VERSION, caches zenoh-c, and builds for linux-amd64, linux-arm64, darwin-arm64, darwin-amd64 and windows-amd64. Artifacts are packaged (tar.gz on Unix, zip on Windows) with bundled zenoh-c libraries and adjusted rpaths (patchelf / install_name_tool), checksums are generated, and artifacts are uploaded. When publish is enabled the job flattens artifacts, manages a nightly tag, and creates/updates a GitHub Release with the built files.
* Update CI push trigger to 'go' branch
Change the GitHub Actions binary-release workflow to trigger on pushes to the 'go' branch instead of 'main'. Tag-based triggers and manual workflow_dispatch remain unchanged. Aligns the release workflow with the repository's branch naming.
* Switch macOS runner and remove Windows build job
Update .github/workflows/binary-release.yml to use the macos-15-intel runner for the darwin-amd64 job (was macos-13). Remove the entire Windows build job (build-windows) including MSYS2 setup, zenoh-c download, build and packaging steps, and adjust the release job dependencies to no longer require build-windows.
* Update nightly release name
Change the release name used by softprops/action-gh-release for nightly builds from 'Nightly (latest main)' to 'Development Build' in .github/workflows/binary-release.yml. No other workflow behavior was modified.
* Format README note and link MIT license
Update README.md for clarity and readability: convert the plain Note line into a markdown admonition ([!NOTE]) for consistent styling, and replace repeated plain-text MIT License mentions with links to the project's LICENSE file (./LICENSE) to make it easier for readers to view the full license.
* Condense README license section
Replace a verbose, repetitive paragraph describing the MIT License with a concise single-line reference to the LICENSE file in README.md to reduce verbosity and duplication.
* Add release environment and summary step
Set job environment name and url based on the release version (nightly -> development, otherwise production) and wire the release URL to the environment. Add an id to the GitHub release step so its outputs can be referenced. Append a new step that writes a formatted release summary to $GITHUB_STEP_SUMMARY including release URL, channel, commit SHA and listed release assets.
* Use 'staging' env for nightly releases
Update .github/workflows/binary-release.yml to set environment.name to 'staging' when needs.setup.outputs.version == 'nightly' (previously 'development'), falling back to 'production' otherwise. This routes nightly release artifacts to the staging environment.
* update readme to includ go and python differences
* update readme
* Add parallel asr (#2604)
* Migrate Riva to Golang
* Add parallel ASR
* Remove testing functions
* Fix merge conflicts
* Shorten comments
* Run make fmt
* Optimize ASR folder structure
* Rename ASR model to provider and refactor aggregator to sensor core (#2606)
* Rename ASR model->provider and refactor aggregator
Replace the 'Model' identifier with 'Provider' across ASR configs, streams, metrics, logs, and tests. Refactor asrAggregator into asrSensorCore (renaming constructor/newAggregator to newSensorCore and updating receivers and methods). Update transcriberStream to use provider and adjust onTranscript signature, metric labels, logging keys, and parallel-ASR dedup state (lastModel -> lastProvider). Also include minor comment and formatting tweaks.
* Add VSCode Go settings and stop ignoring .vscode
Remove .vscode from .gitignore and add .vscode/settings.json to commit VS Code Go configuration. The new settings configure go test flags (-p 8, -v), set CGO include/lib paths and runtime library paths to the local .zenoh-c directory, and enable the "integration" build tag so VS Code can build and run integration tests that rely on the native zenoh-c library.
---------
Co-authored-by: openminddev <147775420+openminddev@users.noreply.github.com>
* Update README (#2608)
* Migrate Riva to Golang
* Add parallel ASR
* Remove testing functions
* Fix merge conflicts
* Shorten comments
* Run make fmt
* Optimize ASR folder structure
* Rename ASR model->provider and refactor aggregator
Replace the 'Model' identifier with 'Provider' across ASR configs, streams, metrics, logs, and tests. Refactor asrAggregator into asrSensorCore (renaming constructor/newAggregator to newSensorCore and updating receivers and methods). Update transcriberStream to use provider and adjust onTranscript signature, metric labels, logging keys, and parallel-ASR dedup state (lastModel -> lastProvider). Also include minor comment and formatting tweaks.
* Add VSCode Go settings and stop ignoring .vscode
Remove .vscode from .gitignore and add .vscode/settings.json to commit VS Code Go configuration. The new settings configure go test flags (-p 8, -v), set CGO include/lib paths and runtime library paths to the local .zenoh-c directory, and enable the "integration" build tag so VS Code can build and run integration tests that rely on the native zenoh-c library.
* Add Go vs Python feature comparison to README
Introduce a new "Go vs. Python Feature Comparison" section in README.md that documents current parity between the Go and Python runtimes. Adds a capabilities table showing which features are available or under development in Go (hardware connectors, VLMs, sensors, messaging, simulators, full autonomy, etc.), plus a note recommending the Python runtime for features still marked as under development and links to the Python runtime and contributing guidance.
---------
Co-authored-by: Shicai He <94800998+shicaih@users.noreply.github.com>
* Add VLM video stream provider and utils
Introduce a new internal/providers/vlm package that implements video capture and utilities.
Adds Frame (with custom JSON/base64 marshaling), streamBase (lifecycle, buffering, drop counting), and helpers splitJPEGStream and jpegQScale. Implements VideoStream (camera capture via ffmpeg) and VideoRTSPStream (RTSP capture with reconnect logic and ffmpeg arg builders). Adds video device enumeration for Linux and macOS (avfoundation) and unit tests covering JPEG splitting, qscale mapping, Frame JSON, stream lifecycle, and defaults.
This enables capturing MJPEG frames from local devices or RTSP sources and provides safe, buffered delivery to consumers.
* update readme
* Add VLM input plugin and response latency metrics
Introduce a Visual Language Model (VLM) input plugin (camera + RTSP) with a vision client, sensor implementation, Gemini defaults and unit tests; register the plugin in inputs. Add Prometheus VLM metrics and a generic RecordResponseLatency helper in internal/metrics and switch existing OpenAI-compatible LLM providers to use it (removing duplicated latency/logging code). Improve video capture handling: run-restart loop for VideoStream, lower default JPEG quality, add camera retry delay and util.Sleep usage, and minor RTSP stream cleanup. Update README and conversation config to surface VLMGemini support.
* Update README VLM support status
Clarify Visual Language Models (VLM) support in the README: update the table note to indicate OpenAI and Gemini VLMs are supported (removing the previous note about lack of Go support). This aligns the documentation with current capabilities.
* Add VLM describer, latest frame cache & greeting
Introduce an internal VLM describer (internal/providers/vlm) to call vision chat-completions with optional image attachment and record metrics. Add a singleton LatestFrame provider (internal/providers/latest_frame.go) with tests to store and retrieve the most-recent JPEG frame and a freshness check. Integrate vision-based greeting into the greeting hook: attempt a vision describe using the latest frame (with fall back to text-only LLM), and add related defaults and helpers. Add util.FirstNonEmpty and its tests, refactor the vlm input to use the new describer and to populate the LatestFrame, and remove the old vlm client implementation.
* Remove DecodeFormat and add RTSPURL default
Remove the DecodeFormat field and its default from the VideoRTSPStream implementation and tests (internal/providers/vlm). Update VideoRTSPStreamConfig and NewVideoRTSPStream to no longer handle decode format. Add a plugin-level default RTSP URL and ensure NewRTSPSensor fills cfg.RTSPURL when empty (plugins/inputs/vlm), and adjust the constructor call to match the simplified config. Update tests to drop the DecodeFormat assertion.
* Track pending TTS requests and wait for playback
Add an atomic pending counter and Busy() helper so Busy reflects queued-but-unplayed speech. Increment pending when requests are enqueued and decrement when handled. Refactor player loop into handleRequest to centralize synthesis/playback, ensure Speaking is set/cleared reliably, handle pre-roll silence, errors and interrupts, and require ffplay availability. In greeting_conversation: introduce ttsPollInterval, use tts.Busy() instead of fragile flags, remove pendingFinishedUpdate, add switchWhenTTSDone to wait (with timeout) for TTS to drain before switching modes, and simplify waitingOnTTS logic. These changes prevent mode switches while queued TTS remains and add a timeout to avoid stalling.
* Add GreetingStatus sensor and final-turn guidance
Add a new GreetingStatus input sensor and expose final-turn guidance from the greeting state machine.
- config/greeting_conversation.json5: register the new GreetingStatus input in the conversation config.
- internal/providers/greeting_conversation_state.go: add finalTurnGuidance constant and an EndingGuidance() method (mutex-protected) that returns guidance when the conversation is concluding or about to hit max turns.
- plugins/inputs/greeting_status.go: new sensor that registers as "GreetingStatus", retrieves the GreetingConversationStateMachineProvider, and exposes the EndingGuidance via FormattedLatestBuffer(). Other sensor methods are present as no-ops/stand-ins.
This enables the runtime to surface a short LLM guidance for the final exchange so the assistant can produce a brief, warm goodbye and mark the conversation as finished.
* Add VLMGeminiRTSP and adjust ending logic
Register VLMGeminiRTSP in the greeting conversation config (added to two component lists). Adjust EndingGuidance in the state machine: update the comment, remove special-case checks for finished/concluding states, and change the trigger condition to only return finalTurnGuidance when turnCount+1 > maxTurnCount (tightens the ending logic/addresses an off-by-one and redundant state handling).
* Remove three persona IDs from greeting config
Delete rubail, samantha, and david persona ID entries from config/greeting_conversation.json5. This cleans up the persona mapping in the greeting conversation configuration by removing these (presumably obsolete or unused) entries.
* Add VLMGeminiRTSP to unitree convo config
Update config/unitree_g1_conversation.json5: remove the `model: "long"` line from the existing block and append a new object `{ type: "VLMGeminiRTSP" }` to the handlers array. This enables integration of a Gemini RTSP VLM entry in the conversation configuration.
* Add go2 odom zenoh provider and CDR helpers
Introduce a Unitree GO2 odometry Zenoh provider and tests, plus CDR (de)serialization helpers. Adds internal/providers/unitree/go2/odom_zenoh.go implementing OdomZenohProvider that subscribes to a PoseStamped topic, decodes CDR-encoded payloads, computes pose/yaw/movement/body state and exposes a Position snapshot; and internal/providers/unitree/go2/odom_zenoh_test.go with unit tests for deserialization, quaternion->euler and odom processing. Also extends internal/zenoh/cdr.go with ReadFloat64LE and AppendFloat64LE utilities and clarifies AppendCDRString behavior.
* Add Unitree Go2 odom sensor and format tweaks
Register and implement a new UnitreeGo2Odom input sensor (with tests) that reads odometry from the unitree/go2 provider and produces human-readable messages. Update plugins/inputs to import the unitree/go2 plugin so it is built. Standardize message formatting across inputs: conversation_history, face_presence, and vlm now include the descriptor and use a consistent quoted format; tests adjusted accordingly. Minor whitespace/clarity tweaks in internal/providers/unitree/go2/odom_zenoh.go (no behavior changes).
* Add zenoh probe tool and fix go2 odom decoding
Add a new zenoh diagnostic tool and Makefile target: introduce cmd/zenohprobe (standalone probe for subscribing to zenoh keys, hex-dumping payloads and optionally decoding PoseStamped) and expose it via a new `make probe` target.
Update internal providers/unitree/go2/odom_zenoh.go: change default odom topic to "odom", add a rate-limited debug log (15s) to reduce spamming, and improve CDR decoding for the Go2 odometry payload by accounting for child_frame_id and updating comments to reflect nav_msgs/Odometry layout. Minor safety checks and alignment handling included.
* Update go2 odom logs: add z and body_attitude
Remove the verbose Info log emitted when movement is detected to reduce log noise, and add the z coordinate and body_attitude field to the periodic debug log so diagnostics include full pose information.
* Add paths provider and Paths sensor (#2611)
Introduce a PathsProvider that subscribes to the om/paths topic via zenoh, deserializes CDR-encoded Paths messages, classifies path indices into movement options (turn left, move forwards, turn right, move back) and generates a natural-language assessment string. Expose methods to retrieve the latest lidar string, valid paths and movement options, and add graceful stop/cleanup. Add a PathsSensor plugin that registers as "Paths", polls the provider on a fixed cadence, converts raw assessments into timestamped messages, maintains a bounded history and formats the latest buffer for prompts. Unit tests cover payload deserialization and movement-string generation.
* Add Unitree Go2 config and update Paths comments
Add a new autonomy config for Unitree Go2 (config/unitree_go2_autonomy.json5) defining a 'Bits' agent: system prompts, inputs (Unitree odom, Google ASR, VLM Gemini RTSP, Paths), Gemini LLM settings, and actions for TTS and emotion. Also refine comments in plugins/inputs/paths.go to clarify RawToText now documents appending assessments to the message buffer and FormattedLatestBuffer description is simplified to state it returns the most recent buffered assessment.
* Add Unitree Go2 autonomy plugin and rename arm connector (#2612)
* Add Unitree Go2 autonomy and rename arm connector
Introduce a new Unitree Go2 autonomy plugin and related movement primitives, and standardize connector naming/logging.
- Add a full autonomy connector for Unitree Go2 (plugins/actions/unitree/go2/autonomy): movement planning, cmd_vel serialization, AI status request/response, guard watcher, and unit tests.
- Register and wire the new autonomy and unitree arm plugins in plugins/actions/actions.go.
- Rename/move the G1 arm connector from arm_g1 to unitree_g1_arm (including tests) and update config references in greeting/unitree configs.
- Add PathsProvider Movement() and Movement struct (internal/providers/paths.go) to expose derived path options programmatically.
- Add util.StringFrom helper (internal/util/convert.go) for safe string extraction from decoded JSON.
- Standardize logger contexts (logger.Named) and clean up log message prefixes across multiple connectors/background tasks.
These changes add autonomous movement capabilities, improve APIs for path/movement data, and align connector names used in configs and registration.
* dualLLM: add logger fallback and use it
Add a logger() helper to dualLLM that returns the existing logger or a default named "DualLLM". Replace direct d.log references with d.logger() at the sub-call failure, race completion, and quality-evaluation fallback sites to avoid nil logger panics and ensure consistent logger naming.
* Rename move_go2_autonomy to unitree_go2_autonomy
Update config/unitree_go2_autonomy.json5: changed the action entry name from "move_go2_autonomy" to "unitree_go2_autonomy" to align the config entry with the file/connector naming and avoid naming inconsistencies.
* Add stall detection and progress tracking for moves
Introduce progress-based convergence checks and timeouts for move commands. Shorten tick interval (100ms -> 50ms) and add stallTimeout, commandTimeout, turnProgressEps and driveProgressEps constants. Extend moveCommand with timing/progress fields (started, lastImprove, bestGap) and add markPhase/recordProgress helpers. Initialize progress tracking in queue and replace legacy movementAttempts/gapPrevious counters with recordProgress checks in tickTurn and tickDrive, aborting on stall or command timeout. Adjust turn/drive logic accordingly and remove obsolete counters from moveConnector.
* Remove redundant comment in move.go
Delete an unnecessary comment in plugins/actions/unitree/go2/autonomy/move.go that described an "overshoot" nudge; no functional changes were made to the logic, just cleaned up the source for clarity.
* Add Cloud Sim support (#2613)
* Add Unitree Go2 autonomy and rename arm connector
Introduce a new Unitree Go2 autonomy plugin and related movement primitives, and standardize connector naming/logging.
- Add a full autonomy connector for Unitree Go2 (plugins/actions/unitree/go2/autonomy): movement planning, cmd_vel serialization, AI status request/response, guard watcher, and unit tests.
- Register and wire the new autonomy and unitree arm plugins in plugins/actions/actions.go.
- Rename/move the G1 arm connector from arm_g1 to unitree_g1_arm (including tests) and update config references in greeting/unitree configs.
- Add PathsProvider Movement() and Movement struct (internal/providers/paths.go) to expose derived path options programmatically.
- Add util.StringFrom helper (internal/util/convert.go) for safe string extraction from decoded JSON.
- Standardize logger contexts (logger.Named) and clean up log message prefixes across multiple connectors/background tasks.
These changes add autonomous movement capabilities, improve APIs for path/movement data, and align connector names used in configs and registration.
* dualLLM: add logger fallback and use it
Add a logger() helper to dualLLM that returns the existing logger or a default named "DualLLM". Replace direct d.log references with d.logger() at the sub-call failure, race completion, and quality-evaluation fallback sites to avoid nil logger panics and ensure consistent logger naming.
* Rename move_go2_autonomy to unitree_go2_autonomy
Update config/unitree_go2_autonomy.json5: changed the action entry name from "move_go2_autonomy" to "unitree_go2_autonomy" to align the config entry with the file/connector naming and avoid naming inconsistencies.
* Add stall detection and progress tracking for moves
Introduce progress-based convergence checks and timeouts for move commands. Shorten tick interval (100ms -> 50ms) and add stallTimeout, commandTimeout, turnProgressEps and driveProgressEps constants. Extend moveCommand with timing/progress fields (started, lastImprove, bestGap) and add markPhase/recordProgress helpers. Initialize progress tracking in queue and replace legacy movementAttempts/gapPrevious counters with recordProgress checks in tickTurn and tickDrive, aborting on stall or command timeout. Adjust turn/drive logic accordingly and remove obsolete counters from moveConnector.
* Remove redundant comment in move.go
Delete an unnecessary comment in plugins/actions/unitree/go2/autonomy/move.go that described an "overshoot" nudge; no functional changes were made to the logic, just cleaned up the source for clarity.
* Add cloud session and hybrid zenoh backend
Introduce a cloudsession package (client, session, topics) implementing a WebSocket-based cloud broker client with binary/JSON frames and tests. Add zenoh backend abstractions: a Session interface, concrete local zenoh backend, cloud backend adapter, and a hybrid session that routes IsCloudTopic topics to the cloud broker. Wire default zenoh options via SetDefaultOptions and use SystemConfig.UseSim/APIKey in runtime to enable the hybrid/cloud mode. Update various providers and plugins to depend on the zenoh Session/Publisher/Subscriber interfaces instead of concrete types.
* Generalize meta map and include UseSim flag
Change buildMeta to return map[string]any (instead of map[string]string) and update its comment to reference system-level values. Add the "use_sim" boolean to the metadata when SystemConfig.UseSim is true. Update addMeta to accept meta as map[string]any and adjust tests to the new type, adding a test that verifies use_sim is omitted when false.
* Add simulator note and use_sim config
Update README to consolidate simulator status: add a Simulators row noting Gazebo/Isaac Sim support and two Zenoh backend types, and remove the duplicate/older entry. Add use_sim setting to unitree_go2_autonomy.json5 (env var USE_SIM, default false) so the runtime can be toggled to use simulators.
* Add Unitree Go2 location action connector
Introduce a new actions connector for Unitree Go2 that saves/records the robot's current location via an HTTP POST to a map/orchestrator endpoint and announces success via ElevenLabs TTS. Adds configuration parsing with sensible defaults (base URL, map name, TTS defaults, timeout) and a Connector implementation (Connect, Tick, Stop). Registers the interface and the action in plugins/actions/actions.go and include…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce a PathsProvider that subscribes to the om/paths topic via zenoh, deserializes CDR-encoded Paths messages, classifies path indices into movement options (turn left, move forwards, turn right, move back) and generates a natural-language assessment string. Expose methods to retrieve the latest lidar string, valid paths and movement options, and add graceful stop/cleanup. Add a PathsSensor plugin that registers as "Paths", polls the provider on a fixed cadence, converts raw assessments into timestamped messages, maintains a bounded history and formats the latest buffer for prompts. Unit tests cover payload deserialization and movement-string generation.