fix: Enhance STT provider management and configuration in setup utilities by 0xrushi · Pull Request #320 · SimpleOpenSoftware/chronicle

0xrushi · 2026-02-28T10:28:18Z

Introduced functions to load and retrieve STT provider details from a new YAML catalog, improving the setup wizard's flexibility.
Updated wizard.py to utilize the new catalog for provider selection, enhancing user experience during transcription provider setup.
Refactored existing provider mapping logic to streamline the process and improve maintainability.
Added a new configuration file stt-providers-catalog.yml to define STT providers, their capabilities, and setup instructions, ensuring a centralized management approach.
Enhanced error handling and logging for better visibility during provider selection and configuration processes.

Summary by CodeRabbit

New Features
- Added dynamic speech-to-text provider catalog system supporting deepgram, parakeet, vibevoice, qwen3-asr, and smallest.
- Setup wizard now features provider-specific configuration guidance and auto-detection capabilities.
- Introduced support for streaming and local ASR features across providers with customizable environment variables.
Tests
- Updated test utilities to support catalog-based provider lookups.

- Introduced functions to load and retrieve STT provider details from a new YAML catalog, improving the setup wizard's flexibility. - Updated `wizard.py` to utilize the new catalog for provider selection, enhancing user experience during transcription provider setup. - Refactored existing provider mapping logic to streamline the process and improve maintainability. - Added a new configuration file `stt-providers-catalog.yml` to define STT providers, their capabilities, and setup instructions, ensuring a centralized management approach. - Enhanced error handling and logging for better visibility during provider selection and configuration processes.

coderabbitai · 2026-02-28T10:28:24Z

📝 Walkthrough

Walkthrough

Introduces a catalog-based system for speech-to-text providers, replacing hardcoded provider configurations with dynamic, YAML-driven catalog entries. Adds utility functions to load and query provider metadata, and updates the setup wizard to derive provider choices, defaults, and configuration options from the catalog.

Changes

Cohort / File(s)	Summary
Catalog Infrastructure `config/stt-providers-catalog.yml`	New YAML catalog defining five STT providers (deepgram, parakeet, vibevoice, qwen3-asr, smallest) with metadata including display names, capabilities, environment variables, defaults, and setup notes.
Utility Functions `setup_utils.py`	Three new functions added: `load_stt_provider_catalog()` reads the YAML catalog, `get_stt_provider_by_id()` queries providers by ID, and `read_config_yml()` reads the config file. Functions are duplicated in the diff.
Wizard Provider Integration `wizard.py`	Replaces static provider mappings with catalog-driven logic; imports and caches catalog data, updates `get_existing_stt_provider()` and `get_existing_stream_provider()` to resolve provider IDs from catalog, and refactors provider selection menus to derive choices, descriptions, and defaults from catalog entries.
Setup Initialization `backends/advanced/init.py`	Imports catalog utilities and refactors `setup_transcription()` to dynamically fetch provider metadata (display_name, env_vars, setup_notes) from catalog, build interactive provider choices, and apply provider-specific configuration with derived streaming URLs and environment variable handling.
Image Inventory `scripts/pull-images.sh`	Adds new image mapping entry for `chronicle-asr-qwen3-vllm`.
Test Support `tests/unit/test_wizard_strixhalo.py`	Updates setup_utils mock to expose catalog loading and provider lookup functions (`load_stt_provider_catalog`, `get_stt_provider_by_id`, `read_config_yml`) for test wizard setup.

Sequence Diagram

sequenceDiagram
    participant User
    participant Wizard as Setup Wizard
    participant Catalog as Catalog System
    participant Config as Config/Env
    
    User->>Wizard: Start setup
    Wizard->>Catalog: load_stt_provider_catalog()
    Catalog-->>Wizard: Provider list with metadata
    Wizard->>User: Display provider choices (from catalog)
    User->>Wizard: Select provider
    Wizard->>Catalog: get_stt_provider_by_id()
    Catalog-->>Wizard: Provider details (env_vars, defaults, setup_notes)
    Wizard->>User: Prompt for provider-specific vars (from catalog metadata)
    User->>Wizard: Enter configuration values
    Wizard->>Config: Apply defaults & env vars (from provider metadata)
    Config-->>Wizard: Config updated
    Wizard->>User: Display success with provider display_name

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: introducing catalog-based STT provider management to improve setup utility flexibility and maintainability.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Included "chronicle-asr-qwen3-vllm:chronicle-asr-qwen3-vllm" in the pull-images.sh script to support the latest ASR model integration. - This addition enhances the flexibility and capabilities of the ASR services within the Docker environment.

scripts/pull-images.sh

AnkushMalaker · 2026-03-02T05:43:59Z

@coderabbitai review

coderabbitai · 2026-03-02T05:44:08Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backends/advanced/init.py`:
- Around line 537-543: The CLI menu currently prints the custom-endpoint text
when handling the user input stored in choice, but the branching logic treats
choice == "3" as skip and checks choice == "4" for skipping, making the skip
branch unreachable and conflating "skip" with "custom endpoint"; update the
choice-to-action mapping so the printed menu entries, the conditional checks on
the choice variable (e.g., where it prints the OpenAI-Compatible custom endpoint
message), and the skip branch condition (currently checking choice == "4") are
consistent — either move the custom-endpoint print/handling into the branch that
matches the menu number for custom endpoint or change the skip branch to check
the correct value, and ensure --llm-provider none is handled by the skip branch
accordingly so each numeric option (skip vs custom endpoint) maps uniquely to
its handling code.
- Line 789: Three string literals in backends/advanced/init.py are prefixed with
f although they contain no placeholders (Ruff F541). Remove the leading f from
the literal "[green]✅[/green] Neo4j: password configured via wizard" and the
other two f-prefixed strings at the same block (the ones on lines corresponding
to 820 and 825 in the diff) so they become normal string literals; this will
resolve the F541 lint error without changing runtime behavior.

In `@setup_utils.py`:
- Around line 480-482: yaml.safe_load(catalog_path) may return None for
empty/malformed YAML so calling data.get("providers", []) can raise
AttributeError; update the code that opens catalog_path (the block using
yaml.safe_load and variable data) to defensively handle non-dict results by
treating data as an empty dict when it's None or not a mapping (e.g., check
isinstance(data, dict) or coalesce with {}), then return data.get("providers",
[]) or an empty list; ensure you still surface parsing errors appropriately if
needed.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between a0cc2ea and c809500.

📒 Files selected for processing (6)

backends/advanced/init.py
config/stt-providers-catalog.yml
scripts/pull-images.sh
setup_utils.py
tests/unit/test_wizard_strixhalo.py
wizard.py

coderabbitai · 2026-03-02T05:50:51Z

backends/advanced/init.py

+            self.console.print(
+                "[blue][INFO][/blue] OpenAI-Compatible custom endpoint selected"
+            )
+            self.console.print(
+                "This works with any provider that exposes an OpenAI-compatible API"
+            )
            self.console.print("(e.g., Groq, Together AI, LM Studio, vLLM, etc.)")


⚠️ Potential issue | 🔴 Critical

Fix LLM option mapping: “skip” and “custom endpoint” are currently conflated.

Line 537 makes choice "3" execute custom endpoint setup, but the menu/CLI mapping still treats "3" as skip. This makes the choice == "4" skip branch effectively unreachable and can mis-handle --llm-provider none.

💡 Proposed fix

- choice = {"openai": "1", "ollama": "2", "none": "3"}.get(provider, "1") + choice = {"openai": "1", "ollama": "2", "none": "4"}.get(provider, "1") @@ choices = { "1": "OpenAI (GPT-4, GPT-3.5 - requires API key)", "2": "Ollama (local models - runs locally)", - "3": "Skip (no memory extraction)", + "3": "OpenAI-Compatible custom endpoint", + "4": "Skip (no memory extraction)", }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backends/advanced/init.py` around lines 537 - 543, The CLI menu currently prints the custom-endpoint text when handling the user input stored in choice, but the branching logic treats choice == "3" as skip and checks choice == "4" for skipping, making the skip branch unreachable and conflating "skip" with "custom endpoint"; update the choice-to-action mapping so the printed menu entries, the conditional checks on the choice variable (e.g., where it prints the OpenAI-Compatible custom endpoint message), and the skip branch condition (currently checking choice == "4") are consistent — either move the custom-endpoint print/handling into the branch that matches the menu number for custom endpoint or change the skip branch to check the correct value, and ensure --llm-provider none is handled by the skip branch accordingly so each numeric option (skip vs custom endpoint) maps uniquely to its handling code.

coderabbitai · 2026-03-02T05:50:51Z

backends/advanced/init.py

        if neo4j_password:
-            self.console.print(f"[green]✅[/green] Neo4j: password configured via wizard")
+            self.console.print(
+                f"[green]✅[/green] Neo4j: password configured via wizard"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and get its structure git ls-files | grep -E "(backends|init\.py)" | head -20

Repository: SimpleOpenSoftware/chronicle

Length of output: 788

🏁 Script executed:

# Now let's examine the specific lines in the file if [ -f "backends/advanced/init.py" ]; then echo "=== File exists, checking lines 789, 820, 825 ===" sed -n '788,790p' "backends/advanced/init.py" | cat -n sed -n '819,821p' "backends/advanced/init.py" | cat -n sed -n '824,826p' "backends/advanced/init.py" | cat -n else echo "File not found" fi

Repository: SimpleOpenSoftware/chronicle

Length of output: 556

🏁 Script executed:

# Let's also verify these are f-strings without placeholders using Python AST python3 <<'EOF' import ast from pathlib import Path path = Path("backends/advanced/init.py") if path.exists(): content = path.read_text(encoding="utf-8") tree = ast.parse(content) lines_to_check = [789, 820, 825] for node in ast.walk(tree): if isinstance(node, ast.JoinedStr): if node.lineno in lines_to_check: # Check if it has any formatted values (placeholders) has_placeholders = any(isinstance(v, ast.FormattedValue) for v in node.values) print(f"Line {node.lineno}: has_placeholders={has_placeholders}") print(f" Values: {node.values}") # Also check string content from source lines = content.split('\n') if node.lineno <= len(lines): print(f" Source: {lines[node.lineno - 1].strip()}") else: print("File not found") EOF

Repository: SimpleOpenSoftware/chronicle

Length of output: 542

Remove redundant f prefixes to resolve Ruff F541.

Lines 789, 820, and 825 contain f-strings without placeholders; this is flagged as F541 by Ruff and can break lint-gated pipelines.

Proposed fix

- self.console.print( - f"[green]✅[/green] Neo4j: password configured via wizard" - ) + self.console.print( + "[green]✅[/green] Neo4j: password configured via wizard" + ) @@ - self.console.print( - f"[green]✅[/green] Obsidian: enabled (configured via wizard)" - ) + self.console.print( + "[green]✅[/green] Obsidian: enabled (configured via wizard)" + ) @@ - self.console.print( - f"[blue][INFO][/blue] Obsidian: disabled (configured via wizard)" - ) + self.console.print( + "[blue][INFO][/blue] Obsidian: disabled (configured via wizard)" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

f"[green]✅[/green] Neo4j: password configured via wizard"

"[green]✅[/green] Neo4j: password configured via wizard"

🧰 Tools

🪛 Ruff (0.15.2)

[error] 789-789: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backends/advanced/init.py` at line 789, Three string literals in backends/advanced/init.py are prefixed with f although they contain no placeholders (Ruff F541). Remove the leading f from the literal "[green]✅[/green] Neo4j: password configured via wizard" and the other two f-prefixed strings at the same block (the ones on lines corresponding to 820 and 825 in the diff) so they become normal string literals; this will resolve the F541 lint error without changing runtime behavior.

coderabbitai · 2026-03-02T05:50:52Z

setup_utils.py

+    with open(catalog_path, "r") as f:
+        data = yaml.safe_load(f)
+    return data.get("providers", [])


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, run the verification script from the review comment python - <<'PY' import yaml data = yaml.safe_load("") # empty YAML content print("safe_load output:", data, type(data).__name__) try: print("Result of .get():", data.get("providers", [])) except Exception as exc: print("current pattern fails with:", type(exc).__name__, str(exc)) PY

Repository: SimpleOpenSoftware/chronicle

Length of output: 187

🏁 Script executed:

# Find and read setup_utils.py, focusing on lines 480-482 find . -name "setup_utils.py" -type f | head -5

Repository: SimpleOpenSoftware/chronicle

Length of output: 88

🏁 Script executed:

# Read the actual code at lines 480-482 sed -n '475,490p' ./setup_utils.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 577

Handle empty or malformed catalog YAML safely before accessing .get().

At lines 481-483, data.get("providers", []) will crash with an AttributeError when yaml.safe_load(f) returns None (which occurs with empty YAML content). This breaks setup flows unexpectedly.

Proposed fix

def load_stt_provider_catalog(catalog_path=None): """...""" import yaml if catalog_path is None: catalog_path = Path(__file__).parent / "config" / "stt-providers-catalog.yml" with open(catalog_path, "r", encoding="utf-8") as f: - data = yaml.safe_load(f) - return data.get("providers", []) + data = yaml.safe_load(f) or {} + return data.get("providers", [])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@setup_utils.py` around lines 480 - 482, yaml.safe_load(catalog_path) may return None for empty/malformed YAML so calling data.get("providers", []) can raise AttributeError; update the code that opens catalog_path (the block using yaml.safe_load and variable data) to defensively handle non-dict results by treating data as an empty dict when it's None or not a mapping (e.g., check isinstance(data, dict) or coalesce with {}), then return data.get("providers", []) or an empty list; ensure you still surface parsing errors appropriately if needed.

coderabbitai · 2026-03-02T05:50:52Z

wizard.py

+            # Derive ASR provider name from catalog
+            provider_entry = get_stt_provider_by_id(
+                transcription_provider, _get_catalog()
+            )
+            if provider_entry and provider_entry.get("local_asr_provider"):
+                if hardware_profile == "strixhalo":
+                    asr_provider = (
+                        provider_entry.get("local_asr_provider_strixhalo")
+                        or provider_entry["local_asr_provider"]
+                    )
+                else:
+                    asr_provider = provider_entry["local_asr_provider"]
                cmd.extend(["--provider", asr_provider])


⚠️ Potential issue | 🟠 Major

ASR provider derivation should fall back to streaming_provider when batch is cloud.

Line 425 currently resolves ASR provider from transcription_provider only. If batch is cloud and streaming is local, asr-services gets auto-added but --provider is not preselected.

💡 Proposed fix

- # Derive ASR provider name from catalog - provider_entry = get_stt_provider_by_id( - transcription_provider, _get_catalog() - ) + # Derive ASR provider from batch provider first, then streaming provider + catalog = _get_catalog() + provider_source = transcription_provider + provider_entry = get_stt_provider_by_id(provider_source, catalog) + if not (provider_entry and provider_entry.get("local_asr_provider")): + provider_source = streaming_provider + provider_entry = get_stt_provider_by_id(provider_source, catalog) if provider_entry and provider_entry.get("local_asr_provider"): if hardware_profile == "strixhalo": asr_provider = ( provider_entry.get("local_asr_provider_strixhalo") or provider_entry["local_asr_provider"] ) else: asr_provider = provider_entry["local_asr_provider"] cmd.extend(["--provider", asr_provider]) console.print( - f"[blue][INFO][/blue] Pre-selecting ASR provider: {asr_provider} (from wizard choice: {transcription_provider})" + f"[blue][INFO][/blue] Pre-selecting ASR provider: {asr_provider} (from wizard choice: {provider_source})" )

0xrushi marked this pull request as draft February 28, 2026 10:28

0xrushi added 2 commits February 28, 2026 20:48

Merge branch 'dev' into fix/singleconfig

a2d87cc

0xrushi commented Mar 1, 2026

View reviewed changes

scripts/pull-images.sh Show resolved Hide resolved

0xrushi marked this pull request as ready for review March 1, 2026 01:51

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Enhance STT provider management and configuration in setup utilities#320

fix: Enhance STT provider management and configuration in setup utilities#320
0xrushi wants to merge 3 commits intoSimpleOpenSoftware:devfrom
0xrushi:fix/singleconfig

0xrushi commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Uh oh!

Uh oh!

AnkushMalaker commented Mar 2, 2026

Uh oh!

coderabbitai bot commented Mar 2, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	f"[green]✅[/green] Neo4j: password configured via wizard"
	"[green]✅[/green] Neo4j: password configured via wizard"

Conversation

0xrushi commented Feb 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Uh oh!

Uh oh!

AnkushMalaker commented Mar 2, 2026

Uh oh!

coderabbitai bot commented Mar 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0xrushi commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 28, 2026 •

edited

Loading