fix: Enhance STT provider management and configuration in setup utilities#320
fix: Enhance STT provider management and configuration in setup utilities#3200xrushi wants to merge 3 commits intoSimpleOpenSoftware:devfrom
Conversation
- Introduced functions to load and retrieve STT provider details from a new YAML catalog, improving the setup wizard's flexibility. - Updated `wizard.py` to utilize the new catalog for provider selection, enhancing user experience during transcription provider setup. - Refactored existing provider mapping logic to streamline the process and improve maintainability. - Added a new configuration file `stt-providers-catalog.yml` to define STT providers, their capabilities, and setup instructions, ensuring a centralized management approach. - Enhanced error handling and logging for better visibility during provider selection and configuration processes.
📝 WalkthroughWalkthroughIntroduces a catalog-based system for speech-to-text providers, replacing hardcoded provider configurations with dynamic, YAML-driven catalog entries. Adds utility functions to load and query provider metadata, and updates the setup wizard to derive provider choices, defaults, and configuration options from the catalog. Changes
Sequence DiagramsequenceDiagram
participant User
participant Wizard as Setup Wizard
participant Catalog as Catalog System
participant Config as Config/Env
User->>Wizard: Start setup
Wizard->>Catalog: load_stt_provider_catalog()
Catalog-->>Wizard: Provider list with metadata
Wizard->>User: Display provider choices (from catalog)
User->>Wizard: Select provider
Wizard->>Catalog: get_stt_provider_by_id()
Catalog-->>Wizard: Provider details (env_vars, defaults, setup_notes)
Wizard->>User: Prompt for provider-specific vars (from catalog metadata)
User->>Wizard: Enter configuration values
Wizard->>Config: Apply defaults & env vars (from provider metadata)
Config-->>Wizard: Config updated
Wizard->>User: Display success with provider display_name
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Included "chronicle-asr-qwen3-vllm:chronicle-asr-qwen3-vllm" in the pull-images.sh script to support the latest ASR model integration. - This addition enhances the flexibility and capabilities of the ASR services within the Docker environment.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backends/advanced/init.py`:
- Around line 537-543: The CLI menu currently prints the custom-endpoint text
when handling the user input stored in choice, but the branching logic treats
choice == "3" as skip and checks choice == "4" for skipping, making the skip
branch unreachable and conflating "skip" with "custom endpoint"; update the
choice-to-action mapping so the printed menu entries, the conditional checks on
the choice variable (e.g., where it prints the OpenAI-Compatible custom endpoint
message), and the skip branch condition (currently checking choice == "4") are
consistent — either move the custom-endpoint print/handling into the branch that
matches the menu number for custom endpoint or change the skip branch to check
the correct value, and ensure --llm-provider none is handled by the skip branch
accordingly so each numeric option (skip vs custom endpoint) maps uniquely to
its handling code.
- Line 789: Three string literals in backends/advanced/init.py are prefixed with
f although they contain no placeholders (Ruff F541). Remove the leading f from
the literal "[green]✅[/green] Neo4j: password configured via wizard" and the
other two f-prefixed strings at the same block (the ones on lines corresponding
to 820 and 825 in the diff) so they become normal string literals; this will
resolve the F541 lint error without changing runtime behavior.
In `@setup_utils.py`:
- Around line 480-482: yaml.safe_load(catalog_path) may return None for
empty/malformed YAML so calling data.get("providers", []) can raise
AttributeError; update the code that opens catalog_path (the block using
yaml.safe_load and variable data) to defensively handle non-dict results by
treating data as an empty dict when it's None or not a mapping (e.g., check
isinstance(data, dict) or coalesce with {}), then return data.get("providers",
[]) or an empty list; ensure you still surface parsing errors appropriately if
needed.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (6)
backends/advanced/init.pyconfig/stt-providers-catalog.ymlscripts/pull-images.shsetup_utils.pytests/unit/test_wizard_strixhalo.pywizard.py
| self.console.print( | ||
| "[blue][INFO][/blue] OpenAI-Compatible custom endpoint selected" | ||
| ) | ||
| self.console.print( | ||
| "This works with any provider that exposes an OpenAI-compatible API" | ||
| ) | ||
| self.console.print("(e.g., Groq, Together AI, LM Studio, vLLM, etc.)") |
There was a problem hiding this comment.
Fix LLM option mapping: “skip” and “custom endpoint” are currently conflated.
Line 537 makes choice "3" execute custom endpoint setup, but the menu/CLI mapping still treats "3" as skip. This makes the choice == "4" skip branch effectively unreachable and can mis-handle --llm-provider none.
💡 Proposed fix
- choice = {"openai": "1", "ollama": "2", "none": "3"}.get(provider, "1")
+ choice = {"openai": "1", "ollama": "2", "none": "4"}.get(provider, "1")
@@
choices = {
"1": "OpenAI (GPT-4, GPT-3.5 - requires API key)",
"2": "Ollama (local models - runs locally)",
- "3": "Skip (no memory extraction)",
+ "3": "OpenAI-Compatible custom endpoint",
+ "4": "Skip (no memory extraction)",
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backends/advanced/init.py` around lines 537 - 543, The CLI menu currently
prints the custom-endpoint text when handling the user input stored in choice,
but the branching logic treats choice == "3" as skip and checks choice == "4"
for skipping, making the skip branch unreachable and conflating "skip" with
"custom endpoint"; update the choice-to-action mapping so the printed menu
entries, the conditional checks on the choice variable (e.g., where it prints
the OpenAI-Compatible custom endpoint message), and the skip branch condition
(currently checking choice == "4") are consistent — either move the
custom-endpoint print/handling into the branch that matches the menu number for
custom endpoint or change the skip branch to check the correct value, and ensure
--llm-provider none is handled by the skip branch accordingly so each numeric
option (skip vs custom endpoint) maps uniquely to its handling code.
| if neo4j_password: | ||
| self.console.print(f"[green]✅[/green] Neo4j: password configured via wizard") | ||
| self.console.print( | ||
| f"[green]✅[/green] Neo4j: password configured via wizard" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and get its structure
git ls-files | grep -E "(backends|init\.py)" | head -20Repository: SimpleOpenSoftware/chronicle
Length of output: 788
🏁 Script executed:
# Now let's examine the specific lines in the file
if [ -f "backends/advanced/init.py" ]; then
echo "=== File exists, checking lines 789, 820, 825 ==="
sed -n '788,790p' "backends/advanced/init.py" | cat -n
sed -n '819,821p' "backends/advanced/init.py" | cat -n
sed -n '824,826p' "backends/advanced/init.py" | cat -n
else
echo "File not found"
fiRepository: SimpleOpenSoftware/chronicle
Length of output: 556
🏁 Script executed:
# Let's also verify these are f-strings without placeholders using Python AST
python3 <<'EOF'
import ast
from pathlib import Path
path = Path("backends/advanced/init.py")
if path.exists():
content = path.read_text(encoding="utf-8")
tree = ast.parse(content)
lines_to_check = [789, 820, 825]
for node in ast.walk(tree):
if isinstance(node, ast.JoinedStr):
if node.lineno in lines_to_check:
# Check if it has any formatted values (placeholders)
has_placeholders = any(isinstance(v, ast.FormattedValue) for v in node.values)
print(f"Line {node.lineno}: has_placeholders={has_placeholders}")
print(f" Values: {node.values}")
# Also check string content from source
lines = content.split('\n')
if node.lineno <= len(lines):
print(f" Source: {lines[node.lineno - 1].strip()}")
else:
print("File not found")
EOFRepository: SimpleOpenSoftware/chronicle
Length of output: 542
Remove redundant f prefixes to resolve Ruff F541.
Lines 789, 820, and 825 contain f-strings without placeholders; this is flagged as F541 by Ruff and can break lint-gated pipelines.
Proposed fix
- self.console.print(
- f"[green]✅[/green] Neo4j: password configured via wizard"
- )
+ self.console.print(
+ "[green]✅[/green] Neo4j: password configured via wizard"
+ )
@@
- self.console.print(
- f"[green]✅[/green] Obsidian: enabled (configured via wizard)"
- )
+ self.console.print(
+ "[green]✅[/green] Obsidian: enabled (configured via wizard)"
+ )
@@
- self.console.print(
- f"[blue][INFO][/blue] Obsidian: disabled (configured via wizard)"
- )
+ self.console.print(
+ "[blue][INFO][/blue] Obsidian: disabled (configured via wizard)"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| f"[green]✅[/green] Neo4j: password configured via wizard" | |
| "[green]✅[/green] Neo4j: password configured via wizard" |
🧰 Tools
🪛 Ruff (0.15.2)
[error] 789-789: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backends/advanced/init.py` at line 789, Three string literals in
backends/advanced/init.py are prefixed with f although they contain no
placeholders (Ruff F541). Remove the leading f from the literal
"[green]✅[/green] Neo4j: password configured via wizard" and the other two
f-prefixed strings at the same block (the ones on lines corresponding to 820 and
825 in the diff) so they become normal string literals; this will resolve the
F541 lint error without changing runtime behavior.
| with open(catalog_path, "r") as f: | ||
| data = yaml.safe_load(f) | ||
| return data.get("providers", []) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, run the verification script from the review comment
python - <<'PY'
import yaml
data = yaml.safe_load("") # empty YAML content
print("safe_load output:", data, type(data).__name__)
try:
print("Result of .get():", data.get("providers", []))
except Exception as exc:
print("current pattern fails with:", type(exc).__name__, str(exc))
PYRepository: SimpleOpenSoftware/chronicle
Length of output: 187
🏁 Script executed:
# Find and read setup_utils.py, focusing on lines 480-482
find . -name "setup_utils.py" -type f | head -5Repository: SimpleOpenSoftware/chronicle
Length of output: 88
🏁 Script executed:
# Read the actual code at lines 480-482
sed -n '475,490p' ./setup_utils.pyRepository: SimpleOpenSoftware/chronicle
Length of output: 577
Handle empty or malformed catalog YAML safely before accessing .get().
At lines 481-483, data.get("providers", []) will crash with an AttributeError when yaml.safe_load(f) returns None (which occurs with empty YAML content). This breaks setup flows unexpectedly.
Proposed fix
def load_stt_provider_catalog(catalog_path=None):
"""..."""
import yaml
if catalog_path is None:
catalog_path = Path(__file__).parent / "config" / "stt-providers-catalog.yml"
with open(catalog_path, "r", encoding="utf-8") as f:
- data = yaml.safe_load(f)
- return data.get("providers", [])
+ data = yaml.safe_load(f) or {}
+ return data.get("providers", [])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@setup_utils.py` around lines 480 - 482, yaml.safe_load(catalog_path) may
return None for empty/malformed YAML so calling data.get("providers", []) can
raise AttributeError; update the code that opens catalog_path (the block using
yaml.safe_load and variable data) to defensively handle non-dict results by
treating data as an empty dict when it's None or not a mapping (e.g., check
isinstance(data, dict) or coalesce with {}), then return data.get("providers",
[]) or an empty list; ensure you still surface parsing errors appropriately if
needed.
| # Derive ASR provider name from catalog | ||
| provider_entry = get_stt_provider_by_id( | ||
| transcription_provider, _get_catalog() | ||
| ) | ||
| if provider_entry and provider_entry.get("local_asr_provider"): | ||
| if hardware_profile == "strixhalo": | ||
| asr_provider = ( | ||
| provider_entry.get("local_asr_provider_strixhalo") | ||
| or provider_entry["local_asr_provider"] | ||
| ) | ||
| else: | ||
| asr_provider = provider_entry["local_asr_provider"] | ||
| cmd.extend(["--provider", asr_provider]) |
There was a problem hiding this comment.
ASR provider derivation should fall back to streaming_provider when batch is cloud.
Line 425 currently resolves ASR provider from transcription_provider only. If batch is cloud and streaming is local, asr-services gets auto-added but --provider is not preselected.
💡 Proposed fix
- # Derive ASR provider name from catalog
- provider_entry = get_stt_provider_by_id(
- transcription_provider, _get_catalog()
- )
+ # Derive ASR provider from batch provider first, then streaming provider
+ catalog = _get_catalog()
+ provider_source = transcription_provider
+ provider_entry = get_stt_provider_by_id(provider_source, catalog)
+ if not (provider_entry and provider_entry.get("local_asr_provider")):
+ provider_source = streaming_provider
+ provider_entry = get_stt_provider_by_id(provider_source, catalog)
if provider_entry and provider_entry.get("local_asr_provider"):
if hardware_profile == "strixhalo":
asr_provider = (
provider_entry.get("local_asr_provider_strixhalo")
or provider_entry["local_asr_provider"]
)
else:
asr_provider = provider_entry["local_asr_provider"]
cmd.extend(["--provider", asr_provider])
console.print(
- f"[blue][INFO][/blue] Pre-selecting ASR provider: {asr_provider} (from wizard choice: {transcription_provider})"
+ f"[blue][INFO][/blue] Pre-selecting ASR provider: {asr_provider} (from wizard choice: {provider_source})"
)
wizard.pyto utilize the new catalog for provider selection, enhancing user experience during transcription provider setup.stt-providers-catalog.ymlto define STT providers, their capabilities, and setup instructions, ensuring a centralized management approach.Summary by CodeRabbit
New Features
Tests