docs: add streaming transcription API spec#28
Draft
Leftium wants to merge 9 commits intocjpais:mainfrom
Draft
Conversation
0977438 to
7d9a21a
Compare
- Define StreamingTranscriptionEngine trait (pull-based core) - Define StreamingTranscriptionSource struct (high-level callback API) - Define PushSource trait + PushAdapter for push-based backends - Define Transcript type with rich metadata (timing, speaker, confidence) - Architecture diagram with existing vs planned components - Migration path from legacy transcribe-rs
7d9a21a to
7d5fe35
Compare
- Remove SherpaEngine, ElevenLabsSource, ElevenLabsEngine to reduce clutter - Keep ParakeetEngine, OpenAISource, OpenAIEngine, WhisperEngine as representative examples - Expand 3rd party API list: add Vosk, Deepgram, Azure, AssemblyAI, etc. - Trim to top 3 per category (Pull: NeMo/Vosk/sherpa-onnx, Push: Deepgram/OpenAI/ElevenLabs, Batch: whisper.cpp) - Add counts to category labels (e.g., "Push Streaming (9)") - Add collapsible table listing all compatible APIs with streaming/transport/deployment info - Mark unconnected APIs with * and legend entry explaining omission
- Add separate diagrams for Batch (existing) and Streaming (planned) - Keep Combined View showing both together - Split consumer apps into Handy + Whispering* (representative example) - Replace text legend with compact mermaid diagram showing styles
Address feedback that Rust-specific details felt overconfident. All code examples now use pseudocode (STRUCT, INTERFACE, METHOD) with a disclaimer noting implementation details are left to the implementer.
- Split transcription-rs.md into overview (diagrams, sub-spec list) and appendix (details) - Add A/B/C/D labels to diagrams as map for upcoming sub-specs - Reorder: Overview → Sub-spec list → Diagrams - Bold labels in Mermaid diagrams - Update legend to show rename notation [OldName]
- (A) Transcript Type - (B) StreamingTranscriptionEngine (Low-Level) - (C) StreamingTranscriptionSource (High-Level Adapter)
- (A) Transcript Type — required - (B) StreamingTranscriptionEngine (Low-Level) — required - (C) StreamingTranscriptionSource (High-Level Adapter) — optional - (D) PushAdapter — optional
- Overview now explains why (gaps in transcribe-rs) then what's new - Legend clarified: [Existing] vs NewName [OldName] notation - Removed Omitted* from legend (stars already explain this) - (A) Transcript Type: required → highly recommended - Tightened adapter explanation
da42e03 to
2e0f9be
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
API design for streaming transcription support, related to #4.
This spec:
Rationale:
The final results of (realtime) streaming transcription and batch transcription are very similar. So they should share the same return type. (Streaming transcription just adds some extra optional fields.) This makes it easier for crate consumers to support both streaming and batch transcription. (Or simply switch between streaming and batch.)
On the other hand, a new API is required for requesting streaming transcription: