Skip to content

docs: add streaming transcription API spec#28

Draft
Leftium wants to merge 9 commits intocjpais:mainfrom
Leftium:docs/streaming-api-spec
Draft

docs: add streaming transcription API spec#28
Leftium wants to merge 9 commits intocjpais:mainfrom
Leftium:docs/streaming-api-spec

Conversation

@Leftium
Copy link

@Leftium Leftium commented Jan 29, 2026

API design for streaming transcription support, related to #4.


This spec:

  • Defines a new streaming transcription API.
  • Defines a new common return type that is suitable for both streaming and batch transcription.
  • Refactors the existing batch APIs for consistency (and performance).
  • Deprecates the existing API (which becomes a thin wrapper over the new API).

Rationale:

The final results of (realtime) streaming transcription and batch transcription are very similar. So they should share the same return type. (Streaming transcription just adds some extra optional fields.) This makes it easier for crate consumers to support both streaming and batch transcription. (Or simply switch between streaming and batch.)

On the other hand, a new API is required for requesting streaming transcription:

  • input (audio samples) must be decoupled from output (transcription results)
  • must support a series of input (essentially small batch transcription requests)
  • should provide interim results with partial transcriptions

@Leftium Leftium marked this pull request as draft January 29, 2026 23:49
@Leftium Leftium force-pushed the docs/streaming-api-spec branch 5 times, most recently from 0977438 to 7d9a21a Compare February 1, 2026 06:16
- Define StreamingTranscriptionEngine trait (pull-based core)
- Define StreamingTranscriptionSource struct (high-level callback API)
- Define PushSource trait + PushAdapter for push-based backends
- Define Transcript type with rich metadata (timing, speaker, confidence)
- Architecture diagram with existing vs planned components
- Migration path from legacy transcribe-rs
@Leftium Leftium force-pushed the docs/streaming-api-spec branch from 7d9a21a to 7d5fe35 Compare February 5, 2026 05:22
- Remove SherpaEngine, ElevenLabsSource, ElevenLabsEngine to reduce clutter
- Keep ParakeetEngine, OpenAISource, OpenAIEngine, WhisperEngine as representative examples
- Expand 3rd party API list: add Vosk, Deepgram, Azure, AssemblyAI, etc.
- Trim to top 3 per category (Pull: NeMo/Vosk/sherpa-onnx, Push: Deepgram/OpenAI/ElevenLabs, Batch: whisper.cpp)
- Add counts to category labels (e.g., "Push Streaming (9)")
- Add collapsible table listing all compatible APIs with streaming/transport/deployment info
- Mark unconnected APIs with * and legend entry explaining omission
- Add separate diagrams for Batch (existing) and Streaming (planned)
- Keep Combined View showing both together
- Split consumer apps into Handy + Whispering* (representative example)
- Replace text legend with compact mermaid diagram showing styles
Address feedback that Rust-specific details felt overconfident.
All code examples now use pseudocode (STRUCT, INTERFACE, METHOD)
with a disclaimer noting implementation details are left to the
implementer.
- Split transcription-rs.md into overview (diagrams, sub-spec list) and appendix (details)
- Add A/B/C/D labels to diagrams as map for upcoming sub-specs
- Reorder: Overview → Sub-spec list → Diagrams
- Bold labels in Mermaid diagrams
- Update legend to show rename notation [OldName]
- (A) Transcript Type
- (B) StreamingTranscriptionEngine (Low-Level)
- (C) StreamingTranscriptionSource (High-Level Adapter)
- (A) Transcript Type — required
- (B) StreamingTranscriptionEngine (Low-Level) — required
- (C) StreamingTranscriptionSource (High-Level Adapter) — optional
- (D) PushAdapter — optional
- Overview now explains why (gaps in transcribe-rs) then what's new
- Legend clarified: [Existing] vs NewName [OldName] notation
- Removed Omitted* from legend (stars already explain this)
- (A) Transcript Type: required → highly recommended
- Tightened adapter explanation
@Leftium Leftium force-pushed the docs/streaming-api-spec branch from da42e03 to 2e0f9be Compare February 5, 2026 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant