docs: add streaming transcription API spec by Leftium · Pull Request #28 · cjpais/transcribe-rs

Leftium · 2026-01-29T22:27:16Z

API design for streaming transcription support, related to #4.

This spec:

Defines a new streaming transcription API.
Defines a new common return type that is suitable for both streaming and batch transcription.
Refactors the existing batch APIs for consistency (and performance).
Deprecates the existing API (which becomes a thin wrapper over the new API).

Rationale:

The final results of (realtime) streaming transcription and batch transcription are very similar. So they should share the same return type. (Streaming transcription just adds some extra optional fields.) This makes it easier for crate consumers to support both streaming and batch transcription. (Or simply switch between streaming and batch.)

On the other hand, a new API is required for requesting streaming transcription:

input (audio samples) must be decoupled from output (transcription results)
must support a series of input (essentially small batch transcription requests)
should provide interim results with partial transcriptions

- Define StreamingTranscriptionEngine trait (pull-based core) - Define StreamingTranscriptionSource struct (high-level callback API) - Define PushSource trait + PushAdapter for push-based backends - Define Transcript type with rich metadata (timing, speaker, confidence) - Architecture diagram with existing vs planned components - Migration path from legacy transcribe-rs

- Remove SherpaEngine, ElevenLabsSource, ElevenLabsEngine to reduce clutter - Keep ParakeetEngine, OpenAISource, OpenAIEngine, WhisperEngine as representative examples - Expand 3rd party API list: add Vosk, Deepgram, Azure, AssemblyAI, etc. - Trim to top 3 per category (Pull: NeMo/Vosk/sherpa-onnx, Push: Deepgram/OpenAI/ElevenLabs, Batch: whisper.cpp) - Add counts to category labels (e.g., "Push Streaming (9)") - Add collapsible table listing all compatible APIs with streaming/transport/deployment info - Mark unconnected APIs with * and legend entry explaining omission

- Add separate diagrams for Batch (existing) and Streaming (planned) - Keep Combined View showing both together - Split consumer apps into Handy + Whispering* (representative example) - Replace text legend with compact mermaid diagram showing styles

Address feedback that Rust-specific details felt overconfident. All code examples now use pseudocode (STRUCT, INTERFACE, METHOD) with a disclaimer noting implementation details are left to the implementer.

- Split transcription-rs.md into overview (diagrams, sub-spec list) and appendix (details) - Add A/B/C/D labels to diagrams as map for upcoming sub-specs - Reorder: Overview → Sub-spec list → Diagrams - Bold labels in Mermaid diagrams - Update legend to show rename notation [OldName]

- (A) Transcript Type - (B) StreamingTranscriptionEngine (Low-Level) - (C) StreamingTranscriptionSource (High-Level Adapter)

- (A) Transcript Type — required - (B) StreamingTranscriptionEngine (Low-Level) — required - (C) StreamingTranscriptionSource (High-Level Adapter) — optional - (D) PushAdapter — optional

- Overview now explains why (gaps in transcribe-rs) then what's new - Legend clarified: [Existing] vs NewName [OldName] notation - Removed Omitted* from legend (stars already explain this) - (A) Transcript Type: required → highly recommended - Tightened adapter explanation

Leftium mentioned this pull request Jan 29, 2026

feature request: On the fly transcription #4

Open

Leftium marked this pull request as draft January 29, 2026 23:49

Leftium force-pushed the docs/streaming-api-spec branch 5 times, most recently from 0977438 to 7d9a21a Compare February 1, 2026 06:16

Leftium force-pushed the docs/streaming-api-spec branch from 7d9a21a to 7d5fe35 Compare February 5, 2026 05:22

Leftium added 8 commits February 5, 2026 15:13

docs(spec): convert Rust code to language-agnostic pseudocode

ff98a71

Address feedback that Rust-specific details felt overconfident. All code examples now use pseudocode (STRUCT, INTERFACE, METHOD) with a disclaimer noting implementation details are left to the implementer.

docs: draft sub-spec summaries for (A), (B), and (C)

a0d154f

- (A) Transcript Type - (B) StreamingTranscriptionEngine (Low-Level) - (C) StreamingTranscriptionSource (High-Level Adapter)

docs: complete sub-spec summaries (A-D) with required/optional tags

8f52a20

- (A) Transcript Type — required - (B) StreamingTranscriptionEngine (Low-Level) — required - (C) StreamingTranscriptionSource (High-Level Adapter) — optional - (D) PushAdapter — optional

docs: clarify diagram arrow labels with "returns" prefix

2e0f9be

Leftium force-pushed the docs/streaming-api-spec branch from da42e03 to 2e0f9be Compare February 5, 2026 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add streaming transcription API spec#28

docs: add streaming transcription API spec#28
Leftium wants to merge 9 commits intocjpais:mainfrom
Leftium:docs/streaming-api-spec

Leftium commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Leftium commented Jan 29, 2026

This spec:

Rationale:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant