MAITO’s Local AI translation uses “streaming” - displaying translation results as they’re generated, word-by-word. This article explains how and why.
What Is Streaming Translation?
Traditional (DeepL):
- You click Translate
- Wait…
- Complete translation appears at once
Streaming (Local AI):
- You click Translate
- Words appear gradually, one by one
- You see progress in real-time
- Complete translation builds up
It’s like watching someone type the translation live.
Why Stream?
Technical Reason
AI models generate text sequentially, one token (word/word-part) at a time. Streaming shows this natural process as it happens.
Model Process:
- Reads input: “Hello, how are you?”
- Generates token 1: “Hallo”
- Generates token 2: ”,”
- Generates token 3: ” wie”
- Generates token 4: ” geht”
- And so on…
Streaming displays each token immediately as generated.
User Experience Benefits
Immediate Feedback
- See translation start instantly
- Know something is happening
- No “frozen” feeling
Progress Visibility
- Watch translation build up
- Percentage completion shown
- Estimated time remaining
- Tokens/second metric
Early Cancellation
- See if translation is going wrong
- Cancel mid-way if needed
- Don’t waste time on bad output
Engaging Experience
- More interactive feel
- Like real-time assistance
- Less boring than waiting
It's Not Slower
Streaming doesn’t make translation slower - it just shows the process as it happens. The total time is the same whether you see streaming or not. DeepL doesn’t stream because cloud latency makes it impractical.
What You See
During streaming translation, MAITO displays:
Progress Percentage
Translation progress: 47%
- Shows how much is complete
- Updates continuously
Tokens Per Second
12.5 tokens/sec
- Translation speed metric
- Higher = faster
- Indicates system performance
Estimated Time
~25s remaining
- Prediction based on progress
- Updates as translation continues
- Helps you plan
Cancel Button
- Stop translation anytime
- Useful if output is clearly wrong
- Frees resources immediately
Streaming vs All-At-Once
| Aspect | Streaming (Local AI) | All-At-Once (DeepL) |
|---|---|---|
| Feedback | Immediate | Delayed |
| Progress | Visible | Hidden |
| Cancel | Anytime | Before completion only |
| Experience | Interactive | Traditional |
| Speed Metric | Real-time t/s | No metric |
| UX | Modern, engaging | Classic, simple |
Technical Details
Intelligent Chunking
For longer texts, MAITO:
- Intelligently splits at paragraph/sentence boundaries
- Translates each chunk
- Streams each chunk’s translation
- Assembles final result seamlessly
You see streaming for each chunk with overall progress. This chunking is completely transparent - there’s no practical text length limit.
Token Definition
Token = Basic unit of text
- Often a word or word part
- “Hello” = 1 token
- “Translation” might be 1-2 tokens
- Roughly 4 characters = 1 token
Why Tokens Matter:
- AI models think in tokens
- Speed measured in tokens/second
- Character limits converted to token limits
Streaming Protocol
Technically, MAITO uses:
IAsyncEnumerablein C#- Yields translation chunks asynchronously
- UI updates on each chunk
- Smooth, responsive experience
Benefits for Different Users
Casual Users
- Engaging: Fun to watch translation appear
- Reassuring: Know it’s working
- Informative: See if speed is acceptable
Professional Users
- Efficient: Cancel bad translations early
- Informative: Performance metrics help diagnose issues
- Productive: Can start reading while translation continues
Technical Users
- Diagnostic: Tokens/sec reveals system performance
- Benchmarking: Easy to compare speeds
- Transparent: Understand what’s happening
Reading While Streaming
For long texts, you can start reading the beginning while the end is still being translated. This makes even slower systems feel more responsive!
When Streaming Isn’t Used
Streaming only applies to Local AI translation. DeepL doesn’t stream because:
- Cloud round-trip latency makes streaming impractical
- DeepL’s API returns complete translations
- Network buffering would make streaming jerky
Related Articles
- What Is Local Translation - How Local AI works
- Why Is Local Slow - Performance factors
- Benchmark Performance - Measure tokens/sec