Understanding Streaming Translation

Learn how MAITO's streaming translation provides real-time feedback as text is translated word-by-word

Pangaia Software 4 min read

MAITO’s Local AI translation uses “streaming” - displaying translation results as they’re generated, word-by-word. This article explains how and why.

What Is Streaming Translation?

Traditional (DeepL):

  1. You click Translate
  2. Wait…
  3. Complete translation appears at once

Streaming (Local AI):

  1. You click Translate
  2. Words appear gradually, one by one
  3. You see progress in real-time
  4. Complete translation builds up

It’s like watching someone type the translation live.

Why Stream?

Technical Reason

AI models generate text sequentially, one token (word/word-part) at a time. Streaming shows this natural process as it happens.

Model Process:

  • Reads input: “Hello, how are you?”
  • Generates token 1: “Hallo”
  • Generates token 2: ”,”
  • Generates token 3: ” wie”
  • Generates token 4: ” geht”
  • And so on…

Streaming displays each token immediately as generated.

User Experience Benefits

Immediate Feedback

  • See translation start instantly
  • Know something is happening
  • No “frozen” feeling

Progress Visibility

  • Watch translation build up
  • Percentage completion shown
  • Estimated time remaining
  • Tokens/second metric

Early Cancellation

  • See if translation is going wrong
  • Cancel mid-way if needed
  • Don’t waste time on bad output

Engaging Experience

  • More interactive feel
  • Like real-time assistance
  • Less boring than waiting

It's Not Slower

Streaming doesn’t make translation slower - it just shows the process as it happens. The total time is the same whether you see streaming or not. DeepL doesn’t stream because cloud latency makes it impractical.

What You See

During streaming translation, MAITO displays:

Progress Percentage

Translation progress: 47%

  • Shows how much is complete
  • Updates continuously

Tokens Per Second

12.5 tokens/sec

  • Translation speed metric
  • Higher = faster
  • Indicates system performance

Estimated Time

~25s remaining

  • Prediction based on progress
  • Updates as translation continues
  • Helps you plan

Cancel Button

  • Stop translation anytime
  • Useful if output is clearly wrong
  • Frees resources immediately

Streaming vs All-At-Once

AspectStreaming (Local AI)All-At-Once (DeepL)
FeedbackImmediateDelayed
ProgressVisibleHidden
CancelAnytimeBefore completion only
ExperienceInteractiveTraditional
Speed MetricReal-time t/sNo metric
UXModern, engagingClassic, simple

Technical Details

Intelligent Chunking

For longer texts, MAITO:

  1. Intelligently splits at paragraph/sentence boundaries
  2. Translates each chunk
  3. Streams each chunk’s translation
  4. Assembles final result seamlessly

You see streaming for each chunk with overall progress. This chunking is completely transparent - there’s no practical text length limit.

Token Definition

Token = Basic unit of text

  • Often a word or word part
  • “Hello” = 1 token
  • “Translation” might be 1-2 tokens
  • Roughly 4 characters = 1 token

Why Tokens Matter:

  • AI models think in tokens
  • Speed measured in tokens/second
  • Character limits converted to token limits

Streaming Protocol

Technically, MAITO uses:

  • IAsyncEnumerable in C#
  • Yields translation chunks asynchronously
  • UI updates on each chunk
  • Smooth, responsive experience

Benefits for Different Users

Casual Users

  • Engaging: Fun to watch translation appear
  • Reassuring: Know it’s working
  • Informative: See if speed is acceptable

Professional Users

  • Efficient: Cancel bad translations early
  • Informative: Performance metrics help diagnose issues
  • Productive: Can start reading while translation continues

Technical Users

  • Diagnostic: Tokens/sec reveals system performance
  • Benchmarking: Easy to compare speeds
  • Transparent: Understand what’s happening

Reading While Streaming

For long texts, you can start reading the beginning while the end is still being translated. This makes even slower systems feel more responsive!

When Streaming Isn’t Used

Streaming only applies to Local AI translation. DeepL doesn’t stream because:

  • Cloud round-trip latency makes streaming impractical
  • DeepL’s API returns complete translations
  • Network buffering would make streaming jerky

Recently Viewed