Benchmark Model Performance

Learn how to test and compare translation model performance in MAITO to find the fastest option for your system

Pangaia Software 6 min read

MAITO includes built-in benchmarking tools to measure translation speed and help you choose the best model for your hardware.

What Is Benchmarking?

Benchmarking tests how fast a model translates on your specific hardware by:

  • Translating sample text
  • Measuring tokens per second (t/s)
  • Calculating estimated translation time
  • Providing performance rating

Benchmark Duration: 30 seconds per model

Understanding Performance Metrics

Tokens Per Second (t/s)

The primary performance metric:

  • Higher is better
  • 1 token ≈ 4 characters ≈ 3/4 of a word
  • 10 t/s ≈ 7-8 words/second

Performance Ratings

MAITO assigns ratings based on t/s:

RatingTokens/SecTranslation SpeedUser Experience
Excellent20+ t/sNear-instantSmooth, responsive
Good10-20 t/sVery fastMinor wait, feels quick
Fair5-10 t/sModerateNoticeable delay, usable
Poor2-5 t/sSlowSignificant wait
Very Slow<2 t/sVery slowUse for short texts only

What's Good Performance?

10+ t/s is considered good performance for most users. At this speed, a 200-word paragraph translates in about 20-30 seconds, which feels responsive for everyday use.

Running a Benchmark

Via Onboarding

During initial setup with auto-configuration:

  1. MAITO automatically benchmarks the downloaded model
  2. Results appear after model download
  3. Shows conclusion, no technical details
  4. You can proceed to finish setup

Via Settings Page

To benchmark any downloaded model:

  1. Open SettingsTranslation tab
  2. Select Local On-Device Translation
  3. Scroll to Installed Models section
  4. Find the model you want to benchmark
  5. Click Run Quick Benchmark button
  6. Wait ~30 seconds for results
  7. Review performance metrics

Benchmark All Models

If you’ve downloaded multiple models, benchmark each one to find which performs best on your hardware. Results can vary significantly!

Benchmark Process

What Happens During Benchmark

  1. Model Initialization (2-3 seconds)

    • Loads model into memory
    • Initializes translation service
  2. Sample Translation (30 seconds)

    • Translates standardized test text
    • Measures throughput
    • Counts tokens generated
  3. Results Calculation

    • Computes tokens/second
    • Assigns performance rating
    • Estimates real-world translation times
  4. Display Results

    • Shows rating with color coding
    • Displays exact t/s measurement
    • Provides context for the numbers

Canceling Benchmark

You can cancel anytime:

  • Click Cancel button
  • Benchmark stops immediately
  • No results recorded
  • Model remains available for use

Interpreting Results

Example Results

Excellent Performance:

Rating: Excellent (23.5 tokens/sec)
Estimated: 200 words in ~15 seconds

Good Performance:

Rating: Good (12.8 tokens/sec)
Estimated: 200 words in ~25 seconds

Fair Performance:

Rating: Fair (6.3 tokens/sec)
Estimated: 200 words in ~50 seconds

Poor Performance:

Rating: Poor (3.1 tokens/sec)
Estimated: 200 words in ~2 minutes

Color Coding

The conclusion is color-coded for quick assessment:

  • Green: Excellent/Good (>10 t/s)
  • Orange: Fair (5-10 t/s)
  • Red: Poor/Very Slow (<5 t/s)

Benchmark results

Comparing Models

Side-by-Side Comparison

To compare multiple models:

  1. Download both models you want to compare
  2. Benchmark the first model → note results
  3. Benchmark the second model → note results
  4. Compare t/s measurements

Example Comparison:

  • Rosetta 4B CPU: 8.5 t/s (Good)
  • Rosetta 4B DML: 15.2 t/s (Good)
  • Winner: DML version is 79% faster on this GPU system

Factors Affecting Performance

Different models perform differently due to:

  • Model Optimization: CPU vs GPU optimized
  • Model Size: Larger models usually slower
  • Quantization: INT4 models faster than FP16
  • Hardware Match: Right model for your hardware

Hardware Matters

The “best” model depends on YOUR hardware. A GPU-optimized model may be slower than a CPU model if you don’t have a compatible GPU!

Real-World Performance

Benchmark vs Real Usage

Benchmark results indicate real-world performance:

If Benchmark Shows 10 t/s:

  • 100-word text: ~10-15 seconds
  • 200-word text: ~20-30 seconds
  • 500-word text: ~50-75 seconds
  • 1000-word text: ~2-2.5 minutes

Variables That Affect Real Speed:

  • Text complexity (technical terms slower)
  • Language pair (some pairs faster than others)
  • System load (other apps running)
  • Memory pressure (low RAM slows down)

Streaming Translation

During actual translation, you’ll see:

  • Live tokens/second counter
  • Progress percentage
  • Estimated time remaining
  • Text appearing gradually

Performance should match benchmark results within ~10-20%.

Optimizing Performance

If Performance Is Poor (<5 t/s)

  1. Try CPU-Optimized Model:

    • Download Rosetta 4B CPU
    • Often faster on older systems
  2. Close Background Apps:

    • Free up CPU and RAM
    • Disable browser, close unused programs
  3. Check System Load:

    • Open Task Manager
    • Verify MAITO has resources available
  4. Restart Computer:

    • Clears memory leaks
    • Frees system resources
  5. Consider Hardware Upgrade:

    • Add more RAM (8GB minimum recommended)
    • Upgrade CPU for better performance
  6. Use DeepL Instead:

    • If local AI is too slow
    • DeepL works on any hardware

Performance Expectations

On a typical mid-range laptop (i5/i7 CPU, 8GB RAM), expect 8-12 t/s with Rosetta CPU models. High-end desktops with GPUs can achieve 20+ t/s with DML models.

If Performance Is Good (10+ t/s)

Congratulations! Your system runs local translation well:

  • Local AI is viable for everyday use
  • Translation feels responsive
  • You can translate longer texts comfortably
  • Consider downloading additional models to compare

Troubleshooting Benchmark

Benchmark Stuck or Frozen

Solutions:

  1. Wait 60 seconds (model initialization can take time)
  2. Cancel and retry
  3. Restart MAITO
  4. Check Task Manager for hung processes

Benchmark Shows 0.0 t/s

Cause: Benchmark failed or was canceled.

Solutions:

  1. Ensure sufficient RAM available (close other apps)
  2. Verify model downloaded completely
  3. Try different model
  4. Check error logs: %APPDATA%\Pangaia Software\MAITO\logs

Benchmark Much Slower Than Expected

Possible Causes:

  • Other applications consuming resources
  • System thermal throttling (laptop overheating)
  • Wrong model for your hardware (GPU model on CPU-only system)
  • Insufficient RAM (system swapping to disk)

Solutions:

  1. Close all other applications
  2. Ensure laptop is plugged in (not on battery)
  3. Let system cool down if hot
  4. Try CPU-optimized model instead
  5. Add more RAM if constantly swapping

When to Benchmark

Always Benchmark:

  • After downloading a new model
  • After system hardware upgrade
  • After Windows updates
  • When performance feels slower than before

Optional Benchmark:

  • Periodically to track performance over time
  • When comparing models
  • When troubleshooting speed issues

No Need to Benchmark:

  • Repeatedly for the same model/hardware
  • Before every translation
  • Daily or weekly (results don’t change unless hardware changes)

What’s Next?

After benchmarking:

Need Help?

For benchmark issues:

Recently Viewed