Benchmark Model Performance

MAITO includes built-in benchmarking tools to measure translation speed and help you choose the best model for your hardware.

What Is Benchmarking?

Benchmarking tests how fast a model translates on your specific hardware by:

Translating sample text
Measuring tokens per second (t/s)
Calculating estimated translation time
Providing performance rating

Benchmark Duration: 30 seconds per model

Understanding Performance Metrics

Tokens Per Second (t/s)

The primary performance metric:

Higher is better
1 token ≈ 4 characters ≈ 3/4 of a word
10 t/s ≈ 7-8 words/second

Performance Ratings

MAITO assigns ratings based on t/s:

Rating	Tokens/Sec	Translation Speed	User Experience
Excellent	20+ t/s	Near-instant	Smooth, responsive
Good	10-20 t/s	Very fast	Minor wait, feels quick
Fair	5-10 t/s	Moderate	Noticeable delay, usable
Poor	2-5 t/s	Slow	Significant wait
Very Slow	<2 t/s	Very slow	Use for short texts only

What's Good Performance?

10+ t/s is considered good performance for most users. At this speed, a 200-word paragraph translates in about 20-30 seconds, which feels responsive for everyday use.

Running a Benchmark

Via Onboarding

During initial setup with auto-configuration:

MAITO automatically benchmarks the downloaded model
Results appear after model download
Shows conclusion, no technical details
You can proceed to finish setup

Via Settings Page

To benchmark any downloaded model:

Open Settings → Translation tab
Select Local On-Device Translation
Scroll to Installed Models section
Find the model you want to benchmark
Click Run Quick Benchmark button
Wait ~30 seconds for results
Review performance metrics

Benchmark All Models

If you’ve downloaded multiple models, benchmark each one to find which performs best on your hardware. Results can vary significantly!

Benchmark Process

What Happens During Benchmark

Model Initialization (2-3 seconds)
- Loads model into memory
- Initializes translation service
Sample Translation (30 seconds)
- Translates standardized test text
- Measures throughput
- Counts tokens generated
Results Calculation
- Computes tokens/second
- Assigns performance rating
- Estimates real-world translation times
Display Results
- Shows rating with color coding
- Displays exact t/s measurement
- Provides context for the numbers

Canceling Benchmark

You can cancel anytime:

Click Cancel button
Benchmark stops immediately
No results recorded
Model remains available for use

Interpreting Results

Example Results

Excellent Performance:

Rating: Excellent (23.5 tokens/sec)
Estimated: 200 words in ~15 seconds

Good Performance:

Rating: Good (12.8 tokens/sec)
Estimated: 200 words in ~25 seconds

Fair Performance:

Rating: Fair (6.3 tokens/sec)
Estimated: 200 words in ~50 seconds

Poor Performance:

Rating: Poor (3.1 tokens/sec)
Estimated: 200 words in ~2 minutes

Color Coding

The conclusion is color-coded for quick assessment:

Green: Excellent/Good (>10 t/s)
Orange: Fair (5-10 t/s)
Red: Poor/Very Slow (<5 t/s)

Benchmark results

Comparing Models

Side-by-Side Comparison

To compare multiple models:

Download both models you want to compare
Benchmark the first model → note results
Benchmark the second model → note results
Compare t/s measurements

Example Comparison:

Rosetta 4B CPU: 8.5 t/s (Good)
Rosetta 4B DML: 15.2 t/s (Good)
Winner: DML version is 79% faster on this GPU system

Factors Affecting Performance

Different models perform differently due to:

Model Optimization: CPU vs GPU optimized
Model Size: Larger models usually slower
Quantization: INT4 models faster than FP16
Hardware Match: Right model for your hardware

Hardware Matters

The “best” model depends on YOUR hardware. A GPU-optimized model may be slower than a CPU model if you don’t have a compatible GPU!

Real-World Performance

Benchmark vs Real Usage

Benchmark results indicate real-world performance:

If Benchmark Shows 10 t/s:

100-word text: ~10-15 seconds
200-word text: ~20-30 seconds
500-word text: ~50-75 seconds
1000-word text: ~2-2.5 minutes

Variables That Affect Real Speed:

Text complexity (technical terms slower)
Language pair (some pairs faster than others)
System load (other apps running)
Memory pressure (low RAM slows down)

Streaming Translation

During actual translation, you’ll see:

Live tokens/second counter
Progress percentage
Estimated time remaining
Text appearing gradually

Performance should match benchmark results within ~10-20%.

Optimizing Performance

If Performance Is Poor (<5 t/s)

Try CPU-Optimized Model:
- Download Rosetta 4B CPU
- Often faster on older systems
Close Background Apps:
- Free up CPU and RAM
- Disable browser, close unused programs
Check System Load:
- Open Task Manager
- Verify MAITO has resources available
Restart Computer:
- Clears memory leaks
- Frees system resources
Consider Hardware Upgrade:
- Add more RAM (8GB minimum recommended)
- Upgrade CPU for better performance
Use DeepL Instead:
- If local AI is too slow
- DeepL works on any hardware

Performance Expectations

On a typical mid-range laptop (i5/i7 CPU, 8GB RAM), expect 8-12 t/s with Rosetta CPU models. High-end desktops with GPUs can achieve 20+ t/s with DML models.

If Performance Is Good (10+ t/s)

Congratulations! Your system runs local translation well:

Local AI is viable for everyday use
Translation feels responsive
You can translate longer texts comfortably
Consider downloading additional models to compare

Troubleshooting Benchmark

Benchmark Stuck or Frozen

Solutions:

Wait 60 seconds (model initialization can take time)
Cancel and retry
Restart MAITO
Check Task Manager for hung processes

Benchmark Shows 0.0 t/s

Cause: Benchmark failed or was canceled.

Solutions:

Ensure sufficient RAM available (close other apps)
Verify model downloaded completely
Try different model
Check error logs: %APPDATA%\Pangaia Software\MAITO\logs

Benchmark Much Slower Than Expected

Possible Causes:

Other applications consuming resources
System thermal throttling (laptop overheating)
Wrong model for your hardware (GPU model on CPU-only system)
Insufficient RAM (system swapping to disk)

Solutions:

Close all other applications
Ensure laptop is plugged in (not on battery)
Let system cool down if hot
Try CPU-optimized model instead
Add more RAM if constantly swapping

When to Benchmark

Recommended Benchmark Times

Always Benchmark:

After downloading a new model
After system hardware upgrade
After Windows updates
When performance feels slower than before

Optional Benchmark:

Periodically to track performance over time
When comparing models
When troubleshooting speed issues

No Need to Benchmark:

Repeatedly for the same model/hardware
Before every translation
Daily or weekly (results don’t change unless hardware changes)

What’s Next?

After benchmarking:

Download More Models - Try alternatives
Why Is Local Slow - Understand performance factors
Performance Optimization - Advanced tips
Model Selection Guide - Detailed comparisons

Need Help?

For benchmark issues:

Check Troubleshooting Guide
Review Device Assessment
Contact support: contact@pangaia.software

What Is Benchmarking?

Understanding Performance Metrics

Tokens Per Second (t/s)

Performance Ratings

Running a Benchmark

Via Onboarding

Via Settings Page

Benchmark Process

What Happens During Benchmark

Canceling Benchmark

Interpreting Results

Example Results

Color Coding

Comparing Models

Side-by-Side Comparison

Factors Affecting Performance

Real-World Performance

Benchmark vs Real Usage

Streaming Translation

Optimizing Performance

If Performance Is Poor (<5 t/s)

If Performance Is Good (10+ t/s)

Troubleshooting Benchmark

Benchmark Stuck or Frozen

Benchmark Shows 0.0 t/s

Benchmark Much Slower Than Expected

When to Benchmark

Recommended Benchmark Times

What’s Next?

Need Help?

Was this article helpful?

Thank you for your feedback!

You've already provided feedback

Recently Viewed

What Is Benchmarking?

Understanding Performance Metrics

Tokens Per Second (t/s)

Performance Ratings

Running a Benchmark

Via Onboarding

Via Settings Page

Benchmark Process

What Happens During Benchmark

Canceling Benchmark

Interpreting Results

Example Results

Color Coding

Comparing Models

Side-by-Side Comparison

Factors Affecting Performance

Real-World Performance

Benchmark vs Real Usage

Streaming Translation

Optimizing Performance

If Performance Is Poor (<5 t/s)

If Performance Is Good (10+ t/s)

Troubleshooting Benchmark

Benchmark Stuck or Frozen

Benchmark Shows 0.0 t/s

Benchmark Much Slower Than Expected

When to Benchmark

Recommended Benchmark Times

What’s Next?

Need Help?

Was this article helpful?

Thank you for your feedback!

You've already provided feedback

📚 Related Articles

Why Is Local Translation Slow on My Device?

Download and Manage Translation Models

Performance Optimization: Maximizing Translation Speed

Recently Viewed