MAITO includes built-in benchmarking tools to measure translation speed and help you choose the best model for your hardware.
What Is Benchmarking?
Benchmarking tests how fast a model translates on your specific hardware by:
- Translating sample text
- Measuring tokens per second (t/s)
- Calculating estimated translation time
- Providing performance rating
Benchmark Duration: 30 seconds per model
Understanding Performance Metrics
Tokens Per Second (t/s)
The primary performance metric:
- Higher is better
- 1 token ≈ 4 characters ≈ 3/4 of a word
- 10 t/s ≈ 7-8 words/second
Performance Ratings
MAITO assigns ratings based on t/s:
| Rating | Tokens/Sec | Translation Speed | User Experience |
|---|---|---|---|
| Excellent | 20+ t/s | Near-instant | Smooth, responsive |
| Good | 10-20 t/s | Very fast | Minor wait, feels quick |
| Fair | 5-10 t/s | Moderate | Noticeable delay, usable |
| Poor | 2-5 t/s | Slow | Significant wait |
| Very Slow | <2 t/s | Very slow | Use for short texts only |
What's Good Performance?
10+ t/s is considered good performance for most users. At this speed, a 200-word paragraph translates in about 20-30 seconds, which feels responsive for everyday use.
Running a Benchmark
Via Onboarding
During initial setup with auto-configuration:
- MAITO automatically benchmarks the downloaded model
- Results appear after model download
- Shows conclusion, no technical details
- You can proceed to finish setup
Via Settings Page
To benchmark any downloaded model:
- Open Settings → Translation tab
- Select Local On-Device Translation
- Scroll to Installed Models section
- Find the model you want to benchmark
- Click Run Quick Benchmark button
- Wait ~30 seconds for results
- Review performance metrics
Benchmark All Models
If you’ve downloaded multiple models, benchmark each one to find which performs best on your hardware. Results can vary significantly!
Benchmark Process
What Happens During Benchmark
-
Model Initialization (2-3 seconds)
- Loads model into memory
- Initializes translation service
-
Sample Translation (30 seconds)
- Translates standardized test text
- Measures throughput
- Counts tokens generated
-
Results Calculation
- Computes tokens/second
- Assigns performance rating
- Estimates real-world translation times
-
Display Results
- Shows rating with color coding
- Displays exact t/s measurement
- Provides context for the numbers
Canceling Benchmark
You can cancel anytime:
- Click Cancel button
- Benchmark stops immediately
- No results recorded
- Model remains available for use
Interpreting Results
Example Results
Excellent Performance:
Rating: Excellent (23.5 tokens/sec)
Estimated: 200 words in ~15 seconds
Good Performance:
Rating: Good (12.8 tokens/sec)
Estimated: 200 words in ~25 seconds
Fair Performance:
Rating: Fair (6.3 tokens/sec)
Estimated: 200 words in ~50 seconds
Poor Performance:
Rating: Poor (3.1 tokens/sec)
Estimated: 200 words in ~2 minutes
Color Coding
The conclusion is color-coded for quick assessment:
- Green: Excellent/Good (>10 t/s)
- Orange: Fair (5-10 t/s)
- Red: Poor/Very Slow (<5 t/s)

Comparing Models
Side-by-Side Comparison
To compare multiple models:
- Download both models you want to compare
- Benchmark the first model → note results
- Benchmark the second model → note results
- Compare t/s measurements
Example Comparison:
- Rosetta 4B CPU: 8.5 t/s (Good)
- Rosetta 4B DML: 15.2 t/s (Good)
- Winner: DML version is 79% faster on this GPU system
Factors Affecting Performance
Different models perform differently due to:
- Model Optimization: CPU vs GPU optimized
- Model Size: Larger models usually slower
- Quantization: INT4 models faster than FP16
- Hardware Match: Right model for your hardware
Hardware Matters
The “best” model depends on YOUR hardware. A GPU-optimized model may be slower than a CPU model if you don’t have a compatible GPU!
Real-World Performance
Benchmark vs Real Usage
Benchmark results indicate real-world performance:
If Benchmark Shows 10 t/s:
- 100-word text: ~10-15 seconds
- 200-word text: ~20-30 seconds
- 500-word text: ~50-75 seconds
- 1000-word text: ~2-2.5 minutes
Variables That Affect Real Speed:
- Text complexity (technical terms slower)
- Language pair (some pairs faster than others)
- System load (other apps running)
- Memory pressure (low RAM slows down)
Streaming Translation
During actual translation, you’ll see:
- Live tokens/second counter
- Progress percentage
- Estimated time remaining
- Text appearing gradually
Performance should match benchmark results within ~10-20%.
Optimizing Performance
If Performance Is Poor (<5 t/s)
-
Try CPU-Optimized Model:
- Download Rosetta 4B CPU
- Often faster on older systems
-
Close Background Apps:
- Free up CPU and RAM
- Disable browser, close unused programs
-
Check System Load:
- Open Task Manager
- Verify MAITO has resources available
-
Restart Computer:
- Clears memory leaks
- Frees system resources
-
Consider Hardware Upgrade:
- Add more RAM (8GB minimum recommended)
- Upgrade CPU for better performance
-
Use DeepL Instead:
- If local AI is too slow
- DeepL works on any hardware
Performance Expectations
On a typical mid-range laptop (i5/i7 CPU, 8GB RAM), expect 8-12 t/s with Rosetta CPU models. High-end desktops with GPUs can achieve 20+ t/s with DML models.
If Performance Is Good (10+ t/s)
Congratulations! Your system runs local translation well:
- Local AI is viable for everyday use
- Translation feels responsive
- You can translate longer texts comfortably
- Consider downloading additional models to compare
Troubleshooting Benchmark
Benchmark Stuck or Frozen
Solutions:
- Wait 60 seconds (model initialization can take time)
- Cancel and retry
- Restart MAITO
- Check Task Manager for hung processes
Benchmark Shows 0.0 t/s
Cause: Benchmark failed or was canceled.
Solutions:
- Ensure sufficient RAM available (close other apps)
- Verify model downloaded completely
- Try different model
- Check error logs:
%APPDATA%\Pangaia Software\MAITO\logs
Benchmark Much Slower Than Expected
Possible Causes:
- Other applications consuming resources
- System thermal throttling (laptop overheating)
- Wrong model for your hardware (GPU model on CPU-only system)
- Insufficient RAM (system swapping to disk)
Solutions:
- Close all other applications
- Ensure laptop is plugged in (not on battery)
- Let system cool down if hot
- Try CPU-optimized model instead
- Add more RAM if constantly swapping
When to Benchmark
Recommended Benchmark Times
Always Benchmark:
- After downloading a new model
- After system hardware upgrade
- After Windows updates
- When performance feels slower than before
Optional Benchmark:
- Periodically to track performance over time
- When comparing models
- When troubleshooting speed issues
No Need to Benchmark:
- Repeatedly for the same model/hardware
- Before every translation
- Daily or weekly (results don’t change unless hardware changes)
What’s Next?
After benchmarking:
- Download More Models - Try alternatives
- Why Is Local Slow - Understand performance factors
- Performance Optimization - Advanced tips
- Model Selection Guide - Detailed comparisons
Need Help?
For benchmark issues:
- Check Troubleshooting Guide
- Review Device Assessment
- Contact support: contact@pangaia.software