📊 Summary Statistics
Total Models
Tool Support
Response Handling
Reliability
Fastest Model
Average Response Time
Model Performance History (Last 30 Days)
Performance Trends
🔍 Models with Tool Support
| Model | Tool Support | Response Handling | Call Time | Response Time | Total Time | Details |
|---|---|---|---|---|---|---|
| ibm/granite-3-8b-instruct | ✅ Reliable (50/50) | ⚠️ Partial (48/50) | 0.95s | 1.14s | 2.08s | Consistent success across all iterations |
| ibm/granite-4-h-small | ✅ Reliable (50/50) | ⚠️ Partial (49/50) | 0.47s | 0.37s | 0.85s | Consistent success across all iterations |
| meta-llama/llama-3-2-11b-vision-instruct | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.48s | 0.30s | 0.78s | Consistent success across all iterations |
| meta-llama/llama-3-2-90b-vision-instruct | ✅ Reliable (50/50) | ⚠️ Partial (40/50) | 1.31s | 0.70s | 2.01s | Inconsistent results across 50 iterations |
| meta-llama/llama-3-3-70b-instruct | ✅ Reliable (50/50) | ✅ Correct (50/50) | 1.17s | 1.18s | 2.35s | Consistent success across all iterations |
| meta-llama/llama-4-maverick-17b-128e-instruct-fp8 | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.44s | 0.36s | 0.81s | Consistent success across all iterations |
| mistral-large-2512 | ✅ Reliable (50/50) | ❌ Never Handles (0/50) | 0.32s | 0.10s | 0.42s | Inconsistent results across 50 iterations |
| mistralai/mistral-medium-2505 | ⚠️ Unreliable (49/50) | ⚠️ Partial (48/49) | 0.55s | 0.33s | 0.87s | Consistent success across all iterations |
| mistralai/mistral-small-3-1-24b-instruct-2503 | ✅ Reliable (50/50) | ⚠️ Partial (45/50) | 0.49s | 0.31s | 0.79s | Consistent success across all iterations |
| openai/gpt-oss-120b | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.52s | 0.50s | 1.02s | Consistent success across all iterations |
📋 Latest Test Results
Results from the most recent test execution (2026-04-04)
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
❌ Models Without Tool Support (16 models)
These models do not support tool calling and are listed here for reference.