📊 Summary Statistics
Total Models
Tool Support
Response Handling
Reliability
Fastest Model
Average Response Time
Model Performance History (Last 30 Days)
Performance Trends
🔍 Models with Tool Support
| Model | Tool Support | Response Handling | Call Time | Response Time | Total Time | Details |
|---|---|---|---|---|---|---|
| ibm/granite-3-8b-instruct | ✅ Reliable (50/50) | ⚠️ Partial (48/50) | 35.94s | 34.78s | 70.73s | Consistent success across all iterations |
| ibm/granite-4-h-small | ✅ Reliable (50/50) | ⚠️ Partial (47/50) | 0.49s | 0.36s | 0.85s | Consistent success across all iterations |
| meta-llama/llama-3-2-11b-vision-instruct | ⚠️ Unreliable (45/50) | ✅ Correct (45/45) | 0.57s | 0.32s | 0.89s | Consistent success across all iterations |
| meta-llama/llama-3-2-90b-vision-instruct | ✅ Reliable (50/50) | ⚠️ Partial (47/50) | 1.20s | 0.58s | 1.77s | Consistent success across all iterations |
| meta-llama/llama-3-3-70b-instruct | ✅ Reliable (50/50) | ✅ Correct (50/50) | 1.21s | 1.06s | 2.27s | Consistent success across all iterations |
| meta-llama/llama-4-maverick-17b-128e-instruct-fp8 | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.46s | 0.35s | 0.81s | Consistent success across all iterations |
| mistral-large-2512 | ✅ Reliable (50/50) | ❌ Never Handles (0/50) | 0.32s | 0.08s | 0.40s | Inconsistent results across 50 iterations |
| mistralai/mistral-medium-2505 | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.58s | 0.36s | 0.93s | Consistent success across all iterations |
| mistralai/mistral-small-3-1-24b-instruct-2503 | ⚠️ Unreliable (49/50) | ⚠️ Partial (47/49) | 0.48s | 0.31s | 0.79s | Consistent success across all iterations |
| openai/gpt-oss-120b | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.68s | 0.75s | 1.43s | Consistent success across all iterations |
📋 Latest Test Results
Results from the most recent test execution (2026-05-19)
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
❌ Models Without Tool Support (16 models)
These models do not support tool calling and are listed here for reference.