📊 Summary Statistics
Total Models
Tool Support
Response Handling
Reliability
Fastest Model
Average Response Time
Model Performance History (Last 30 Days)
Performance Trends
🔍 Models with Tool Support
Model | Tool Support | Response Handling | Call Time | Response Time | Total Time | Details |
---|---|---|---|---|---|---|
ibm/granite-3-2-8b-instruct | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.54s | 0.26s | 0.80s | Consistent success across all iterations |
ibm/granite-3-2b-instruct | ⚠️ Unreliable (49/50) | ✅ Correct (49/49) | 0.45s | 0.30s | 0.75s | Consistent success across all iterations |
ibm/granite-3-3-8b-instruct | ⚠️ Unreliable (22/50) | ⚠️ Partial (21/22) | 0.41s | 0.10s | 0.51s | Inconsistent results across 50 iterations |
ibm/granite-3-8b-instruct | ✅ Reliable (50/50) | ⚠️ Partial (47/50) | 0.43s | 0.52s | 0.95s | Consistent success across all iterations |
ibm/granite-4-h-small | ✅ Reliable (50/50) | ⚠️ Partial (49/50) | 0.81s | 0.53s | 1.34s | Consistent success across all iterations |
meta-llama/llama-3-2-11b-vision-instruct | ⚠️ Unreliable (48/50) | ✅ Correct (48/48) | 0.44s | 0.26s | 0.70s | Consistent success across all iterations |
meta-llama/llama-3-2-90b-vision-instruct | ✅ Reliable (50/50) | ⚠️ Partial (38/50) | 1.04s | 0.52s | 1.56s | Inconsistent results across 50 iterations |
meta-llama/llama-3-3-70b-instruct | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.73s | 0.92s | 1.65s | Consistent success across all iterations |
meta-llama/llama-3-405b-instruct | ⚠️ Unreliable (49/50) | ✅ Correct (49/49) | 0.90s | 0.35s | 1.25s | Consistent success across all iterations |
meta-llama/llama-4-maverick-17b-128e-instruct-fp8 | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.54s | 0.61s | 1.15s | Consistent success across all iterations |
mistralai/mistral-medium-2505 | ✅ Reliable (50/50) | ✅ Correct (50/50) | 1.21s | 0.33s | 1.54s | Consistent success across all iterations |
mistralai/mistral-small-3-1-24b-instruct-2503 | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.45s | 0.27s | 0.72s | Consistent success across all iterations |
openai/gpt-oss-120b | ⚠️ Unreliable (43/50) | ✅ Correct (43/43) | 1.41s | 0.88s | 2.30s | Inconsistent results across 50 iterations |
📋 Latest Test Results
Results from the most recent test execution (2025-10-22)
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
❌ Models Without Tool Support (20 models)
These models do not support tool calling and are listed here for reference.