📊 Summary Statistics
Total Models
Tool Support
Response Handling
Reliability
Fastest Model
Average Response Time
Model Performance History (Last 30 Days)
Performance Trends
🔍 Models with Tool Support
Model | Tool Support | Response Handling | Call Time | Response Time | Total Time | Details |
---|---|---|---|---|---|---|
ibm/granite-3-2-8b-instruct | ✅ Reliable (50/50) | ⚠️ Partial (49/50) | 0.46s | 0.26s | 0.72s | Inconsistent results across 50 iterations |
ibm/granite-3-2b-instruct | ⚠️ Unreliable (49/50) | ⚠️ Partial (47/49) | 0.49s | 0.34s | 0.82s | Inconsistent results across 50 iterations |
ibm/granite-3-3-8b-instruct | ⚠️ Unreliable (25/50) | ✅ Correct (25/25) | 0.55s | 0.28s | 0.83s | Inconsistent results across 50 iterations |
ibm/granite-3-8b-instruct | ✅ Reliable (50/50) | ⚠️ Partial (49/50) | 0.54s | 0.47s | 1.01s | Inconsistent results across 50 iterations |
meta-llama/llama-3-2-11b-vision-instruct | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.46s | 0.27s | 0.73s | Consistent success across all iterations |
meta-llama/llama-3-2-3b-instruct | ✅ Reliable (50/50) | ⚠️ Partial (28/50) | 0.45s | 0.96s | 1.41s | Inconsistent results across 50 iterations |
meta-llama/llama-3-2-90b-vision-instruct | ✅ Reliable (50/50) | ✅ Correct (50/50) | 1.10s | 0.69s | 1.79s | Consistent success across all iterations |
meta-llama/llama-3-3-70b-instruct | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.89s | 0.88s | 1.77s | Consistent success across all iterations |
meta-llama/llama-3-405b-instruct | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.91s | 0.37s | 1.28s | Consistent success across all iterations |
meta-llama/llama-4-maverick-17b-128e-instruct-fp8 | ✅ Reliable (50/50) | ✅ Correct (50/50) | 0.46s | 0.35s | 0.81s | Consistent success across all iterations |
mistralai/mistral-large | ⚠️ Unreliable (46/50) | ⚠️ Partial (45/46) | 0.78s | 0.40s | 1.17s | Inconsistent results across 50 iterations |
mistralai/mistral-medium-2505 | ✅ Reliable (50/50) | ⚠️ Partial (48/50) | 0.65s | 0.37s | 1.02s | Inconsistent results across 50 iterations |
mistralai/mistral-small-3-1-24b-instruct-2503 | ✅ Reliable (50/50) | ⚠️ Partial (46/50) | 0.46s | 0.30s | 0.76s | Inconsistent results across 50 iterations |
📋 Latest Test Results
Results from the most recent test execution (2025-08-02)
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Consistent success across all iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Inconsistent results across 50 iterations
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
Test Details
Details: Failed probe iterations (0/5 successes)
❌ Models Without Tool Support (25 models)
These models do not support tool calling and are listed here for reference.