Skip to main content

Cluster Benchmarks

sloth-kubernetes includes comprehensive benchmarking tools to measure and analyze your cluster's performance across network, storage, CPU, and memory dimensions.

Overview

The benchmark system provides:

  • Multiple benchmark types: Network, storage, CPU, memory, or all combined
  • Reference comparisons: Results compared against Kubernetes best practices
  • Report saving: Save results to JSON for historical tracking
  • Report comparison: Compare benchmark results over time
  • Quick mode: Fast performance overview

Commands

All benchmark commands require the stack name as the first argument. The kubeconfig is automatically retrieved from the Pulumi stack.

benchmark run

Execute comprehensive benchmarks on your cluster.

# Run all benchmarks
sloth-kubernetes benchmark run my-cluster

# Run specific benchmark type
sloth-kubernetes benchmark run my-cluster --type network

# Run with verbose output
sloth-kubernetes benchmark run my-cluster --type storage --verbose

# Save results to JSON file
sloth-kubernetes benchmark run my-cluster --output json --save results.json

# Compare with previous results
sloth-kubernetes benchmark run my-cluster --compare previous-results.json

Flags:

FlagDescriptionDefault
--typeBenchmark type (network, storage, cpu, memory, all)all
--outputOutput format (text, json, compact)text
--verbose, -vShow verbose output with detailsfalse
--nodesSpecific nodes to benchmark (comma-separated)-
--saveSave results to file-
--compareCompare with previous results file-

benchmark quick

Run a quick set of essential benchmarks for a fast performance overview.

sloth-kubernetes benchmark quick my-cluster

# Output as JSON
sloth-kubernetes benchmark quick my-cluster --output json

Flags:

FlagDescriptionDefault
--outputOutput format (text, json, compact)compact

Example output:

═══════════════════════════════════════════════════════════════
QUICK BENCHMARK
═══════════════════════════════════════════════════════════════

Overall Score: 85.2/100 Grade: B+

Category Scores:
Network: 88.0
Storage: 82.5
CPU: 86.3
Memory: 84.0

Key Metrics:
[OK] Pod-to-Pod Latency 0.45 ms
[OK] Storage IOPS 15420 ops
[OK] API Server Latency 12 ms

Top Recommendation: Consider increasing storage IOPS for better performance

benchmark report

Display a benchmark report from a previously saved JSON file.

sloth-kubernetes benchmark report results.json

benchmark compare

Compare two benchmark result files and show differences.

sloth-kubernetes benchmark compare baseline.json current.json

Flags:

FlagDescriptionDefault
--outputOutput format (text, json)text

Example output:

Benchmark Comparison
======================================================================
baseline.json vs current.json

Overall Score: 78.5 -> 85.2 (+6.7)
Grade: B -> B+

Category Comparison
--------------------------------------------------
Category baseline current Change
--------------------------------------------------
Network 75.0 88.0 +13.0
Storage 80.0 82.5 +2.5
CPU 78.0 86.3 +8.3
Memory 81.0 84.0 +3.0

Metric Changes
----------------------------------------------------------------------
Metric baseline current Change
----------------------------------------------------------------------
Pod-to-Pod Latency 0.65 ms 0.45 ms -30.8%
Storage IOPS 12500 ops 15420 ops +23.4%
API Server Latency 18 ms 12 ms -33.3%

Benchmark Types

Network Benchmarks

Measures network performance within the cluster.

sloth-kubernetes benchmark run my-cluster --type network

Metrics measured:

  • Pod-to-Pod Latency: Round-trip time between pods on different nodes
  • DNS Resolution: Time to resolve cluster DNS names
  • Throughput: Network bandwidth between pods
  • CNI Performance: Container Network Interface efficiency

Reference values:

  • Pod-to-Pod Latency: < 1ms (excellent), < 5ms (good), < 10ms (acceptable)
  • DNS Resolution: < 5ms (excellent), < 20ms (good), < 50ms (acceptable)

Storage Benchmarks

Measures storage I/O performance.

sloth-kubernetes benchmark run my-cluster --type storage

Metrics measured:

  • IOPS: Input/Output Operations Per Second (4K random read/write)
  • Sequential I/O: Large block sequential read/write throughput
  • Random I/O: Small block random read/write latency
  • PVC Provisioning: Time to provision a PersistentVolumeClaim

Reference values:

  • IOPS: > 10000 (excellent), > 5000 (good), > 1000 (acceptable)
  • Sequential throughput: > 500MB/s (excellent), > 200MB/s (good)

CPU Benchmarks

Measures compute performance and scheduling efficiency.

sloth-kubernetes benchmark run my-cluster --type cpu

Metrics measured:

  • CPU Efficiency: Processing power utilization
  • Scheduling Latency: Time for scheduler to place pods
  • API Server Response: Kubernetes API responsiveness

Reference values:

  • Scheduling Latency: < 100ms (excellent), < 500ms (good), < 1s (acceptable)
  • API Server Response: < 50ms (excellent), < 200ms (good), < 500ms (acceptable)

Memory Benchmarks

Measures memory performance and utilization.

sloth-kubernetes benchmark run my-cluster --type memory

Metrics measured:

  • Memory Utilization: Cluster-wide memory usage efficiency
  • Memory Bandwidth: Read/write throughput to memory
  • etcd Memory Usage: Memory consumed by etcd cluster

Reference values:

  • Memory Utilization: 60-80% (optimal), < 60% (underutilized), > 90% (pressure)

Output Formats

Text (Default)

Human-readable detailed report with color-coded status.

sloth-kubernetes benchmark run my-cluster --output text

JSON

Machine-readable format for automation and storage.

sloth-kubernetes benchmark run my-cluster --output json

Example JSON structure:

{
"timestamp": "2024-01-15T10:30:00Z",
"cluster_name": "production",
"results": [
{
"name": "Pod-to-Pod Latency",
"category": "network",
"score": 0.45,
"unit": "ms",
"status": "passed",
"reference": 1.0,
"details": "Measured across 3 node pairs"
}
],
"summary": {
"overall_score": 85.2,
"grade": "B+",
"network_score": 88.0,
"storage_score": 82.5,
"cpu_score": 86.3,
"memory_score": 84.0,
"recommendations": [
"Consider SSD storage for better IOPS"
]
}
}

Compact

Condensed summary view.

sloth-kubernetes benchmark run my-cluster --output compact

Examples

Baseline Performance Check

Establish a performance baseline for your cluster:

# Run full benchmark suite and save
sloth-kubernetes benchmark run my-cluster --save baseline-$(date +%Y%m%d).json

# View the saved report
sloth-kubernetes benchmark report baseline-20240115.json

Performance Monitoring

Regular performance monitoring workflow:

# Weekly performance check
sloth-kubernetes benchmark run my-cluster \
--save weekly-$(date +%Y%m%d).json \
--compare baseline.json

Node-Specific Benchmarks

Benchmark specific nodes to identify issues:

# Benchmark only worker nodes
sloth-kubernetes benchmark run my-cluster --nodes worker-1,worker-2,worker-3

# Benchmark a specific problematic node
sloth-kubernetes benchmark run my-cluster --nodes worker-5 --verbose

Pre-Production Validation

Validate cluster performance before production deployment:

# Run comprehensive benchmarks
sloth-kubernetes benchmark run my-cluster --verbose --output text

# Quick sanity check
sloth-kubernetes benchmark quick my-cluster

CI/CD Integration

Automated performance testing in pipelines:

#!/bin/bash
# benchmark-check.sh
STACK_NAME="production"

sloth-kubernetes benchmark run $STACK_NAME --output json --save results.json

# Extract overall score
SCORE=$(jq '.summary.overall_score' results.json)

# Fail if score below threshold
if (( $(echo "$SCORE < 70" | bc -l) )); then
echo "Performance below threshold: $SCORE"
exit 1
fi

echo "Performance check passed: $SCORE"

Scoring System

Overall Score

The overall score (0-100) is a weighted average of category scores:

  • Network: 25%
  • Storage: 25%
  • CPU: 25%
  • Memory: 25%

Grades

Score RangeGradeInterpretation
90-100AExcellent performance
80-89BGood performance
70-79CAcceptable performance
60-69DBelow average, improvements recommended
0-59FPoor performance, action required

Status Indicators

  • [OK] / PASSED: Metric within acceptable range
  • [WARN]: Metric approaching limits
  • [FAIL] / FAILED: Metric outside acceptable range

Troubleshooting

Benchmark Fails to Connect

If benchmarks fail to connect to the cluster:

# Verify stack and connectivity
sloth-kubernetes kubectl my-cluster get nodes

Low Network Scores

Common causes and solutions:

  • High latency: Check CNI configuration, consider network policy optimization
  • Low throughput: Verify MTU settings, check for bandwidth throttling
  • DNS issues: Review CoreDNS configuration and resources

Low Storage Scores

Common causes and solutions:

  • Low IOPS: Consider SSD storage, check for noisy neighbors
  • Slow provisioning: Review storage class settings, check CSI driver logs

Inconsistent Results

For more reliable results:

# Run multiple times and compare
for i in 1 2 3; do
sloth-kubernetes benchmark run my-cluster --save run-$i.json
done

# Compare runs
sloth-kubernetes benchmark compare run-1.json run-3.json