Production Deployment
Using async-inspect in production environments requires careful configuration to minimize overhead while maintaining useful observability.
Production Configuration
Recommended Settings
use async_inspect::{Inspector, Config};
#[tokio::main]
async fn main() {
let inspector = Inspector::new(Config {
// Sample 1% of tasks to reduce overhead
sampling_rate: 0.01,
// Limit memory usage
max_tasks: 10_000,
max_events: 100_000,
// Disable expensive features
capture_backtraces: false,
track_allocations: false,
// Production mode optimizations
mode: async_inspect::Mode::Production,
..Default::default()
});
// Your application code
}
Configuration Modes
async-inspect provides three operational modes:
Development Mode (Default)
Config {
mode: Mode::Development,
sampling_rate: 1.0, // Track all tasks
capture_backtraces: true,
track_allocations: true,
}
- Use when: Local development, debugging
- Overhead: High (5-15%)
- Features: All enabled
Production Mode
Config {
mode: Mode::Production,
sampling_rate: 0.01, // Track 1% of tasks
capture_backtraces: false,
track_allocations: false,
}
- Use when: Live production traffic
- Overhead: Low
<1% - Features: Basic metrics only
Analysis Mode
Config {
mode: Mode::Analysis,
sampling_rate: 0.1, // Track 10% of tasks
capture_backtraces: true,
track_allocations: false,
}
- Use when: Performance investigation
- Overhead: Medium (2-5%)
- Features: Detailed profiling
Sampling Strategies
Fixed Rate Sampling
Track a fixed percentage of tasks:
Config {
sampling_rate: 0.01, // 1%
..Default::default()
}
Adaptive Sampling
Automatically adjust sampling based on load:
use async_inspect::sampling::AdaptiveSampler;
let sampler = AdaptiveSampler::new()
.min_rate(0.001) // 0.1% minimum
.max_rate(0.1) // 10% maximum
.target_overhead(0.02); // 2% overhead target
Config {
sampler: Box::new(sampler),
..Default::default()
}
Custom Sampling
Implement custom sampling logic:
use async_inspect::sampling::Sampler;
struct CustomSampler;
impl Sampler for CustomSampler {
fn should_sample(&self, task_name: &str) -> bool {
// Sample all API endpoints
if task_name.contains("api::") {
return true;
}
// Sample 1% of background tasks
task_name.contains("background::") && rand::random::<f64>() < 0.01
}
}
Memory Management
Setting Limits
Prevent unbounded memory growth:
Config {
max_tasks: 10_000, // Keep last 10k tasks
max_events: 100_000, // Keep last 100k events
// Auto-cleanup old data
cleanup_interval: Duration::from_secs(60),
task_retention: Duration::from_secs(300), // 5 minutes
..Default::default()
}
Memory Monitoring
Track memory usage:
let stats = inspector.memory_stats();
println!("Tasks: {} / {}", stats.task_count, stats.max_tasks);
println!("Events: {} / {}", stats.event_count, stats.max_events);
println!("Memory: {} MB", stats.total_bytes / 1_048_576);
// Alert if approaching limits
if stats.task_count as f64 / stats.max_tasks as f64 > 0.9 {
eprintln!("WARNING: Approaching task limit");
}
Performance Overhead
Benchmarks
Typical overhead by mode:
| Mode | Sampling | Overhead | Use Case |
|---|---|---|---|
| Development | 100% | 5-15% | Local dev |
| Analysis | 10% | 2-5% | Debugging production |
| Production | 1% | <1% | Always-on monitoring |
Reducing Overhead
-
Disable expensive features:
Config {
capture_backtraces: false, // Expensive
track_allocations: false, // Very expensive
..Default::default()
} -
Use conditional compilation:
#[cfg_attr(feature = "inspect", async_inspect::trace)]
async fn my_function() {
// Only traced when "inspect" feature enabled
} -
Selective tracing:
// Only trace critical paths
#[async_inspect::trace]
async fn handle_request() { }
// Skip tracing for hot loops
async fn process_batch() {
// Not traced
}
Integration with Monitoring Systems
Prometheus Export
Expose metrics for Prometheus scraping:
use async_inspect::integrations::PrometheusExporter;
let exporter = PrometheusExporter::new(inspector.clone());
exporter.start_server("0.0.0.0:9090").await?;
Metrics exposed:
async_inspect_tasks_total- Total tasks createdasync_inspect_tasks_by_state- Tasks by state (running, blocked, completed)async_inspect_task_duration_seconds- Task duration histogramasync_inspect_events_total- Total eventsasync_inspect_deadlocks_detected- Deadlock count
OpenTelemetry Export
Send traces to OpenTelemetry collector:
use async_inspect::integrations::OtelExporter;
let exporter = OtelExporter::new(
inspector.clone(),
"http://localhost:4317", // OTLP endpoint
)?;
exporter.start().await?;
Custom Export
Export to custom backends:
use async_inspect::export::JsonExporter;
use std::time::Duration;
// Export every 60 seconds
tokio::spawn(async move {
let mut interval = tokio::time::interval(Duration::from_secs(60));
loop {
interval.tick().await;
let json = JsonExporter::new(&inspector).export();
// Send to your backend
send_to_backend(&json).await;
}
});
Health Checks
Readiness Probe
Check if async-inspect is healthy:
async fn health_check(inspector: Arc<Inspector>) -> Result<(), String> {
let stats = inspector.memory_stats();
// Check memory limits
if stats.task_count >= stats.max_tasks {
return Err("Task limit reached".to_string());
}
// Check for deadlocks
if inspector.deadlocks().len() > 0 {
return Err("Deadlocks detected".to_string());
}
Ok(())
}
Liveness Probe
Ensure async-inspect is responding:
async fn liveness_check(inspector: Arc<Inspector>) -> bool {
// Inspector should always be responsive
inspector.task_count() >= 0
}
Best Practices
1. Start with Low Sampling
Begin with 1% sampling and increase if needed:
// Production
Config { sampling_rate: 0.01, ..Default::default() }
// Investigation
Config { sampling_rate: 0.1, ..Default::default() }
// Critical issue
Config { sampling_rate: 1.0, ..Default::default() }
2. Use Feature Flags
Enable async-inspect only when needed:
# Cargo.toml
[dependencies]
async-inspect = { version = "0.1", optional = true }
[features]
inspect = ["async-inspect"]
#[cfg_attr(feature = "inspect", async_inspect::trace)]
async fn handler() { }
3. Monitor Performance Impact
Track overhead in production:
let start = Instant::now();
// Your code
let duration = start.elapsed();
metrics::histogram!("request_duration", duration);
4. Set Up Alerts
Alert on anomalies:
# Prometheus alerts
groups:
- name: async_inspect
rules:
- alert: HighTaskCount
expr: async_inspect_tasks_by_state{state="running"} > 1000
for: 5m
- alert: DeadlockDetected
expr: async_inspect_deadlocks_detected > 0
- alert: SlowTasks
expr: |
histogram_quantile(0.99,
async_inspect_task_duration_seconds_bucket
) > 10
5. Rotate Export Files
Prevent disk space issues:
use async_inspect::export::CsvExporter;
// Export with timestamp
let filename = format!(
"async-inspect-{}.csv",
chrono::Utc::now().format("%Y%m%d-%H%M%S")
);
CsvExporter::new(&inspector)
.export_to_file(&filename)?;
// Clean up old files
cleanup_old_exports("async-inspect-*.csv", 7)?; // Keep 7 days
Environment Variables
Configure via environment:
# Sampling rate
export ASYNC_INSPECT_SAMPLING=0.01
# Memory limits
export ASYNC_INSPECT_MAX_TASKS=10000
export ASYNC_INSPECT_MAX_EVENTS=100000
# Export endpoint
export ASYNC_INSPECT_OTLP_ENDPOINT=http://localhost:4317
# Enable/disable
export ASYNC_INSPECT_ENABLED=true
let config = Config::from_env();
let inspector = Inspector::new(config);
Docker Deployment
Dockerfile
FROM rust:1.70 as builder
WORKDIR /app
COPY . .
RUN cargo build --release --features inspect
FROM debian:bookworm-slim
COPY --from=builder /app/target/release/myapp /usr/local/bin/
# Expose Prometheus metrics
EXPOSE 9090
ENV ASYNC_INSPECT_SAMPLING=0.01
ENV ASYNC_INSPECT_ENABLED=true
CMD ["myapp"]
Kubernetes
apiVersion: v1
kind: ConfigMap
metadata:
name: async-inspect-config
data:
sampling-rate: "0.01"
max-tasks: "10000"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
- name: app
image: myapp:latest
env:
- name: ASYNC_INSPECT_SAMPLING
valueFrom:
configMapKeyRef:
name: async-inspect-config
key: sampling-rate
ports:
- containerPort: 9090
name: metrics
Security Considerations
1. Sensitive Data
Avoid logging sensitive information:
#[async_inspect::trace(skip_args)]
async fn process_payment(card_number: &str) {
// card_number not logged
}
2. Access Control
Restrict metrics endpoint:
// Require authentication
async fn metrics_handler(auth: Auth) -> Result<String, Error> {
auth.require_admin()?;
Ok(PrometheusExporter::render_metrics())
}
3. Rate Limiting
Prevent abuse of export endpoints:
use tower::limit::RateLimit;
let metrics_service = RateLimit::new(
metrics_handler,
Rate::new(10, Duration::from_secs(60)), // 10 req/min
);
Troubleshooting Production Issues
See Troubleshooting Guide for common production issues and solutions.