Why use this? Triton Inference Server has many tuning knobs — instance counts, dynamic batching, batch sizes, framework-specific accelerators — and finding the right combination manually is tedious.