Serving ML at scale without burning the budget Jul 12, 2024 Draft notes on caching, batching, and autoscaling.