I think
https://groq.com/ is a good example of latency trade off, but they trade off at the cost of number of requests a minute (though they're updating that soon).
As far as metrics, it's usually processed in tokens / s - and smaller models are much faster. However, their accuracy comparatively is significantly reduced.
Pretty much what you'd expect 🙂