The fast and the curious load testing Llama 3.1 with vLLM

Throughput Scaling of Llama 3.1 8B Under Various Quantization Methods on an NVIDIA A6000
Throughput Scaling of Llama 3.1 70B Under Various Quantization Methods on 4 x NVIDIA A6000