The fast and the curious load testing Llama 3.1 with vLLM

Nov 5, 2024

Throughput Scaling of Llama 3.1 8B Under Various Quantization Methods on an NVIDIA A6000

Throughput Scaling of Llama 3.1 70B Under Various Quantization Methods on 4 x NVIDIA A6000