About
Goal Find out how many http requests per second the LLM(Large Language Model) service can handle. Background Large Language Models (LLMs) have become an integral part of modern applications. Unlike traditional databases, LLMs are known for slower response times. While building a solution on top of the LLM service, we need to know how many requests per second the service can handle. So then we can design the system and plan the capacity/resources accordingly to the expected load. Disclaimer It is important to note that the tests conducted are not typical performance tests used to evaluate the behavior of models. Instead, they focus on the service's ability to handle concurrent requests. Tooling The load testing was performed using the following tools: Apache Bench: A benchmarking tool for web servers, used to simulate multiple con
Builders