Load balancing OpenAI API calls with LiteLLM

If you have ever created an application that makes calls to Azure OpenAI models, you know there are limits to the amount of calls you can make per minute. Take a look at the settings of a GPT model below:

GPT deployment settings

Above, the tokens per minute (TPM)…

Learn More
Share:

You may be interested in

What you're searching for?

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors