

But even further you could go and limit certain methods or calls. If you look inside API quota in detail you can also imagine that you not only set a limit based on an overall client/consumer but also on a per consuming application level (e.g., per API Key), this is what we call application quota. This is important from a consumer side, but it’s also important for the provider to keep an eye on when the API is your product. Quite often you also find an SLA attached which defines the response times and availability of the service. Consumers come in, select a certain plan which has a quota attached. Usually, this is where API Management solutions help. To enforce an API quota, you need to identify the client or consumer, therefore the term user quota (aka organization quota) is used. Remember that this could be combined with a rate limit or throttling setup e.g. For example, your API quota might be 5.000 calls per month. API quotas usually describe a certain amount of calls for longer intervals. Looking more at a commercial aspect and long-term consumption of calls and data, API quotas are used a lot. From an implementation perspective, Leaky Bucket might be a known algorithm. Maybe your systems are able to consume the load. For example, you allow your client to send 20 TPS but they send 30 transactions which process very fast. It allows your client to send a certain amount of traffic more than usual. Sometimes clients cannot control the API calls that are emitted. Sometimes you want to enable a single client to send more than its actual limit because your system has the bandwidth or is idle. There is a risk of connections timing out and for sure you risk keeping connections longer which might open a vector for Denial of Service Attacks. This allows to keep the connection open and help with keeping errors down. This means the processing slows down but does not disconnect. If they are starting to send too many requests, their connection gets throttled. Often, multiple clients get an overall rate limit which they are allowed to send called an Application Rate Limiting. For example, your backend might be able to process 2000 TPS (Transaction per Second). Sometimes, systems might also have a physical limit of data that can be transferred in Bytes. This is often measured by TPS (Transaction per Second). If you start looking at and an end to end scenario, you first have an overall limit of calls your backend can process per time unit. So what are API throttling, API quota, API rate limiting, and API burst? Discover how APIs are the keys to unlocks modern IT and digital success.

Here is a sample response to delete virtual machine scale set request. There will be a separate x-ms-ratelimit-remaining-resource header for each policy. Note that an API request can be subjected to multiple throttling policies. If you start thinking about limiting access to your APIs, a lot of things come to mind. Batch requests, such as for scaling a virtual machine scale set, can charge multiple counts. In the world of APIs, nobody gives direct access to his resources because you never know how much your services are going to be used.
