Efficient handling of API Throttling in Python with Tenacity

Efficient client-side handling of API Throttling in Python with Tenacity

This post assumes the familiarity with Python Requests library.

Nowadays, APIs are everywhere around. There are a very practical and efficient way to retrieve a structured set of data.

But some day, your beloved API may return you an unexpected response with a 429 status code. As the large majority of the production APIs, yours may have a throttling policy and you surely exceeded your allocated quota. A requiem for a dream...

Beyond this point, there are three solutions.

The first one is the wallet one. If the provider of the API allows it, you can simply upgrade your plan to unlock a larger calls quota.
You can optimize your software to reduce the number of calls. Your needs may not require as many calls you've been making.
You can optimize your software to execute the maximum calls to the API that your current plan allows. This means you will encounter a quota exceed response, and your application will have to handle it without crashing. The python library Tenacity will help you to achieve this.

Discovering Tenacity

Tenacity is a python library, forked from the old Retrying library, that allows you to "retry" actions. When you have to call a function that may fail sometimes, like doing an API call at the limit of your quota, you simply wrap your call in a Retrying object from Tenacity.

# api.py
import os
from typing import List, Dict

import requests
from requests.models import Response

API_BASE_URL = os.environ.get("API_BASE_URL")


def retrieve_users_list(token) -> List[Dict]:
    response: Response = requests.get(
        f"{API_BASE_URL}/users/", 
        headers={"Authorization": f"Bearer {token}"}
    )
    response.raise_for_status()
    return response.json()

This is a typical piece of code using python's requests to call an API. If all went fine, the user's list will be return to the caller, if not (because of a quota exceeded ?) an HTTPError will be raised.

Let's see what tenacity can do for us.

from tenacity import Retrying

from .api import retrieve_users_list


users = Retrying().call(retrieve_users_list, token="my_token")

Here, the Retrying object will call retrieve_users_list for us and automatically retry the call if any exception is raised. This is pretty much an equivalent of a while loop, retrying the call until it succeeds.
This is a good start but now we have to design a retrying strategy to potentially avoid an infinite retrying loop.

A retrying strategy

To build an efficient retrying strategy, you have to ask the right questions:

When should I retry my call?
How much time should I wait between two calls?
When should I stop retrying?

Keeping in mind that we want to overcome an API Throttling policy, we want to retry our call only if it was throttled by the provider of the API. In the same logic, we want to wait the exact amount of time needed to restore our quota. Finally, we do not want to retry more than a fixed numbers of trials.

Tenacity has a lot of built-in features to configure its retrying behavior. The first one we will use is the retry one. With a simple subclass of the default options of tenacity, we can define a condition to decide if we should retry after a call.

from requests.status_codes import codes
from tenacity.retry import retry_base


def is_throttling_related_exception(e: Exception) -> bool:
    # check is the exception is a requests one, 
    # and if the status_code is a throttling related one.
    return (
        isinstance(e, HTTPError)
        and e.response.status_code == codes.too_many_requests
    )


class retry_if_throttling(retry_base):  
    def __call__(self, retry_state) -> bool:
        # if the call failed (raised an exception)
        if retry_state.outcome.failed:
            exception = retry_state.outcome.exception())
            return is_throttling_related_exception(exception)
            
            
users = Retrying(
    retry=retry_if_throttling()
).call(retrieve_users_list, token="my_token")

For each call to retrieve_users_list, the Retrying object will check if the condition retry_if_throttling is met and it will retry accordingly.
First we check if an exception has been raised. Then we check if the raised exception is a HTTPError and if its status_code is 429 (TOO_MANY_REQUESTS).

Now we have to write a condition to choose logically how much time we will wait between two calls. We could retry as soon as possible, but it consumes bandwidth and it's not very eco-friendly. Moreover, some providers can banish users that retry too often. Here we assume to deal with an API using headers indicating the reset time of our quota. We are going to read the specified time in the headers and determine the time to wait in seconds.

import arrow
from requests.models import Response
from tenacity.wait import wait_base


class wait_until_quota_restore(wait_base):
    @staticmethod
    def get_wait_time_from_response(response: Response) -> int:
        reset_time_str = response.headers["x-quota-resets-on"]
        reset_time = arrow.get(reset_time_str)
        wait_interval = reset_time - arrow.utcnow()
        return wait_interval.seconds
        

    def __call__(self, retry_state) -> int:
        if retry_state.outcome.failed:
            exception = retry_state.outcome.exception()
            
            if is_throttling_related_exception(exception):
                return self.get_wait_time_from_response(exception.response)
                
        # if this is an unknown exception, retry immediately
        return 0
    


users = Retrying(
    retry=retry_if_throttling(),
    wait=wait_until_quota_restore()
).call(retrieve_users_list, token="my_token")

We can improve this by randomly adding some seconds to the wait time to be sure that the call will succeed in case of time desynchronization between your server and the API's one.

from tenacity.wait import wait_random

users = Retrying(
    retry=retry_if_throttling(),
    wait=(
        wait_until_quota_restore() + wait_random(min=1, max=3)
    )
).call(retrieve_users_list, token="my_token")

Then we will add a stop condition to avoid retrying an infinite of time if the call keeps failing . Let's decide that we won't call the API more than 10 times.

from tenacity.stop import stop_after_attempt

users = Retrying(
    retry=retry_if_throttling(),
    stop=stop_after_attempt(max_attempt_number=10),
    wait=(
        wait_until_quota_restore() + wait_random(min=1, max=3)
    )
).call(retrieve_users_list, token="my_token")

You can as well decide to stop retrying after a certain amount of time, using the stop_after_delay hook.
Tenacity offers a lot of options to build the best strategy to fit your needs, and the specificities of the API you want to use.

Finally, let's create a python decorator to keep our code clean, especially if we use multiple endpoints from the same API.

# decorator.py
import arrow
from requests.models import Response
from requests.status_codes import codes
from tenacity.retry import retry_base
from tenacity.wait import wait_base


def is_throttling_related_exception(e: Exception) -> bool:
    # check is the exception is a requests one, 
    # and if the status_code is a throttling related one.
    return (
        isinstance(e, HTTPError)
        and e.response.status_code == codes.too_many_requests
    )


class retry_if_throttling(retry_base):  
    def __call__(self, retry_state) -> bool:
        # if the call failed (raised an exception)
        if retry_state.outcome.failed:
            exception = retry_state.outcome.exception())
            return is_throttling_related_exception(exception)
            
            
class wait_until_quota_restore(wait_base):
    @staticmethod
    def get_wait_time_from_response(response: Response) -> int:
        reset_time_str = response.headers["x-quota-resets-on"]
        reset_time = arrow.get(reset_time_str)
        wait_interval = reset_time - arrow.utcnow()
        return wait_interval.seconds
        
  
    def __call__(self, retry_state) -> int:
        if retry_state.outcome.failed:
            exception = retry_state.outcome.exception()
            
            if is_throttling_related_exception(exception):
                return self.get_wait_time_from_response(exception.response)
                
        # if this is an unknown exception, retry immediately
        return 0
      
      
def api_retry(func):
    def wrapper(*args, **kwargs):
        return Retrying(
            retry=retry_if_throttling(),
            stop=stop_after_attempt(max_attempt_number=10),
            wait=(
                wait_until_quota_restore() + wait_random(min=1, max=3)
            )
        ).call(func, *args, **kwargs)

    return wrapper



# api.py
from typing import List, Dict

import requests
from requests.models import Response

from .decorators import api_retry

@api_retry
def retrieve_users_list(token) -> List[Dict]:
    response: Response = requests.get(
        f"{API_BASE_URL}/users/", 
        headers={"Authorization": f"Bearer {token}"}
    )
    response.raise_for_status()
    return response.json()
    
    
@api_retry
def retrieve_groups_list(token) -> List[Dict]:
    pass

This example shows how to use some of the hooks provided by Tenacity. Feel free to explore the others one by having a look on the documentation, or build yours like we did with the wait_until_quota_restore wait hook.
You can even add log to the Retrying object using the before= and before_sleep= hooks !

Bonus: an alternate wait strategy

There are a lot of different APIs and some of them won't provide you the quota reset time in the response headers. Let's see how we can adapt our code to use the quota specified in the API documentation (most common).
We assume that retrieve_users_list allows 30 calls per minute and retrieve_groups_list allows 10 calls per 45 seconds.

# decorators.py

class wait_until_quota_restore(wait_base):
    def __init__(self, max_call_number: int, max_call_number_interval: int):
        self.max_call_number = max_call_number
        self.max_call_number_interval = max_call_number_interval       
  
    def __call__(self, retry_state) -> float or int:
        if retry_state.outcome.failed:
            exception = retry_state.outcome.exception()
            
            if is_throttling_related_exception(exception):
                return self.max_call_number_interval / self.max_call_number
                
        # if this is an unknown exception, retry immediately
        return 0
        
        
def api_retry(max_call_number: int, max_call_number_interval: int):
    """
    This endpoint allows `max_call_number` per `max_call_number_interval`.    
    """
    
    def decorator(func):
        def wrapper(*args, **kwargs):
            return Retrying(
                retry=retry_if_throttling(),
                stop=stop_after_attempt(max_attempt_number=10),
                wait=(
                    wait_until_quota_restore(max_call_number, max_call_number_interval) 
                    + wait_random(min=1, max=3)
                )
            ).call(func, *args, **kwargs)

        return wrapper

    return decorator
    
    
# api.py

@api_retry(max_call_number=30, max_call_number_interval=60)
def retrieve_users_list(token) -> List[Dict]:
    pass
    
    
@api_retry(max_call_number=10, max_call_number_interval=45)
def retrieve_groups_list(token) -> List[Dict]:
    pass

We rewrote our wait_until_quota_restore wait hook to determine the wait time before the next restore based on the restore rate provided by the API documentation. Then we surcharged our decorator as well to be able to define the restore rate per endpoint using the max_call_number and max_call_number_interval arguments.

Conclusion

Tenacity will help you to build code quickly to handle the throttling strategy chose by your API provider.
If you want to learn a bit more about the different rate limiting mechanism you can head over this article.