Efficient client-side handling of API Throttling in Python with Tenacity
This post assumes the familiarity with Python Requests library.
Nowadays, APIs are everywhere around. There are a very practical and efficient way to retrieve a structured set of data.
But some day, your beloved API may return you an unexpected response with a 429 status code. As the large majority of the production APIs, yours may have a throttling policy and you surely exceeded your allocated quota. A requiem for a dream...
Beyond this point, there are three solutions.
- The first one is the
wallet
one. If the provider of the API allows it, you can simply upgrade your plan to unlock a larger calls quota. - You can optimize your software to reduce the number of calls. Your needs may not require as many calls you've been making.
- You can optimize your software to execute the maximum calls to the API that your current plan allows. This means you will encounter a
quota exceed
response, and your application will have to handle it without crashing. The python libraryTenacity
will help you to achieve this.
Discovering Tenacity
Tenacity is a python library, forked from the old Retrying
library, that allows you to "retry" actions. When you have to call a function that may fail sometimes, like doing an API call at the limit of your quota, you simply wrap your call in a Retrying object from Tenacity.
# api.py
import os
from typing import List, Dict
import requests
from requests.models import Response
API_BASE_URL = os.environ.get("API_BASE_URL")
def retrieve_users_list(token) -> List[Dict]:
response: Response = requests.get(
f"{API_BASE_URL}/users/",
headers={"Authorization": f"Bearer {token}"}
)
response.raise_for_status()
return response.json()
This is a typical piece of code using python's requests
to call an API. If all went fine, the user's list will be return to the caller, if not (because of a quota exceeded ?) an HTTPError will be raised.
Let's see what tenacity
can do for us.
from tenacity import Retrying
from .api import retrieve_users_list
users = Retrying().call(retrieve_users_list, token="my_token")
Here, the Retrying
object will call retrieve_users_list
for us and automatically retry the call if any exception is raised. This is pretty much an equivalent of a while
loop, retrying the call until it succeeds.
This is a good start but now we have to design a retrying strategy to potentially avoid an infinite retrying loop.
A retrying strategy
To build an efficient retrying strategy, you have to ask the right questions:
- When should I retry my call?
- How much time should I wait between two calls?
- When should I stop retrying?
Keeping in mind that we want to overcome an API Throttling policy, we want to retry our call only if it was throttled by the provider of the API. In the same logic, we want to wait the exact amount of time needed to restore our quota. Finally, we do not want to retry more than a fixed numbers of trials.
Tenacity has a lot of built-in features to configure its retrying behavior. The first one we will use is the retry
one. With a simple subclass of the default options of tenacity, we can define a condition to decide if we should retry after a call.
from requests.status_codes import codes
from tenacity.retry import retry_base
def is_throttling_related_exception(e: Exception) -> bool:
# check is the exception is a requests one,
# and if the status_code is a throttling related one.
return (
isinstance(e, HTTPError)
and e.response.status_code == codes.too_many_requests
)
class retry_if_throttling(retry_base):
def __call__(self, retry_state) -> bool:
# if the call failed (raised an exception)
if retry_state.outcome.failed:
exception = retry_state.outcome.exception())
return is_throttling_related_exception(exception)
users = Retrying(
retry=retry_if_throttling()
).call(retrieve_users_list, token="my_token")
For each call to retrieve_users_list
, the Retrying
object will check if the condition retry_if_throttling
is met and it will retry accordingly.
First we check if an exception has been raised. Then we check if the raised exception is a HTTPError
and if its status_code
is 429 (TOO_MANY_REQUESTS).
Now we have to write a condition to choose logically how much time we will wait between two calls. We could retry as soon as possible, but it consumes bandwidth and it's not very eco-friendly. Moreover, some providers can banish users that retry too often. Here we assume to deal with an API using headers indicating the reset time of our quota. We are going to read the specified time in the headers and determine the time to wait in seconds.
import arrow
from requests.models import Response
from tenacity.wait import wait_base
class wait_until_quota_restore(wait_base):
@staticmethod
def get_wait_time_from_response(response: Response) -> int:
reset_time_str = response.headers["x-quota-resets-on"]
reset_time = arrow.get(reset_time_str)
wait_interval = reset_time - arrow.utcnow()
return wait_interval.seconds
def __call__(self, retry_state) -> int:
if retry_state.outcome.failed:
exception = retry_state.outcome.exception()
if is_throttling_related_exception(exception):
return self.get_wait_time_from_response(exception.response)
# if this is an unknown exception, retry immediately
return 0
users = Retrying(
retry=retry_if_throttling(),
wait=wait_until_quota_restore()
).call(retrieve_users_list, token="my_token")
We can improve this by randomly adding some seconds to the wait time to be sure that the call will succeed in case of time desynchronization between your server and the API's one.
from tenacity.wait import wait_random
users = Retrying(
retry=retry_if_throttling(),
wait=(
wait_until_quota_restore() + wait_random(min=1, max=3)
)
).call(retrieve_users_list, token="my_token")
Then we will add a stop condition to avoid retrying an infinite of time if the call keeps failing . Let's decide that we won't call the API more than 10 times.
from tenacity.stop import stop_after_attempt
users = Retrying(
retry=retry_if_throttling(),
stop=stop_after_attempt(max_attempt_number=10),
wait=(
wait_until_quota_restore() + wait_random(min=1, max=3)
)
).call(retrieve_users_list, token="my_token")
You can as well decide to stop retrying after a certain amount of time, using the stop_after_delay
hook.
Tenacity offers a lot of options to build the best strategy to fit your needs, and the specificities of the API you want to use.
Finally, let's create a python decorator to keep our code clean, especially if we use multiple endpoints from the same API.
# decorator.py
import arrow
from requests.models import Response
from requests.status_codes import codes
from tenacity.retry import retry_base
from tenacity.wait import wait_base
def is_throttling_related_exception(e: Exception) -> bool:
# check is the exception is a requests one,
# and if the status_code is a throttling related one.
return (
isinstance(e, HTTPError)
and e.response.status_code == codes.too_many_requests
)
class retry_if_throttling(retry_base):
def __call__(self, retry_state) -> bool:
# if the call failed (raised an exception)
if retry_state.outcome.failed:
exception = retry_state.outcome.exception())
return is_throttling_related_exception(exception)
class wait_until_quota_restore(wait_base):
@staticmethod
def get_wait_time_from_response(response: Response) -> int:
reset_time_str = response.headers["x-quota-resets-on"]
reset_time = arrow.get(reset_time_str)
wait_interval = reset_time - arrow.utcnow()
return wait_interval.seconds
def __call__(self, retry_state) -> int:
if retry_state.outcome.failed:
exception = retry_state.outcome.exception()
if is_throttling_related_exception(exception):
return self.get_wait_time_from_response(exception.response)
# if this is an unknown exception, retry immediately
return 0
def api_retry(func):
def wrapper(*args, **kwargs):
return Retrying(
retry=retry_if_throttling(),
stop=stop_after_attempt(max_attempt_number=10),
wait=(
wait_until_quota_restore() + wait_random(min=1, max=3)
)
).call(func, *args, **kwargs)
return wrapper
# api.py
from typing import List, Dict
import requests
from requests.models import Response
from .decorators import api_retry
@api_retry
def retrieve_users_list(token) -> List[Dict]:
response: Response = requests.get(
f"{API_BASE_URL}/users/",
headers={"Authorization": f"Bearer {token}"}
)
response.raise_for_status()
return response.json()
@api_retry
def retrieve_groups_list(token) -> List[Dict]:
pass
This example shows how to use some of the hooks provided by Tenacity. Feel free to explore the others one by having a look on the documentation, or build yours like we did with the wait_until_quota_restore
wait hook.
You can even add log to the Retrying object using the before=
and before_sleep=
hooks !
Bonus: an alternate wait strategy
There are a lot of different APIs and some of them won't provide you the quota reset time in the response headers. Let's see how we can adapt our code to use the quota specified in the API documentation (most common).
We assume that retrieve_users_list
allows 30 calls per minute and retrieve_groups_list
allows 10 calls per 45 seconds.
# decorators.py
class wait_until_quota_restore(wait_base):
def __init__(self, max_call_number: int, max_call_number_interval: int):
self.max_call_number = max_call_number
self.max_call_number_interval = max_call_number_interval
def __call__(self, retry_state) -> float or int:
if retry_state.outcome.failed:
exception = retry_state.outcome.exception()
if is_throttling_related_exception(exception):
return self.max_call_number_interval / self.max_call_number
# if this is an unknown exception, retry immediately
return 0
def api_retry(max_call_number: int, max_call_number_interval: int):
"""
This endpoint allows `max_call_number` per `max_call_number_interval`.
"""
def decorator(func):
def wrapper(*args, **kwargs):
return Retrying(
retry=retry_if_throttling(),
stop=stop_after_attempt(max_attempt_number=10),
wait=(
wait_until_quota_restore(max_call_number, max_call_number_interval)
+ wait_random(min=1, max=3)
)
).call(func, *args, **kwargs)
return wrapper
return decorator
# api.py
@api_retry(max_call_number=30, max_call_number_interval=60)
def retrieve_users_list(token) -> List[Dict]:
pass
@api_retry(max_call_number=10, max_call_number_interval=45)
def retrieve_groups_list(token) -> List[Dict]:
pass
We rewrote our wait_until_quota_restore
wait hook to determine the wait time before the next restore based on the restore rate provided by the API documentation. Then we surcharged our decorator as well to be able to define the restore rate per endpoint using the max_call_number
and max_call_number_interval
arguments.
Conclusion
Tenacity will help you to build code quickly to handle the throttling strategy chose by your API provider.
If you want to learn a bit more about the different rate limiting mechanism you can head over this article.
Thomas Berdy
More from Seelk
Know your audience on Amazon: The power of reviews
Thu Nov 19 2020
6 min read
Marjorie Borreda-Martinez
January 2020 inside Seelk Studio: A few updates...And an in-depth look at Competition & Share of Voice!
Tue Jan 28 2020
5 min read
Nathaniel Daudrich
Real world metaclass usages inside django
Wed Oct 09 2019
3 min read
Philip Garnero