3 Dangerous Mistakes to Avoid: Enqueued Jobs
As a developer, you will inevitably come across a situation where you need to enqueue work asynchronously. While it is a quintessential tool, there are a few incredibly easy-to-fall-into traps — here are 3 traps that you will hopefully never fall into after reading this!
Disclaimer: I am a Django developer, so my examples might be more geared towards Python/Django use-cases.
1) Enqueueing Inside an Atomic Block
This is just one of many ways to shoot yourself in the foot with an atomic block 🙃
Let’s say you are an eCommerce site and a customer is placing an order. You want to send an email to the “seller” telling them that someone purchased an item they are selling, so you enqueue a piece a code to do so.
However, if you’ve worked on an eCommerce site before, you’ll know that the “Place Order” step is not as simple as it seems. There are usually multiple steps that we want to happen atomically, so if any of the operations fail, everything else should get rolled back too. An example of this is if a credit card payment fails, we want to roll back the order creation.
Here lies the danger of enqueueing in an atomic block — if we enqueue a job inside an atomic block, but a later portion of the atomic block fails, our enqueued job will stay in the queue.
Here, the stakes are not that high, and the worst case is just sending an erroneous email to a seller. You might be able to see how this becomes very problematic in other situations though, especially when money or sensitive data gets involved.
Solution
Simply move the enqueue outside of the atomic block, once it has successfully completed!
2) Enqueueing Sensitive Data
This one is not an immediately obvious mistake, but it is a serious security issue.
If you have worked with third-party libraries before, then something like this should be a familiar sight:
import third_party_libapi_client = third_party_lib.create_client()
api_client.api_key = get_env_variable("STRIPE_API_SECRET_KEY")
Third-party libraries will often create some form of “client”, an object that you attach credentials to and use to make calls to their API. Let’s say we want to do something asynchronously with this client.
Using django-rq
syntax as an example, here is a sample of something that might seem reasonable, but is definitely not recommended.
api_client = third_party_lib.create_client()
api_client.api_key = get_env_variable("STRIPE_API_SECRET_KEY")queue = django_rq.get_queue(queue_name)
queue.enqueue(api_client.do_something, args, kwargs)
To answer why this is bad to do, we have to know what happens when you enqueue a job. Looking at thedjango-rq
source code can provide some insight.
There is more to it, but what we mainly care about are the first two conditions. When we pass a “method” (ie. belongs to an instance of a class), we will assign job._instance
to retain the object.
Now, remember what we just assigned to that “object” (ie. our api_client
)? Our very secret, sensitive API key…oops 😬. This job (with theapi_client
object and the secret key in tow), will get written into some datastore like Redis, usually in an unencrypted form!
Solution
Make it a habit to only enqueue class methods, which will be enqueued as just a string of the path to the function name (eg. “app.payments.utils.payment_utils.PaymentUtils.some_class_method”).
Then, read and assign the API key inside the class method, so you’re reading the secure data straight from your environment variables, never letting it get written into an insecure data store.
3) Not Considering Race Conditions
Let’s say we are Netflix and have a recurring job that checks for any users with upcoming subscription renewals, meaning we need to charge them and renew their subscription for another month.
In a large system, it is likely that we would enqueue this job quite frequently and even across multiple workers.
We need to be careful about two enqueued jobs trying to pick up the same task simultaneously — in this case, trying to renew a subscription for the same customer at the same time, resulting in duplicate charges to the customer.
Solution
There are a few ways around this, one being to use a shared cache “lock”. If there is a shared cache that all workers can read/write from, we can put the database PKs of objects being updated into the shared cache, so that other workers can be aware of what is in progress or not already.
Hopefully, you are now aware of these various pitfalls when enqueueing async jobs. I have tried to provide somewhat realistic examples, but this is of course just a high-level article, and your specific situation may vary!
I’d love to hear your feedback in the comments :) Thanks for reading!