Dogfooding our own rate-limited API

JavascriptRest

Javascript Problem Overview


Overview:

My company has developed a rate-limited API. Our goal is twofold:

  • A: Create a strong developer ecosystem around our product.
  • B: Demonstrate the power of our API by using it to drive our own application.

Clarification: Why rate-limit at all?

We rate limit our API, because we sell it as an addition to our product. Anonymous access to our API has a very low threshold for API calls per hour, whereas our paid customers are permitted upwards of 1000 calls per hour or more.

The Problem:

Our rate-limited API is great for the developer eco-system, but in order for us to dogfood it we can't allow it to be restricted to the same rate-limiting. The front end of our API is all JavaScript, making direct Ajax calls to the API.

So the question is:

> How do you secure an api so that rate-limiting can be removed where in the process in removing such rate-limiting can't be easily spoofed?

Explored Solutions (and why they didn't work)

  1. Verify the referrer against the host header. -- Flawed because the referrer is easily faked.

  2. Use an HMAC to create a signature based off the request and a shared secret, then verify the request on the server. -- Flawed because the secret and algorithm would be easily determined by looking into the front end JavaScript.

  3. Proxy the request and sign the request in the proxy -- Still flawed, as the proxy itself exposes the API.

The Question:

I am looking to the brilliant minds on Stack Overflow to present alternate solutions. How would you solve this problem?

Javascript Solutions


Solution 1 - Javascript

Since your own JavaScript client is accessing the API directly, anyone's going to be able to look at what it's doing and mimic it, including use the same API key. You can try to make it more difficult, like by obfuscating your code or putting various hurdles in the way, but you and the person you're trying to restrain have fundamentally the same access. Instead of trying to create a difference in privileges, you'll need to construct a system where it's totally OK that the unofficial client uses all the access in its scope, but the system is arranged in such a way that official use across all clients is greater.

This is often done with per-user access tokens, as opposed to one token for the entire application. Each token's limit should be plenty for typical use of your API, but restrictive for someone trying to abuse it. For example, 100 calls per minute might be more than enough to support typical browsing, but if I want to scrape you, I can't do it effectively on that budget.

There will always be an arms race - I can get around the limit by creating lots of bot user accounts. That, though, is a pretty solved problem if you just add a captcha to your signup flow, at a tiny bit of expense to the real human. When you get into these scenarios, everything's just a tradeoff between convenience and restriction. You'll never find something totally bulletproof, so focus on making it good enough and wait until someone exploits you to learn where the holes were.

Solution 2 - Javascript

If this is causing you a problem, it will cause your putative ecosystem of developers a problem (e.g. when they try to develop an alternative UI). If you are really eating your own dog food, make the API (and the rate limiting) work for your application. Here are some suggestions:

  • Do not rate limit by IP address. Rather, rate limit by something associated with the user, e.g. their user ID. Apply the rate limit at the authentication stage.

  • Design your API so that users do not need to call it continuously (e.g. give a list call that returns many results, rather than a repeated call that returns one item each time)

  • Design your web app with the same constraints you expect your developer ecosystem to have, i.e. ensure you can design it within reasonable throttling rates.

  • Ensure your back end is scalable (horizontally preferably) so you don't need to impose throttling at levels so low it actually causes a problem to a UI.

  • Ensure your throttling has the ability to cope with bursts, as well as limiting longer term abuse.

  • Ensure your throttling performs sensible actions tailored to the abuse you are seeking to remove. For instance, consider queuing or delaying mild abusers rather than refusing the connection. Most web front ends will only open four simultaneous connections at once. If you delay an attempt to open a fifth you'll only hit the case where they are using a CLI at the same time as the web client (ot two web clients). If you delay the n-th API call without a gap rather than failing it, the end user will see things slow down rather than break. If you combine this with only queuing N API calls at once, you will only hit people who are parallelising large numbers of API calls, which is probably not the behaviour you want - e.g. 100 simultaneous API calls then a gap for an hour is normally far worse than 100 sequential API calls over an hour.

Did this not answer your question? Well, if you really need to do what you are asking, rate-limit at the authentication stage and apply a different rate limit based on the group your user fits into. If you are using one set of credentials (used by your devs and QA team), you get a higher rate limit. But you can immediately see why this will inevitably lead you to your ecosystem seeing issues that your dev and QA team do not see.

Solution 3 - Javascript

Buy your product. Become a paid customer of yourself.

"Anonymous access to our API has a very low threshold for API calls per hour, whereas our paid customers are permitted upwards of 1000 calls per hour or more."

This also helps test the system from a customer's perspective.

Solution 4 - Javascript

Unfortunately, there is no perfect solution to this.

The general approach is typically to provide a spoofable way for clients to identify themselves (e.g. an identifier, version, and API key -- for example), for clients to register information about themselves that can be used to limit access (e.g. the client is a server in a given IP address range, so only allow callers in that range; e.g. the client is JavaScript, but delivered only to a specific category of browser, so only allow access to HTTP requests that specify certain user agent strings; etc.), and then to use machine learning/pattern recognition to detect anomalous usage that is likely a spoofed client and then to reject traffic from these spoofed clients (or confirm with clients that these usages are indeed not coming from the legitimate client, replace their spoofable credentials, and then disallow further traffic using the older spoofed credentials).

You can make it slightly more difficult to spoof by using multiple layers of key. For example, you give out a longer-lived credential that lives on a server (and that can only be used in a limited set of IP address ranges) to make an API call that records information about the client (e.g. the user agent) and returns a shorter-lived client-side key that is syndicated in JavaScript for use on the client for client-side API requests. This, too, is imperfect (a spoofer could issue the same server call to get the credential), but it will be more difficult if the returned API key is included in obfuscated (and frequently changing) JavaScript or HTML (which would make it difficult to reliably extract from the response). That also provides a way to more easily detect spoofing; the client-side key is now tied to a particular client (e.g. specific user agent, perhaps even a specific cookie jar) that makes reuse in another client easy to detect and the expiration also limits the duration in which the spoofed key may be reused.

Solution 5 - Javascript

Assuming the app in question must be publicly open, you don’t have much choice:

Pick another way to demonstrate the power of your API. For example, write such an app and share its source, but don’t actually run that code. Make sure it’s well-documented though, so that anyone can deploy it and see it working (subject to throttling).

The app you run would need to be refactored to avoid client-side API requests and be more server-rendered. You can still dogfood your API, but not in an obvious way—make secure requests to throttle-free API from the server side.

Adjust rate limitation to allow for your app to work and invest into performance optimization to handle the load.

And yeah, have the core API throttle-free in the first place, and keep it inside a private network. Throttle in a separate publicly accessible layer.

Solution 6 - Javascript

Can you stand up a separate instance of the UI and throttle-free API, and then restrict access to IP addresses coming from your organisation?

E.g., deploy the whole thing behind your corporate firewall, and attach the application to the same database as the public-facing instance if you need to share data between instances.

Solution 7 - Javascript

You could try to generate a unique session ID, bound to a certain IP address/user and limited time to live. When a user downloads your application frontend JavaScript code inject the generated session ID into the JavaScript source code. The session ID will be attached to every request to your API and the rate-limit is lifted.

The ID cannot be simply copied for spoofing, because it is only valid for a single IP address, user and limited amount of time. So an adversary would have to call your page and filter out the key from your JavaScript source or from intercepting the Ajax request every time a new user wants to use it.

Another Option:

Set up a proxy for your own application and use obfuscation. The Ajax requests to the proxy use different names from the real API-calls and the proxy translates them back. So your application would not call getDocument on your real API, but it will call getFELSUFDSKJE on your proxy. The proxy will translate this call back to getDocument and forward it to actual rate-limited API.

Your actual API will not rate-limit requests by the proxy.

And so that other people don't use your proxy for their own application you change the obfuscation scheme daily. The obfuscated call-names can be generated automatically in you JavaScript source code and configured in the proxy.

A client wishing to use this, would also need to keep up with your changing obfuscation to use your proxy. And you can still use referrer-headers and similar for logging, so you can find people using your proxy. Or catch them when changing the obfuscation scheme.

Solution 8 - Javascript

  • Whitelist source IP addresses

  • Use a VPN, whitelist VPN members

  • Proxy solution or browser addon that adds HTTP headers should be fine if you can secure the proxy and aren't concerned about MITM attacks sniffing the traffic

  • Any solution involving secrets can mitigate the impact of leaks by rotating secrets on a daily basis

Solution 9 - Javascript

Setup multiple accounts, and pick one of them at random at every request, or change which one you use every hour or so. This way you can distribute the load over n accounts, giving you up to n times higher limits.

Be careful about accidentally shutting yourself down if you are trying to find other users doing this, if it's not allowed for customers.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJason WaldripView Question on Stackoverflow
Solution 1 - JavascriptKristjánView Answer on Stackoverflow
Solution 2 - JavascriptablighView Answer on Stackoverflow
Solution 3 - JavascriptjkdevView Answer on Stackoverflow
Solution 4 - JavascriptMichael Aaron SafyanView Answer on Stackoverflow
Solution 5 - JavascriptAnton StrogonoffView Answer on Stackoverflow
Solution 6 - Javascriptuser41871View Answer on Stackoverflow
Solution 7 - JavascriptFalcoView Answer on Stackoverflow
Solution 8 - Javascriptthe8472View Answer on Stackoverflow
Solution 9 - JavascriptFilip HaglundView Answer on Stackoverflow