Software Supply Chains Achilles Heel.

The pains of token leaks

Mar 03, 2023

What is the most common and problematic attack vector that haunts software engineers?

Simple old Token Leaks.

Nothing sexy eh, just someone stealing a thing and using embarrassingly simple techniques to do so (we get into how, later on).

So what are tokens? You can think of them as passwords for computers.

API tokens have become an essential part of modern software development, especially when it comes to building applications that need to interact with other services. In simple terms, an API token is a unique identifier that is generated by a service provider, which allows developers to authenticate themselves and access specific resources or perform certain programmatic actions. They are hard for humans to memorize due to their length and no discernable relation to human words, save a small identifier often placed at the start of the string.

For example a GitHub Personal Access Token:

ghp_a9J5h2o2g2yZ0pOlBfXz8iGb9UJgOr09p9Ik

API tokens are typically used in social media platforms, public services (weather reports), and many types of common cloud services. They are usually required for accessing protected resources or performing actions that require authorization.

The process of obtaining an API token may vary depending on the service, but it typically involves registering as a developer, creating an API key, and configuring access permissions. Once you have the API token, you can include it in your application's requests to authenticate and authorize the API calls.

There are different types of API tokens, including OAuth tokens, JSON Web Tokens (JWTs), and Simple Authentication and Security Layer (SASL) tokens. OAuth tokens are widely used for third-party authentication and authorization, while JWTs are becoming increasingly popular for web applications due to their compact size and ease of use and ability to carry additional meta-data. SASL tokens, on the other hand, are typically used for email and instant messaging services.

If leaked and obtained pre-expiry, a hacker can access all the resources allocated to that resource.

Let’s take a look at some recent examples:

Twitter Hack (2020) - In July 2020, hackers gained access to Twitter's internal systems using stolen employee credentials and API keys. The hackers used these keys to access high-profile Twitter accounts, including those belonging to Barack Obama, Elon Musk, and Bill Gates, and posted tweets asking for Bitcoin donations. The hackers were able to steal over $100,000 worth of Bitcoin before the hack was stopped.
Facebook/Cambridge Analytica Scandal (2018) - In 2018, it was revealed that the political consulting firm Cambridge Analytica had obtained access to the personal data of millions of Facebook users without their consent. This was made possible through a leak of a Facebook access token, which allowed Cambridge Analytica to harvest data from users' friends without their permission.
Equifax Data Breach (2017) - In 2017, Equifax, one of the largest credit reporting agencies in the world, suffered a massive data breach that exposed the personal information of over 143 million customers. The breach was caused by a vulnerability in Equifax's web application, which allowed hackers to steal sensitive data by exploiting an access token.
Uber Data Breach (2016) - In 2016, Uber suffered a data breach that exposed the personal information of over 57 million users and drivers. The breach was caused by a leak of an AWS access token, which allowed hackers to access Uber's servers and steal sensitive data.
GitHub Data Breach (2022) - In July 2021, GitHub suffered a data breach that exposed the personal information of millions of users. The breach was caused by the leaks of GitHub access tokens by cloud provider Heroku and CI service Travis.

So what protections are in place?

Tokens are typically communicated over the HTTP protocol to an API service using the HEADER. A common parameter to use is Authorization, as outlined in RFC6750

GET /resource HTTP/1.1
Host: server.example.com
Authorization: Bearer eyJhbGciOiJIUzI1NiIXVCJ9TJV...r7E20RMHrHDcEfxjoYZgeFONFh7HgQ

This is an acceptable approach, as long as the transport protocol is HTTPS, to prevent the token from being sniffed on a clear text HTTP connection.

Another common method again to use the HTTP Header with a Set-Cookie value declaring httpOnly: true , Secure: true.

httpOnly despite its confusing name, does not mean only use HTTP, it is a value which enforces that a cookie cannot be accessed through client side script (such as javascript). The cookie and its values are stored within a protected space, unlike browser stores such as Local or Session Storage. Even if a cross-site scripting (XSS) flaw exists, and a user accidentally accesses a link that exploits this flaw, the browser will not reveal the cookie to a third party. secure is an attribute that enforces the browser to use HTTPS.

Most often code will tell the browser to include the authentication in the final request. For example, with the commonly used javascript HTTP client library axios, simply setting withCredentials: true will result in the browser delivering the token on your behalf.

await axios.get('https://example.com/api/v1/auth/verify', {
    withCredentials: true
});

API tokens can be either permanent or temporary. Permanent tokens remain valid until they are explicitly revoked or deactivated, while temporary tokens have an expiration date and are automatically invalidated after a specific time. This makes temporary tokens a more secure option for applications that require a high level of security. Generally the more short lived, the better.

So what is the issue then?

Well, it’s a human problem.

From looking at most token leaks, a human is typically at fault. Two of the most common approaches are plain old phishing or hard coding tokens within programming code.

You can have all the protections in the world, including MFA, but if a human is spoofed into granting access or posting them publically, it’s game over. And by Humans we are not talking about the non-technical, we are talking software engineers.

Here is a classic example where Circle-CI had to inform users not to fall for an instance of phishing. Emails were sent out on mass to circle-ci users telling them an action was required in relation to a change in privacy policy:

Looks fairly innocent on the face of it, however clicking on that link grants an OAuth2 token to hackers, which in turn meant they could access private resources, such as a companies code base.

This was not helped by circle-ci not having DMARC enabled, thereby allowing attackers to spoof the origin of email addresses as being circleci.com (something they have since fixed).

The other issue is hardcoded tokens.

Tokens (and secrets in general) are hardcoded into programming code so that the code can access a service of some sort. This happens from someone developing prototype code (where they mistakenly push the code to a repository before sanitizing, or when the developer is plainly unaware of the security risks. For a round up on how widespread this can be, checkout GitGuardians State of Secrets Sprawl report

Here is a common example using nodes fetch HTTP client.

fetch('/api/data', {
    headers: {
        // Hard-coded authentication token
        'Authorization': 'Bearer 1234567890abcdefg'
    }
})

How to address this issue?

First use an environment variable, set within your local machine

const apiKey = process.env.API_KEY;

fetch('/api/data', {
    headers: {
        'Authorization': `Bearer ${apiKey}`
    }
})

If your code / application is ran somewhere else, use a secrets protection feature that GitHub and others provide. They have the ability to set environment variables in secure silos not exposed to any external systems.

Another option is to limit the connection elements, such as the originating IP address (or subnet). This way access control is not wide open to anyone, but to do this, you first need the provider to offer that feature.

Ultimately leaked tokens are often difficult to detect, so it’s worthwhile instilling good habits. There are likely thousands of compromises that have never been detected. In many cases, companies do not even know that their tokens have been compromised until it is too late.

What is the real solution to this quandary? We don’t know yet. I have been noodling on perhaps something like ZT proofs could be used. Thereby the whole token does not need to be revealed.

Have any more ideas? I would love to hear them.

decodebytes

Software Supply Chains Achilles Heel.

The pains of token leaks