Advertisement Playback Proof Protocol (A3P)

Overview

The Advertisement Playback Proof Protocol (or A3P for short) is a simple to implement protocol with the goal of preventing user-side scripts, tools or other modifications (i.e. content or ad blockers, DNS sinkholes, "patched" applications) to block the display or playback of media such as advertisements, warnings or chaptered content in the form of videos or images. This is made possible by injecting special tokens in the media stream and requiring users (or rather clients) to calculate a hash of the entire media, then verify that to receive a playback token, signed by the server with a secret key only known by the server.

Background

Modern video or audio streaming websites like YouTube, Spotify and other similar platforms rely on ad revenue. Users have found various ways to challenge the integrity of ad playback. Ad powered sites have fought the circumvention of advertisement display and playback (ad viewing) for years. The traditional way they fight such circumventions is that they try to detect client side modifications in the client's code or try to load resources that content blockers, proxies or DNS filters usually block access to. A possible solution for this would be to implement a Digital Rights Management (DRM) solution, however that creates its own challenges, such as compatibility and cost.

Goals

Prevent abuse of client side playback of media
Verify that the user has actually streamed (or loaded) the entirety of a given advertisement
Prevent skipping segments or speeding up the playback speed in the case of video or audio
Minimize the overhead for legitimate users
Ability to work when the client is running in a computationally limited environment (i.e. older, slower hardware)
Support for current media streaming protocols
Simple implementation on both server and client side

Threat Model

Threat	Protection mechanism
User skips ahead	Time-locked token sequence
User uses adblocker/DNS sinkhole	Tokens hidden in media chunks, session-specific
Token replay/sharing	Session binding via signature
Fake playback scripts	Playback speed tracking and chunk-by-chunk validation

Components

0. Prerequisites

The user has to be uniquely identified by a session mechanism (for example, a session cookie, bearer token or something similar)
The advert or other similar media needs to have a unique identifier associated with it
The target media that should be accessible only after the playback of the advert needs to have a unique identifier associated with it

1. Playback Initialization

When the client initially requests access to a piece of media for playback or display, the server generates a signature. The signature should be bound to any application specific data that uniquely identifies the actual playback:

a unique identifier for the currently authenticated user,
the prerequisite advert's unique identifier,
the target media's unique identifier,
the number of verification tokens that will be injected into stream.

This nonce has to be hashed with a hashing algorithm of your choice to obfuscate it's contents and will be used as a signature in the end. The reason for hashing instead of using a format like JSON Web Tokens is to prevent the client from reading its contents and ending streaming prematurely once all tokens have been received. The goal is to force the client into streaming all of the video chunks to collect all verification tokens.

Example implementation of the signature:

server_secret = "my secret string"
timestamp = get_unix_epoch()
signature = hash(user_id + advert_id + media_id + token_count + timestamp + server_secret) + "." + timestamp

The server should send the hashed signature back to the client in an easily readable format.

2. Ad Streaming

The advertisement should be streamed in chunks of random length, determined by the server before streaming or on the fly. Each real media chunk should be followed by a new verification token.

A verification token should consist of the signature, plus the number of the token appended to the end. The token should be base64 encoded. Example for the first token:

token_1 = base64(signature + "." + "1")

When the client detects that the received chunk is not media but a token, it should hash the bytes of the last media chunks, append the token in plain text format, then encode the entire thing in base64 and save the new token for later use, like the following example:

previous_tokens = {"token1", "token2", ...}

encoded_token = base64(hash(media_chunks_bytes) + "." + token)
previous_tokens.append(encoded_token)

Inspired by JSON Web Tokens, a dot is used as a delimeter to ease parsing later on.

3. Verification

The client, when it detects that the stream has ended from the server, has to send all of the buffered verification tokens back to the server. The method of communication is up to the developer of the application. The goal is that the verification server has to verify each token sent by the client by doing the same calculations as the client.

The server has to:

Identify the target media and advertisement (e.g. via URL parameters or as fields in the payload) and verify their validity (authentication) and access to the target media (authorization).
Identify the user that is associated with the session (e.g. via cookie or JWT) and verify their validity (authentication) and access to the target media (authorization).
Require the client to send the original signature.
Reconstruct the signature and individual verification tokens based on the identifiers and the signature sent by the client and verify if they match.
Verify that the the difference between the signature's timestamp and the current time on the server's side is at least the advert's duration in seconds, with a margin of error of a few seconds (e.g. 3 seconds) to account for mathematical rounding differences between implementations, and not older than an upper bound deemed reasonable by the application developer (e.g. 60 minutes).

Should any of these conditions fail, the server has to reject the request and deny access to the target media. If the client fails the verification after multiple (the number is left up to the implementer) attempts, rate limiting should be applied to said client in order to prevent brute forcing of verification.

Considerations

The more tokens the server injects into the stream, the less likely it is that the client can successfully guess when the last token was or will be injected. Implementers have to take performance into consideration when determining the upper limit of tokens they want to inject into any given stream because the shorter the ad media and the more of the injected tokens can cause buffering issues on the client side, especially on devices with poor performance. A solution for this problem would be to set the lower and upper bounds for the random number of verification tokens programmatically based on the length of the advert and possibly on other information that can reveal more of the client's computational capabilities.

Benefits

Strong resistance to traditional methods of circumvention of display or playback of adverts
Crowd sourced, cached or reused tokens are ineffective
Performance overhead should be minimal on modern client devices
Could be combined with other solutions like DRM or signed URLs

Limitations

Increased development time and complexity on both client and server side
Extra latency for verification before the target media is accessible

References

US Patent US7836511B2