Implementing AVAssetResourceLoaderDelegate: a How-To Guide

TL;DR

See the code samples for all this on GitHub.

Meat & Potatoes

I’m writing a podcast app — I’m calling it ‘sodes — both as a way to let off steam and so that I can have the fussy-casual podcast app I’ve always wanted. Most podcast apps pre-download a full queue of episodes before you listen to them, and offer settings to manage how many episodes are downloaded, how often, and what to do with them when finished. ‘Sodes will be streaming-only. I think managing downloads is an annoying vestigial trait from when iPods synced via iTunes. I only listen to a handful of podcasts, and never from a place that doesn’t have Internet access. I’d rather never futz with toggles and checkmarks or police disk usage.

Most other apps do have optional streaming-only modes which, as far as I know1, are implemented as follows:

In other words, even though you may be using a streaming-only mode, your app might be downloading the episode twice. It’s a little sneaky, but it’s a perfectly sensible compromise. If the parallel download succeeds it means the current episode won’t need to be re-buffered during a future session. AVFoundation does not persist streaming buffers across app sessions. Since it’s not uncommon for a podcast MP3 to be encoded at ~60 megabytes an hour, resuming playback from a cached file can dramatically reduce data usage over time, especially if it takes several sessions for someone to finish listening to an episode.

I could use that same dual-download pattern with ‘sodes, but I wondered if it would be possible to eliminate the need for a parallel download without also having to re-download the same streaming buffer with every new app session. After some digging, I found an obscure corner of AVFoundation which will allow me to do exactly that. There’s a protocol called:

AVAssetResourceLoaderDelegate

It lets your code take the reigns for individual buffer requests when streaming audio or video with an AVPlayer. When setting up an AVURLAsset to stream, you can set the asset’s resource loader’s delegate to a conforming class of your own:

let redirectUrl = {I’ll get into this below…}
let asset = AVURLAsset(url: redirectUrl)
asset.resourceLoader.setDelegate(self, queue: loaderQueue)

Your custom resource loader delegate is given an opportunity to handle each individual request for a range of bytes from the streamed asset, which means you could load that data from anywhere: from the network if you don’t already have the bytes, or by reading it from a local file if you do.

A proper implementation of AVAssetResourceLoaderDelegate is hard to get correct. The actual code you write needn’t be extraordinary. What’s hard is the documentation is spotty, the protocol method names are misleading, the required url manipulation is bizarre, and the order of events at run-time isn’t obvious. There are still aspects of it that I don’t fully understand, but what follows is a record of what I’ve learned so far.

Note: there are portions of AVAssetResourceLoaderDelegate that are only applicable to streamed media that require expiring forms of authentication. Those are outside the scope of this post since I don’t need to use them for streaming a podcast episode.

Basics of a Streaming Session

When you add an AVPlayerItem to an AVPlayer, the player prepares its playback pipeline. If that item’s asset points to a remotely-hosted media file, the player will want to acquire a sufficient buffer of a portion of that file so that playback can continue without stalling. The internal structure of the relationship between AVPlayer, AVPlayerItem, and AVURLAsset is not publicly exposed. But it is clear that AVPlayer fills its buffer with the help of AVURLAsset’s resourceLoader property, an instance of AVAssetResourceLoader. The resource loader is provided by AVFoundation and cannot be changed. The resource loader fulfills the AVPlayer’s requests for both content information about the media as well as requests for specific byte-ranges of the media data.

AVAssetResourceLoaderDelegate

AVAssetResourceLoader has an optional delegate property that must conform to AVAssetResourceLoaderDelegate. If your app provides a delegate for the resource loader, the loader will give its delegate an opportunity to handle all content info requests and data requests for its asset. If the delegate reports back that it can handle a given request, the resource loader relinquishes control of that request and waits for the delegate to signal that the request finished.

For our purposes, there are two delegate methods we need to implement:

func resourceLoader(_ resourceLoader: AVAssetResourceLoader, shouldWaitForLoadingOfRequestedResource loadingRequest: AVAssetResourceLoadingRequest) -> Bool

func resourceLoader(_ resourceLoader: AVAssetResourceLoader, didCancel loadingRequest: AVAssetResourceLoadingRequest)

The first method should return true if the receiver can handle the loading request. The method name is confusing at first glance since it’s written from the perspective of the resource loader (“should wait”) as opposed to the delegate (“can handle”), but it makes enough sense. The delegate is returning true if the resource loader should wait for the delegate to signal that the request has been completed. It is from inside this method that the delegate will kick off the asynchronous work needed to satisfy the request.

The second method is called whenever a loading request is cancelled. This is easy enough to reproduce. If you start playback from the beginning of a file, and then scrub far ahead into the timeline, there’s no longer a need to fill up the earlier buffer so the request for that initial range of data will be cancelled in order to spawn a new request starting from the scrubbed-to point.

Both delegate methods will be called on the dispatch queue you provide when setting the resource loader’s delegate:

asset.resourceLoader.setDelegate(self, queue: loaderQueue)

I recommend that you use something other than the main queue so that loading request work never competes with the UI thread. I also recommend using a serial queue so that you don’t have to juggle concurrent procedures within your delegate.

AVAssetResourceLoadingRequest

The AVAssetResourceLoadingRequest class represents either a request for content information about the asset or a request for a specific range of bytes in the asset’s remotely-hosted file. You can determine which kind of request it is by inspecting the following two properties:

var contentInformationRequest: AVAssetResourceLoadingContentInformationRequest?

var dataRequest: AVAssetResourceLoadingDataRequest?

If there is a non-nil content information request, then the loading request is a content info request. If there is a non-nil data request and if the content info request is nil, then the loading request is a data request. It’s crucial to note here that content info requests are always accompanied by a data request for the first two bytes of the file. The actual received bytes are not used by the resource loader.

My implementation of resourceLoader(shouldWaitForLoadingOfRequestedResource:) looks like this:

if let _ = loadingRequest.contentInformationRequest {
    return handleContentInfoRequest(for: loadingRequest)
} else if let _ = loadingRequest.dataRequest {
    return handleDataRequest(for: loadingRequest)
} else {
    return false
}

I perform the work specific to either kind of request in those two private convenience methods.

Content Info Requests

Handling a content info request is straightforward. Create a URLRequest for the original url using a GET verb and set the value of the byte range header to the loading request’s dataRequest’s byte range:

let lower = dataRequest.requestedOffset
let upper = lower + dataRequest.requestedLength - 1
let rangeHeader = "bytes=\(lower)-\(upper)”
setValue(rangeHeader, forHTTPHeaderField: "Range")

You may wonder why I’m not using a HEAD request instead. I’m following Apple’s lead. Their engineers have their well-considered reasons. My educated guess is that if you request a byte range, the response header field Content-Range will contain a value for the expected content length of the entire file. This value wouldn’t be present in a HEAD response header. A range of two bytes is the smallest valid range, which helps avoid unnecessary data transfer.

Hang onto a strong reference to the loading request and the loading request’s contentInformationRequest. After receiving a response back from the server, you must update the content info request’s properties:

let infoRequest = loadingRequest.contentInformationRequest

infoRequest.contentType = {the content type, e.g. “public.mp3” for an MP3 file}

infoRequest.contentLength = {the expected length of the whole file from the Content-Range header, if present}

infoRequest.isByteRangeAccessSupported = {whether the server supports byte range access}

Warning: do not pass the two requested bytes of data to the loading request’s dataRequest. This will lead to an undocumented bug where no further loading requests will be made, stalling playback indefinitely.

After updating those three values on the content info request, mark the associated loading request as finished:

loadingRequest.finishLoading()

If you get an error when trying to fetch the content info, mark the loading request as finished with an error:

loadingRequest.finishLoading(with: error)

While your delegate is handling the content info request, it is unlikely that any other requests will be started. Your request could be cancelled during this time if the player happens to cancel playback. Since you’re holding onto a strong reference to the loading request, you should take care to cancel any URLSessionTasks and relinquish references to the loading request when it’s cancelled as well as when it’s finished.

Assuming you fetched the content info successfully, calling finishLoading() will trigger the resource loader to follow up with the first genuine data request.

Data Requests

For a given asset, the resource loader will only make one content info request but will many one or more data requests (instances of AVAssetResourceLoadingDataRequest). If the host server does not support byte range requests, there will be one data request for the full file:

let dataRequest = loadingRequest.dataRequest
if (dataRequest.requestsAllDataToEndOfResource) {
    // It’s requesting the entire file, assuming
    // that dataRequest.requestedOffset is 0
}

iTunes podcast registry will reject any podcast feed whose host server doesn’t support byte range requests. Thus in practice it’s probably hard to find a podcast host server that doesn’t support byte range requests. It’s not a terrible idea for a podcast-specific implementation of AVAssetResourceLoaderDelegate to always fail if you determine that the host server doesn’t support byte range requests. This will spare you the additional headache of handling the edge cases where either the full file is being requested or the length of the file exceeds the maximum length that can be expressed in an NSInteger using the current architecture (this can happen on 32 bit systems). See the documentation for AVAssetResourceLoadingDataRequest for more information about these edge cases.

Most of the time your data requests will be for a specific byte range:

let dataRequest = loadingRequest.dataRequest
let lower = dataRequest.requestedOffset
let upper = lower + dataRequest.requestedLength - 1
let range = (lower..<upper)

A simplistic implementation would make a GET request with the Range header set to the requested byte range, download the data using URLSessionDownloadTask, and pass the result to the loading request as follows:

let dataRequest = loadingRequest.dataRequest
dataRequest.respond(with: data)
loadingRequest.finishLoading()

A problem with this implementation is that the request doesn’t receive data progressively, but rather in one big bolus at the tail end of the URL task. The respond(with: data) method is designed to be called numerous times, progressively adding more and more data as it is received. AVPlayer will base its calculations about whether or not playback is likely to keep up based on the rate at which data is passed to the data request via respond(with: data). For this reason, I recommend using a URLSession configured with a URLSessionDataDelegate, and to download the data using URLSessionDataTask so that the data delegate can pass chunks of data to the loading request’s data request as each chunk is received:

func urlSession(_ session: URLSession, dataTask: URLSessionDataTask, didReceive data: Data) {
    loadingRequest.dataRequest?.respond(with: data)
}

When the URLSessionDataTask finishes successfully or with an error, finish the loading request accordingly:

loadingRequest.finishLoading()
// or
loadingRequest.finishLoading(with: error)

If the user starts skipping or scrubbing around in the file, or if the network conditions change dramatically, the resource loader may elect to cancel an active request. Your delegate implementation should cancel any URLSessionTasks still in progress. In practice, requests can be started and cancelled in rapid succession. Failure to properly cancel network requests can degrade overall streaming performance very quickly.

URL Manipulation

I’ve skipped over an important part of implementing AVAssetResourceLoaderDelegate. Your delegate will never be given an opportunity to handle a loading request if the AVURLAsset’s url uses an http or https url scheme. In order to get the resource loader to use your delegate, you must initialize the AVURLAsset using a url that has a custom scheme:

let url = URL(string: “myscheme://example.com/audio.mp3”)
let asset = AVURLAsset(url: url)

What I recommend doing is prefixing the existing scheme with a custom prefix:

myschemehttps://example.com/audio.mp3

This is a non-destructive edit that can be removed later. Otherwise it would be more difficult to determine whether to use http or https when handling the loading request.

Your resource loader delegate implementation should check for the presence of your custom url scheme prefix when determining whether or not it can handle a loading request. If so, you’ll strip the prefix from the loading request’s url purely as an implementation detail, using the original url value when fulfilling the loading request. The resource loader doesn’t need to know that you’re manipulating the url in this way.

Warning: if you forget to modify the url scheme for the AVURLAsset, your delegate method implementations will never be called.

Special Implementation in ’sodes

My resource loader delegate for ’sodes will be optimized for podcast streaming. When it receives a data request from a resource loader, it will first check a locally-cached “scratch file” to see if any portions of the requested byte range have already been downloaded and written to the scratch file during a previous request. For any overlapping ranges, the pre-cached data will be passed to the data request. For all the gaps, the data will first be downloaded from the internet, and then both written to the scratch file and passed to the data request. In this way, I can download each byte only one time, even across multiple app sessions.2

As byte ranges are downloaded, I write the data to the scratch file using NSFileHandle. If written successfully, I annotate the downloaded range in a plist stored in the same directory as the scratch file. The plist gets updated at regular intervals during a download session. I combine all the contiguous or overlapping downloaded byte ranges when generating the plist, so that it’s easy to parse for gaps in the scratch file when servicing future data requests. The plist is necessary because I am not aware of any facility in Foundation that can determine whether a given file contains ranges of “empty” data. Indeed, an “empty” range might not even contain all zeroes. I take great pains to ensure that the loaded byte range plist is only updated after the data has been successfully written to the scratch file. I’d rather err on the side of having data the plist doesn’t know about, rather than the plist reporting that there is a range of data that hasn’t actually been downloaded.

GitHub

I’ve posted to GitHub a slightly modified version of the actual code that will go into ’sodes. You can see it here. It’s MIT licensed, so feel free to re-use any of it as you see fit. It is not intended for use as a re-usable framework since it’s heavily optimized for the needs of ’sodes. The Xcode project has a framework called SodesAudio which has all the AVFoundation-related code, including my resource loader delegate. There’s also an example host app that plays an episode of Exponent on launch. The example app has simple playback controls, and also a text view that prints out the loaded byte ranges that have been written to the scratch file. The ranges are updated as more data is received.


  1. If you make or know of an app that solves this problem in a different way, I’m anxious to hear about it. 

  2. If the user jumps around between multiple episodes, this will negate that effort. I could guard against this by keeping more than one scratch file around in the cache, but for now I’m only keeping a single scratch file around, so that I can minimize disk usage. Disk space tends to be more constrained on iOS devices than network bandwidth. 

|  3 Sep 2016




How to Pause and Resume Your App’s Audio in Response to Turn-by-Turn Notifications and Other Audio from Other Apps

I’m writing this down if only so I can reference it myself later. Maybe this will help you, dear reader in some not-too-distant future. If you are making an iOS app that plays spoken-word audio (like, say, a podcast app) and you want your audio to pause when another app plays some audio (turn-by-turn directions notifications, etc.), then here is the TL;DR of what you need to do:

  1. Your app’s audio session should use an appropriate category, most likely: AVAudioSessionCategoryPlayback.

  2. Your app’s audio session should use the AVAudioSessionModeSpokenAudio mode, which signals to AVFoundation that your audio session is primarily spoken word content which should ideally be interrupted by certain categories of audio from other apps’ audio sessions.

  3. Your app should register for the .AVAudioSessionInterruption notification (contrary to the documentation, don’t assume it’s fired on the main queue - be safe). In your handler for the notification, first check the user info to see whether the value for AVAudioSessionInterruptionTypeKey is .began or .ended. If it’s .began, then your audio player is likely already paused. You should update any other model or UI state accordingly. If the state is .ended, then you should check the value for AVAudioSessionInterruptionOptions to see if it contains .shouldResume. If it does, you should resume playback. AVFoundation will not resume playback on your behalf.

Sadly, even if you do all these things, the final result is partially out of your control. The other app that triggered the interruption must configure its audio session with the option:

AVAudioSessionCategoryOptionInterruptSpokenAudioAndMixWithOthers

which signifies that it should mix with other audio sessions besides spoken audio, which should be interrupted. Apple’s Maps app does this. Google Maps does not. 😞

For more information, read this programming guide.

|  30 Jul 2016




The Value of Grey Thinking

This idea should be taught as part of a standard college, or even high school, curriculum:

But the fact is, reality is all grey area. All of it. There are very few black and white answers and no solutions without second-order consequences.

This fundamental truth is easy to grasp in theory and hard to use in practice, every day. It takes a substantial deprogramming to realize that life is all grey, that all reality lies on a continuum. This is why quantitative and scale-based thinking is so important. But most don’t realize that quantitative thinking isn’t really about math; it’s about the idea that The dose makes the poison.

Excerpted from The Value of Grey Thinking at Farnmam Street Blog.

|  5 Jul 2016




How to Make a Sticker Pack for iOS 10 Messages

TL;DR - No programming needed.


Screenshot of some stickers from the Whisper app Jamin Guy and I used to make for App.net. It’d be fun to be able to use these again.

  1. Make your sticker images.
  2. Open Xcode, choose the Sticker Pack template project.
  3. Drag and drop your images into the default Sticker Pack asset folder.
  4. Name your images and give them accessibility descriptions.
  5. Choose whether you want the layout to use a small, medium, or large grid presentation.1
  6. Done.2

  1. This doesn’t seem to affect the actual size the stickers appear to be in the chat area, nor in the grid, but only whether the grid should be spaced for small, medium, or large stickers (4,3,2 across on my iPhone 6s Plus e.g.). This might change in future betas. 

  2. Okay, you’ll also need to make some new app icons. There’s about a dozen new app icon sizes needed for Messages apps. But that’s just more design work, not programming. 

|  21 Jun 2016




Apple TV Focus Animation Coordinator Bug

If you know how to fix this, please ping me on Twitter.

Since tvOS 9.2, the addCoordinatedAnimations(-:completion:) method of UIFocusAnimationCoordinator has exhibited a strange behavior. Assume you have a collection view whose cells use either affine or 3D scale transforms to scale-up their contents when focused:

override func didUpdateFocusInContext(context: UIFocusUpdateContext, withAnimationCoordinator coordinator: UIFocusAnimationCoordinator) {
        coordinator.addCoordinatedAnimations({ 
            if self.focused {
                self.focusEffectsContainer.transform = CGAffineTransformMakeScale(1.158, 1.158)
            } else {
                self.focusEffectsContainer.transform = CGAffineTransformIdentity
            }
            }, completion: nil)
    }

If you scroll through the collection view very quickly, cells that acquire and then immediately lose focus before being scrolled into the visible bounds will appear to be scaled-down for a few moments before returning to their identity transform states.

Check out a video of the bug here. You can also download the sample project here

My best guess is that UIFocusAnimationCoordinator’s addCoordinatedAnimations method is using some unclamped animation parameters, which causes the return to an identity transform to over-downscale (like a bouncy animation). This, in conjunction with the (new since 9.2) delay applied to coordinated animations for offscreen views, might be causing the shrunken appearance of the cells awaiting the rest of their unfocusing animations.

I can “resolve” the issue by making nested UIView animation block calls from inside the addCoordinatedAnimations animation block argument, passing animation options that override the inherited curve and duration. But this results in sloppy looking animations all the time, even though it resolves the fast-scrolling transform edge case. I’d prefer to find a workaround that continues to use addCoordinatedAnimations the way it was intended.

|  14 Jun 2016