
PFSeq is a precision-timing audio sequencer Android module. It is useful for music apps that require precise, rhythmic audio timing.
While working on the Sample Metronome app, it became apparent that precise audio timing can be a challenge, because it is difficult to guarantee that code will execute at a very precise time. Trying to control sound timing by sending a play command with a timer produced unsatisfactory results. I tried Android SDK audio player classes MediaPlayer, SoundPool, and AudioTrack, which is the high-performance, low-level one. Even with the target times getting calculated correctly, the audible sound would not always play exactly on its mark. The deviations were small, less than 10 milliseconds, by one measurement, but the ear can notice it, and it would be worse on some devices.
People solve this by running an AudioTrack in streaming mode and measuring out the audio data written to the stream buffer. Silence can be written to the stream in between audio items.
Accomplishing this would take a non-trivial set of code, particularly because of some of the requirements of the Sample Metronome app, such as being able to change content and tempo, while playing. For that reason, the timing engine was built as a reusable module that could easily be incorporated into other apps. Even though the initial use-case was a metronome app that only required one track, the natural choice, when building a sequencer, was to make it handle multiple, simultaneous tracks.
Position of audio items is calculated in one dimension: time. All audio item time positions are calculated relative to the tempo start time, with adequate arithmetic precision, and with minimal calculations, so that the values stay accurate for an extended duration. Time values are in nanoseconds (billionths of a second), and the audio sample rate is 44,100 frames per second, so time position is always higher resolution than the audio data. Time converts cleanly to frames (AKA samples*), because the sample rate is an integer. This means that PFSeq is precise to the frame. PFSeq has been tested against a leading digital audio workstation, and the ticks stayed perfectly synced for an extended period of time.
The module creates one control thread, and, for each track, a work thread. The control thread rapidly iterates through a loop in the “contentWriting” Runnable, which is the central brain. This loop wheels and deals segments of audio data to all tracks, either silence or content, sending them to the work threads of the respective tracks, to be written to the tracks’ buffers.
Audio clips sent to be written to an AudioTrack buffer often need to be abridged, so that they end before the anticipated subsequent item that will need to be written. Abridging is currently the expensive operation in the main control loop, because it means allocating a new array for the audio data, and iterating through arithmetic operations to produce new values for several frames, in order to accomplish a brief fade-out, which is necessary, to prevent clipping. The module is performant. The Sample Metronome app can be cranked up to dozens of ticks per second, which sounds like a buzz.
PFSeq was built for apps like loopers, sequencers, samplers, and drum machines. Rhythmic audio content is the focus of the module’s functionality, so the module includes a class, PFSeqTimeOffset, which defines a data model for tempo-independent durations in a way that is consistent with musical notation and MIDI grids. The PFSeqTimeOffset class is used to specify the position of an audio item relative to another time position, such as the beginning of a musical bar. PFSeqTimeOffset has 2 modes, percent and fractional. Fractional can be used to accomplish quarter notes, triplets, etc. Percent mode would be used if a user specfied a position on a UI component that did not snap to a fractional position. Further illustration of the rhythmic data model usage is provided in comments at the top of the PFSeqTimeOffset Java class.
PFSeq is an AAR module that includes a Service class, which is the sequencer itself, and an Activity class, which is extended by any activities that interact with the sequencer. Both of these classes are abstract, so that app developers are directed to the abstract methods, which are where app-specific code is supplied.
The sequencer Service class, named PFSeq, only has one abstract method:
The PFSeqActivity class has 3 abstract methods:
PFSeqConfig is a class where configurable sequencer settings are stored as members. Defaults are overridden by constructor arguments. Before the sequencer can play, it must be set up by calling its method setUpSequencer(PFSeqConfig config):
@Override public void onConnect() {
if (!getSeq().isSetUp()) {
boolean success = getSeq().setUpSequencer(new PFSeqConfig(null, null, null, null));
}
…
}
All that remains is to specify the audio content:
@Override
public void onConnect() {
…
PFSeqTrack track = new PFSeqTrack(getSeq(), “metronome”);
PFSeqClip clip = new PFSeqClip(getSeq(), new File([your file location]));
PFSeqTimeOffset timeOffset = PFSeqTimeOffset.make(0, MODE_FRACTIONAL, 0, 4, 0, false, 0);
PFSeqPianoRollItem item1 = new PFSeqPianoRollItem(getSeq(), clip, “item 1”, timeOffset);
track.addPianoRollItem(item1);
getSeq().addTrack(track);
…
}
Now you can call play() and it should work.
Mapping frames to time – Mapping frames to time, meaning System time to first frame played, is accomplished with an AudioTimestamp object. An AudioTrack instance provides an AudioTimestamp object only after it is playing, and an AudioTrack instance won’t play until sufficient data is written to the buffer. So, after the sequencer is told to play, we fill the buffer with silence and start playing, so that we can get the AudioTimestamp, giving us a precise mapping of the exact time that audio begins playing. The module’s current default buffer size corresponds to a half second of time, so that is the delay after a user presses play, before audio content begins. Having a solid mapping of frames-to-time is key because it allows the module to sync multiple tracks.
Syncing multiple tracks – After tracks have frame-to-time mapping, they can be synced. After they are synced, content can start playing. The code for these 3 stages is organized into 3 Runnables: silenceUntilMapped, syncTracks, and contentWriting. They each run once, in the control thread, one after the other.
The syncTracks Runnable has a loop in which silence is written to all tracks until all tracks have their buffers sufficiently full, and no writes are pending. Then we stop sending silence to be written and the soonest nanosecond that is writable for all tracks is determined. The soonest writable nanosecond is the system time at the end of written data for a track. This is different for each track, so we get the soonest writable time that is furthest out, of all the tracks, and fill the gap with silence for tracks with buffered data that ends sooner. Then, all tracks have silence written up to the same point in time, and nothing written to the track buffers, after that. Now, the tracks are synced, and actual audio content can start getting written to the AudioTrack buffers.
Changing tempo while playing – Timing accuracy is achieved by keeping count of frames written to an AudioTrack. If you know how many frames have been written, and you know there have been no underruns, and you can map frames to time, then you know which frame to write to in order for a sound to play at a desired moment. PFSeq ensures there are no underruns by regularly checking AudioTrack.getUnderrunCount(), and stopping itself and sending an error message, if there has been an underrun.
This setup presented a challenge for achieving real-time tempo changing. Because audio item placement is calculated relative to the tempo start time, and relative to the tempo, if the tempo is changed while playing, then it would seem to the user like their place in the sequence of audio items had shifted. The solution was to rewrite history. When changing tempo, we calculate where the tempo start time would have been, if the tempo had always been the new tempo, and set it to that. By doing this, the result is that the tempo can be changed, while playing, without the current place getting shifted.
* The term “sample” can actually be used to refer to three distinct things, in the context of discussing this module:
You could say that a sample consists of a sequence of samples, each of which, if stereo, includes 2 samples. 🙂