Thursday, March 1, 2007

GDAM: time access

Once the beat (tempo and phase) of a song has been mapped via the Beat Calculator, time access can be very sophisticated. A number of these techniques seem to still be unique to GDAM, I am giving the descriptions here in hopes that all software might offer these powerful features.

Basic precision: position math is done inside the audio thread with sample accuracy. Although file playback may not start immediately(eg hard drive takes time to spin up), the server can keep track of how far behind it falls, and dump a little data to catch up... this makes the song start in synch regardless of file access/encoder delay.

Seeking smoothly: When seeking a song, it starts a new copy at the desired position. While that starts up, the current copy keeps playing. Once the new copy has started, data is dumped to make up for the small startup delay, a quick crossfade occurs, and the old copy is discarded. This keeps position seeks smooth while honoring the *exact* timing of the request.

Relative seeking: relative seeking is requested by the client, often seeking by an amount of time which is precisely calculated from the BPM to be an even number of beats. The jump is executed using the smooth seek functionality to keep sample accuracy. The medium-scale seek buttons seek anywhere from 1 to 32 beats forwards or backwards. Because the audio engine accounts for startup delay, the beat is *perfectly* maintained when relative seeking in this way. This gives incredible control over playback of the song... press "back 4 beats" every four beats, and you are looping a perfect bar. Because you are seeking by an amount, rather than to a position, you can time the jump however you want (at any point in the bar not just at an even bar boundary) while keeping the beat.

Jumping to any point in song while keeping beat: the client has an approximate idea of where song playback is... not sample accurate, but within 1/10th of a beat. The user can click on any point in a song, and jump there without dropping the beat. The client calculates the difference between where playback is and where you want it to be, rounds that difference to an even number of beats or bars, then issues a relative seek in that amount. This opens up music to be completely time-accessible, rearranged at will in real time, without dropping the beat.

Index points: When beatmatching, you can go beyond simple tempo and phase information, and add index points at certain places within the songs such as verse, chorus, breakdown, outro. These are clearly marked in the timeline, and you can jump to them in beat. When i'm playing a hip-hop song, I'll cut every chorus to half size and skip over the guest verse. I can edit a song down to just the meaty core, or do long remixes by extending each instrumental moment into 8 bars of beat juggling and effects.

Sub-beat time access: this is usually used to nudge a song into synch with another audio source. There is a ruler representing one beat, click anywhere along it to seek an amount between 1/2 beat back to 1/2 beat foward. The closer to the ruler's center mark, the smaller the seek. When I hear my song playing out of synch, I know from the rhythmic relationship (1/4 or 1/16 beat ahead or behind) where on the ruler to click. This allows me to instantly correct, rather than nudging it incrementally or guessing how far to seek.

Video Playback: volume compression

I live in an apartment, the walls are thin, and I enjoy my TV and movies late at night. The dynamic range is way too high, some dialogue is barely audible at full volume but sudden action explodes in a frantic dive for the volume control. Volume normalization can mitigate this somewhat at extreme settings, but it is imprecise and inaccessible.

A better model: interactive volume mode, simple controls which adjust expander / compressor / limiter to intuitively adjust dynamics. If I can't hear the dialogue, I indicate "too quiet". This boosts the input volume or otherwise adjust the dynamics to boost quiet parts. After the first car crash, I indicate "too loud". This looks at the maximum output volume during the last 10 seconds, and adjusts the compressor/limiter to guarantee that nothing will produce output louder than 90% of that. The system would quickly be tuned to the content and my environment.