Eventfulness for Interactive Video Alignment

1Cornell University, 2University of Michigan 3Meta AI

Eventfulness represents the likelihood that different moments in video are the intended targets of synchronization tasks.

Abstract

Humans are remarkably sensitive to the alignment of visual events with other stimuli, which makes synchronization one of the hardest tasks in video editing.

A key observation of our work is that most of the alignment we do involves salient localizable events that occur sparsely in time. By learning how to recognize these events, we can greatly reduce the space of possible synchronizations that an editor or algorithm has to consider. Furthermore, by learning descriptors of these events that capture additional properties of visible motion, we can build active tools that adapt their notion of eventfulness to a given task as they are being used. Rather than learning an automatic solution to one specific problem, our goal is to make a much broader class of interactive alignment tasks significantly easier and less time-consuming.

We show that a suitable visual event descriptor can be learned entirely from stochastically-generated synthetic video. We then demonstrate the usefulness of learned and adaptive eventfulness by integrating it in novel interactive tools for applications including audio-driven time warping of video and the extraction and application of sound effects across different videos.

Interactive Tools

We demonstrate eventfulness in three novel interactive tools: the first two are designed to help with the extraction and application of sound effects to and from video. The third is designed to help time-warp one video based on events in another video or audio signal, which we use to perform dancification. The role of eventfulness in all of these applications is to reduce the space of synchronization events that a user must search through for each associated task.


Foley Application

Eventfulness could be applied to insert sound effects at selected events in videos. Users navigate the timeline and select events where sound effects should be added. For each selected event, they can also change what sound should be added.

Time-warp Based Video Dancification

Our second application deals with the challenging task of dynamically warping a video into alignment with some target signal. We address scenarios closer to the dancification explored in Visual Beat, where discrete visual events are to be synchronized with discrete events in a target signal (e.g., aligning visual events with beats in a piece of music).

Video-video Alignment

We can also visually align two videos with similar eventfulness curve.