Localization of Salient Events

We present methods to discover regions of emotionally salient events in a given audio-visual data. We demonstrate that different modalities, such as the upper face, lower face, and speech, express emotion with dierent timings and time scales, varying for each emotion type. We further extend this idea into another aspect of human behavior: human
action events in videos. We show how transition patterns between events can be used for automatically segmenting and classifying action events.