| SciPort RLP

Breaking down violence detection: Combining divide-et-impera and coarse-to-fine strategies

Neurocomputing. Bd. 208. Amsterdam: Elsevier 2016 S. 225 - 237

Erscheinungsjahr: 2016

ISBN/ISSN: 1872-8286

Publikationstyp: Zeitschriftenaufsatz

Sprache: Englisch

Doi/URN: 10.1016/j.neucom.2016.05.050

Volltext über DOI/URN

Geprüft:

Bibliothek

Inhaltszusammenfassung

In today’s society where audio-visual content is ubiquitous, violence detection in movies and Web videos has become a decisive functionality, e.g., for providing automated youth protection services. In this paper, we concentrate on two important aspects of video content analysis: Time efficiency and modeling of concepts (in this case, violence modeling). Traditional approaches to violent scene detection build on audio or visual features to model violence as a single concept in the feature spa...In today’s society where audio-visual content is ubiquitous, violence detection in movies and Web videos has become a decisive functionality, e.g., for providing automated youth protection services. In this paper, we concentrate on two important aspects of video content analysis: Time efficiency and modeling of concepts (in this case, violence modeling). Traditional approaches to violent scene detection build on audio or visual features to model violence as a single concept in the feature space. Such modeling does not always provide a faithful representation of violence in terms of audio-visual features, as violence is not necessarily located compactly in the feature space. Consequently, in this paper, we target to close this gap. To this end, we present a solution which uses audio-visual features (MFCC-based audio and advanced motion features) and propose to model violence by means of multiple (sub)concepts. To cope with the heavy computations induced by the use of motion features, we perform a coarse-to-fine analysis, starting with a coarse-level analysis with time efficient audio features and pursuing with a fine level analysis with advanced features when necessary. The results demonstrate the potential of the proposed approach on the standardized datasets of the latest editions of the MediaEval Affect in Multimedia: Violent Scenes Detection (VSD) task of 2014 and 2015. (C) 2016 Elsevier B.V. All rights reserved. » weiterlesen » einklappen