The term content processing has already been around for a few years [Karjalainen, 1999,Chiariglione, 2000,Camurri, 1999] but its meaning is still unclear and a matter of controversy. When we talk about content analysis, content browsing, content indexing, content processing or content transformation we are usually addressing the higher-level information that a signal produced by an audiovisual source carries within.
Even though the previous pseudo-definition is conservative in its scope, it already includes a crucial and sometimes polemical term: higher-level. It is true that this label assumes that it is being compared to something else, and this something else is usually the signal processing level. Even so, what about semantic features that can (more or less directly) be extracted from the actual signal? Is color a low-level descriptor or a high-level semantic feature of a visual object? Should we consider pitch as a higher-level feature as opposite to its signal-processing counterpart, fundamental frequency? How can we distinguish between the abstraction level implied by some perceptual feature like loudness and some other with more semantic load such as genre?
It is our opinion that for deciding whether something is content or not we must address its functional value. Therefore we will use the word content for any piece of information related to the audio source that is in any way meaningful (that it carries semantic information) to the targeted user. Thus, the description of that content can be thought of as a content hierarchy with different levels of abstraction, any of them potentially useful for some users. In that sense, think of how different would a content description of a song be if the targeted user was a naive listener or an expert musicologist. Even a low-level descriptor such as the spectral envelope of a signal can be thought of as a particular level of content description targeted for the signal processing engineer. We subscribe the already commented definition of a descriptor in MPEG7 as ``a distinctive characteristic of the data which signifies something to somebody''.
In the next sections we will also see how our idea of content fits well into an object-oriented metamodel.