On Sound Objects

As already mentioned in the introduction to this chapter, the main goal of the content transmission metamodel is to analyze the signal, identify sound objects and describe them in an appropriate way. But, before getting any deeper into the different modules that make up the metamodel, it is necessary to have a clear idea of what we mean when talking about Sound Objects.

Maybe the most commonly accepted definition of a Sound Object is that related to Pierre Schaeffer's theories [Schaeffer, 1966]. In [Chion, 1983], a Sound Object is defined as ``any sound phenomenon or event perceived as a coherent whole (...) regardless its source or meaning''. Although this definition might be useful from a psycho-acoustical or perceptual point of view, it is not so from an implementation or engineering point of view.

Other explanations of an object from a multimedia point of view result in definitions with a narrower scope (see [Tolonen, 2000] , as an example of the use of objects from a physical models perspective). In MPEG-7s Multimedia Description Scheme an object is defined as follows: ``A perceivable object is an entity that exists, i.e. has temporal and spatial extent, in a narrative world (e.g. Tom's piano). An abstract object is the result of applying abstraction to a perceivable object (e.g. any piano)''.

In this section we intend to give a clear definition of what is meant when talking about this idea. For doing so, we rely on definitions given to similar concepts in other areas. Especially we refer to the Object-Oriented paradigm as described in section 1.1. It is interesting to note, though, the strong relation there has traditionally been between OO technologies and computer music or sound signal processing [Pope, 1991a]. As a matter of fact, the definition we will later introduce can be seen as a superset and conceptual enhancement of other previously introduced concepts (see [Scaletti and Hebel, 1991] , for example).

As already commented in section 1.2.3 Alan Kay includes in his definition of OO the maxima that ``everything is an object''[Kay, 1993] . Following this same idea, when dealing with Object Oriented Sound Processing, everything must be thought of as an object: a sound stream is an object, a track is an object, a musical note is an object, and an instrument is an object. These objects have different properties and relate between them in different ways.

Let us see a basic example. In a sound stream we have a number of tracks one of which contains a trumpet performance. In this track, there may be different and identical notes (same pitch, same loudness, same attack type). Thus, at first sight, we might distinguish four different kinds of objects:

  1. The whole sound stream
  2. Our set of tracks (out of which we concentrate on the one with the trumpet performance)
  3. The instrument in that track (trumpet)
  4. Any number of notes in the track
As a first and basic interpretation, Figure 5.2 illustrates a possible UML object diagram of the system.

Figure 5.2: UML object diagram of a simple audio stream
\includegraphics[%
width=0.50\textwidth]{images/ch5-OOCTM/ps/SimpleStreamObjectDiagram.eps}

On the other hand, in section 1.1.1 we defined a class as an abstract representation of a set of objects that comply with an identical behavior. Following the previous example, we could define what the class Sound_Stream, Audio_Track, Instrument and so on should behave like. The UML class diagram of the previous example would become the one depicted in Figure 5.3, which should be read: ``a Sound Stream is made up of any number of Tracks (a Track can only belong to a single Stream); an Audio Track is related to a single Instrument and an Instrument can be recorded into different Tracks; an Audio Track is also made up of any number of notes which have an association relation with the instrument that produced them; Trumpet is a particular case of an Instrument (behaves like an Instrument but may add specific behavior) and Mono Audio Track and Stereo Audio Track are particular cases of Audio Tracks in which a Stereo Audio Track must contain two Mono Audio Tracks and a Mono Audio Track can be contained in at most one Stereo Audio Track.''5.2

Figure 5.3: UML simplified class diagram representing an audio stream
\includegraphics[%
width=0.60\textwidth]{images/ch5-OOCTM/ps/SoundStream.ps}

When declaring a class, we must ask ourselves what should be its behavior declaring methods for that purpose. The class SoundStream, for example, might have methods such as AddAudioTrack(), FindInstrument(). On the other hand we should also identify the attributes that will be used to distinguish the state of two objects belonging to a same class. In that sense, for example, we should identify the attributes that may allow us to distinguish two different instruments (trumpet and piano). We may end up having a diagram similar to the one depicted in Figure 5.4.

Figure 5.4: UML class diagram representing an audio stream
\includegraphics[%
width=0.80\textwidth]{images/ch5-OOCTM/ps/SoundStreamComplex.ps}

The previous diagram, though, does not explicitly show our first hypothesis of everything being an ``Sound Object''. That is, all the elements are objects but there is no explicit relation in between them to show that they share some commonalities . For doing so, the only missing link we should add is the fact that every class in our model should be a subclass of the SoundObject superclass. The diagram would then become (previously introduced methods and attributes are not shown for simplicity) the one illustrated in Figure 5.5.

Figure 5.5: The ``everything is a sound object'' UML class diagram
\includegraphics[%
width=0.70\textwidth]{images/ch5-OOCTM/ps/EverythingSoundObject.ps}

Note that following this model we will treat every bit of audio content as an object on its own. For that reason, as we will later see, ``content description'' is very much related to ``object identification''.

It is also important to note that Sound Objects in the Object-Oriented Content Transmission Metamodel are in fact Processing Data objects instances of the DSPOOM metamodel (see section 4.1.2).

2004-10-18