The Coding Step (Content Description)

In the coding step, all the content information extracted and organized in the form of objects in previous analysis step needs to be encoded in an appropriate format. Ideally binary and textual based versions of the format should be provided in order to provide both coding and transmission efficiency and readability. It is also important for the coding scheme used to offer support to the way that the output of our analysis block is organized. In that sense, it is necessary to use a highly structured language that enables the description of a tree-like data structure giving also support to object-oriented concepts.

Maybe the first idea that comes to mind is using UML as a way of describing our content. UML is indeed a highly structured language and supports all OO concepts. It would be an excellent choice for describing our Sound Classes. But it is not so appropriate if what we want is to describe the state of our objects/instances or, in other words, make our objects persistent. On the other hand, and as described in section 1.4.3, a quite immediate relation can be established between in-memory objects declared in any object-oriented language and their persistent representation or metadata.

There are many examples of coding schemes used for encoding metadata or, more precisely, audiovisual content description, perhaps the most ambitious being MPEG7 (see 1.4.2). Although MPEG7 is focused on search and retrieval issues, the actual encoding of the audiovisual content description is flexible enough as for being used in applications as the ones envisioned in this thesis [Ebrahimi and Christopoulos, 1998,Lindsay and Kriechbaum, 1999].

Our Content Transmission metamodel does not enforce a particular description scheme such as MPEG-7 but does recommend XML as an appropriate encoding format. Besides, our Sound Objects are in fact DSPOOM Processing Data's and their attached features are DSPOOM Descriptors. If CLAM is used as an appropriate implementation of the metamodel all such object have automatic built-in XML persistence (see section 3.2.2). Note that, as a matter of fact, XML-Schema will be the language used for structuring our content or defining Sound Classes, but the actual output of the analysis or content of the identified objects will be a standard XML document.

On the other hand, the encoding step must also be in charge of deciding the degree of abstraction to be applied to the output of the content extraction step. This decision must be taken on the basis of the application and the user's requirements although it will obviously affect the data transmission rate. The encoder must decide what level of the content tree should indeed be encoded depending on the degree of concreteness demanded to the transmission process, degree that will usually be fixed by the particularities of the receiver. If only high-level semantic information is encoded, the receiver will be forced to use more of its 'artificial imagination' (see next section). The more low-leveled the information encoded is, the more 'real world knowledge' the receiver should have.

Another subject, which will not be dealt with, is how this textual information could be compressed and transformed into a more efficient binary format suitable for transmission. Suffice it to say that different solutions, such as MPEG7's Binary Format[Manjunath et al., 2002], have been found to this issue.

2004-10-18