So far, we have assumed that all sites want to receive media data in
the same format. However, this may not always be appropriate.
Consider the case where participants in one area are connected
through a low-speed link to the majority of the conference
participants who enjoy high-speed network access. Instead of forcing
everyone to use a lower-bandwidth, reduced-quality audio encoding, an
RTP-level relay called a mixer may be placed near the low-bandwidth
area. This mixer resynchronizes incoming audio packets to reconstruct
the constant 20 ms spacing generated by the sender, mixes these
reconstructed audio streams into a single stream, translates the
audio encoding to a lower-bandwidth one and forwards the lower-
bandwidth packet stream across the low-speed link. These packets
might be unicast to a single recipient or multicast on a different
address to multiple recipients. The RTP header includes a means for
mixers to identify the sources that contributed to a mixed packet so
that correct talker indication can be provided at the receivers.
Some of the intended participants in the audio conference may be
connected with high bandwidth links but might not be directly
reachable via IP multicast. For example, they might be behind an
Schulzrinne, et al Standards Track [Page 6]
RFC 1889 RTP January 1996
application-level firewall that will not let any IP packets pass. For
these sites, mixing may not be necessary, in which case another type
of RTP-level relay called a translator may be used. Two translators
are installed, one on either side of the firewall, with the outside
one funneling all multicast packets received through a secure
connection to the translator inside the firewall. The translator
inside the firewall sends them again as multicast packets to a
multicast group restricted to the site's internal network.
Mixers and translators may be designed for a variety of purposes. An
example is a video mixer that scales the images of individual people
in separate video streams and composites them into one video stream
to simulate a group scene. Other examples of translation include the
connection of a group of hosts speaking only IP/UDP to a group of
hosts that understand only ST-II, or the packet-by-packet encoding
translation of video streams from individual sources without
resynchronization or mixing. Details of the operation of mixers and
translators are given in Section 7.
3. Definitions
RTP payload: The data transported by RTP in a packet, for example
audio samples or compressed video data. The payload format and
interpretation are beyond the scope of this document.
RTP packet: A data packet consisting of the fixed RTP header, a
possibly empty list of contributing sources (see below), and the
payload data. Some underlying protocols may require an
encapsulation of the RTP packet to be defined. Typically one
packet of the underlying protocol contains a single RTP packet,
but several RTP packets may be contained if permitted by the
encapsulation method (see Section 10).