Published on: Friday 30th April 1999 By: Janus Boye
When you read a book, it is fairly easy to know, when you are getting close to the end. You can feel with your fingers, that the book is running out of paper, and when there are no more pages left, you do know, that you have reached the end. Because of the inter-twingled nature of hypertext, this is not how things work when you move around in hyperlinks. In hypertext fiction, you might get lost, you might never reach the end, or you might get to the end, without having met all the characters.
This article is going to cover how time, more than 10 years after the invention of the World Wide Web, is slowly becoming a first-class citizen of the Web. I'll try to cover 3 exciting technologies that have come out in the last two years: SMIL (Synchronized Multimedia Integration Language), ASF (Advanced Streaming Format) and HTML+Time (Timed Interactive Multimedia Extensions). In the conclusion, I'll share some thoughts on how HTML, the main building block of the World Wide Web, and the Web itself, is being changed by the addition of the time dimension.
As just a high-level discussion of the different benefits (and drawbacks) these technologies present, this article is not meant to be very technical, but to serve as a short introduction to the topic, and present you with some ideas on where the technology could be used (or abused). Please consult the useful links section, if you want a more technical introduction to the different technologies.
Before I move on to describing how you can add time to your Web page, and use these exciting new technologies, it is important first to figure out, why and when adding time would be a good idea.
Currently the main difference between the well-known invention called TV, and the more recent invention, but not as well-known, called the World Wide Web, is how users interact much more actively with the newer media. If you start adding the time dimension to your Web page, your Web pages will in turn become more like TV. Your users will be able to sit back from the keyboard and let go off the mouse (for a while), while your content is being presented.
For some content, like a presentation, this might not be a bad idea. You could combine text, with time-based medias such as audio, video and animation and create a multimedia presentation. As an example you could apply for a job, with some text covering your resume, some video from your last speech that would start after the 5th paragraph had been displayed, some audio where you present yourself, and an animation to show your skills. All this could be linked together, so that the company you send it to, could choose to either sit back and enjoy the presentation or walk through the links in the HTML, or even do both!
Unlike old-media productions, you do not need an expensive studio, you do not need time on cable networks, you basically just need your PC and a network connection, to broadcast your work to the world using these new technologies.
Do your users want something like TV though? As always with new technologies, you should be very careful, only to use these technologies, when they are appropriate. Certainly in a presentation it could be appropriate, in hypertext fiction it could also be useful to walk your users through a default path, but in other cases, such as email, news, and e-commerce, other parameters such as accessibility, load-time, and the trustworthiness of the page are much more important.
A main stumbling block before the Web can become like TV, is the bandwidth problem. You will never want to watch WebTV on a 33.6 Kb modem! With text-only presentations that uses simple events, this is not that big an issue, but if you start to integrate multimedia, don't leave your users waiting forever to get something they might not want. If you offer something that takes a long time to download, make sure to present your user with a very good description of what the user is getting, and why it is worthwhile waiting.
The new technologies covered in this article, will make it possible for you to create extremely complicated multimedia content on the Web. Not only does the Web provide you with a universal information space, it now also lets you create synchronized multimedia. Technology will let you listen to radio on your handheld device, let you view screen previews of Star Wars Episode I on your PC, view a press conference on CNN, explore animated 3D worlds and much more.
One of the main advantages of the technologies that came out in the last 2 years, is that they use streaming technology. This makes it possible even for users on slow network connections to gain access to advanced multimedia content optimized for their connection. Previously you would have a link to an AVI file here, a WAVE file there, a MPEG video here and so on, and it would all take a very long time to download. Not only did it take a long time to download, but you also had to wait for the entire file to download before playback could begin. Streaming technology, on the other hand, lets multimedia servers send content in a continuous stream, that can be decoded and played back shortly after being received. To use an analogy, streaming is to downloading, as drinking milk straight out of the carton, is to first pouring yourself a glass, and then drinking. You don't have to wait, you can drink more than a glass without missing a beat, and there are no dishes to wash. What is important, is that streaming opens the door for unlimited-length media.
Let us now move on to briefly present the different ways in which you can add time to your Web pages.
Early work on SMIL (Synchronized Multimedia Integration Language) began back in December 1995, and the SMIL recommendation (http://www.w3.org/TR/REC-smil) came out on June 15, 1998. The standard was backed by big companies such as RealNetworks, Apple, Lucent/Bell Labs, Philips, Digital Equipment Corp., and Alcatel. Microsoft helped develop and author the standard, but just before the standard moved to the recommendation level at W3C, Microsoft chose to back out. Netscape was also a part of the working group that developed SMIL, but also dropped out. Microsoft later came out with HTML+Time. More about that later.
Even though SMIL does not end with ML (like all the other markup languages), it relies on XML, and is thus a markup language. This means that SMIL is not a complex programming language or, even worse, some proprietary technology -- you could even create the layout and design of SMIL presentations using a text editor!
SMIL, which is written as an XML application (see XML - What's in it for us?), is a language, that is meant to schedule multimedia presentations where audio, video, text and graphics are combined in real-time. All the different media elements are referenced from the SMIL file, similar to the way an HTML page references its images, applets, and other elements. You can, for example, control the precise time a sentence is being shown, and make it coincide with a video clip or a soundclip, another good example could be a slide-show followed by some text or sound.
Another cool thing in SMIL is, that you can even set language choices for your presentation, enabling you to create the time line and layout just once, but serving an international audience (see the Tags for the Identification of Languages at http://www.ietf.org/rfc/rfc1766.txt)
Compared to older formats, such as AVI or MPEG, SMIL presentations lets text appear outside the presentation. This is particularly good for search engines, since they will be able to also index this data, and thus make it easier for users to find exactly what they need. The separation of text from video and audio also increases accessibility, as users with text-only browsers still are able to access parts of the content.
Familiar looking buttons such as stop, fast-forward, and rewind have also been build into the language, as well as additional functions such as random access (i.e. the presentation can be started anywhere), and slow motion.
The first commercial player/browser of SMIL to arrive was RealNetworks' RealPlayer G2 (http://www.real.com/g2). RealNetworks has implemented a large subset of the SMIL 1.0 spec. in G2, but chose not to implement the interactive part of SMIL, that lets you create hyperlink media elements (e.g. a linked table of contents). Hopefully this will be implemented in future versions of the RealPlayer. As an authoring tool, Allaire's HomeSite 4.0 (http://www.allaire.com/Products/HomeSite) has built in support for some SMIL elements.
Once you've written some SMIL content, you also need to host it somewhere. The HTTP protocol was found not to be the best protocol to use on the server side, since HTTP does not understand the temporal nature of streamed presentations, and as a result it can't serve files on a timed basis reliably. So even though you can do SMIL on HTTP servers, the RTSP (RealTime Streaming Protocol) was built to understand time, or more specifically, time-stamps. RTSP also supports VCR-like functionality, and multicasting, and is, among many places, used in the Netscape Media Server. See a review at http://serverwatch.internet.com/dtreview-nsmedia.html. To use an analogy, media servers are to Web servers, as bicycles are to cars for getting around. Sure you could take your bike on the highway, but you would be better off with your car.
One of the best features of SMIL is perhaps, that since it is a text based language, you have the ability to create code on-the-fly using a database. This is how many Web pages are already created today, and it allows you to offer personalized streaming multimedia.
Last, but not least, it is important to note, that SMIL is not intended to replace any of the existing multimedia environments out there. Rather it is a universal glue for joining all kinds of different formats and types of media in interesting, and more even more important, useful ways, using a vendor-neutral language.
As opposed to SMIL, that is based on a plain text-based markup language, ASF (Advanced Streaming Format) is a full-blown object-oriented programming language. ASF is supposed to replace the older AVI (Audio Video Interleave) format, and it is backed very strongly by Microsoft, but also by about 100 other companies including IBM.
Where both AVI (and also QuickTime, MPEG-1 and MPEG-2) mainly were built for local playback, and not designed to handle the precarious network pipes that characterizes the Internet today, ASF is, just like SMIL, designed for efficient media playback over networks.
The design roots of ASF stem from the RIFF format (for example, AVI and WAVE), which IBM and Microsoft defined over a decade ago. Several different companies later defined other file formats (i.e. QuickTime from Apple) to address the limitations within RIFF. The goal of ASF, when Microsoft released the first draft of the specification in August 1997, was to build a highly flexible and expandable multimedia file format to simultaneously support the diverse needs of local playback (for example, CD-ROM, DVD, and hard disk), HTTP playback, and media server streaming, all utilizing the experience of RIFF, and the other file formats created to address the limitations in RIFF.
This continues to be the goal today, as you are slowly starting to see ASF files on more and more sites. Back in 1996 when the 1st draft came out, Microsoft made their Netshow streaming server and client products ASF capable, and the newer Windows Media Player (http://microsoft.com/windows/mediaplayer) also supports ASF. The Windows Media Player comes built into Windows 98, and Microsoft's Internet Explorer 4 and 5, so many users already have the player.
Under the hood, an ASF file is based on a sequence of objects. While some objects, such as the header object that provides global information about the content in the file are required, others are not. The file format is extensible, as any object type can be extended by adding sub-objects, and servers that do not recognize the sub-object can safely ignore it, just as browsers safely ignore any HTML-tags that they do not understand.
One of the interesting aspects of ASF, is that is supports component download and prioritization. Component download is achieved by storing stream-specific information about playback components (for example, decompressers and renderers) in the file header. If the user does not have the needed version, the user can then be offered a download option, or it can even be done automatically, so that the user does not have to worry. The prioritization capabilities in ASF allows authors to establish prioritization among the various different data streams. For example, in a casting from a BB King concert, the sound might be more important than video, which in turn might be more important than the lyrics.
In comparison to SMIL (and HTML+Time - more on that later), ASF does not describe where different media streams should appear on the screen. At first glance, this might seem to be a shortcoming of ASF, but considering ASF's focus on storage and transmission, this is actually one of ASF's strengths, since it helps make ASF more efficient and much less complicated, as ASF does not worry about layout information in each and every stream.
Lastly, ASF also supports the language capabilities that SMIL offers, thus allowing you to create truly international multimedia content.
Citing failings with SMIL, Microsoft teamed up with Compaq and Macromedia to develop HTML+Time (Timed Interactive Multimedia Extensions), which was submitted to the W3C in September 1998. HTML+Time mainly tries to extend SMIL into the browser, and without the need for any media server, HTML+Time extends HTML by adding a set of time-based attributes to its existing tag set.
The two main priorities in HTML+Time was to apply time attributes to any arbitrary HTML element, and to use these same attributes to provide a means of describing that same media element's integration with other multimedia elements in the presentation. This functionality includes specifying when a streaming media element is supposed to begin, or the length of its duration.
The trouble with SMIL that HTML+Time tries to fix, is that SMIL elements work in their own environment. HTML+Time takes this to the next level, by letting all the elements in a page interact with each other. As an example of this, HTML+Time lets you apply different timing to individual bullets in an HTML list, where SMIL does not. HTML+Time also lets Web developers specify how long elements in basic HTML pages should remain on a page, or be replaced by other elements - for instance in a succession of images.
Since HTML+Time is just an extension to HTML 4.0, it is, like SMIL, based on plain text. It uses HTML for display, CSS (see An Introduction to Cascading Style Sheets) for positioning and style reuse, XML for semantic data, and Namespaces for qualifying XML tags and HTML attributes. Through the HTML DOM (Document Object Model - see A Gift of "Life" : The Document Object Model), all the elements in a page can interact with each other, and participate in the presentation.
The capability to accelerate or decelerate a presentation has been downplayed in HTML+Time. In the current submission, it is only mentioned in Appendix D (http://www.w3.org/TR/1998/NOTE-HTMLplusTIME-19980918#AppendixD), that the capability to adjust pace should be minimal. The W3C staff comment (http://www.w3.org/Submission/1998/14/Comment) on HTML+Time, strongly disagrees with this recommendation for usability reasons, and it will be interesting to follow if this will be changed in future versions.
One major advantage of HTML+Time, is that it has already been built into a browser, namely Microsoft Internet Explorer 5. The implementation in IE5 is experimental, but it lets you start working with HTML+Time right away. You can go right ahead, and add images, video and sounds to your HTML page, and using HTML+Time, you can synchronize them over a specified amount of time, by just using a few new attributes to existing HTML elements and a few new XML-based elements.
It is interesting to note, that while the Web evolved around a set of standards (HTML, HTTP and TCP/IP for more info see WWW - How it all began), that made Web servers and browsers spring up like mushrooms, multimedia content has emerged on a different path. As opposed to using an established framework of data formats and protocols, early developers found themselves in the Wild West. This put several incompatible formats on the stage such as ActiveX, VRML, and QuickTime, and it is only in the last two years, that standard languages and protocols are starting to appear.
The new technologies all aim at combining many old analog medias into one new digital media, and they are all independent from their environment, meaning that they can be generated on any client machine, any operating system, served from any serving machine, and later received and played back on any client machine.
Where SMIL and HTML+Time are more built for the Web, than ASF is, SMIL has the problem of different proprietary techniques being worked into different browsers - just as we've seen happen to HTML in the last few years. To be fair to SMIL, you should think of SMIL as HTML1.0. Proof-of-concept more than a full-fledged system. SMIL currently does not work well with non-linear presentations and its ability to skip around in the timeline is buggy at best.
Even though SMIL can include ASF files, make sure to figure out, what media formats your chosen SMIL player supports, before you start developing SMIL presentations.
As previously stated HTML+Time was built to solve some of the outstanding issues with SMIL. Realistically HTML+Time will not replace SMIL. Where SMIL offers the potential for lightweight clients and support of legacy browser classes as well as next generation browsers, HTML+Time is intended for applications that require next generation browsers for interpretation.
Adding time to Web pages, moves the Web much closer to movie making. When the very first movies came out towards the end of the 19th century, movies were made by cameramen, because they understood the equipment. It took several years, before the director role was invented. A director being the person, that has to bring all the different skills together to make a successful movie, that fully utilizes the media, and not only understands the technical aspect of it.
Currently most Web pages are not only being created by people with a strong technical background, but too often also managed by people with a technical background. Before the Web truly will take off, and thus before time will become a first-class citizen on the Web, we need to re-invent the Director role, and put it to work on the Web.
With all the standards in place, it is no longer an all-or-nothing decision to create a streaming presentation. You do not have to commit to a vendor, you can mix and match as much as you want, and your users will be able to access and view your presentation using a different browser, even on a different operating system.
Today streaming presentations are already on some 400,000 Web pages. As it becomes easier and easier for everybody to broadcast, expect to see nothing less than a multimedia revolution, and millions of streaming presentations on the Web.