Published on: Saturday 16th May 1998 By: Janus Boye
RDF - the Ressource Description Framework - is a foundation for processing metadata; it provides interoperability between applications that exchange machine-understandable information on the Web. RDF emphasizes facilities to enable automated processing of Web ressources. RDF metadata can be used in a variety of application areas; for example: in resource discovery to provide better search engine capabilities; in cataloging for describing the content and content relationships available at a particular web site, page or digital library; by intelligent software agents to facilitate knowledge sharing and exchange; in content rating; in describing collections of pages that represent a single logical "document"; for describing intellectual proberty rights of Web pages, and in many others. RDF with digital signatures will be key to building the "Web of Trust" for electronic commerce, collaboration, and other applications.
Before we move onto the details, I need to tell you, that a draft document, is a work in progress. The Working Groups working with RDF at the W3C has not yet reached full consensus on all parts of RDF, and is continuing to refine the draft.
Even though it might be a bit premature to tell the story of a technology, that has barely started, I'll try to give it a shot.
No one individual or organization invented RDF; it is very much a collaborative design effort. RDF started as an extension of the PICS content describtion technology. It is now also drawing upon the XML design as well as technology submissions, such as Microsoft's XML-Data paper, SiteMap proposals, and the Dublin Core/Warwick Framework have also influenced the RDF design. Later on I'll show you a Dublin Core example....
Development of PICS was motivated by the anticipation of restrictions on the Internet such as some recent US legislation (the Communications Decency Act and it subsequent overruling by the Federal Supreme Court). More on PICS later.....
One of the earliest and very important metadata systems on the Web is the Meta Content Framework (MCF), a specification which was first introduced by Apple Computer in September 1996 and is still in use by hundreds of websites today. MCF was developed as a Navigator plug-in called "HotSauce," that was limited to providing site map applications, since it was not extensible. Netscape was among the first industry partners to support MCF, and initially they extended it to XML. Later Netscape submitted a formal proposal to the W3C in June 1997, plus helped form a W3C working group on RDF in September 1997.
The working group has since then been working on drawing from Netscape's MCF proposal, as well as the W3C Recommendation PICS, to define a new framework for viewing, manipulating and associating networked collections of information. Several existing W3C activities, including submissions on managing personal user preferences through OPS (Open Profiling Standard), defining push content channels using CDF (Channel Definitation Format), as well as parental controls described by PICS, are among the various more narrowly-focused applications now addressed by RDF.
Let's start of by talking about Metadata. Metadata is "data about data" or "information describing content." In HTML we have:
<meta name="keywords" content="rdf,xml,w3c">
and also the:
<meta name="description" content="This page is about RDF">
The next line gives you an description of the site; This page is about RDF. This is also used by search engines. When they display search results they most often display the description line.
In the context of RDF, metadata is "data describing web resources". RDF uses XML as the encoding syntax for the metadata. The resources being described by RDF are, in general, anything that can be named via a URI. The broad goal of RDF is to define a mechanism for describing resources that makes no assumptions about a particular application domain, nor defines the semantics of any application domain. The definition of the mechanism should be domain neutral, yet the mechanism should be suitable for describing information about any domain.
If you are having trouble with "information overload" you should look into metadata, as this will give more control over content. A big problem with HTML, is that there are too many different interfaces to metadata information. On one page, an author might use the following piece of code:
<meta name="Author" content="Janus Boye">
and some other author, that wanted to display the same information, could instead use:
<meta name="AuthorName" content="Janus Boye">
This shows some of the current problems, that search engines are facing. There's no current standard. What is really meant by Author? or what is AuthorName? and what's the difference? And also, what Janus Boye, are we talking about? The Janus Boye from Denmark, or some other Janus Boye?
If you are a publisher (isn't everybody on the internet a publisher), you would want to look into metadata, so that you could provide more information about your content. Today there is no complete and standard way to describe all aspects of website content.
There's currently also much redundancy, where describtion of site content requires multiple standards and multiple files. The current internet metadata is not widely used by publishers, which perhaps is caused by the lack of extensibility.
The systems currently used are proprietary, incompatible, and are not widely supported by software vendors.
Another point that makes metadata interesting for publishers, is that RDF introduces a uniform query capability for resource discovery. This could give a publisher much more information about their competition, without drowning in the "information overload".
PICS is a mechanism for communicating ratings of web pages from a server to clients; these ratings, or rating labels, contain information about the content of web pages: for example, whether a particular page contains a peer-reviewed research article, or was authored by an accredited researcher, or contains sex, nudity, violence, foul language etc. Instead of being a fixed set of criteria, PICS introduced a general mechanism for creating rating systems. Different organizations could rate content based on their own objectives and values, and users - for example, parents worried about their children's web usage - could set their browser to filter out any web pages not matching their own criteria.
One of the requirements for the RDF design is that it be able to express everything that a PICS-1.1 (Platform for Internet Content Selection) label can express, and that it be possible to automatically translate PICS-1.1 labels into RDF format without loss of information. Any future technical work on PICS will evolve it to using RDF. The W3C PICS Interest Group is chartered to decide when this transition is appropriate. Software and Web content using PICS-1.1 will remain a supported W3C Recommendation for as long as the market demands.
It is expected, that PICS-1.1 and an equivalent expression of PICS ratings in RDF will both be useful for quite some time.
A sitemap, is only one of the many things RDF offers, but it is very easy to implement.
As RDF is able to solve the complex problem of managing information across mulitple yet incompatible file formats, you'll be able to automatically generate a sitemap in software that uses RDF. This could be a sitemap of your desktop PC, that would give you an easy-to-use interface to unify all of the information you need, regardless of whether it resides on the Internet, a local network, in a legacy database, in an e-mail thread, or on your hard drive. This could be the ture Web-desktop integration.
The SiteMap could also be an automatically 24-hour updated SiteMap of your site. RDF would then integrate the metadata into the sitemap giving the navigational look and feel you need, plus all the information you want. When you move your mouse over a link, you could have a pop-up window with a description of the site, and the SiteMap could also provide search capabilites and much more.
The SiteMap would be written using XML syntax in standard ASCII text, either using an editor or special tools. It could either be an applet, or it could be a combined html,gif,jpeg file. It would use the "text/xml" MIME type, and would also have the capability to reference additional site maps (it might have one high level file, then separate files for each area of the site).
Without trying to get too complicated, I'll spend this chapter telling you about the different RDF components. I'll finish off with a nice looking example, so please stay with me.
At the core of RDF we have the RDF Data Model for representing named properties and their values. These properties serve both to represent attributes of resources (and in this sense correspond to usual attribute-value pairs) and to represent relationships between resources. The RDF data model is a syntax-independent way of representing RDF expressions.
The RDF Syntax is for expressing and transporting this metadata in a manner that maximizes the interoperability of indepently developed web servers and clients. The syntax uses the eXtensible Markup Language (XML).
Last, but not least, RDF Schemas are a collection of information about classes of RDF nodes, including properties and relations. RDF schemas are specified using a declarative representation language influenced by ideas from knowledge represention, e.g., semantic nets, frames, and predicate logic, as well as database schema representation models such as binary relational models, and graph data models.
RDF in itself does not contain any predefined vocabularies for authoring metadata. It is though expected that standard vocabularies will emerge, after all this is a core requirement for large-scale interoperability. Anyone can design a new vocabulary, the only requirement for using it is that a designating URI is included in the metadata instances using this vocabulary.
Without going further into the core, or starting to talk about Nodes, PropertyTypes, or Triples, I'll show you an example instead, that illustrate all these things:
What this picture tells you, is that "John Smith is the Author of the document whose URL is http://www.bar.com/some.doc". In RDF syntax this would be:
<?xml:namespace name="http://docs.r.us.com/bibliography-info/" as="BIB"?> <?xml:namespace name="http://www.w3.org/TR/WD-rdf-syntax#" as="RDF"?> <RDF:RDF> <RDF:Description RDF:HREF="http://www.bar.com/some.doc"> <BIB:Author>John Smith</BIB:Author> </RDF:Description> </RDF:RDF>
The above syntax represents the named properties, and their values (the Data Model), using the schemas in the 2 first lines, that'll provide you with more information about the different classes (Author and Description).
The first 2 lines tells the user agent, that we'll be using the schemas (or vocabulaires) from the 2 URL's. The first URL is on our own server, and contains information about the tags, that we've created and added to RDF. The next line is the W3C schema, and it contains the tags, that are recommended by the W3C. The schemas tells the browser what tags are legal, and what they mean, and therefore they are very important.
After that, you have the actual RDF code starting. The description line, tells the user agent, that we are descriping the document at http://www.bar.com/some.doc. The author line, tells the browser, that John Smth wrote this thing. The last 2 lines closes the RDF, in the same way you would close HTML-code.
As you'll see in the next chapter, the code is not much different in Dublin Core.
One obvious application for RDF is in the description of web pages. This is one of the basic functions of the Dublin Core (DC) initiative. The Dublin Core is a set of 15 metadata elements (such as Title, Subject, Publisher etc.) used to descibe resources on the Web. Dublin Core has gathered experts from the library world and the networking and digital library research communites. Dublin Core is intended to be usable by non-catalogers as well as by those with experience with formal resource description models.
Dublin Core is currently being used in many places, and is one of the foundations that RDF is building on.
I'll now show you an example, that uses the RDF syntax to encode Dublin Core metadata within HTML documents.
An inline Dublin Core example of this article would then have the following syntax:
<head> <xml> <?namespace href = "http://www.w3.org/schemas/rdf-schema" as = "RDF"> <?namespace href = "http://www.purl.org/RDF/DC/" as = "DC"> <RDF:RDF> <RDF:Assertion RDF:HREF = "...uri...." DC:Title = "...value..." DC:Creator= "...value..." /> </RDF:RDF> </xml> </head>
A real world example would then look like this:
<head> <xml> <?namespace href = "http://www.w3.org/schemas/rdf-schema" as = "RDF"> <?namespace href = "http://www.purl.org/RDF/DC/" as = "DC"> <RDF:RDF> <RDF:Description RDF:HREF="http://purl.org/metadata/dublin_core_elements" DC:Title = "The RDF article" DC:Creator = "Janus Boye" DC:Subject = "RDF, metadata, w3c" DC:Description = "This document tries to give some sort of idea, of what RDF has to offer you" DC:Publisher = "Internet Related Technologies" DC:Format = "text/html" DC:Type = "Technical Report" DC:Language = "en" DC:Date = "1998-05-05" /> </RDF:RDF> </xml> </head>
If you wanted to store the above text, in a separate file, you could do this, and instead include the following piece of code in your HTML-page:
<head> <LINK REL="meta" HREF="http://www.irt.org/articles/intro.rtf"> </head>
Once the web has been sufficiently "populated" with rich metadata, what can we expect? First, searching on the web will become easier as search engines have more information available, and thus searching can be more focused. Doors will also be opened for automated software agents to roam the web, looking for information for us or transacting business on our behalf. The web of today, the vast unstructured mass of information, may in the future be transformed into something more manageable - and thus something more useful.
In the future non-PC devices could also benefit, such as set-top electronic program guides defined in RDF.
The interest from the large browser vendors gives us hope, that large scale development of tools which understand about RDF will take place; this in turn, should lead to the widespread adoption of RDF on the web.
Netscape has announced that RDF will be key part of the "Aurora" component of their version 5 of Netscape Communicator. This component is still in beta testing, but was previewed at Seybold '97. "Aurora" will help users organize and manage all their information, allowing them to integrate content from all over place, and I think this is a nice way of illustrating how RDF will change the flow of information.
For Internet content providers or corporate developers who manage Intranets, using RDF can help provide a simple solution to a complex problem. For example, a developer will be able to create and deploy a simple RDF file that indexes a Web site, enabling users to see an entire map of the site. This could viewed in a customizable Java-applet, that would allow users to integrate their own information, such as bookmarks, local files and much more.
RDF is also not just limited to providing site maps or channel definitions. It can be used for any metadata application. Since there's currently no vocabularies, you could build your sitemap in MCF today, and convert to RDF tomorrow.
In short RDF has the power to elevate the status of the web from machine-readable to something we might call machine-understandable, and also do for applications what HTML did for content.
Introduction to RDF Metadata
W3C Resource Description Framework (RDF) Model and Syntax
Frequently Asked Questions about RDF
Netscape RDF press releases
Dublin Core Metadata