Related items

Perspectives of XML in E-Commerce

You are here: irt.org | Articles | Extensible Markup Language (XML) | Perspectives of XML in E-Commerce [ previous next ]

Published on: Sunday 3rd December 2000 By: Pankaj Kamthan and Hsueh-Ieng Pai

Introduction
Advantages of XML Towards E-Commerce
- Advantages to Businesses
- Advantages to Consumers
The Infrastructure for XML in E-Commerce
Applications of XML in E-Commerce
Issues Facing Deployment of XML in E-Commerce
Conclusion
References
Appendix : "Webcasting" in E-Commerce Using XML-Based Channels

Introduction

Electronic Commerce (E-Commerce) includes all aspects of business and market processes enabled by the Internet and the World Wide Web (Web) technologies. In the past few years, E-Commerce has tremendously benefited from the Web, with several companies offering products/services via the Internet. Conversely, E-Commerce has been a major factor in the evolution of the Web itself. Recent surveys predict that the trend in the dramatic growth of both Business-to-Business (B2B) and Business-to-Consumer (B2C) E-Commerce will continue.

The Extensible Markup Language (XML) is a meta-language which provides the syntax for markup languages. Since XML became a standard in form of a W3C Recommendation about two years ago, applications based on it have grown rapidly. It is therefore useful to examine the current state of interplay of XML and E-Commerce, as well as the problems that need to be overcome to maintain this synergy.

In the next section, we discuss the advantages that XML offers to businesses involved in E-Commerce.

Advantages of XML Towards E-Commerce

XML offers several advantages to both the businesses and to their customers.

Advantages to Businesses

From a business's perpective, XML offers the following benefits:

Standardization. Standardization in information representation and transfer is crucial to both B2B and B2C E-Commerce. XML is platform- and application-independent, and vendor-neutral mechanism. XML relies on other technologies, in particular, SGML for syntax, URIs for name identifiers, EBNF for grammar, and Unicode for character encoding, which are all standards.
Manageability. The advantage of data being independent of any particular platform, application or vendor, is that it can be transformed to produce different types of outputs for different media devices (Web browser, paper, CD-ROM) without the need to modify the original content. When modifications are required, only the original version of the content need to be edited before republishing to the various target media. This leads to efficiency and ease-of-maintainability, without the inherent problems of version control and the effort required in making modifications in medium-specific document versions. This allows authors to concentrate on authoring rather than formatting and thus to be more productive.
Longevity. Proprietary/binary formats only last as along as the systems which support it. XML data exists as plain text. This gives data has a longer life span with future readability and reuse of data. Even if a system becomes obsolete, the data will live on and will remain accessible in the long term.
Business-to-Business Communication. Conducting E-Commerce requires communicating with other companies and often poses a challenge. XML simplifies business-to-business communication, particularly in vertical industries for the following reasons: (1) The only thing that is to be mutually agreed upon is the XML vocabulary that will be used to represent data. (2) Neither company has to know how the other's back-end systems (platforms, operating systems, programming languages) are organized, which does not put any extra technical burden while keeping the privacy. All that is required is that each company develop the mapping to transform XML documents into the internal format used by the back-end systems. (3) XML-based solution is scalable: If there is an addition of another partner, there is no need by the host company to interact with the systems of the new company. All that is required is that they follow the protocol (the XML vocabulary).
Freedom of Extensibility. XML, as a meta-language, provides a standard framework to create business-oriented markup vocabularies. This was much more difficult to with its predecessor Standard Generalized Markup Language (SGML) due to its complexity and lack of widely-available technical expertise.
Development. XML is simpler than SGML, and relatively easier to implement. This has the following practical implications: (1) One of the prime advantages of XML is that generic programs, such as parsers, can be developed that can be used with any XML vocabulary. Even though the programs themselves could be platform-dependent, the XML documents themselves are plain text, and thus (if they are in "pure" ASCII) platform-independent. Thus, a programmer can use tools from different vendors to process the (same) data. With several choices of XML processors already available, it also frees the programmer from having to write parsers from scratch. This allows the programmer to concentrate on other aspects of the application domain. (2) XML is usually referred to as "portable data" in the sense that its parsing is "application independent" and one XML parser can read every possible XML document. (3) The time and effort used in set-up of the programming environment (APIs, libraries, modules, software) can be used for different XML vocabularies. (For example, once Apache Web server is customized with Xerces-J, it can be used for multiple purposes.) (4) The skills attained in a project on one specific vocabulary are often transferable to others. (For example, once having learnt how to write XSLT style sheets for transforming SVG format to XHTML, the transition to transforming XML-RPC to SOAP is not steep.)
Structure-Presentation Decoupling and Authoring. The vocabularies based on XML syntax can keep the structure of the content separate from its presentation. This has long-term advantages, particularly towards document maintenance and towards automated machine-processing (for which presentation is irrelevant). This is a major improvement over HTML which allowed the mixture of structure and presentation, both implicitly and explicitly (it has presentation-oriented tags). This capability has a number of far-reaching benefits: (1) With XML, the burden can be shifted to certain extent from the server to the client-side. Browsers can do much of the work of processing. The content can be manipulated and rearranged. For example, items in an XML-encoded catalog can be sorted by price, availability, and size. Calculations can be performed to generate extra content on the fly and so on. Style sheets can provide presentation. (2) When supported by sufficient prose and documentation, XML documents can provide useful semantic clues, making searching, indexing, and locating information more accurate.
Human-to-Machine and Machine-to-Machine Interfaces. XML provides both machine-to-machine and human-to-machine interface via its "data-document dichotomy." XML is a data representation that has the characteristics of a document. This XML document could be a file, a record in a relational database, an object delivered by an Object Request Broker, and a stream of bytes arriving at a network socket. This concept is very powerful because it implies that the same information can be processed (data view) or can be presented (document view) in the same application at the same time. Even though it is expected that large documents will be processed by machines, they are still human-interpretable.
Internationalization. A major advantage of conducting business on the Web is that it broaden the customer-base towards globalization, without the necessity of having physical office locations. However, in order to communicate, a business must still "speak" the language of the region in context. With the Unicode support in XML, Web sites can be multilingual. XML also includes a method to signal what language and encoding is being used.

Advantages to Consumers

From a customer's perspective, XML offers the following benefits:

Personalization. Currently, there are many proprietary ways of accessing information in catalogs, stock data, and so on. XML improves the presentation of the information for its end use because it separates the structure of data from the way data is presented. When marked in an XML format, "smart" agents can customize the information to be viewed according to the preferences of the end user and the capabilities of the client device. An example of this is the use of Directory Services Markup Language (DSML). DSML enables companies to use directory information from, and exchange directory information with, their customers and partners, regardless of the specific directories at the remote Web sites.
Accessibility. Web is increasingly being accessed by customers using a variety of devices with differences in screen capabilities and computing power: desktop computers, personal digital assitants, cellular phones, and so on. It is important for every business to take into account the user's environment, which previously has only been possible using ad hoc and inexact techniques, such as client-side scripts or HTTP headers. XML, being platform-independent, scales well to provide information on these devices without the need for the user to make adjustments at his/her end. This customer convenience can measure in terms of success for the business.
Efficient Resource Discovery. Searching is one of the major activities for users on the Web, and also has been a source of frustration for various reasons (such as, imprecise indexing primarily due to lack of semantic interpretation of information by the robot combined with general query string leading to overwhelming number of irrelevant results, exponential increase in the number of documents, and so on). It is vital that business information made available be found by the intended audience. On a local scale, such as a Web site on an intranet, the results of query-response can be quite precise. (A global demonstration of precise query-response is yet to be seen and depends on several factors, including the proliferation of XML documents, and a "knowledge" by indexing robots of the semantics of respective vocabularies.)
Old is Gold. XML provides companies opportunities for customer services that did not exist previously. Corporate data that was previously stored in disparate sources and considered to be non-integrable, can be transformed in an XML format. By consolidating different data sources, opens the doors for the companies to make a variety of such data available to customers. It gives customers a powerful way to transact, manage and share data over the Web.

In the next section, we take a look at the core components that are necessary to apply XML in the context of E-Commerce.

The Infrastructure for XML in E-Commerce

There are three major building blocks for the realization of XML in E-Commerce: XML vocabularies, protocols, and software.

XML Vocabularies for E-Commerce

By an XML vocabulary, we mean a language based on XML syntax. Several vocabularies have been created in different areas of business processes: accounting, billing, supply chain, syndication, and so on. They provide both the syntax and semantics based on which documents can be created. Not all of these efforts are standards. Inspite of that some of these lack maturity and technical quality, many hold promise, and reflect strong commitment to open standards initiatives.

Protocols for Commercial XML Data Interchange

XML does not by itself provide a mechanism of how the data objects are to be transported and interchanged. An increasingly important part of using XML in E-Commerce is the ability to interact with remote applications. This kind of interaction is part of machine-to-machine communication and is commonly modeled as a Remote Procedure Call (RPC), in which the client passes in parameters and then gets some kind of result in return. For example, for two servers exchanging XML data to confirm a price for an item or buy a product, a mutually agreed-upon protocol is necessary.

Like HTML, XML documents can be transmitted using the Hypertext Transfer Protocol (HTTP). The advantage of using this combination is that it is non-proprietary and Web-centric. However, the current version of HTTP lacks certain features, such as state persistence and lack of invoking remote procedures, required for business transactions and messaging. Another problem in this area is how to standardize a serialization and transport layer for XML messaging/Remote Procedure Calls. These problems are being addressed in several initiatives by using distributed object protocols that can be used to communicate with remote applications: in the next generation of HTTP (HTTP-NG), XML-RPC, Simple Object Access Protocol (SOAP), and XML-CORBA. All of these protocols provide a similar service, allowing a client to issue an RPC to a server application and then receive a response.

To "do" something with XML requires the use of a language with programmatic capabilities or software. This is the topic of the next section.

XML Software : To "Do Something" with XML

Software is the Soul of E-Business.
- IBM

XML provides only syntax, not presentation or behaviour. Associating presentation or behavior with XML requires additional mechanisms, such as style sheets for presentation and applets or scripts for behaviour. For example, XML documents can be presented using Cascading Style Sheets (CSS) or Extensible Stylesheet Language (XSL). Documents can be manipulated using Java or ECMAScript (a standardization of JavaScript) using the Document Object Model (DOM), a standard interface that allows programs and scripts to dynamically access and update the content, structure and style of documents.

An XML conforming processor is a broad category that includes XML syntax checkers, parsers, and vocabulary-specific renderers. Some of them may be standalone while others may be integrated into a larger software, such as a Web server, an application server, or a full-blown browser.

One of reasons of widespread use and success of XML is that a large base of free and Open Source software is available in system programming languages (such as, C/C++ and Java) and scripting languages (such as, Perl, Python and Tcl). This gives an opportunity for businesses to experiment with and compare products without incurring major costs. They are not locked into a proprietary software, which once bought, could be found not-as-useful later.

Commercial software support pertaining to application of XML in E-Commerce is also getting mature and several implementations are available. The reasons for a business to consider a commercial software over a non-commercial one are that it is usually more stable, robust, is better documented and has good customer support.

Who Wants to be XML-Aware? : XML And "Perfect" Marriages

In the global environment of the Web, platform independence and multilingual-support are basic requirements of information being delivered. Therefore, even though any programming language can be used to process XML data, and processors have actually been written in a variety of languages, some languages do have an advantage over others. The important aspect to note is, that it is those features mentioned above that determine the language-of-choice, not the other way around.

One such language is Java because of a strong synergy when cross platform applications are required and its built-in support for Unicode. The assertions such as "XML and Java - The Perfect Marriage" may hold true in Las Vegas, but in the programming world it is just being at the right place (Sun Microsystems supported the XML effort since its inception), at the right time (Java predates XML), in the right environment (the Web), and among the right people (SGML community with several years of academic and industry experience).

Applications of XML in E-Commerce

This section puts "XML to work" by examining diverse areas of E-Commerce where XML can be deployed. One of the earliest XML applications towards E-Commerce were suggested by the "Father of XML."

Web Publishing

The starting point for any business involved in E-Commerce is a presence on the Web. There are at least two possible approaches for a business to make their XML data available on the Web:

Publishing XML Data Natively. The data can be associated with CSS or XSL style sheets for presentation. Use of CSS is preferable because currently XSL implementation in renderers is premature.
Transforming the XML Data to XHTML. The advantage of XML, as mentioned earlier, is that data can be created in one format and then repurposed in formats that are suitable for different media. This can be accomplished by transforming native XML data to Extensible Hypertext Markup Language (XHTML) 1.0, a reformulation of HTML in XML. The transformation itself can be done via scripting languages or by using XSL Transformations (XSLT), a language for transforming XML documents into other XML documents.

The advantages of the second approach are:

HTML was designed primarily for presentation on the Web. XHTML 1.0, derives its semantics from HTML, and therefore inherits all the powerful features of HTML, which made it the lingua franca for Web publishing.
The original XML data remains unchanged. Therefore, it can be [re]used for other tasks, including publishing it natively, if needed.
Compatibility of XHTML 1.0 with existing HTML browsers is readily possible by following a small set of guidelines. In fact, support for XHTML 1.0 is available is several widely-used browsers.
The support for forms that is required for user-interactions, such as, for query input in search engines and for selecting products in shopping carts, is currently experimental in XML. For these tasks, till support for XML forms gets stabilized, use of XHTML 1.0, which uses HTML syntax for forms, is a natural choice.

These different approaches are more due to the current situation of XML rendering support than out of choice.

One of the major advantages of XML is that it can function as a low-level format for back-end storage and retrieval. This is the topic of the next section.

Content Management

At the lowest level of any content management task is the nature of the data itself which is often stored in proprietary database management systems (DBMS). XML offers a variety of choices for the back-end format. The type of business determines the appropriate XML vocabulary: Accounting (Extensible Financial Reporting Markup Language), Banking (Bank Internet Payment System), Content Syndication (The Information and Content Exchange Protocol), Directory Services (Directory Services Markup Language), Human Resources (Human Resources Markup Language), Insurance (Life), Real Estate (Real Estate Listing Management System), and so on. If a suitable vocabulary-of-choice is unavailable for a business, a generic vocabulary with an inclination towards databases, could be used. An example is the Cold Fusion Markup Language, use of which requires the use of the Cold Fusion Server.

Management systems for based on some of the vocabularies outlined above are starting to become available. Besides that, a generic content management system that supports any XML-based format for storage, such as OmniMark, could also be used. OmniMark is specifically for preparing, organizing, manipulating, and distributing information with text-based and binary formats. OmniMark includes a powerful pattern matching language, advanced hypertext link manipulation abilities, a sophisticated rule-based language for processing XML documents, and seamless integration with external systems.

The advantage is all these cases is that besides internal use, the data could also be published on the Web, if required.

There is always the possibility for a business to design an XML-syntax-based markup language specific to their own business. This task should be seen as nontrivial, and one that requires careful consideration and cost.

XML is not only for presentation, but also for machine-to-machine communication. One of the major areas of application of this is in enterprise data interchange in B2B E-Commerce, and is the subject of the next section.

Data Interchange in a Distributed Networking Environment

In a TCP/IP-based computer networking environment, the data communication takes place in form of "packets." The nature of data communication is serial (data packets are sent in a sequence, one after another). This requires data to be serialized. This serialized data has to be sent in some notation, which has to be understood by the receiving end. XML provides a standard notation for serialized data. There are other alternatives for serialized data, such as, Abstract Syntax Notation 1 (ISO standard ASN.1) or Electronic Data Interchange (EDI). But XML is a text-based format, which has numerous advantages over ASN.1 or EDI (ANSI ASC X12, UN/CEFACT), which are binary formats.

One of the earliest suggested applications of XML, were in the area medical systems. Hospitals, clinics and pharmaceutical agencies maintain medical records of patients that contain information on medical histories and billing data. The records are stored in databases of proprietary file formats, which poses several problems:

In their current state, the medical records can not be made available on the Web. This precludes medical institutions from taking advantage of the Web technology for management.
Data from one database can not often be interchanged with the other, within the same country and across continents. This results in redundant duplication of data, unnecessary extra phases in data entry and delivery (such as, first printing the data and then entering it manually) and paper work.
In order to conduct clinical research, large amount of medical data is required ((to infer any reasonable statistics from them). Such research, at times, involves doctors from different countries. If there are different formats, in different agencies, in different countries, the data interchange problem becomes astronomical.

A technically feasible but monetarily impractical solution for seamless interchange of medical records is to replace the existing heterogeneous systems with a single standard system. An alternative solution is to adopt a single industry-wide XML vocabulary that serves as the single output format for all exporting systems and the single input format for all importing systems.

Data interchange takes place in several other contexts such as in industries involved in brokerage, syndication, supply-chain, and so on. We consider the case of supply-chain in the next section.

Supply-Chain Integration

A supply-chain is a collection of interdependent steps that, when followed, accomplish a certain objective such as meeting customer requirements. According to the Electronic Commerce FAQ, supply-chain management is a generic term that encompasses the coordination of order generation, order taking, and offer fulfillment/distribution of products, services, or information. There are often, numerous "components" in a supply-chain, for example, manufacturers and parts suppliers, parcel shippers, senders and receivers, wholesalers and retailers. EDI has traditionally been used as the de facto data format for invoices, purchase orders, and other items. EDI has proved to be expensive particularly due to their use of Value-Added Networks (VANs), vendor-proprietary, and involves the use of specialized computer networks that are beyond the reach of small and medium-size companies.

The use of XML as the data format, and intranets/extranets as the networking infrastructure that leverages on the Web technology, offers tremendous benefits to all the business components involved in the supply-chain process: cost reduction, common data format, and increasing the possibility of partners from small and medium-size companies. Furthermore, the XML data format need not require proprietary software for processing and documents can (after validation, if necessary) automatically be stored in a database without much human intervention. Thus, XML provides a path to seamlessly exchange of reusable business documents of different types, resulting in frictionless E-Commerce across multiple trading communities.

Several vocabularies for supply-chain data are available. One choice is the Common Business Library (xCBL), an open XML-based vocabulary for the cross-industry exchange of business documents such as product descriptions, purchase orders, invoices, and shipping schedules. For businesses already using EDI standards, xCBL provides a transition path to an XML-based commerce capability.

Marketing Products/Services

Advertising and promoting new products on the Web form the core of a company's marketing activity involved in E-Commerce. It is extremely important that the graphics used to represent these products are accurate, widely accessible and aesthetically pleasing. Most Web graphics today are in raster graphical formats such as GIF or JPEG, which have several limitations such as they are slow in transmission, tend not to scale without loss of data, and are "unintelligent" — they do not "carry" information that could be searched through. Scalable Vector Graphics (SVG) is a language for describing two-dimensional vector and mixed vector/raster graphics in XML that works well across platforms, across output resolutions, across colour spaces, across a range of available bandwidths.

SVG in E-Commerce

There are several scenarios of use of SVG for business graphics. Companies that designs logos often develop multiple copies of templates, one for internal use (such as, in TIFF format) and the other for Web delivery (such as, in GIF) that are repurposed accordingly. With SVG, multiple copies are unnecessary, which saves space. Furthermore, modification becomes a simple exercise in search and replace, which can be carried out both quickly and efficiently. This saves time and effort, and thus can result in reduced cost.

There are companies that offer services of real-time map generation to travellers. The major issues here are vivid, accurate and timely for quick interpretation at the user's end. In the past, raster graphics has been used for delivery of maps over the Web. SVG, being a vector format, can provide a pinpointed accuracy of the desired destinations, and because of its scalability, also has the potential to be delivered on products for mobile access, such as a car phone.

SMIL in E-Commerce

Using animated banners on the Web is a popular way for business to advertise their products/services. These banners have evolved from the simple two or more frame-based GIF animations containing text-based graphics to the ones containing product pictures and slogans, to the extremely sophisticated ones, such as those using Macromedia Flash technology. Lot of effort goes in optimizing these for delivery. Development cycle and some of the effort can be reduced with the used of SVG for graphics. The animation component can be provided by the Synchronized Multimedia Integration Language (SMIL), which is a language based on XML that allows integrating a set of independent multimedia objects into a synchronized multimedia presentation. Using SMIL, an author can describe the temporal behavior and the layout of the presentation on a screen and associate hyperlinks with media objects. SMIL can also be used to create entire product presentations with audio commentaries that synchronize with the corresponding graphics or video.

Customer Service : Content Syndication and Resource Discovery

Selling a product or a service marks only the beginning; providing support to the customer's satisfaction is one of the key factors of an E-Commerce-oriented business. This has been done traditionally using mailing lists, which only provide a text-based description. XML provides better ways of accomplishing this by using channels.

Channels provide a mechanism for automated delivery (broadcast) of personalized and up-to-date information via the Web. Unlike typical surfing, which relies on a pull method of transferring interactive Web pages, channels use push technologies.

CDF, RSS, OCS and RDF

Channel Definition Format (CDF) and Rich Site Summary (RSS) are two XML-based vocabularies for channels that can be used for delivery of news related to the company and its products/services. The Appendix presents two (fictitious) examples. Customer experience can be further enhanced by tailoring user's preferences and other characteristics (interest, profession, age, accessibility-specific) in a standard manner. Special purpose content syndication formats, such as Open Content Syndication (OCS), can also be used with Resource Description Framework (RDF) together with Dublin Core metadata schema.

Channels only inform the customer; they do not directly make the content of an entire site directly accessible. Once the customer visits the business site in pursuit of a desirable resource, the issue for businesses is to facilitate a mechanism for a timely resource discovery.

A standard mechanism for both content selection and resource discovery is provided by the RDF, an XML vocabulary which is a foundation for processing metadata (structured data about data). It provides interoperability between applications that exchange machine-understandable information on the Web. Use of RDF also facilitates searching, by helping authors to describe their documents in ways that browsers, search engines, and robots can "understand." As a result, users can have better Web document discovery services available to them. XML-based metadata has played a key role in automating customer services.

Issues Facing Deployment of XML in E-Commerce

The state of XML is not as glorified as sometimes portrayed by trade publications. The integration of XML in all arenas of E-Commerce has several problems ranging from technical to social to political, eventually leading to a lack of standardization in several phases of business processes. Thus, use of XML in E-Commerce "gold rush" can be promising, but can also lead to various pitfalls.

This section points out some of the major issues faced by broad deployment of XML-related technologies in E-Commerce, as well as, suggests some solutions to overcome those. Areas where further research and development efforts are needed are also pointed out.

Seeing is Believing : Rendering Support for XML

Publishing XML documents on the Web requires requisite support for rendering. However, even after XML has been two years in existence as a W3C Recommendation, rendering support for it has been less than satisfactory.

Generic XML browsers, such as JUMBO and Microsoft Internet Explorer, have limited use. They are syntax-sensitive, but without the knowledge of the semantics of an XML vocabulary, are unable to render corresponding documents. It may also be over-optimistic to expect a single browser to support all sorts of combinations of vocabularies, when it is known historically that support even for HTML or CSS has been incomplete and/or with proprietary extensions. Standalone renderers that are vocabulary-specific have limited capabilities because they do not implement other features that the users have become accustomed to.

Therefore, it is quite likely that development of browser plug-ins will be the key to rendering support in the short-term as they can be developed independently (of the browser vendor) using the plug-in Application Programming Interface (API) and have shorter development cycles than a full-blown browser. In fact, SVG and SMIL plug-ins and ActiveX controls for Netscape Communicator and Microsoft Internet Explorer, respectively, are already available.

XML File Sizes and Performance

XML documents can be verbose. For example, XML counterparts of EDI messages and SVG graphics can be prohibitively large. This has a direct impact on performance in terms of delivery and rendering at the user's end.

With disk space getting cheaper, network bandwidth getting cheaper/faster, and CPU's getting cheaper/faster, performance will become lesser of an issue. In addition, HTTP/1.1 can compress data on the fly. XMLZip provides another solution to this issue. When utilizing the XML DOM API, XML files can be compressed based on the node level in the XML document. On the client-side, an XML file can be selected and uncompressed according to the specific node the user is referencing, rather than uncompressing the entire document.

In any case, it will remain an important authoring practice to adhere to the guidelines for minimizing file sizes, such as, using style sheets (instead of inline style attributes), using whitespaces justifiably, and using the number of digits in floating-point numbers appropriately.

Accessibility and URL Persistence : Lost and Not Found

According to one of its design goals, XML was developed for use on the Internet. However, URLs, the mechanism for identifying XML resources lacks persistence. URLs move or disappear, or may just be inaccessible for several other reasons. As a result, there is no guarantee that an XML resource can be accessed with any certainty. Though there are some partial solutions (such as, indirection through PURL system and resource locations through the use of Public Identifiers, as in SGML CATALOGs) to improve access are currently available, they lack scalability and software support needed for general use. There are efforts for robustness in hyperlinks, but these schemes suffer from limitations such as they work for documents indexed by Web search engines and do not work with binary resources (which exist in significantly large numbers and in various forms on the Web). Also, there exists the possibility that a unofficial "copy" is being accessed and read; there is no support to identify the canonical version of a document in these approaches. Therefore, the problem of robust network access remains associated with XML vocabularies (as it does with HTML).

Legal and Ethical Issues

The risks associated with E-Commerce can be broadly classified into the categories of security, integrity, and privacy. Mutual trust is important both for B2B and B2C E-Commerce. Companies are reluctant to share information with other companies whose credibility they are unaware of. Customers are uncomfortable buying from a business they are unfamiliar with, and it has been shown that they consider transaction security and their privacy as very high priority. These traditional problems in E-Commerce have been pointed out to be one of the most important factors in the success of a business involved in E-Commerce. Lack of these factors can result in a decline in Web shopping, and thus lost business. Thus, XML must cope with them in order to be successfully deployed.

Security is Not an Option. A central concern in the development of E-Commerce on the Web is the trust that can be placed in the provenance, reliability, and security of information available over the Internet. One element of trust in E-Commerce is the ability to reliably associate a statement with the person or organization who made it. The Digital Signature Initiative provides a mechanism for signing documents and metadata in order to establish who made the statement.
Internet Privacy, Public Concern. In E-Commerce, there is a constant need for a business Web site to gain information about their customers and the need for these individuals to control the release of this information to others. The Platform for Privacy Preferences (P3P) addresses these two issues of meeting the data privacy expectations of consumers on the Web while assuring that the medium remains available and productive for E-Commerce. It provides communication about data privacy practices between customers and business Web sites as well as enhanced user control over the use and disclosure of personal information. P3P uses RDF as a format for making privacy statements, as well as, for exchanging data under user control. Business Web sites can thus use P3P to increase the level of confidence users place in their services. Using P3P for E-Commerce defines how the Ecommerce Modeling Language (ECML) can be used within P3P, and identifies privacy and security guidelines that companies can optionally employ to make E-Commerce safer for both consumers and merchants.
Content Control. E-Commerce practices are neither completely censored, nor democratized. Therefore, certain content targeted for one group may be unsuitable, and even illegal to access, for another. The Platform for Internet Content Selection (PICS) is a pair of protocols that allows labels to be applied to Internet content. The PICS Rules specification provides an interchange format that can be understood by machines and exchanged by users. The format can be used for filtering preferences, so that preferences can be easily installed or sent to search engines. These protocols can be used by businesses to design and distribute labels reflecting their views about the content, and thus gain consumer trust. For example, PICS provides a decentralized way of parental empowerment when addressing children's access to content requiring discretion. Therefore, they are more likely to permit their children to indulge in shopping on a business Web site if they see that the site is appropriately labelled. RDF labels supersede PICS effort in that they are able to express everything that PICS labels can express, but with additional features. Still, till the browser support for RDF becomes stable, PICS can be used (in, say, XHTML 1.0 documents).

SO MANY VOCABULARIES, SO LITTLE STANDARDIZATION

XML-based message standards are today proliferating across the world, just as relational databases proliferated from the 1980s within individual companies. [...] This is a complexity trap at least as large, and as dangerous, as the complexity trap in multiple relational databases. In twenty years we failed to solve the relational complexity trap. How will we fare with the much bigger XML complexity trap?
- Robert Worden, in XML E-Business Standards: Promises and Pitfalls

XML is an open standard, and innovation in business is a natural process. As a result, several XML vocabularies for business have appeared in different contexts by businesses that do not necessarily have common goals. Unfortunately, there is not just one XML-based vocabulary emerging, but many — for different industry sectors, and even several within the same sector. Some of these initiatives are a proof-of-concept that lack robust testing, some are politically motivated that have conflicting interests, while currently most lack wide acceptance and standardization. Though this has given the user a plethora of choices, it has led to the state of chaos rather than convergence and interoperability.

There is no predetermined guideline as to what constitutes an element or an attribute. In absence of a lack of standardization, it is possible that there are overlaps, such as, element and attribute names of one vocabulary are identical to another. The situation can get worse if an attribute name in one vocabulary has identical meaning to an attribute name in another vocabulary, causing inconsistency and confusion. This also raises the potential danger of creating isolated islands of data in proprietary formats that are inaccessible to other applications — contrary to XML philosophy.

Some solutions to this problem are (a combination of one or more) of the following:

XML Transformations. Transformation of one schema into another solves the problem of getting locked into a single proprietary format. Some software that do that are already available. For example, XML Authority can translate between several schemas including DTDs, XML-Data Reduced (XDR), and XML Schema. Scripts written in, for example, Perl, Python or Tcl, or XSLT style sheets can also be used for XML transformations. It will be crucial to ascertain that there is minimal (ideally, zero) loss of any information in these transformation processes. A major obstacle to the transformation approach is the "n-squared problem" — n vocabularies would require P(n,2) = n(n-1) transformations, a nonlinear growth as n increases. In any case, a strategy for large-scale transformation of documents is still desirable.
"Mix and Match" XML Documents. XML Namespaces provide a simple method for qualifying element and attribute names used in XML documents by associating them with namespaces identified by URI references. This makes it possible to create documents which can include elements and attributes from multiple vocabularies without the possibility of a name collision.
XML Schema Registries/Repositories. Registries/repositories for XML vocabularies and their respective schemas (see, for example, XML.ORG or BizTalk) are an important step towards interoperability in E-Commerce business processes. However, for their utility to be widely realized, it will be necessary that they function more than just "static" repositories which merely cite and catalog. One of the desirable requirements is having a real-time source for schema location during a key activity, such as, transactional processing. Otherwise one may be using a non-canonical copy.

The goal of creating a robust, open framework for E-Commerce vocabularies based on XML remains to be seen. One prospective initiative on the horizon is ebXML.

Semantic Opaqueness

XML processors have no means of validating semantics even if they are declared informally in an XML schema. Semantic validation, however, can be important in several situations, such as, when two XML servers are communicating without human intervention. XML, however, does not express semantics by itself.

The meanings of element and attribute name are derived from the natural language(s) they are based upon. It is true that the use of tag names in HTML, which seldom have any correlation with the information being marked up, results in a loss of information, and with a vocabulary based on XML syntax we do not. However, that should not mean to imply that we now have "gained" semantics. For example, a tag <p> could mean a poster, a plate, or even a paragraph (in HTML). From a human perspective, even "complete" natural language words used as element names can be ambiguous: Does <order> mean related to law, mathematical number theory, or sales? If it is related to business, from which vocabulary? Thus, <order> may be human-readable, and occassionally human-interpretable, but is meaningless to computers. The terms used in an XML schema are derived from a natural language. The "meaning" assigned to the terms is based on intuition by assuming that the developer of the schema used the words in the way we would expect. Natural languages may have different interpretations across different regions and different cultures. Therefore, any interpretations, in absence of supporting prose (documentation) and context (namespaces), are misleading.

The need for semantic transparency in XML-based vocabularies in general, and businesses in particular, is vital for building interoperable systems for a collaborative industry endeavor. Without it, despite the standardization of XML vocabularies, interoperability will not be achieved. Furthermore, it will have an impeding effect on related activities such as transparent machine-to-machine data interchange and resource discovery.

One solution to the problem of semantic ambiguity is the use of shared ontologies. An ontology formalizes the concepts that are noteworthy to a community in such a way that everyone has a common level of understanding upon which future information exchange can be made. In the realm of E-Commerce, the role of defining shared ontologies for XML objects has been initiated by Ontology.Org.

The issues of democratization and consensus on common semantics raised here, are similar to that of the previous section.

Conclusion

XML can offer tremendous potential for businesses, developers, and consumers involved in E-Commerce. However, as with any business application, XML does not by itself offer prescription for a successful E-Commerce venture. It is only a standard foundation on which solutions can be built. The basis of these solutions will only be realized when, along with leveraging XML technology in its different forms, the challenges currently being faced are taken into consideration.

References

Electronic Commerce FAQ - Center for Research for Electronic Commerce, University of Texas, Austin, USA. "Electronic Commerce FAQ presents an overview of the electronic marketplace and its broader impacts on business organization, market processes and economic issues such as pricing and product choice strategies."
Robust Hyperlinks Cost Just Five Words Each - By Thomas A. Phelps and Robert Wilensky, University of California, Berkeley, USA.
XML.ORG - XML.ORG is a source of information about the application of XML in industrial and commercial settings and to serve as a reference repository for specific XML standards such as vocabularies, DTDs, schemas, and namespaces.
BizTalk.org - BizTalk is an industry initiative supported by a wide range of organizations with the goal of driving the rapid, consistent adoption of XML to enable electronic commerce and application integration. See also BizTalk section of the Microsoft site.
XML, Java, and the Future of the Web - By Jon Bosak, Sun Microsystems. Applications of XML towards Data Interchange, Web Agents, and User Selection.
W3C and Electronic Commerce - Thierry Michel (Editor). W3C Note, January 7, 2000. This document describes the current W3C activities related to Electronic Commerce for the purpose of assessing the Consortium's future role in ecommerce-related work.
Extensible Stylesheet Language (XSL) - Sharon Adler et al. (Editors). W3C Working Draft, March 1, 2000.
Document Object Model (DOM) Level 1 - Vidur Apparao, et al. (Editors). W3C Recommendation, October 1, 1998.
Extensible Markup Language (XML) 1.0 Specification - Tim Bray, Jean Paoli, C. M. Sperberg-McQueen (Editors). W3C Recommendation, February 10, 1998.
Namespaces in XML- Tim Bray, Dave Hollander, Andrew Layman (Editors). W3C Recommendation, January 14, 1999.
Cascading Style Sheets, level 2 (CSS2) Specification - Bert Bos, H�kon Wium Lie, Chris Lilley, Ian Jacobs (Editors). W3C Recommendation, May 12, 1998.
XSL Transformations (XSLT) Version 1.0 - James Clark (Editor). W3C Recommendation, November 16, 1999.
XML-Signature Syntax and Processing - Donald Eastlake, Joseph Reagle, David Solo (Editors). W3C Working Draft, February 28, 2000.
Channel Definition Format (CDF) - Castedo Ellerman. W3C Note, March 10, 1997.
Scalable Vector Graphics (SVG) - Jon Ferraiolo (Editor). W3C Working Draft, March 3, 2000.
Synchronized Multimedia Integration Language (SMIL) 1.0 Specification- Philipp Hoschka (Editor). W3C Recommendation, June 15, 1998.
Resource Description Framework (RDF) Model and Syntax Specification - Ora Lassila, Ralph R. Swick (Editors). W3C Recommendation, February 22, 1999.
The Platform for Privacy Preferences 1.0 (P3P1.0) Specification - Massimo Marchiori (Editors). W3C Working Draft, February 11, 2000.
Using P3P for E-Commerce - Joe Coco, Saul Klein, Dan Schutzer, San-Yuan Yen, Alan Slater (Editors). W3C Note, November 29, 1999.
XHTML^™ 1.0: The Extensible HyperText Markup Language — A Reformulation of HTML 4.0 in XML 1.0 - Steve Pemberton, et al. (Editors). W3C Recommendation, January 26, 2000.

Appendix : "Webcasting" in E-Commerce using XML-Based Channels

The following two examples are fictitious in the sense that the addresses (URLs) they provide do not physically exist (and if they do, that is purely coincidental).

Related items

XML Conformance : The Burden of Proof

XML Entities and their Applications

XMLization of Graphics

XML Euphoria in Perspective

XML and CSS : Structured Markup with Display Semantics

XML Namespaces : Universal Identification in XML Markup

The Emperor has New Clothes : HTML Recast as an XML Application

XML - What's in it for us?

Feedback on 'Perspectives of XML in E-Commerce'

Friday January 25th, 2002 at 23:17:48 - Ramesh Bhardwaj