Published on: Sunday 3rd December 2000 By: Pankaj Kamthan and Hsueh-Ieng Pai
Electronic Commerce (E-Commerce) includes all aspects of business and market processes enabled by the Internet and the World Wide Web (Web) technologies. In the past few years, E-Commerce has tremendously benefited from the Web, with several companies offering products/services via the Internet. Conversely, E-Commerce has been a major factor in the evolution of the Web itself. Recent surveys predict that the trend in the dramatic growth of both Business-to-Business (B2B) and Business-to-Consumer (B2C) E-Commerce will continue.
The Extensible Markup Language (XML) is a meta-language which provides the syntax for markup languages. Since XML became a standard in form of a W3C Recommendation about two years ago, applications based on it have grown rapidly. It is therefore useful to examine the current state of interplay of XML and E-Commerce, as well as the problems that need to be overcome to maintain this synergy.
In the next section, we discuss the advantages that XML offers to businesses involved in E-Commerce.
XML offers several advantages to both the businesses and to their customers.
From a business's perpective, XML offers the following benefits:
From a customer's perspective, XML offers the following benefits:
In the next section, we take a look at the core components that are necessary to apply XML in the context of E-Commerce.
There are three major building blocks for the realization of XML in E-Commerce: XML vocabularies, protocols, and software.
By an XML vocabulary, we mean a language based on XML syntax. Several vocabularies have been created in different areas of business processes: accounting, billing, supply chain, syndication, and so on. They provide both the syntax and semantics based on which documents can be created. Not all of these efforts are standards. Inspite of that some of these lack maturity and technical quality, many hold promise, and reflect strong commitment to open standards initiatives.
XML does not by itself provide a mechanism of how the data objects are to be transported and interchanged. An increasingly important part of using XML in E-Commerce is the ability to interact with remote applications. This kind of interaction is part of machine-to-machine communication and is commonly modeled as a Remote Procedure Call (RPC), in which the client passes in parameters and then gets some kind of result in return. For example, for two servers exchanging XML data to confirm a price for an item or buy a product, a mutually agreed-upon protocol is necessary.
Like HTML, XML documents can be transmitted using the Hypertext Transfer Protocol (HTTP). The advantage of using this combination is that it is non-proprietary and Web-centric. However, the current version of HTTP lacks certain features, such as state persistence and lack of invoking remote procedures, required for business transactions and messaging. Another problem in this area is how to standardize a serialization and transport layer for XML messaging/Remote Procedure Calls. These problems are being addressed in several initiatives by using distributed object protocols that can be used to communicate with remote applications: in the next generation of HTTP (HTTP-NG), XML-RPC, Simple Object Access Protocol (SOAP), and XML-CORBA. All of these protocols provide a similar service, allowing a client to issue an RPC to a server application and then receive a response.
To "do" something with XML requires the use of a language with programmatic capabilities or software. This is the topic of the next section.
Software is the Soul of E-Business.
An XML conforming processor is a broad category that includes XML syntax checkers, parsers, and vocabulary-specific renderers. Some of them may be standalone while others may be integrated into a larger software, such as a Web server, an application server, or a full-blown browser.
One of reasons of widespread use and success of XML is that a large base of free and Open Source software is available in system programming languages (such as, C/C++ and Java) and scripting languages (such as, Perl, Python and Tcl). This gives an opportunity for businesses to experiment with and compare products without incurring major costs. They are not locked into a proprietary software, which once bought, could be found not-as-useful later.
Commercial software support pertaining to application of XML in E-Commerce is also getting mature and several implementations are available. The reasons for a business to consider a commercial software over a non-commercial one are that it is usually more stable, robust, is better documented and has good customer support.
In the global environment of the Web, platform independence and multilingual-support are basic requirements of information being delivered. Therefore, even though any programming language can be used to process XML data, and processors have actually been written in a variety of languages, some languages do have an advantage over others. The important aspect to note is, that it is those features mentioned above that determine the language-of-choice, not the other way around.
One such language is Java because of a strong synergy when cross platform applications are required and its built-in support for Unicode. The assertions such as "XML and Java - The Perfect Marriage" may hold true in Las Vegas, but in the programming world it is just being at the right place (Sun Microsystems supported the XML effort since its inception), at the right time (Java predates XML), in the right environment (the Web), and among the right people (SGML community with several years of academic and industry experience).
The starting point for any business involved in E-Commerce is a presence on the Web. There are at least two possible approaches for a business to make their XML data available on the Web:
The advantages of the second approach are:
These different approaches are more due to the current situation of XML rendering support than out of choice.
One of the major advantages of XML is that it can function as a low-level format for back-end storage and retrieval. This is the topic of the next section.
At the lowest level of any content management task is the nature of the data itself which is often stored in proprietary database management systems (DBMS). XML offers a variety of choices for the back-end format. The type of business determines the appropriate XML vocabulary: Accounting (Extensible Financial Reporting Markup Language), Banking (Bank Internet Payment System), Content Syndication (The Information and Content Exchange Protocol), Directory Services (Directory Services Markup Language), Human Resources (Human Resources Markup Language), Insurance (Life), Real Estate (Real Estate Listing Management System), and so on. If a suitable vocabulary-of-choice is unavailable for a business, a generic vocabulary with an inclination towards databases, could be used. An example is the Cold Fusion Markup Language, use of which requires the use of the Cold Fusion Server.
Management systems for based on some of the vocabularies outlined above are starting to become available. Besides that, a generic content management system that supports any XML-based format for storage, such as OmniMark, could also be used. OmniMark is specifically for preparing, organizing, manipulating, and distributing information with text-based and binary formats. OmniMark includes a powerful pattern matching language, advanced hypertext link manipulation abilities, a sophisticated rule-based language for processing XML documents, and seamless integration with external systems.
The advantage is all these cases is that besides internal use, the data could also be published on the Web, if required.
There is always the possibility for a business to design an XML-syntax-based markup language specific to their own business. This task should be seen as nontrivial, and one that requires careful consideration and cost.
XML is not only for presentation, but also for machine-to-machine communication. One of the major areas of application of this is in enterprise data interchange in B2B E-Commerce, and is the subject of the next section.
In a TCP/IP-based computer networking environment, the data communication takes place in form of "packets." The nature of data communication is serial (data packets are sent in a sequence, one after another). This requires data to be serialized. This serialized data has to be sent in some notation, which has to be understood by the receiving end. XML provides a standard notation for serialized data. There are other alternatives for serialized data, such as, Abstract Syntax Notation 1 (ISO standard ASN.1) or Electronic Data Interchange (EDI). But XML is a text-based format, which has numerous advantages over ASN.1 or EDI (ANSI ASC X12, UN/CEFACT), which are binary formats.
One of the earliest suggested applications of XML, were in the area medical systems. Hospitals, clinics and pharmaceutical agencies maintain medical records of patients that contain information on medical histories and billing data. The records are stored in databases of proprietary file formats, which poses several problems:
A technically feasible but monetarily impractical solution for seamless interchange of medical records is to replace the existing heterogeneous systems with a single standard system. An alternative solution is to adopt a single industry-wide XML vocabulary that serves as the single output format for all exporting systems and the single input format for all importing systems.
Data interchange takes place in several other contexts such as in industries involved in brokerage, syndication, supply-chain, and so on. We consider the case of supply-chain in the next section.
A supply-chain is a collection of interdependent steps that, when followed, accomplish a certain objective such as meeting customer requirements. According to the Electronic Commerce FAQ, supply-chain management is a generic term that encompasses the coordination of order generation, order taking, and offer fulfillment/distribution of products, services, or information. There are often, numerous "components" in a supply-chain, for example, manufacturers and parts suppliers, parcel shippers, senders and receivers, wholesalers and retailers. EDI has traditionally been used as the de facto data format for invoices, purchase orders, and other items. EDI has proved to be expensive particularly due to their use of Value-Added Networks (VANs), vendor-proprietary, and involves the use of specialized computer networks that are beyond the reach of small and medium-size companies.
The use of XML as the data format, and intranets/extranets as the networking infrastructure that leverages on the Web technology, offers tremendous benefits to all the business components involved in the supply-chain process: cost reduction, common data format, and increasing the possibility of partners from small and medium-size companies. Furthermore, the XML data format need not require proprietary software for processing and documents can (after validation, if necessary) automatically be stored in a database without much human intervention. Thus, XML provides a path to seamlessly exchange of reusable business documents of different types, resulting in frictionless E-Commerce across multiple trading communities.
Several vocabularies for supply-chain data are available. One choice is the Common Business Library (xCBL), an open XML-based vocabulary for the cross-industry exchange of business documents such as product descriptions, purchase orders, invoices, and shipping schedules. For businesses already using EDI standards, xCBL provides a transition path to an XML-based commerce capability.
Advertising and promoting new products on the Web form the core of a company's marketing activity involved in E-Commerce. It is extremely important that the graphics used to represent these products are accurate, widely accessible and aesthetically pleasing. Most Web graphics today are in raster graphical formats such as GIF or JPEG, which have several limitations such as they are slow in transmission, tend not to scale without loss of data, and are "unintelligent" they do not "carry" information that could be searched through. Scalable Vector Graphics (SVG) is a language for describing two-dimensional vector and mixed vector/raster graphics in XML that works well across platforms, across output resolutions, across colour spaces, across a range of available bandwidths.
There are several scenarios of use of SVG for business graphics. Companies that designs logos often develop multiple copies of templates, one for internal use (such as, in TIFF format) and the other for Web delivery (such as, in GIF) that are repurposed accordingly. With SVG, multiple copies are unnecessary, which saves space. Furthermore, modification becomes a simple exercise in search and replace, which can be carried out both quickly and efficiently. This saves time and effort, and thus can result in reduced cost.
There are companies that offer services of real-time map generation to travellers. The major issues here are vivid, accurate and timely for quick interpretation at the user's end. In the past, raster graphics has been used for delivery of maps over the Web. SVG, being a vector format, can provide a pinpointed accuracy of the desired destinations, and because of its scalability, also has the potential to be delivered on products for mobile access, such as a car phone.
Using animated banners on the Web is a popular way for business to advertise their products/services. These banners have evolved from the simple two or more frame-based GIF animations containing text-based graphics to the ones containing product pictures and slogans, to the extremely sophisticated ones, such as those using Macromedia Flash technology. Lot of effort goes in optimizing these for delivery. Development cycle and some of the effort can be reduced with the used of SVG for graphics. The animation component can be provided by the Synchronized Multimedia Integration Language (SMIL), which is a language based on XML that allows integrating a set of independent multimedia objects into a synchronized multimedia presentation. Using SMIL, an author can describe the temporal behavior and the layout of the presentation on a screen and associate hyperlinks with media objects. SMIL can also be used to create entire product presentations with audio commentaries that synchronize with the corresponding graphics or video.
Selling a product or a service marks only the beginning; providing support to the customer's satisfaction is one of the key factors of an E-Commerce-oriented business. This has been done traditionally using mailing lists, which only provide a text-based description. XML provides better ways of accomplishing this by using channels.
Channels provide a mechanism for automated delivery (broadcast) of personalized and up-to-date information via the Web. Unlike typical surfing, which relies on a pull method of transferring interactive Web pages, channels use push technologies.
Channel Definition Format (CDF) and Rich Site Summary (RSS) are two XML-based vocabularies for channels that can be used for delivery of news related to the company and its products/services. The Appendix presents two (fictitious) examples. Customer experience can be further enhanced by tailoring user's preferences and other characteristics (interest, profession, age, accessibility-specific) in a standard manner. Special purpose content syndication formats, such as Open Content Syndication (OCS), can also be used with Resource Description Framework (RDF) together with Dublin Core metadata schema.
Channels only inform the customer; they do not directly make the content of an entire site directly accessible. Once the customer visits the business site in pursuit of a desirable resource, the issue for businesses is to facilitate a mechanism for a timely resource discovery.
A standard mechanism for both content selection and resource discovery is provided by the RDF, an XML vocabulary which is a foundation for processing metadata (structured data about data). It provides interoperability between applications that exchange machine-understandable information on the Web. Use of RDF also facilitates searching, by helping authors to describe their documents in ways that browsers, search engines, and robots can "understand." As a result, users can have better Web document discovery services available to them. XML-based metadata has played a key role in automating customer services.
The state of XML is not as glorified as sometimes portrayed by trade publications. The integration of XML in all arenas of E-Commerce has several problems ranging from technical to social to political, eventually leading to a lack of standardization in several phases of business processes. Thus, use of XML in E-Commerce "gold rush" can be promising, but can also lead to various pitfalls.
This section points out some of the major issues faced by broad deployment of XML-related technologies in E-Commerce, as well as, suggests some solutions to overcome those. Areas where further research and development efforts are needed are also pointed out.
Publishing XML documents on the Web requires requisite support for rendering. However, even after XML has been two years in existence as a W3C Recommendation, rendering support for it has been less than satisfactory.
Generic XML browsers, such as JUMBO and Microsoft Internet Explorer, have limited use. They are syntax-sensitive, but without the knowledge of the semantics of an XML vocabulary, are unable to render corresponding documents. It may also be over-optimistic to expect a single browser to support all sorts of combinations of vocabularies, when it is known historically that support even for HTML or CSS has been incomplete and/or with proprietary extensions. Standalone renderers that are vocabulary-specific have limited capabilities because they do not implement other features that the users have become accustomed to.
Therefore, it is quite likely that development of browser plug-ins will be the key to rendering support in the short-term as they can be developed independently (of the browser vendor) using the plug-in Application Programming Interface (API) and have shorter development cycles than a full-blown browser. In fact, SVG and SMIL plug-ins and ActiveX controls for Netscape Communicator and Microsoft Internet Explorer, respectively, are already available.
XML documents can be verbose. For example, XML counterparts of EDI messages and SVG graphics can be prohibitively large. This has a direct impact on performance in terms of delivery and rendering at the user's end.
With disk space getting cheaper, network bandwidth getting cheaper/faster, and CPU's getting cheaper/faster, performance will become lesser of an issue. In addition, HTTP/1.1 can compress data on the fly. XMLZip provides another solution to this issue. When utilizing the XML DOM API, XML files can be compressed based on the node level in the XML document. On the client-side, an XML file can be selected and uncompressed according to the specific node the user is referencing, rather than uncompressing the entire document.
In any case, it will remain an important authoring practice to adhere to the guidelines for minimizing file sizes, such as, using style sheets (instead of inline style attributes), using whitespaces justifiably, and using the number of digits in floating-point numbers appropriately.
According to one of its design goals, XML was developed for use on the Internet. However, URLs, the mechanism for identifying XML resources lacks persistence. URLs move or disappear, or may just be inaccessible for several other reasons. As a result, there is no guarantee that an XML resource can be accessed with any certainty. Though there are some partial solutions (such as, indirection through PURL system and resource locations through the use of Public Identifiers, as in SGML CATALOGs) to improve access are currently available, they lack scalability and software support needed for general use. There are efforts for robustness in hyperlinks, but these schemes suffer from limitations such as they work for documents indexed by Web search engines and do not work with binary resources (which exist in significantly large numbers and in various forms on the Web). Also, there exists the possibility that a unofficial "copy" is being accessed and read; there is no support to identify the canonical version of a document in these approaches. Therefore, the problem of robust network access remains associated with XML vocabularies (as it does with HTML).
The risks associated with E-Commerce can be broadly classified into the categories of security, integrity, and privacy. Mutual trust is important both for B2B and B2C E-Commerce. Companies are reluctant to share information with other companies whose credibility they are unaware of. Customers are uncomfortable buying from a business they are unfamiliar with, and it has been shown that they consider transaction security and their privacy as very high priority. These traditional problems in E-Commerce have been pointed out to be one of the most important factors in the success of a business involved in E-Commerce. Lack of these factors can result in a decline in Web shopping, and thus lost business. Thus, XML must cope with them in order to be successfully deployed.
XML-based message standards are today proliferating across the world, just as
relational databases proliferated from the 1980s within individual companies. [...] This
is a complexity trap at least as large, and as dangerous, as the complexity trap in
multiple relational databases. In twenty years we failed to solve the relational
complexity trap. How will we fare with the much bigger XML complexity trap?
- Robert Worden, in XML E-Business Standards: Promises and Pitfalls
XML is an open standard, and innovation in business is a natural process. As a result, several XML vocabularies for business have appeared in different contexts by businesses that do not necessarily have common goals. Unfortunately, there is not just one XML-based vocabulary emerging, but many for different industry sectors, and even several within the same sector. Some of these initiatives are a proof-of-concept that lack robust testing, some are politically motivated that have conflicting interests, while currently most lack wide acceptance and standardization. Though this has given the user a plethora of choices, it has led to the state of chaos rather than convergence and interoperability.
There is no predetermined guideline as to what constitutes an element or an attribute. In absence of a lack of standardization, it is possible that there are overlaps, such as, element and attribute names of one vocabulary are identical to another. The situation can get worse if an attribute name in one vocabulary has identical meaning to an attribute name in another vocabulary, causing inconsistency and confusion. This also raises the potential danger of creating isolated islands of data in proprietary formats that are inaccessible to other applications contrary to XML philosophy.
Some solutions to this problem are (a combination of one or more) of the following:
The goal of creating a robust, open framework for E-Commerce vocabularies based on XML remains to be seen. One prospective initiative on the horizon is ebXML.
XML processors have no means of validating semantics even if they are declared informally in an XML schema. Semantic validation, however, can be important in several situations, such as, when two XML servers are communicating without human intervention. XML, however, does not express semantics by itself.
The meanings of element and attribute name are derived from the natural language(s)
they are based upon. It is true that the use of tag names in HTML, which seldom have any
correlation with the information being marked up, results in a loss of information, and
with a vocabulary based on XML syntax we do not. However, that should not mean to imply
that we now have "gained" semantics. For example, a tag
could mean a poster, a plate, or even a paragraph (in HTML). From a human perspective,
even "complete" natural language words used as element names can be ambiguous:
<order> mean related to law, mathematical number theory, or sales?
If it is related to business, from which vocabulary? Thus,
be human-readable, and occassionally human-interpretable, but is meaningless to computers.
The terms used in an XML schema are derived from a natural language. The
"meaning" assigned to the terms is based on intuition by assuming that the
developer of the schema used the words in the way we would expect. Natural languages may
have different interpretations across different regions and different cultures. Therefore,
any interpretations, in absence of supporting prose (documentation) and context
(namespaces), are misleading.
The need for semantic transparency in XML-based vocabularies in general, and businesses in particular, is vital for building interoperable systems for a collaborative industry endeavor. Without it, despite the standardization of XML vocabularies, interoperability will not be achieved. Furthermore, it will have an impeding effect on related activities such as transparent machine-to-machine data interchange and resource discovery.
One solution to the problem of semantic ambiguity is the use of shared ontologies. An ontology formalizes the concepts that are noteworthy to a community in such a way that everyone has a common level of understanding upon which future information exchange can be made. In the realm of E-Commerce, the role of defining shared ontologies for XML objects has been initiated by Ontology.Org.
The issues of democratization and consensus on common semantics raised here, are similar to that of the previous section.
XML can offer tremendous potential for businesses, developers, and consumers involved in E-Commerce. However, as with any business application, XML does not by itself offer prescription for a successful E-Commerce venture. It is only a standard foundation on which solutions can be built. The basis of these solutions will only be realized when, along with leveraging XML technology in its different forms, the challenges currently being faced are taken into consideration.
The following two examples are fictitious in the sense that the addresses (URLs) they provide do not physically exist (and if they do, that is purely coincidental).