Sei sulla pagina 1di 8

D-Lib Magazine

July/August 2005
Volume 11 Number 7/8

ISSN 1082-9873

Border Crossings
Reflections on a Decade of Metadata Consensus Building
Stuart L. Weibel
Senior Research Scientist
OCLC Research
<weibel@oclc.org>

In June of this year, I performed my final official duties as part of the Dublin Core Metadata
Initiative management team. It is a happy irony to affix a seal on that service in this journal, as
both D-Lib Magazine and the Dublin Core celebrate their tenth anniversaries. This essay is a
personal reflection on some of the achievements and lessons of that decade.

The OCLC-NCSA Metadata Workshop took place in March of 1995, and as we tried to understand
what it meant and who would care, D-Lib magazine came into being and offered a natural venue
for sharing our work [16]. I recall a certain skepticism when Bill Arms said "We want D-Lib to be
the first place people look for the latest developments in digital library research." These were the
early days in the evolution of electronic publishing, and the goal was ambitious. By any measure, a
decade of high-quality electronic publishing is an auspicious accomplishment, and D-Lib (and its
host, CNRI) deserve congratulations for having achieved their goal. I am grateful to have been a
contributor.

That first DC workshop led to further workshops, a community, a variety of standards in several
countries, an ISO standard, a conference series, and an international consortium. Looking back on
this evolution is both satisfying and wistful. While I am pleased that the achievements are
substantial, the unmet challenges also provide a rich till in which to cultivate insights on the
development of digital infrastructure.

The Achievements

When we started down the metadata garden path, the term itself was new to most. The known Web
was less than a million pages, people tried to bribe their way into sold-out Web conferences, and
the term 'search engine' was as yet unfamiliar outside of research labs. The OCLC-NCSA Metadata
Workshop brought practitioners and theoreticians together to identify approaches to improve
discovery. In two and a half days, an eclectic Gang of 52 (we affectionately described ourselves as
'geeks, freaks, and people with sensible shoes') brought forward a core element set upon which
many resource description efforts have since been based.

The goal was simple, modular, extensible metadata – a starting place for more elaborate
description schemes. From the thirteen original elements we grew to a core of fifteen, and later
elaborated the means for refining those rough categories. In recent years much work has been done
on the modular and extensible aspects, as application profiles have emerged to bring together terms
from separate vocabularies [9].

A Consensus Community

The workshop series coalesced as a community of people from many countries and many domains,
drawn by the appeal of a simple metadata standard. Openness was the Prime Directive, and early
progress was often marked by the contentious debate of consensus building. But our belief that
value would emerge from many voices informed our deliberations, and still does. Not without
difficulty: in one early meeting, participants spent an hour of scarce plenary time talking about
Type before realizing that the librarians and the computer scientists had been talking about
completely different concepts. Crossing borders is often difficult.

This open, inclusive approach to problem solving helped the Dublin Core community to frame the
metadata conversation for the past decade. The Dublin Core brand has been for some years the first
link returned for the Google search term "metadata", and for a time, it outranked all other results
for the search "Dublin" (as of this writing, it is #6). With only moderate irony, we might say "I feel
lucky!"

Process

As a workshop series evolved into a set of standards and a community, the need for rules and
governance evolved as well. DCMI developed a process for evaluating proposed changes and
bringing them into conformance with the overall standard [5]. The DCMI Usage Board is
comprised of knowledgeable, experienced metadata experts from five countries who exercise
editorial guidance over the evolution of DCMI terms and their conformance with the DCMI
Abstract Model [13].

This model itself is among the most important of the achievements of the Initiative, representing as
it does the convergence of theory and practice over a decade of vigorous debate and practical
implementation. It emerged from early intuition and experience, informed by an evolving sense of
grammatical structure [2,6] and further refined by a long co-evolution with the W3C's Resource
Description Framework (RDF) and the Semantic Web.

At a higher level, DCMI has a Board of Trustees [1], who oversee operations and do strategic
planning, and an Affiliate Program and governance structure that distributes the cost of the
initiative and assures that the needs of stakeholders are accommodated [3]. At the time of this
writing, there are four national DCMI Affiliates and several more in discussion.

Internationalization

The global nature of the Web demands commitment to internationalization. The difficulties of
achieving system interoperability in multiple languages are immense, and still only partially solved
(anyone used IRIs recently?). Nonetheless, DCMI has succeeded in attracting translations of its
basic terms in 25 languages and offers a multilingual registry infrastructure of global reach [14].
The venues for the workshops and conferences have been chosen to make the Initiative accessible
to people in as many places as possible. Workshops and conferences are held in the Americas,
Europe, and Austral-Asia on a rotating basis, and Dublin Core principals have given talks on every
continent save Antarctica. This policy of international inclusion has been a philosophic mainstay
for the Initiative, attracting long-term participation from around the world.

Where we were confused

Confusions and unmet challenges are both interesting and instructive. A few of these are historical
curiosities, and interesting mostly as a source of wry humility. Others represent unsolved dilemmas
that remain prominent challenges for the metadata world in general.

Author-created Metadata

The idea of user-created metadata is seductive. Creating metadata early in the life cycle of an
information asset makes sense, and who should know the content better than its creator? Creators
also have the incentive of their work being more easily found – who wouldn't want to spend an
extra few minutes with so much already invested?

The answer is that almost nobody will spend the time, and probably the majority of those who do
are in the business of creating metadata-spam. Creating good quality metadata is challenging, and
users are unlikely to have the knowledge or patience to do it very well, let alone fit it into an
appropriate context with related resources. Our expectations to the contrary seem touchingly naïve
in retrospect.

The challenge of creating cost-effective metadata remains prominent. As Erik Duval pointed out in
his DC-2004 keynote, 'Librarians don't scale' [7]. We need automated (or at least, hybrid) means
for creating metadata that is both useful and inexpensive.

What is metadata for?

Another naïve assumption was that metadata would be the primary key to discovery on the Web.
While one may quibble about the effectiveness of unstructured search for some purposes, it is the
dominant idiom of discovery for Web resources, and may be expected to remain so. What then, is
metadata for?

There are many answers to this question, though given the high stakes in the search domain, expect
these answers to shift and weave for the foreseeable future. Searching the so-called 'dark web'
remains a function of gated access, and metadata is a central feature of such access. One might
simply say – harvest and index. OCLC's exposure of WorldCat assets in search engines such as
Google and Yahoo is exemplary of this approach [11]. Indexed metadata terms connect users to the
location of the physical assets via holdings records, but it is reasonable to ask... would simple, full-
text indexing of these assets be better still? We may argue the fine points today but in the future,
we'll know the answer, for the day of digitization is fast upon us.

Structured metadata remains important in organizing and managing intellectual assets. The
Canadian Government's approach to managing electronic information illustrates this strategy [4].
Metadata becomes the linkage relating content, legislative mandates, reporting requirements,
intended audience, and various other management functions. One does not achieve this sort of
functionality with unstructured full text.

The International Press Telecommunications Council is exploring embedding Dublin Core in their
new generation of news standards [17]. No domain is more digitally now than this one. If you want
to know the value of structured metadata, look to the requirements and business cases in such
communities [10].

Similarly, in the management of intellectual property rights, well-structured data is essential, and
as these requirements become ubiquitous, the creation and management of metadata will be central
to the story.

Metadata for images is a critical use. Association of images with text makes them discoverable.
When the asset is a stand-alone image, metadata is the primary avenue by which they can be
accessed. Picture Australia is an early and enduring (and widely copied) model in this area,
showing how a photo archive can become a primary cultural heritage asset through the addition of
systematic search tools and Web accessibility [12].

There is much talk of taxonomies, their strengths, and deficiencies these days and in fact the
emergence of 'folksonomies' hints at a sea change in the use of vocabularies to improve
organization and discovery [9]. The Dublin Core community has struggled with the role of
controlled vocabularies, how to declare and use them, and how important (or impotent?) they
might be. The notion that uncontrolled vocabularies – community-based, emergent vocabularies –
might play an important role in aggregation and discovery occasions a certain discomfort for those
schooled in formal information management. Whether it is just the latest fad, or an important
emerging trend, remains to be seen.

A Major Unmet Challenge

Entropy is an arrow. In the absence of constant care and fussing, our greatest successes break
down. Failures, however, remain potent without much attention, retaining their power to impede.

One of the yet-unsolved problems in the metadata community is the railroad gage dilemma. The
first editor of D-Lib, Amy Friedlander, introduced me to the notion of train gages as metaphor for
interoperability challenges [8]. Last year I rode that metaphor from Beijing to Ulan Bator,
Mongolia. A cursory knowledge of Asian history reminds us that relations between Mongolia and
China have been less-than-cordial from time to time, and this history remains manifest at the Gobi
border crossing today. In the dark of night, the Beijing train of the Trans-Siberian Railway pulls
into a longhouse of clanking and switching as the entire train is raised on hydraulic jacks. Chinese
bogeys (wheel carriages) are rolled out, and Mongolian bogeys of a different gage are rolled in.
Border guards with comically high hats (and un-comical sidearms) work their way through the
train cars in the manner of border guards everywhere. After a couple of hours, the train is rolling
through the Gobi anew.

It is a fascinating display of technological diplomacy – a kind of Maginot line that helps those on
both sides of the border sleep better. These images belong to a Bogart movie or a Clancy novel, but
their abstraction pervades the metadata arena.

Stacked bogeys, ready to be rolled into use. Photo by Stuart Weibel.


A railroad car raised on one of dozens of hydraulic jacks that raise an entire train at once for
the exchange of bogeys. Photo by Stuart Weibel.

We load our metadata into structures in one domain and when we cross borders we unload it,
repackage it, massage it to something slightly different, and suffer a measure of broken semantics
in the bargain. We're running on different gages of track, manifested in different data models,
slightly divergent semantics, and driven by related, but meandering, often poorly-understood
functional requirements. Crosswalks are the hydraulic jacks – quieter, but no more efficient than
the clanking and grinding in the train longhouse.

Metadata standards specify the means to make (mostly) straightforward assertions about resources.
Many of these assertions are as simple as attribute-value pairs. Others are more complex, involving
dependencies or hierarchies. None are so complicated that they cannot be accommodated within a
common formal model. Yet we do not have such a model in place. Why?

• NIH (Not Invented Here) Syndrome is often blamed for disparities that emerge in solutions
from separate domains targeted at similar problems. Certainly our propensity to like our
own ideas better than those of others plays a role, but my view is that it is not such a large
role.
• Developments take place in parallel. It is unusual to have the luxury of waiting to see how
another group is approaching a particular problem before tackling it yourself. It is quite
hard enough to know what is happening in one's own community, let alone to follow related
developments in others, whose differences in terminology obscure what we need to know.
• The functional requirements of various metadata standards are often ambiguous and always
focused slightly differently. DCMI focuses on simple, extensible, high-level metadata.
IEEE LOM (Learning Object Metadata) also concerns itself with discovery metadata, but
focuses more strongly on educational process descriptors. MPEG is about media, where
technical image metadata is central, and intellectual property rights management is crucial.
MODS is grounded firmly in the legacies of MARC (and the world's largest installed base
of resource discovery systems).
• The cost of collaboration – in intellectual as well as financial terms – is high. People have
to know and trust one another, which generally requires face-to-face engagement:
transporting ourselves and our ideas to other time zones, surviving frequent-flyer-flues,
finding the means to support travel costs, and missing baseball games of our children.
• The problems are more complicated than we imagine at the outset. The recent approval of
the Dublin Core Abstract Model by DCMI is the culmination of a journey that began
almost at the outset of the Initiative. Early attempts, under the guise of the DC Data Model
Working Group, rank among my most contentious professional experiences. To borrow
from the oldest joke of the Dismal Profession, put all the data modelers in the world end to
end, and you won't reach a conclusion (we did, but it took ten years to manage it).

The idea of achieving similar consensus across communities with their own legacies of such
conflict is daunting in the extreme, though recent discussions on this topic with colleagues in
another metadata community remind me that hopefulness and optimism are as much a part of our
domain as contention [18].

Collaboration and consensus in the digital environment

The Web demands an international, multicultural approach to standards and infrastructure. The
costs in time and treasure are substantial, and the results are uncertain. Paying for collaboration
that spans national boundaries, language barriers, and the often-divergent interests of different
domains is a major part of these challenges. Doing this while sustaining forward progress and
attracting a suitable mix of contributors, reviewers, implementers, and practitioners, is particularly
difficult.

A recent presentation by Google's Adam Bosworth, referenced in the Blandiose blog [15], makes
for provocative reading for those debating the costs and benefits of heavy-weight versus light-
weight standards. The tension between these approaches sharpens designers and practitioners (and
especially, entrepreneurs), to the eventual benefit of users. Any standards activity ignores this
balancing act at its peril.

As we try to foment change and react to it at once, we are like Escher's Hands – designing the
future as it, in turn, designs us... except that there are often implements other than pencils in those
hands. Ever try explaining what you do for a living to your mother? In the Internet standards arena,
conveying an appropriate balance of glee, terror, satisfaction, frustration, and pure wonder is no
easy task. I just tell her I'm not a real librarian, but I play one on the Internet. It seems enough.

Acknowledgements

I wish to acknowledge my personal debt to uncountable colleagues in the Dublin Core community,
and my deep sense of gratitude for the opportunity to have played the role I have. The patience,
forbearance, and generosity of the support of OCLC management in supporting my efforts and
DCMI in general, have been singular and essential.

Thomas Baker reviewed and improved this manuscript with several insightful suggestions.

Amy Friedlander and Bonnie Wilson, successive editors of D-Lib, have made me look better than I
am in these pages for 10 years. Congratulations to them and to all who have helped make this
journal (and its authors) what they are.

References and Notes

[1] About the Initiative DCMI Website, accessed June 23, 2005
<http://dublincore.org/about/>.

[2] Baker, Thomas


"A Grammar of Dublin Core"
D-Lib Magazine, October 2000
Volume 6 Number 10
<doi:10.1045/october2000-baker>.

[3] DCMI Affiliate Program


DCMI Website, accessed June 23, 2005
<http://dublincore.org/about/affiliates/>.

[4] Committee of Federal Metadata Experts Metadata Action Team,


Council of Federal Libraries.
Government of Canada Metadata Implementation Guide For Web Resources
3rd edition - July 2004
<http://www.collectionscanada.ca/6/37/s37-4016-e.html>.

[5] DCMI Usage Board


DCMI Usage Board Mission and Principle
DCMI Website, June 11, 2003
<http://dublincore.org/usage/documents/mission/>.

[6] DCMI Usage Board


DCMI Grammatical Principles
DCMI Website, 2003-11-18
<http://dublincore.org/usage/documents/principles/>.

[7] Duval, Erik and Wayne Hodgins


"Making metadata go away: Hiding everything but the benefits"
Keynote address at DC-2004
Shanghai, China, October 2004
<http://students.washington.edu/jtennis/dcconf/Paper_15.pdf>.

[8] Friedlander, Amy


Emerging Infrastructure: The Growth of Railroads
Infrastructure History Series, CNRI, 1995
<http://www.cnri.reston.va.us/series.html#rail>.

[9] Mathes, Adam


Folksonomies - Cooperative Classification and Communication Through Shared Metadata
Computer Mediated Communication - LIS590CMC Graduate School of Library and Information
Science, University of Illinois Urbana-Champaign.
December 2004
<http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html>.

[10] News Architecture Version 1.0 Metadata Framework Business Requirements


IPTC Standards Draft, 2005
<http://iptc.org/pdl.php?fn=DRAFT-NAR_1.0-spec-NMDF-BusReq_34.pdf>.

[11] Open Worldcat Program


OCLC Website, accessed June 23, 2005
<http://www.oclc.org/worldcat/open/default.htm>.

[12] Picture Australia Hosted by the National Library of Australia


Website accessed June 23, 2005
<http://www.pictureaustralia.org/>.

[13] Powell, Andy; Mikael Nilsson, Ambjörn Naeve, and Pete Johnston.
DCMI Abstract Model. DCMI Website, 2005-03-07
<http://dublincore.org/documents/abstract-model/>.

[14] Wagner, Harry and Stuart Weibel


"The Dublin Core Metadata Registry: Requirements, Implementation, and Experience"
Journal of Digital Information
Accepted for publication, May, 2005.

[15] "Web of Data"


Blandiose blog, 2005-04-21
<http://www.blandiose.org/index.php?s=bosworth&submit=Search>.

[16] Weibel, Stuart


Metadata: the Foundations of Resource Discovery. D-Lib Magazine, July, 1995 Volume 1, Number
1 doi:10.1045/july95-weibel

[17] Wolf, Misha


DC in XHTML2
Semantic Web and DC-General Mailing lists, June 7, 2005
<http://lists.w3.org/Archives/Public/semantic-web/2005Jun/0058.html>.
[18] The author has been party to discussions with Erik Duval and Wayne Hodgins of the IEEE
LOM effort centered around the possibility of cross-standard data modeling that might promote
convergence among various metadata activities. The means and methods for carrying such work
forward are presently undetermined.
Copyright © 2005 OCLC Online Computer Library Center, Inc.

Potrebbero piacerti anche