Sei sulla pagina 1di 3

Metadata discov ery - Wikipedia, the f ree ency clopedia

Metadata discovery
From Wikipedia, the free encyclopedia

In metadata, metadata discovery is the process of using automated tools to discover the semantics of a data element in data sets. This process usually ends with a set of mappings between the data source elements and a centralized metadata registry. Metadata discovery is also known as metadata scanning.

Contents
1 Data source formats for metadata discovery 2 A taxonomy of metadata matching algorithms 2.1 Lexical Matching 2.2 Semantic Matching 2.3 Statistical Matching 3 Vendors 4 Research 5 See also 6 References

Data source formats for metadata discovery


Data sets may be in a variety of different forms including: 1. 2. 3. 4. 5. Relational databases Spreadsheets XML files Web services Software source code such as Fortran, Jovial, COBOL, Assembler, RPG, PL/1, EasyTrieve, Java, C# or C++ classes, and hundreds of other software languages 6. Unstructured text documents such as Microsoft Word or PDF files

A taxonomy of metadata matching algorithms


There are distinct categories of automated metadata discovery:

Lexical Matching
1. Exact match - where data element linkages are made based on the exact name of a column in a database, the name of an XML element or a label on a screen. For example if a database column has the name "PersonBirthDate" and a data element in a metadata registry also has the name "PersonBirthDate", automated tools can infer that the column of a database has the same semantics (meaning) as the data element in the metadata registry.
en.wikipedia.org/wiki/Metadata_discov ery 1/3

11/5/12

Metadata discov ery - Wikipedia, the f ree ency clopedia

2. Synonym match - where the discovery tool in not just given a single name but a set of synonym. 3. Pattern match - in this case the tools is given a set of lexical patterns that it can match. For example the tools may search for "*gender*" or "*sex*"

Semantic Matching
Semantic matching attempts to use semantics to associate target data with registered data elements. 1. Semantic Similarity - In this algorithm that relies on a database of word conceptual nearness is used. For example the WordNet system can rank how close words are conceptually to each other. For example the terms "Person", "Individual" and "Human" may be highly similar concepts.

Statistical Matching
Statistical matching uses statistics about data sources data itself to derive similarities with registered data elements. 1. Distinct Value Analysis - By analyzing all the distinct values in a column the similarity to a registered data element may be made. For example if a column only has two distinct values of 'male' and 'female' this could be mapped to 'PersonGenderCode'. 2. Data distribution analysis - By analyzing the distribution of values within a single column and comparing this distribution with known data elements a semantic linkage could be inferred.

Vendors
The following vendors (listed in alphabetical order) provide metadata discovery and metadata mapping software and solutions Esquire Innovations (see [7 (http://www.esqinc.com/section/products/2/iscrub.html) ) IBM InfoLibrarian Corporation (see [1] (http://www.infolibcorp.com/scanners.html) ) Masai Technologies (see [2] (http://www.masaitechnologies.com/) ) Revelytix (see [3] (http://www.revelytix.com/) ) Sliver Creek Systems (see [4] (http://www.silvercreeksystems.com/) ) Sypherlink: Harvester (see [5] (http://www.sypherlink.com/products/index.asp) ) Unicorn Systems (see [6] (http://www.unicorn.com/products/unicornsystem/scanners.htm) )

Research
INDUS project at the Iowa State University (see [7] (http://www.cild.iastate.edu/software/indus.html) ) Mercury - A Distributed Metadata Management and Data Discovery System developed at the Oak Ridge National Laboratory DAAC (see [8] (http://mercury.ornl.gov) ) [1]

See also
metadata
en.wikipedia.org/wiki/Metadata_discov ery 2/3

11/5/12

Metadata discov ery - Wikipedia, the f ree ency clopedia

data mapping data warehouse semantic web Defense Discovery Metadata Specification

References
1. ^ Devarakonda, R., Palanisamy, G., Wilson, B., and Green, J., "Mercury: reusable metadata management, data discovery and access system", Earth Science Informatics (Springer Berlin / Heidelberg) 3 (1): 8794, doi:10.1007/s12145-010-0050-7 (http://dx.doi.org/10.1007%2Fs12145-010-0050-7)

Massive Data Analysis Systems (http://www.sdsc.edu/MDAS/Reports/MDAS.Final.SciTech/techreport97.1/techreport.html) by San Diego Supercomputer Center June 1997 IBM Whitepaper on Enterprise Metadata Discovery (http://public.dhe.ibm.com/software/dw/library/jemd/EnterpriseMetadataDiscovery_v0.12.pdf) White Paper on Metadata Management (http://esqinc.com/Content/WhitePapers/Managing-Metadata.php) by Esquire Innovations (http://esqinc.com/) Retrieved from "http://en.wikipedia.org/w/index.php?title=Metadata_discovery&oldid=482655385" Categories: Metadata This page was last modified on 19 March 2012 at 02:53. Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. See Terms of use for details. Wikipedia is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.

en.wikipedia.org/wiki/Metadata_discov ery

3/3

Potrebbero piacerti anche