Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
A Reference Architecture
April 2013
1 Abstract
Master Data Management (MDM) has matured as an enterprise strategy for data management in the
past decade. Product Information Management (PIM) represents a set of functions to manage and
publish product information, classification and catalog management. Many driving factors such as data
quality, efficiency, consolidation, completeness of data and creating a best of breed record are making
MDM a true central hub in enterprise today.
Both MDM and PIM are well traveled path today however in most cases these paths have been mutually
exclusive. MDM has mostly been deployed for data domains like Consumers, Suppliers, Businesses,
Locations, Employees and Stores. For Product domain, mostly PIM has been utilized. The integration
between PIM and MDM has been either missing or minimal.
Recently, Informatica MDM and Heiler PIM have come together under one umbrella. This document,
MDM-PIM Integration Patterns, is an early asset created as part of broader set of solutions involving
Heiler PIM and Informatica MDM. Through this blueprint, the authors intend to provide a working
guideline for utilizing MDM and PIM together for a greater benefit. This blueprint is intended to provide
the implementation perspective which will hopefully influence not only the Retail enterprise architecture
but also the developer/implementation community.
Key words: MDM, PIM, Informatica MDM, Heiler PIM, Retail product data management.
2 Introduction
2.1 The role of PIM
Product Information Management (PIM) is the central product information platform for e-commerce
and multichannel commerce. The solution provides distributors and manufacturers with a central
Master Data Management (MDM) solution for all product data in a central data source for all
communication channels and languages.
Key elements of the solution include the import (Onboarding) and central mastering of highly structured
product data and the corresponding media assets (Digital Asset Management). This means that centrally
managed product data can be adapted to suit the needs of various target systems (Multichannel), such
as online shops or print systems, when it is exported in data formats on the basis of CSV or XML.
Enterprise PIM comprises the following three functional areas that support Product Information
Management along the supply chain from onboarding of product data to the multichannel export into
various target systems:
1. Product Manager is the central platform of Enterprise PIM.
The client server architecture based on the Eclipse Rich Client technology (ERCP) enables the
import of product data into the PIM system, its mastering and its export into various target
systems. Additionally PIM Web Access allows the maintenance of product data online via a light
weight web interface.
2. Supplier Exchange is the self-service supplier portal of Enterprise PIM and enables data suppliers
to provide their product data via an easy to use web interface on their own.
3. Media Portal is the integrated Digital Asset Management (DAM) solution in Enterprise PIM and
enables the structured management and reuse of media that are part of product data.
The consolidate feed is sent to the downstream BI/reporting system for integrated reporting and
analytics.
MDM-MDE system also helps to cleanse and de- duplicate Customer and Supplier records in PIM. The
standardized and de-duplicated feed of mastered records are sent back to PIM along with their crossreferences and by processing this feed PIM is able to identify the duplicates and get rid of them.
Batch integration
For bulk transfer of records batch integration is used. CSV files are used to transfer data to and from
PIM.
Near real-time integration
The near real-time integration uses message queues to exchange XML messages between PIM and
MDM-MDE. This mechanism is event driven and asynchronous.
Real-time integration
Composite web-services are used to lookup duplicate Customer and Supplier from MDM-MDE and then
optionally import the duplicate record directly from MDM-MDE hub. This is an in-process integration
which is synchronous.
3 Reference Architecture
3.1 Architecture Diagram
Product Manager
imported data before it is merged into the master catalog. Alternatively it is also possible to import
product data directly into the master catalog.
Data Mastering
Product Manager is responsible for the centralized control of all data sources and data consumers and
the internal organization of all product information. Therefore it represents the management cockpit for
company-wide Master Data Management of product data. Flexible data modeling allows implementing
a consistent data model and, on this basis, to integrate heterogeneous data landscapes.
The functions for data mastering include mechanisms to maintain and administrate all of the relevant
data for Product Information Management. The maintenance of product and item properties, associated
media objects, attributes and references as well as the structuring of assortments and the
administration of multilingual texts are only a few of the possibilities that come with Product Manager.
The possibility to define individual layouts for the user interface to align the views and perspectives to
the business processes of the company and thereby having a use case oriented and efficient way to
maintain mass data is an important concept of the software.
To support data governance guidelines tracking the change history (Audit Trail) is additionally possible.
All relevant information regarding changes of datasets will be stored.
Essential constituents for Data Mastering:
a.
b.
c.
d.
e.
Processing of product & item information, structure systems & attributes, customers & suppliers
Maintenance of language specific data
Item assortments & prices, references (e.g. for cross-selling & up-selling), localization
Search functionality
Possibility of a multichannel preview to review products, items or structure groups via mouse
click in the view of the specific publication channel
f. Possibility to customize the user interface to align it to the individual business processes and use
cases of the company for efficient data mastering
g. Mass data updates (e.g. via cumulated data maintenance and intelligent search and replace
mechanisms)
Central cockpit for managing channels using export format templates and profiles
Export editor for creating export format templates
Execute single or recurring exports (immediate or at a scheduled point of time)
Criteria for sorting export files are definable on multiple levels
Data validations can be defined at field level for each specific export format template, and thus
also for each channel
f. Export functions enable manipulations and transformations of product data
g. Support of long-tail and shadow assortment strategies by the ability to export form the central
master catalog and also from staging catalogs
PIM Web Access
PIM Web Access is a light weight web application for data mastering in Enterprise PIM. It supports a
collaborative approach by enabling a large group of users including product managers, sales teams, copy
writers and content creators, marketing or publishing teams to participate in the PIM process.
PIM Web Access especially addresses users that are responsible for specific tasks or processes like the
creation of marketing texts. The focus is on a simple and intuitive usability.
Essential constituents for basic functionalities:
a. Creation of products and items
b. Classification of products to one or more structure systems
c. Viewing and maintaining fields of a product or item (e.g. texts)
d.
e.
f.
g.
h.
Maintenance of attributes
Viewing and managing tasks (workflow)
Online translation
Viewing of media assets and references
Channel approvals
Supplier Exchange
Supplier Exchange is an integrative part of Enterprise PIM and responsible for data onboarding. The
web-based portal defines a data adoption process for suppliers.
Users of Product Manager can define and provide descriptions, format definitions, and sample data for
Supplier Exchange so that data suppliers are able to upload their product data and corresponding media
assets on their own. Supplier Exchange supports the import of product data using any of the formats
supported by Product Manager (CSV, XML or Microsoft Excel).
After data suppliers have uploaded product data into the system, it is automatically checked against
defined rules. Supplier Exchange then provides a report on the quality of the data, after which the data
suppliers have the opportunity to make corrections. Thereupon, data suppliers can choose to have the
product data checked again, after which it is then transferred to Product Manager.
With Supplier Exchange data suppliers can make their data and corresponding media objects available
independently via web interface. Supplier Exchange has a social user interface that enables cost-efficient
communication between data suppliers and the licensee in one central location. This can lead to a
higher efficiency when adding a large number of suppliers and large assortments, and enables faster
product launches with quality-assured supplier data.
Administrative tasks for Supplier Exchange can be carried out online. For example, an administrator can
invite new data suppliers and define the data formats that will be available to them. Individual data
import operations can be checked and approved. In addition, a dashboard provides statistics about the
import processes of the several data suppliers.
Heiler Media Portal
Heiler Media Portal enables the management of unstructured data like images, graphics, documents,
audio and video in a media-neutral format so that they can be readily located, and provided
automatically for a specific publication channel.
The web-based Heiler Media Portal is based on flexible metadata modeling functionalities. Metadata of
any media objects can be managed in multiple languages, as can the various categories and hierarchical
groups.
Essential constituents for basic functionalities:
a. Media asset onboarding
Match - duplicate records are identified by matching them with each other a set of configurable match
rules are used for this purpose. Sophisticated, population based matching algorithms come
prepackaged with the product.
Enrichment - data can be enriched or augmented with data from third-party data providers such as D&B
and Acxiom. Informatica MDM Hub provides out-of-the-box integration with major third-party data
providers within its user interface.
Consolidation - Once duplicate records are identified, the best attributes from the matched records are
consolidated to create the Best Version of the Truth. This reconciliation process, achieved within the
Informatica Trust Framework and governed by configured business rules, provides the best attributes
from contributing systems.
Hierarchy Management - Relating people and organizations is a key requirement for many
organizations. Informatica MDM Hubs Hierarchy Management capabilities let the users group people
into households and companies into corporate hierarchies. Hierarchies can include entities from
different domains creating a holistic multi-domain view.
Synchronization - One common goal of sharing the data in Informatica MDM Hub is to synchronize it
with contributing source systems as well as downstream systems. Informatica MDM Hub can be
configured to handle these synchronizations in real time, near-real time, or batch mode.
The entity Article is representative for the entities Variant and Product as well. These entities are
all based on the same type and physical data model. Which information is finally provided and
maintained on which level is decided on a more logical business driven level.
The next figure shows a more detailed but still high level view on the entity Article and its sub-entities.
Entity
Sub-entity
ArticleDetails
ArticleTrading
ArticleSupplierRelation
ArticleReference
ArticleSurcharges
Article
ArticleLogistic
ArticleLang
ArticleAttribute
ArticleStructureMap
Sub-entity description
Entity
Sub-entity
ArticleSpecialTreatment
ArticleMediaAssetMap
ArticlePrice
ArticleExtension
Sub-entity description
Location Address
Location
1
Telephone
Employee Address
Employee Hierarchy
*
*
Employee Assignment
*
Store Location
Employee
Telephone
1
*
Electronic Address
Store
Merchant Account
Store Status
Employee Status
*
*
Customer Relationship
*
Marketing/Campaign
*
*
Location Article
Relationship
*
Contact Preference
Tender Account
Customer Loyalty
Customer Wishlist
Customer Status
Customer Address
Electronic Address
Telephone
Customer
1
*
*
Alternate ID
*
Supplier
Supplier Address
Electronic Address
Telephone
Alternate ID
Product Classification
Article Lang
Customer Product
Classification Relationship
Article Detail
Postal Address
1
1
*
Customer Article
Relationship
*
Article Classification
Article Reference
*
*
1
Article
1
*
Supplier Article
Relationship
*
The next figure shows that sub-data model used basically for the data provision to MDM.
First Name
Last Name
Phone
Email
Alternate Identifier (Tax ID, SSN)
Postal Address
Organization Name
Industry
Postal Address
Website
Alternate Identifier (DUNS No.)
Name
Location Type
Location Address
Entities
Customer
Supplier
Employee
Location
Product/Article
Relationships
Following diagram shows a map of all the source systems and the entities and relationships to which
they contribute
Customer to Store
Customer to Product
Classification
Customer to Employee
Customer to Customer
Store to Location
Store to Employee
Employee to Employee
Article (Product) to Location
Article (Product) Reference
Supplier Article
CRM System
Source Systems
ERP ecommerce Social POS PIM Analytics
CRM
ERP
ecommerce
ecommerce Application
Social
POS
PIM
Analytics
Trust is configured for each of these systems and for each of the entities and in case of a conflict these
trust rules will identify surviving values.
Hierarchies and Relationships
The MDM-MDE hub masters relationships between available entities. Relationships can be created and
maintained by using the Hierarchy Manager console. Some possible relationships are maintained by this
solution
-
Customer to Store
Customer to Product Classification
Customer to Employee (e.g. Personal Shopper)
Customer to Customer (e.g. Friend or Family)
Store to Location
Store to Employee
Employee to Employee (e.g. Reports to)
Article (Product) to Location
Article (Product) Reference
Supplier Article
Relationships and hierarchies created in PIM will be imported to MDE and consolidated with MDE
hierarchies and relationships. This consolidated view is visible from MDE Hierarchy Manager Console.
Data Stewardship UI
Informatica Data Director is used as the Data Stewardship console. The console is preconfigured with
Search, Create, Update functionalities on all Subject Areas.
Additionally duplicate detection and fuzzy search capabilities are present. Task and workflow capabilities
are enabled as well.
A set of predefined Charts and Graphs are added to the dashboard part of the UI. Here are a few
example charts -
PIM
Cleanse Match
Server
Export
engine
Export
Tempate
Import
Mapping
Informatica
DI
Hot folder
Repository
Import
engine
XML/CSV
Landing
File
Files
Hub Server
File
Views
Hub Store
Batch Integration
An export template of PIM defines the layout and content of the CSV files for the data transfer. The
export engine of PIM provides the article data according to the templates definition as a delta for the
articles changed since the last MDM update (or alternatively a full load for the initial transfer).
A set of Power Center mappings brings data from PIM into MDM landing tables. These mappings process
the data files and transform the records as necessary before inserting them in MDM landing tables.
A set of custom scripts invokes MDM batch jobs (stage, load, match & merge) in sequence (parent first)
and supply the necessary parameters to the batch jobs.
Custom scripts are used to identify the newly inserted and modified and merged records in MDM which
are then exported back to PIM system via Power Center mappings.
The incoming CSV files are recognized via a hot folder plugin and automatically transferred to the PIM
repository by the import engine. A previously defined import mapping on side the PIM is used for that.
In batch mode these data entities are transferred using related CSV files.
Entity
Article
ArticleDetail
ArticleLang
ArticleReference
ArticleClassification
File
Article.csv
ArticleDetail.csv
ArticleLang.csv
ArticleReference.csv
ArticleClassification.csv
The CSV files are encoded as UTF-8. As column separator ; is used and the row are separated by CR/LF.
The Article.csv contains one row per Article providing the unique identifiers of the Article. The related
files provide 0many (ArticleDetail.csv 01) rows per Article. The first two columns are the foreign keys
representing the unique identifiers of the Article the record belongs to.
Batch Integration data flow
1. Export file
with change
data
2. Load MDM
landing
3. Standardization
& transformation
4. Consolidation
JOB CONTROL
OUTBOUND VIEW
MDM LANDING
PIM
Power Center
5. Publish
6. Check delta
8. Generate
export file
1. Change
data capture
2. Load MDM
landing
3. Standardization
and
transformation
4. Consolidation
7. Merge
records in PIM
OUTBOUND VIEW
MDM LANDING
PIM
Power Center
5. Publish
MDM
Trigger/
Audit
y
Notif s
ge
C h an
Repository
XML/HTTP
Asynchronous
Composite Service
Services Integration
Framework
Cleanse Match
Server
Hub Server
Wr
ite
Custom JMS
plugin
XML
JMS Queue
XML
Trigger
Hub Store
Composite Service
Composite services developed on MDM-MDE can execute CRUD operations asynchronously. These
services are built using MDM-MDE SIF API and Data Services and transactional. Composite service
includes operations to perform Read, Create, Update, and Delete operation for each of the main
entities. The composite services can be executed in a synchronous or asynchronous manner.
PIM to MDM-MDE interface
A set of triggers in the PIM System monitor data changes. Custom implementations of these triggers
invoke appropriate MDM-MDE composite service and propagate changes to MDM-MDE.
MDM-MDE to PIM interface
Message triggers are also configured in MDM-MDE to monitor data changes in MDM and place an XML
document containing the changes to a JMS queue. A custom JMS plugin developed on PIM consumes
these messages, processes them and makes appropriate changes in the PIM system.
Real-time Lookup and Import
IDC
PIM Console
IDD Console
PIM
MDM
XML,JSON/HTTP
Cleanse Match
Server
Services Integration
Framework
Hub Server
Web service
Search
Engine
rite
/w
ad
e
R
Composite Service
Repository
Hub Store
Duplicate Prevention
While a user adds a new Supplier or Customer using the PIM console a fuzzy search is executed using the
composite services on MDM-MDE and potential duplicates are identified and displayed to the user. The
user can choose to import the duplicate record directly from the MDM-MDE hub into PIM.
Extended product information
Extended set product information is pulled from PIM using web-services and then displayed in a custom
tab in IDD. This enables users with right privileges to directly see all relevant product information
without having to switch user interface.
The PIM Service API splits into several APIs covering different functional areas.
The List API provides searching, reporting, navigation and finally result listing of nearly all objects of the
PIM core. Based on a query a homogeneous list of objects of a certain type is returned. The queries are
parameterized. The resulting list is a two dimensional table with a given set of columns to control the
informational character of the result. The operations supported for the main data entities are Create,
Read, Update (Write), Delete.
The Media API provides access to all kinds of media assets in the PIM. All derivatives and previews of
the media asset are accessible.
The Management API provides access to control the server side management processes like Import,
Merge, Export, and others. Typically these processes will be able to be started or scheduled, the process
status can be evaluated and the process results can be requested and accessed.
MDM
PIM
Meta API
(web services)
Meta-data
Lookup
Metadata
XML
Service Integration
Framework
Cleanse Match
Server
Repository
Hub Server
Compare
Change
XML
Metadata
Introspection Tool
Hub Store
Metadata Introspection
4 Conclusion
MDM and PIM integration techniques are helpful in creating a connected and more informed MDM while
helping PIM to provide much more than just Product data and add more context to Product data. These
integration techniques are first steps towards creating more sophisticated and powerful synchronization
mechanisms between PIM and MDM making the Product data more valuable.
4.1 Benefits