Sei sulla pagina 1di 42

SEG 7430 Information Technology Management

Managing Information Resources


• Data
– comprised of facts, raw materials
• Information
– data in context
– its meaning depends on the surrounding circumstances or usage
• Knowledge
– information with direction or intent
– facilitates a decision or an action

1
SEG 7430 Information Technology Management
Information/Knowledge Management

Data Information Knowledge

< Search engines < Analysis tools > Decision making


< Filtering < AI algorithms

How to design an Information and Knowledge


management system that is scalable and adaptable?
2
SEG 7430 Information Technology Management

Chinese Medicine
China
Laws & China
Taiwan Regulations
CM Related Clinical Taiwan
Organizations Records University
HK Web-sites
HK
Others Reliability Others

Databases/
Scalability
Research papers
Journals
Maintainability

Internet / Dictionaries / News


Books Encyclopedias 3
SEG 7430 Information Technology Management

CM Database for Herbal Herbal Clinical R&D TCM Manufacturers


Literature CM Literature Dictionary CM reports Distributors Database
Database dictionary Online database Database Database

CM Digital Library

Manufacturers
CM
Endorses
Community
e -Commerce CMED Library
Govt.
Contents
Information Infrastructure
Information
CM Searching
Companies Search/Provide
information Citizens
Prescriptions
Researchers/
CM
Academia
Practitioners 4
SEG 7430 Information Technology Management

CMED – a set of open standard

1. The meta-information standard for defining new


drugs, prescriptions, medical cases, treatment
methods, etc.

2. The information exchange standard for integration


of multiple CM databases.

3. The information search standard for retrieval of


relevant CM information.

5
SEG 7430 Information Technology Management

Biological Information Management: A Case Study


• Data maintained by different groups – need to talk to
various sources differently.
• Text, sequence, 3-D structure – what is relevant ?
• Data from different sources in different format – how to
integrate?
• Data can provide knowledge – how to extract the embedded
knowledge?

6
SEG 7430 Information Technology Management
Current Practices

Text DB Sequence DB 3D Structure DB


Databases

PubMed GenBank PDB


Analysis

BLAST
Tools

GCG
Software

Entrez UMLS
Systems

7
SEG 7430 Information Technology Management
The Current Model

Public Data Sources


Private Data Sources

Entrez

Internal Data
Unexplored Data

8
SEG 7430 Information Technology Management

The New Model


Public Data Sources Private Data Sources

Internal Data
Unexplored Data

BioSIFTER

9
SEG 7430 Information Technology Management
Information Management

Data Information Knowledge

D  → I  → K
f D :D → R f I :I → K s

Rn Rm Rp

D: original data; R: relevance value; fD : personalized profile


I: information; Ks: knowledge structure; fI : personalized profile
K : knowledge; Rx : data, information and knowledge space
p<<m<<n
10
SEG 7430 Information Technology Management
Features of BioSIFTER
• Active: Surveys online resources and gather
information
• Adaptive : Cognizant of interests and
requirements of users
• Scalable: New document sources can be easily
added
• Automatic thesaurus discovery
• Multi-format Information Integration
• Knowledge discovery

11
SEG 7430 Information Technology Management
BioSIFTER

• Acquisition
• File, Known Sources
• Representation
• Vector-space -- tf-idf
• Classification
• Maximin, Centroids, Sample Documents
• User Profiling
• Reinforcement Learning
• Presentation
• GUI

12
SEG 7430 Information Technology Management

Document Representation and Vector Space Model

• Identify the concepts that describe the content of the given document
• Convert a document to a numeric or symbolic form
• Documents are vectors of weighted terms, defined in a thesaurus
• Weights -- tf (term frequency) and idf (inverse document frequency) --
Simple and effective

13
SEG 7430 Information Technology Management
Clustering
• Maximin-Distance: unsupervised clustering algorithm based on the
document set
• Distance Metric: Cosine similarity measure (Salton, 1983)
• A point is chosen that has the largest distance from the centroids and
is added as a new centroid if this distance is larger than a threshold

14
SEG 7430 Information Technology Management
User Profiling
• Learn user interest levels for given categories
• Relies on relevance feedback from user
• Uses a simple reinforcement learning algorithms (known as Pursuit Learning)
• maintains an action probability vector and a estimated relevance probabilities
vector
• both these vectors are updated continuously

15
SEG 7430 Information Technology Management
Managing Information – Value, Usage and Sharing
• Information is viewed as an asset.
• Information should be treated differently from the
traditional assets of labor and capital

• Value issues
• Usage issues
• Sharing issues

16
SEG 7430 Information Technology Management
Value Issues
• Information’s value depends on the receipt and the context.
• Most people cannot put a value on a piece of information
until they have seen it.
• The only practical way to establish the value of
information is to establish a price for it and see if anyone
buys (Davenport).
– Charging for the information itself, not the technology or provider
– Charging for the document rather than a smaller unit
– Charging for length or time or number of users
– Charging by value rather than cost

17
SEG 7430 Information Technology Management

• Tools to increase the value of information


– Information maps
• textual charts, diagrammatic maps that point to the location of
information
• E.g., IBM created a guide to market information
– managers can find out where to get quick answers to their ad hoc
questions
– less money spent on duplicate information
– increased understanding of the kinds of questions people typically ask
– Information guides
• people who know where desired information can be found
– e.g. librarian
• E.g., Hallmark Cards created a guide job in its business units to help
employees to find computer-based information

18
SEG 7430 Information Technology Management

– Business documents
• provide organization and context
• uncover what documents an organization needs
• Dean Witter discovered that its brokers used the same documents over
and over.
– Put these documents on CD-ROM, kept on local servers, and updated
monthly
– Groupware
• getting greater value out of less structured information
• allows people to share information across distance in a more
structured manner than electronic mail
• Lotus Notes
• ease discussion and aid distribution of information

19
SEG 7430 Information Technology Management
Usage Issues
• Preserve information’s complexity
– information should not be simplified to be made to fit the
technology
• People do not share information easily
– value grows when information is shared
– culture blocks sharing
• Technology does not change culture
– to change the information culture, it requires
• changing basic behaviors, values, attitudes and management
expectation

20
SEG 7430 Information Technology Management
Sharing Issues
• Technical solutions do not address the sharing issues
– much work has been done on information architecture, the
information are probably outdated before they are used.
– Managers get 2/3 of their information from conversation and 1/3
from documents, almost none from computer systems
– Who has legitimate need of the information? Who determines it?
• The touchy subject of information sharing is brought out into the open
– Information sharing is not always good. Forcing employees to
share information with others above them can lead to intrusive
management.
• Unlimited information sharing does not work
– Sharing of corporate performance figures is beneficial
• even when corporate performance is poor

21
SEG 7430 Information Technology Management
Four Types of Information
• Record-based, e.g., database
– structured
• Document-based, e.g., reports, opinions, e-mail, proposals
– less structured

• Internal
• External
Internal External

Traditional Public
Record-based MIS database
Word Processing Corporate
Document-based Records Management Library
Web Sites
22
SEG 7430 Information Technology Management

• Internal record-based information


– getting record-based information into shape
• clean up the pool of data
• ensure the data streams that feed the pool input clean data
• Internal document-based information
– a document is a semiformal package of information with some
organizational impact that is filed, transmitted, and consequently
maintained
– electronic document management includes a variety of
technologies
• document and image processing, text retrieval, hypertext and
hypermedia, EDI, micrographicsand desktop publishing

23
SEG 7430 Information Technology Management

• External record-based information


– it has been users who manage the acquisition of information from
external database
– companies begin to coordinate their use of such external services,
as well as combine internal and external information
• External document-based information
– least manageable form of information

24
SEG 7430 Information Technology Management

• Internet Securities, Inc.


– founded in June 1994 by two brothers
• Gary Mueller
• George Mueller
– provides hard-to-find financial, business and political information
to business professionals on a subscription basis
– What differentiates ISI from other information providers?
• Concentrates solely on information from emerging markets
• information is primarily available on the WWW
– US$15 to 20 million in subscription revenue

25
SEG 7430 Information Technology Management

– Categories of information
• News
– signs agreements with the newswire services and the leading news periodicals
– articles appear on the service the evening before they are published
• Company Financial Information
– provides basic financial statements for public companies from public and private
sources
– also provides data for unlisted companies and selected private companies
• Industry and Analyst Reports
– signs agreement with leading brokerages in emerging markets
• Equity and Fixed-Income
– covers daily updates of the equity and debt markets
– sources include local stock exchanges and top domestic and international bank
• Macroeconomic data
– spreadsheet of basic economic statistics, cross-sectional and historical, acquired
from government and private sources
• Surveys, General Reports and Other Useful Resources
– reports of countries and industries

26
SEG 7430 Information Technology Management

– Differentiating features
• comprehensive information
• exclusive online source
– only available online via ISI
• presentation
– easy-to-use, graphical interface, “Point and Click” navigation
• searchability
– proprietary search engine searches entire database of text, spreadsheets, graphs, and
other files
• local language support
– supports multiple languages
– access information in other languages
– search and display in local language
• flexible access
– as long as you have connection to Internet, you can download documents
• access to archived data
– in original format
• relatively inexpensive
– compare to traditional data sources such as periodicals

27
SEG 7430 Information Technology Management

• Database Administrator manage


– DBMS and their use
– manage all computerized data resources of an organization
• Problem:
– incompatible data definitions
• from applications to application, department to department, site to
site, and division to division
– data showing up in different files with
• different names for the same data
• same name for different data items
• same data in different files with different update cycles
– such data may be acceptable for routine information processing,
but is not acceptable for management uses
– Management cannot get consistent views across the enterprise
28
SEG 7430 Information Technology Management
Role of data administration - 1
• Clean up data definitions
– get rid redundancies and inconsistent among definitions
• two or more names should not exist for the same data item
• the same name should not be used for two or more different data
items
– data administrators design
• standard data definitions to reconcile conflicting user needs
• data integrity process to flag suspected data and guard against
inaccurate, invalid, or missing data polluting the pool of correct data
– data administrator train users on the meanings and proper use of
the data

29
SEG 7430 Information Technology Management
Role of data administration - 2
• Control shared data
– data used by two or more units are considered shared data
– controversy
• essentially all the data in the organization should be under the control
of data administration
– even if some data is currently not being used across organizational
boundaries, it may be in the future
• each organizational unit should authority of its data, only data that
flow to other units needs to be standardized
– it is impractical to standardize all the data because it imposes
unreasonable rigidities
– data administrator should confront this issue and decide how broad
or narrow to define shared data
– data administrator also analyze the impact of proposed changes to
programs that use shared data

30
SEG 7430 Information Technology Management
Role of data administrator - 3
• Manage data distribution
– distributed data is geographically dispersed data
– significant challenges to data administrator
– current practice
• use single master file concept
• distribute copies that do not need to be kept in synchronized

31
SEG 7430 Information Technology Management
Role of data administration - 4
• Maintain data quality
– data administrators develop policy and procedures to maintain data
quality
– put the owners of the data in charge of editing and verifying data
accuracy and quality
• difficulty: how to identify the owner
– put the processes in place
• ensure the correct data is being input

32
SEG 7430 Information Technology Management
Data dictionaries
• Data dictionaries are systems and procedures (manual or
automated) for storing and handling an organization’s data
definition
• referred as metadata today
• data dictionaries eliminate errors of understanding,
ambiguities, and difficulties in interpreting data
• Ideal sequence
– 1. Set up the data administration function
– 2. Develop data standards
– 3. Purchase and install a DBMS
– 4. Install a data dictionary as the first database application

33
SEG 7430 Information Technology Management
Case study: Monsanto
• Monsanto is a US$9 billion provider of agricultural
products, pharmaceuticals, food ingredients, and
chemicals.
• Monsanto has a tradition of being decentralized
• three large enterprise IT projects
– redevelop operational and financial transaction systems
– develop a knowledge-management architecture, including a data
warehousing
– link transaction and decision support systems via common master
data, know as Enterprise Reference Data (ERD)

34
SEG 7430 Information Technology Management

Enterprise
Reference
Data

Transaction Data
Systems Warehouse

35
SEG 7430 Information Technology Management

• Enterprise Reference Data (ERD)


– repository for most master table information in the company
• the information includes vendors, customers, suppliers, people,
materials, finance, and control tables
– the purpose of ERD is to enable integration
• vertical integration enables closer coordination with suppliers
• horizontal integration across business units enables team marketing,
leveraged purchasing, and interplant manufacturing

36
SEG 7430 Information Technology Management

• Getting data into shape


– created a formal department - ERD Stewardship
• independent of MIS
• set data standards and enforce quality “data police”
– entity specialists are the key managers with the greatest stake in
the quality of the data
• e.g. vice president of purchasing is the specialist of vendor data
– analysts
• people, who formerly maintain local data, are now maintaining global
resource

37
SEG 7430 Information Technology Management
Three-level Database Model
• Level 1
– external, conceptual or local level
• contains various user views of the corporate data used by application
programs
• no concern of how the data be physically stored or what data is used
by other applications
• Level 2
– logical or enterprise-data level
• encompasses all organization’s relevant data under the control of data
administrators
• data and relationships are represented by one or more DBMS
• Level 3
– physical or storage level
• specifies the way the data is physically stored
• a data record consists of its data fields plus some implementation data
38
(pointers and flag fields)
SEG 7430 Information Technology Management
Three Traditional Data Models
• Hierarchical model
– each element is subordinate to another in a strict hierarchical
manner
– relationships are represented as parents and children
– a data item has only one parent
– relationships are stated explicitly by pointers stored with the data
• Network model
– each data item has one or more parents
– relationships are stated explicitly by pointers stored with the data
• Relational database management system
– relationships are stated by pointers
– store data in tables
• each role is called a tuple, representing an individual entity (e.g.
person)
39
• each column represent an attribute of the entities
SEG 7430 Information Technology Management
Object-oriented database system
1. A piece of data
2. Methods - procedures that can perform work on that data
3. Attributes describing the data
4. Relationships between this object and others

40
SEG 7430 Information Technology Management
“True” Distributed Database
• 12 rules by Chris Date
1. Local autonomy
2. No reliance on a central site. All cite are equal and none relies on a master
site for processing or communications
3. Continuous operation. Installation at one site does not affect operations at
another.
4. Location independence. Users do not need to know where data is
physically stored.
5. Fragmentation independence. Users are able to act as if the data was not
fragmented.
6. Replication independence. Relations can be represented at the physical
level by multiple, distinct, stored copies at distinct sites.
7. Distributed query processing
8. Distributed transaction management. Single transaction is able to execute
code at multiple sites, causing updates at multiple sites.
9. Hardware independence
10. Operating system independence
11. Network independence 41
12. Database independence
SEG 7430 Information Technology Management
Alternatives to True Distributed Databases
• Download data files
• copies of data stored at nodes
• not fully synchronized databases
• client/server databases
• federated databases

42

Potrebbero piacerti anche