Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1
SEG 7430 Information Technology Management
Information/Knowledge Management
Chinese Medicine
China
Laws & China
Taiwan Regulations
CM Related Clinical Taiwan
Organizations Records University
HK Web-sites
HK
Others Reliability Others
Databases/
Scalability
Research papers
Journals
Maintainability
CM Digital Library
Manufacturers
CM
Endorses
Community
e -Commerce CMED Library
Govt.
Contents
Information Infrastructure
Information
CM Searching
Companies Search/Provide
information Citizens
Prescriptions
Researchers/
CM
Academia
Practitioners 4
SEG 7430 Information Technology Management
5
SEG 7430 Information Technology Management
6
SEG 7430 Information Technology Management
Current Practices
BLAST
Tools
GCG
Software
Entrez UMLS
Systems
7
SEG 7430 Information Technology Management
The Current Model
Entrez
Internal Data
Unexplored Data
8
SEG 7430 Information Technology Management
Internal Data
Unexplored Data
BioSIFTER
9
SEG 7430 Information Technology Management
Information Management
D → I → K
f D :D → R f I :I → K s
Rn Rm Rp
11
SEG 7430 Information Technology Management
BioSIFTER
• Acquisition
• File, Known Sources
• Representation
• Vector-space -- tf-idf
• Classification
• Maximin, Centroids, Sample Documents
• User Profiling
• Reinforcement Learning
• Presentation
• GUI
12
SEG 7430 Information Technology Management
• Identify the concepts that describe the content of the given document
• Convert a document to a numeric or symbolic form
• Documents are vectors of weighted terms, defined in a thesaurus
• Weights -- tf (term frequency) and idf (inverse document frequency) --
Simple and effective
13
SEG 7430 Information Technology Management
Clustering
• Maximin-Distance: unsupervised clustering algorithm based on the
document set
• Distance Metric: Cosine similarity measure (Salton, 1983)
• A point is chosen that has the largest distance from the centroids and
is added as a new centroid if this distance is larger than a threshold
14
SEG 7430 Information Technology Management
User Profiling
• Learn user interest levels for given categories
• Relies on relevance feedback from user
• Uses a simple reinforcement learning algorithms (known as Pursuit Learning)
• maintains an action probability vector and a estimated relevance probabilities
vector
• both these vectors are updated continuously
15
SEG 7430 Information Technology Management
Managing Information – Value, Usage and Sharing
• Information is viewed as an asset.
• Information should be treated differently from the
traditional assets of labor and capital
• Value issues
• Usage issues
• Sharing issues
16
SEG 7430 Information Technology Management
Value Issues
• Information’s value depends on the receipt and the context.
• Most people cannot put a value on a piece of information
until they have seen it.
• The only practical way to establish the value of
information is to establish a price for it and see if anyone
buys (Davenport).
– Charging for the information itself, not the technology or provider
– Charging for the document rather than a smaller unit
– Charging for length or time or number of users
– Charging by value rather than cost
17
SEG 7430 Information Technology Management
18
SEG 7430 Information Technology Management
– Business documents
• provide organization and context
• uncover what documents an organization needs
• Dean Witter discovered that its brokers used the same documents over
and over.
– Put these documents on CD-ROM, kept on local servers, and updated
monthly
– Groupware
• getting greater value out of less structured information
• allows people to share information across distance in a more
structured manner than electronic mail
• Lotus Notes
• ease discussion and aid distribution of information
19
SEG 7430 Information Technology Management
Usage Issues
• Preserve information’s complexity
– information should not be simplified to be made to fit the
technology
• People do not share information easily
– value grows when information is shared
– culture blocks sharing
• Technology does not change culture
– to change the information culture, it requires
• changing basic behaviors, values, attitudes and management
expectation
20
SEG 7430 Information Technology Management
Sharing Issues
• Technical solutions do not address the sharing issues
– much work has been done on information architecture, the
information are probably outdated before they are used.
– Managers get 2/3 of their information from conversation and 1/3
from documents, almost none from computer systems
– Who has legitimate need of the information? Who determines it?
• The touchy subject of information sharing is brought out into the open
– Information sharing is not always good. Forcing employees to
share information with others above them can lead to intrusive
management.
• Unlimited information sharing does not work
– Sharing of corporate performance figures is beneficial
• even when corporate performance is poor
21
SEG 7430 Information Technology Management
Four Types of Information
• Record-based, e.g., database
– structured
• Document-based, e.g., reports, opinions, e-mail, proposals
– less structured
• Internal
• External
Internal External
Traditional Public
Record-based MIS database
Word Processing Corporate
Document-based Records Management Library
Web Sites
22
SEG 7430 Information Technology Management
23
SEG 7430 Information Technology Management
24
SEG 7430 Information Technology Management
25
SEG 7430 Information Technology Management
– Categories of information
• News
– signs agreements with the newswire services and the leading news periodicals
– articles appear on the service the evening before they are published
• Company Financial Information
– provides basic financial statements for public companies from public and private
sources
– also provides data for unlisted companies and selected private companies
• Industry and Analyst Reports
– signs agreement with leading brokerages in emerging markets
• Equity and Fixed-Income
– covers daily updates of the equity and debt markets
– sources include local stock exchanges and top domestic and international bank
• Macroeconomic data
– spreadsheet of basic economic statistics, cross-sectional and historical, acquired
from government and private sources
• Surveys, General Reports and Other Useful Resources
– reports of countries and industries
26
SEG 7430 Information Technology Management
– Differentiating features
• comprehensive information
• exclusive online source
– only available online via ISI
• presentation
– easy-to-use, graphical interface, “Point and Click” navigation
• searchability
– proprietary search engine searches entire database of text, spreadsheets, graphs, and
other files
• local language support
– supports multiple languages
– access information in other languages
– search and display in local language
• flexible access
– as long as you have connection to Internet, you can download documents
• access to archived data
– in original format
• relatively inexpensive
– compare to traditional data sources such as periodicals
27
SEG 7430 Information Technology Management
29
SEG 7430 Information Technology Management
Role of data administration - 2
• Control shared data
– data used by two or more units are considered shared data
– controversy
• essentially all the data in the organization should be under the control
of data administration
– even if some data is currently not being used across organizational
boundaries, it may be in the future
• each organizational unit should authority of its data, only data that
flow to other units needs to be standardized
– it is impractical to standardize all the data because it imposes
unreasonable rigidities
– data administrator should confront this issue and decide how broad
or narrow to define shared data
– data administrator also analyze the impact of proposed changes to
programs that use shared data
30
SEG 7430 Information Technology Management
Role of data administrator - 3
• Manage data distribution
– distributed data is geographically dispersed data
– significant challenges to data administrator
– current practice
• use single master file concept
• distribute copies that do not need to be kept in synchronized
31
SEG 7430 Information Technology Management
Role of data administration - 4
• Maintain data quality
– data administrators develop policy and procedures to maintain data
quality
– put the owners of the data in charge of editing and verifying data
accuracy and quality
• difficulty: how to identify the owner
– put the processes in place
• ensure the correct data is being input
32
SEG 7430 Information Technology Management
Data dictionaries
• Data dictionaries are systems and procedures (manual or
automated) for storing and handling an organization’s data
definition
• referred as metadata today
• data dictionaries eliminate errors of understanding,
ambiguities, and difficulties in interpreting data
• Ideal sequence
– 1. Set up the data administration function
– 2. Develop data standards
– 3. Purchase and install a DBMS
– 4. Install a data dictionary as the first database application
33
SEG 7430 Information Technology Management
Case study: Monsanto
• Monsanto is a US$9 billion provider of agricultural
products, pharmaceuticals, food ingredients, and
chemicals.
• Monsanto has a tradition of being decentralized
• three large enterprise IT projects
– redevelop operational and financial transaction systems
– develop a knowledge-management architecture, including a data
warehousing
– link transaction and decision support systems via common master
data, know as Enterprise Reference Data (ERD)
34
SEG 7430 Information Technology Management
Enterprise
Reference
Data
Transaction Data
Systems Warehouse
35
SEG 7430 Information Technology Management
36
SEG 7430 Information Technology Management
37
SEG 7430 Information Technology Management
Three-level Database Model
• Level 1
– external, conceptual or local level
• contains various user views of the corporate data used by application
programs
• no concern of how the data be physically stored or what data is used
by other applications
• Level 2
– logical or enterprise-data level
• encompasses all organization’s relevant data under the control of data
administrators
• data and relationships are represented by one or more DBMS
• Level 3
– physical or storage level
• specifies the way the data is physically stored
• a data record consists of its data fields plus some implementation data
38
(pointers and flag fields)
SEG 7430 Information Technology Management
Three Traditional Data Models
• Hierarchical model
– each element is subordinate to another in a strict hierarchical
manner
– relationships are represented as parents and children
– a data item has only one parent
– relationships are stated explicitly by pointers stored with the data
• Network model
– each data item has one or more parents
– relationships are stated explicitly by pointers stored with the data
• Relational database management system
– relationships are stated by pointers
– store data in tables
• each role is called a tuple, representing an individual entity (e.g.
person)
39
• each column represent an attribute of the entities
SEG 7430 Information Technology Management
Object-oriented database system
1. A piece of data
2. Methods - procedures that can perform work on that data
3. Attributes describing the data
4. Relationships between this object and others
40
SEG 7430 Information Technology Management
“True” Distributed Database
• 12 rules by Chris Date
1. Local autonomy
2. No reliance on a central site. All cite are equal and none relies on a master
site for processing or communications
3. Continuous operation. Installation at one site does not affect operations at
another.
4. Location independence. Users do not need to know where data is
physically stored.
5. Fragmentation independence. Users are able to act as if the data was not
fragmented.
6. Replication independence. Relations can be represented at the physical
level by multiple, distinct, stored copies at distinct sites.
7. Distributed query processing
8. Distributed transaction management. Single transaction is able to execute
code at multiple sites, causing updates at multiple sites.
9. Hardware independence
10. Operating system independence
11. Network independence 41
12. Database independence
SEG 7430 Information Technology Management
Alternatives to True Distributed Databases
• Download data files
• copies of data stored at nodes
• not fully synchronized databases
• client/server databases
• federated databases
42