Sei sulla pagina 1di 5

Distributed database overview

A distributed database can be defined as consisting of a collection of data with different parts under the control of separate DBMSs running on independent computer systems. All the computers are interconnected and each system has autonomous processing capability serving local applications. Each system participates, as well, in the execution of one or more global applications. Such applications require data from more than one site. The distributed nature of the database is hidden from users and this transparency manifests itself in a number of ways. Although there are a number of advantages to using a distributed DBMS, there are also a number of problems and implementation issues. Finally, data in a distributed DBMS can be (a) partitioned or (b) replicated or both. (a) Data partitioning In a distributed DBMS a relational table may be broken up into two or more non-overlapping partitions or slices. A table may be broken up horizontally, vertically, or a combination of both. Partitions may in turn be replicated. This feature causes problems for concurrency control and catalogue management in distributed databases. This is transparent to users. (b) Data replication In a distributed DBMS a relational table or a partition may be replicated or copied, and copies may be distributed throughout the database. This feature can cause problems for propagating updates and concurrency control and this is transparent to users in distributed databases.

Distributed database advantages


There are a number of advantages to using a distributed DBMS. These include the following: Capacity and incremental growth Reliability and availability Efficiency and flexibility Sharing Capacity and growth An advantage of distributed databases is that as the organisation grows, new sites can be added with little or no upheaval to the DBMS. Compare this to the situation in a centralised system, where growth entails upgrading with changes in hardware and software that effect the entire database. Reliability and availability

An advantage of distributed databases is that even when a portion of a system (i.e. a local site) is down, the overall system remains available. With replicated data, the failure of one site still allows access to the replicated copy of the data from another site. The remaining sites continue to function. The greater accessibility enhances the reliability of the system. Efficiency and flexibility An advantage of distributed databases is that data is physically stored close to the anticipated point of use. Hence if usage patterns change then data can be dynamically moved or replicated to where it is most needed. Distributed database sharing An advantage of distributed databases is that users at a given site are able to access data stored at other sites and at the same time retain control over the data at their own site.

Distributed database problems


The disadvantages of the distributed approach to DBMS implementation are its cost and complexity. A distributed system, which hides its distributed nature from the end user, is more complex than the centralised system. Increased complexity means that the acquisition and maintenance costs of the system are higher than those for a centralised DBMS. The parallel nature of the system means that errors are harder to avoid and those in the applications are difficult to pinpoint. In addition, the distributed system, by its very nature, entails a large communication overhead in coordinating messages between the different sites

Centralized Data Processing


Centralized computers, processing, data, control, support What are the advantages? Economies of scale (equipment and personnel) Lack of duplication Ease in enforcing standards, security

Distributed Data Processing (DDP)


Computers are dispersed throughout organization Allows greater flexibility in meeting individual needs More redundancy Greater efficiency Requires data communication

Why is DDP Increasing?


Dramatically reduced workstation costs Improved user interfaces and desktop power Ability to share data across multiple servers

Source: www.egr.uri.edu/ime/Course/IME_220/Distributed_Data_Processing.html The term Distributed Data Processing (DDP) refers to the deployment of related computing tasks across one or more discreet computing system, or nodes. Geographical dispersion of the nodes is common. However, it is not an essential part of the definition. Most commonly, it refers to the use of databases maintained by organizations. Cost reduction is one commonly cited reason for employing DDP. DDP can reduce the expense of central processing units. Most commonly, the central processing unit stores data and manages access to that data by the nodes. This is the client/server model. A central unit, the server, stores and manages the data, while the nodes act on it, typically using locally loaded software. This has the effect of putting the data under the control of end users, subject to restrictions imposed by the server. Such distribution of control has another important economic effect: It

places cost control responsibility in the hands of individuals and departments. That is, individuals and departments so empowered become responsible for the efficient (or inefficient) use of the computing resources they control. Related to this is the perception by users that all services are local to their individual workstation. Modern client/server implementations often permit the integration of server data with familiar desktop software such as Microsoft Office, Lotus 1-2-3, etc. Because many potential employees are familiar with these applications, further cost savings may be associated with increased job satisfaction and productivity. In addition, we may add savings resulting from the shorter and easier training of new users. A corollary effect of this is that familiarity with such desktop applications has become a minimum requirement even for many entry-level jobs. It is important to remember that Distributed Data Processing is not decentralized data processing. Decentralized data processing would not justify the implementation of client/server technology, with its cost and complexity. In fact, organizations may implement a client/server model in order to get away from decentralization, which may have occurred as a result of the proliferation of desktop workstations. Logical integration of an organizations data resources is a central concern of DDP systems and their administrators. This is especially important where the system makes copies of data and distributes them to multiple locations. While this can improve performance and application reliability, it complicates the task of maintaining database integrity. The Remote Procedure Call (RPC) facility of client/server technology can enforce a measure of uniformity and integrity across multiple, distributed databases. An RPC is a stored procedure on a local server which, triggered by activity on that local server, takes action on a remote database server. Standardization on Structured Query Language can also further these ends. The application and enforcement of formal rules to the use of data resources, or known as data administration, is probably one of the most effective tools that can maintain integrity across distributed databases. Data consists of facts about objects that are of interest to the organization. As such, data is valuable. It should reflect the real world, and it should permit the meaningful representation of facts. This means that it must be authentic, authoritative, accurate, shared, secure, and intelligible. The database administrator develops a unified, logical view of the organizations data. This logical data model is the glue that holds the database together. The model includes the users view of that data, that is, the ways is which different groups of users view and manipulate data that is of interest to them. The entity-relationship model is one widely used technique for logical data modeling. One can characterize objects and facts as entities, relationships, or attributes. Once the administrator has worked out the logical model, it guides the selection and implementation of the database. It guides the overall structure of the database, as well as the details of implementation. Data administration also records and publishes the facts and information that it learns or discovers about the data. This information is the metadata. The implementation of distributed data processing is a complex and ambitious undertaking. In the best case, data administration can ensure that the use of data facilitates the overall goals of an organization. Bibliography:

Gerald Bernbom, Data Administration and Distributed Data Processing (Cause/Effect Vol. 14, Number 4, Winter 1992. Accessed 26 September 2000) available from http://www.lessaworld.com/dm/dm02.txt Internet Distributed Data Processing (Accessed 26 September 2000) available from http://www.bus.okstate.edu/lhammer/AISweb/AIS9.HTM Internet

Potrebbero piacerti anche