Oracle, Mysql and Postgresql DBMS: Comparison of

Comparison of
Oracle, MySQL and PostgreSQL DBMS

in the context of ALICE needs
Wiktor Peryt, Warsaw University of Technology, Faculty of Physics
We have taken the following approach: first of all we determined what features of DBMS are important from the point of view of such a large experiment.
We chose the following features: Elementary features

basic data types SQL language features declarative integrity constraints programming abstractions automatic generation of identifiers national characters support
Transactions and multi-user access

transactions locks multi-user access
Programming in database

stored procedures triggers
Elements of database administration

access control backup copies data migration
Portability and scalability

portability of DBMS scalability
Performance and VLDB (Very Large Databases)

query optimization structures supporting query optimization support for analytical processing allocation of disk space data size limits VLDB implementations access to multiple databases heterogeneous systems support
Distributed databases


access to multiple databases heterogeneous systems support large objects in database post-relational extensions support for special data types embedded SQL standard interfaces, additional interfaces interoperability with Web technology XML, CASE
Special data types

Application development and interfaces

Reliability
failure recovery
Commercial issues
technical support available market position Having completed step one we carried out subsequent work in 3 subgroups; each of them dealt with only one DBMS. The members of particular subgroups had their own practical experience with using DBMS being subject to investigation by their subgroup. Such a procedure gave us the possibility of verifying information contained in manuals and other documentation available (for instance on Internet). As a result 3 extended documents devoted to Oracle, MySQL and PostgreSQL were created
Konrad Bohuszewicz Maciej Czyzowicz Michal Janik Dawid Jarosz Piotr Mazan Marcin Mierzejewski Mikolaj Olszewski Wiktor S. Peryt Sylwester Radomski Piotr Szarwas Tomasz Traczyk Dominik Tukendorf Jacek Wojcieszuk
undergraduate student undergraduate student Ph.D. student undergraduate student undergraduate student undergraduate student undergraduate student undergraduate student Ph.D. student undergraduate student undergraduate student
Faculty of Electronics and Information Technology Faculty of Mathematics and Information Sciences Faculty of Physics
About Comparison
discussion by all people involved in this task compilation was made by Dr. Tomasz Traczyk compilation circulated within the whole group a few times to make sure we avoided some omissions or mistakes this version of the document is accepted by all co-authors we consider it a quite comprehensive and objective comparison it contains also some kind of "weights" called by us "importance", with differentiation for Central database and Lab-participants. Central database should be a kind of data warehouse at CERN, containing all the data, also data transferred from Lab-participants periodically the term "Lab-participants" denotes smaller databases in labs involved in ALICE experiment preparation few explanations of terminology used in the database domain are also included to make this document easy to comprehend for non-specialists
Summary
Importance Category Problem Basic data types SQL Declarati ve constraints Programming abstractions Generation of i ds National chars Transacti ons Transacti ons Programming in DB Administration Locks Multiuser access Stored pr ocedures and triggers Access control Backup Data migrati on Portability Scalability Query opti mization Structures supporting optimization Performance and VLDB Support for OLAP Allocation of the disk space Size limits VLDB i mplementati ons Access to multi ple databases Heterogeneous systems support Large objects Post-relational extensions Support for s pecial data types Embedde d SQL Standard interfaces Addi tional interfaces Web technolog y XML CAS E Recovery Prices Technical support Position on the market Central database C B B A C B A A A B B A C B A A B B A A A C B B C C C B A A B B A C A A Labpartici pants C B B C C C C C D C D C C C C C D D C B C D D B C C C C A A C C B A B C MySQL B C C D C B D D C D A C A B B B D D C B D C D B D D D B A B D D C A C D Assessment Oracle8 C B A A A A A A A A A A B A A A A A A A A A B A A A A A A A A A A D B A PostgreSQL A B A C A B A A C A B C A B C B B D C C B C D C B C B B A B D D C A D D
Elementary features
Portability and scalability
Special data types
Application development and interfaces
Reliability Commerci al issues
Our preliminary conclusions: for Central Data Repository for ALICE at CERN:
only ORACLE can be taken into account seriously

for Labs-participants (mainly for production phase databases):
Oracle is also the best but using MySQL or PostgreSQL is possible

the choice one of them is not obvious at the moment Some extended tests concerning MySQL and PostgreSQL performance, stability etc. with real data for STAR SSD are still in progress in Warsaw. They will be published in 1-2 weeks on the website:
http://ITS_DB_ALICE.if.pw.edu.pl
the same place for document Comparison of Oracle, MySQL and PostgreSQL DBMS
Questions for ALICE
How to start with databases for ALICE and how to manage the project? General concept of system architecture Databases in production phase Software technologies recommended DBMS platform choice How to proceed?
Databases types for ALICE

The following main categories of information should go into databases:
production and assembly phase measurements and descriptive data ProdPhase database calibrations data Calibration database configuration data Configuration database detector condition data Condition database run logs data RunLog database geometry data (?) Geometry database or part of Calibration
DB (?)
some others? ... to be defined later, during "phase one" work
Databases contents
(1)
ProdPhase database
all information coming from test-beds, from manufacturers, assembly processes, object flow between manufacturers and labs, etc.
RunLog database
to store the summary information describing the contents of an experimental run and to point the locations where detailed information associated with the run is stored
Example of Web based interface developed by Sylwester Radomski (undergraduate student from Faculty of Physics, WUT) for STAR can be seen on http://www.star.bnl.gov -> Computing and from table New the first item
Why database in production phase?

The environment in which the archive facility operates is composed of many sources of information We have to deal with data: produced by various test-bench systems entered manually by operators submitted by collaborating institutes and companies Usually there is a number of distinct data formats Files are stored in many locations
Consequently, without database:
it is not only hard to locate the right piece of information but also to ensure the safety and good quality of data
Goals for production phase database
secure archiving of all the test results in repository easy availability of info upon location of objects (in geographic sense: manufacturers, labs) makes the assembly arrangement easier creating the possibility of automatic assignment of quality attributes according to the well defined criteria statistical analysis of the quality should be made easily and at any time preparing data for future on-line use by slow-control, DCS and DAQ easy access to all data during production and assembly phase In the future - easy access to all data during experiment run
DB production phase
Basic requirements:
data should be stored in central repository to make easy and reliable the management and maintenance access to the data should be assured for everybody which participates in tests during production phase, i.e. software allowing use of WEB browsers is necessary objects' registration should be possible manually (by operator with suitable privileges) as well as automatically (from LabVIEW application, for example or other software) The software should allow creating (SQL) queries to the database even for inexperienced users
From the point of view of domain experts ... (1)
there is an ever-increasing demand for centralized storage of data for consistent and easy to use search and retrieval facilities experts want to be able to retrieve and analyze the information in a user-friendly way, regardless of its origin They do not want to be forced to perform several queries just because data in question was taken by different data acquisition systems
From the point of view of domain experts ... (2)
they wish to do statistics on data sets spanning months (and more) without having to browse tens of subdirectories on backup storage devices usually - they prefer to use industry-standard, versatile software tools to process and analyze data they certainly would not mind should they be able to automate their routine, everyday tasks
Their task is to look at the information, not to look for it
Requirements addressed to software developers
we should address those issues by providing a modular framework for archiving and for platformindependent retrieval of data in heterogeneous distributed computing environment our database system must be open enough to follow inevitable evolution of information gathering systems related to the development of the particular detectors we should be able to cope with the fast evolving new Internet technologies in order to take full advantage of facilities they provide
DB for STAR - software technologies used Use of:

PHP4 software running on the server side C/C++ for API JAVA + SWING & JDBC for applications requiring more interactivity (JDBC = JAVA DataBase Connectivity) seems to be the right choice of tools used for client side software development
"
On the flight plots creation ...
From SQL query to plot ...

Generation of plots and histograms from database and putting them on the Web. Attempt made by S. Radomski:
Data chain: Http server (Apache - Tomacat) calls servlet (dbPlot) with parameter - SQL query. Servlet in http server connects to ROOT based server through socket and sends query ROOT server means ROOT script which handles connections and scripting dbPlot class. dbPlot::Init() reuse existing connection to database or creates new one if the old one does not exist. dbPlot::TakeData() server sends query to DB and takes data using TSQLServer class. dbPlot::TakeData() takes data from TSQLResult and put them to TNtuple. This function can recognise and parse 'private' format of data stored in BLOB. dbPlot::PlotData() calls TTree->Draw() with proper parameters. dbPlot::Style() set colors and labels.
Performance and problems ...

One histogram takes about 1-2 sec. Slowest element in the chain - convert. Convert makes use of GhostScript. Creation of PostScript and then conversion to PNG is overcomplicated and rather simple TGrph with ~758 lines takes ~10 s ROOT cannot generate Gif in -b mode. Problems with memory deallocation in ROOT after about 100 plots ROOT crashes. Modification of Draw() in Ttree: In 1-D Histogram Draw() always makes 100 bins. If data has its own grid (measurement precision) plots look terribly - especially when histogramming integers. Small modification in TTreePlayer permits to recognize if data are gridy and sets number of
Typical architecture for local site/lab i.e. Lab-participant
measurements are performed on dedicated computer data are transferred over Ethernet local network to database users can access the measurements by means of JAVA applets or PHP applications graphical user interface make the construction of complex queries easy even for user with no database experience another capability of this applet is the visualisation of selected data it is clear that using JAVA, JDBC and PHP allows to access the database over the Internet or local network with user's favourite browser
JAVA applet
PHP applications
DB server
(daemon)
ROOT or AliROOT LabVIEW application
repository
DUT
Production phase database for ALICE

JAVA applet PHP applications
Lab 4
Lab n
MySQL server (daemon)
ROOT or AliROOT LabVIEW application
.....
Data Archive Server Library
ORACLE server (daemon)
MySQL repository
Lab 3
DUT
AliROOT
Lab 2
Application services Data services

DATA repository
DBMS
Lab1 somewhere in Europe
CERN
Central data warehouse at CERN

three tier architecture
Client layer modules and applications
Filters Generic data loader Custom data loader
Interactive software: WWW browsers, JAVA applets, PHP, HTML, command line utilities etc...
Data Archive Server Library
ORACLE server
AliROOT
Application services Data services
DATA repository
DBMS
one can easily distinguish the three logical tiers - according to present tendencies: client layer, application services layer and data services layer each layer contains several components (not all shown on the picture) top level is a layer containing client applications, responsible for data transfer into database and visualisation the middle layer is composed of application services; this layer knows the logical structure and physical locations of data the bottom layer contains data and Database Management System
How to start with databases for ALICE?
Project should be managed in few phases Project is large so I strongly suggest to apply
methodology proven in "commercial environment"
How to manage the project?
(1)
Phase 1: strategy (or planning) for the WHOLE project:

determination partitioning
of scope of the project
into subsystems (natural way: subdetectors, but not only) of list of actors/participants time schedule for particular tasks
formulation creation
of general models which could be applied
approximate initial
choice of software technologies
(2)
Successive phases should be performed in "spiral cycle"
It simply means that particular subsystems are elaborated successively. Each subsystem must go through the following phases:
analysis/conceptual design software design development (programming) implementation
improvements/corrections in earlier completed subsystems must be continued during the work on successive subsystems simultaneous work on several subsystems is a good practice;
(3)
work on pilot project - in parallel to the main one; the same software technology, it should contain most urgent things efficiency tests; creation of "simulated data" with capacity volumes similar to the expected ones creation of "conceptual models" during the analysis phase is necessary before design of subsystems; the appropriate formalism and class CASE tools are needed for that. For linux - UML (Unified Modelling Language) is a appropriate option elaboration for the whole project of such standards as: system of keys, terminology, security, access rights etc.
First steps ...

Start to formally organize database central group for ALICE After that: begin phase 1 of the project, i.e. strategy for the WHOLE project/experiment Partial, of highest priority tasks for this group:

determination of scope of the project formulation of general models which could be applied creation of list of actors/participants (including 1-2 representatives from each subdetector!) initial choice of software technologies which could be used partitioning into subsystems analysis/conceptual design

Oracle, Mysql and Postgresql DBMS: Comparison of

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Oracle, Mysql and Postgresql DBMS: Comparison of

Caricato da

Copyright:

Formati disponibili

Comparison of

Oracle, MySQL and PostgreSQL DBMS

We chose the following features: Elementary features

Transactions and multi-user access

transactions locks multi-user access

stored procedures triggers

Elements of database administration

access control backup copies data migration

Portability and scalability

portability of DBMS scalability

Performance and VLDB (Very Large Databases)

Special data types

Application development and interfaces

Portability and scalability

Special data types

Application development and interfaces

Reliability Commerci al issues

only ORACLE can be taken into account seriously

Oracle is also the best but using MySQL or PostgreSQL is possible

Questions for ALICE

Databases types for ALICE

some others? ... to be defined later, during "phase one" work

Why database in production phase?

Consequently, without database:

Goals for production phase database

From the point of view of domain experts ... (1)

From the point of view of domain experts ... (2)

Their task is to look at the information, not to look for it

Requirements addressed to software developers

DB for STAR - software technologies used Use of:

On the flight plots creation ...

From SQL query to plot ...

Performance and problems ...

Typical architecture for local site/lab i.e. Lab-participant

ROOT or AliROOT LabVIEW application

Production phase database for ALICE

MySQL server (daemon)

ROOT or AliROOT LabVIEW application

Application services Data services

Lab1 somewhere in Europe

Central data warehouse at CERN

Filters Generic data loader Custom data loader

Data Archive Server Library

Application services Data services

How to start with databases for ALICE?

methodology proven in "commercial environment"

How to manage the project?

Phase 1: strategy (or planning) for the WHOLE project:

of scope of the project

of general models which could be applied

choice of software technologies

How to manage the project?

Successive phases should be performed in "spiral cycle"

How to manage the project?

First steps ...

Potrebbero piacerti anche