Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
We have taken the following approach: first of all we determined what features of DBMS are important from the point of view of such a large experiment.
Programming in database
query optimization structures supporting query optimization support for analytical processing allocation of disk space data size limits VLDB implementations access to multiple databases heterogeneous systems support
Distributed databases
Distributed databases
access to multiple databases heterogeneous systems support large objects in database post-relational extensions support for special data types embedded SQL standard interfaces, additional interfaces interoperability with Web technology XML, CASE
Reliability
failure recovery
Commercial issues
technical support available market position Having completed step one we carried out subsequent work in 3 subgroups; each of them dealt with only one DBMS. The members of particular subgroups had their own practical experience with using DBMS being subject to investigation by their subgroup. Such a procedure gave us the possibility of verifying information contained in manuals and other documentation available (for instance on Internet). As a result 3 extended documents devoted to Oracle, MySQL and PostgreSQL were created
Konrad Bohuszewicz Maciej Czyzowicz Michal Janik Dawid Jarosz Piotr Mazan Marcin Mierzejewski Mikolaj Olszewski Wiktor S. Peryt Sylwester Radomski Piotr Szarwas Tomasz Traczyk Dominik Tukendorf Jacek Wojcieszuk
undergraduate student undergraduate student Ph.D. student undergraduate student undergraduate student undergraduate student undergraduate student undergraduate student Ph.D. student undergraduate student undergraduate student
Faculty of Electronics and Information Technology Faculty of Mathematics and Information Sciences Faculty of Physics
About Comparison
discussion by all people involved in this task compilation was made by Dr. Tomasz Traczyk compilation circulated within the whole group a few times to make sure we avoided some omissions or mistakes this version of the document is accepted by all co-authors we consider it a quite comprehensive and objective comparison it contains also some kind of "weights" called by us "importance", with differentiation for Central database and Lab-participants. Central database should be a kind of data warehouse at CERN, containing all the data, also data transferred from Lab-participants periodically the term "Lab-participants" denotes smaller databases in labs involved in ALICE experiment preparation few explanations of terminology used in the database domain are also included to make this document easy to comprehend for non-specialists
Summary
Importance Category Problem Basic data types SQL Declarati ve constraints Programming abstractions Generation of i ds National chars Transacti ons Transacti ons Programming in DB Administration Locks Multiuser access Stored pr ocedures and triggers Access control Backup Data migrati on Portability Scalability Query opti mization Structures supporting optimization Performance and VLDB Support for OLAP Allocation of the disk space Size limits VLDB i mplementati ons Access to multi ple databases Heterogeneous systems support Large objects Post-relational extensions Support for s pecial data types Embedde d SQL Standard interfaces Addi tional interfaces Web technolog y XML CAS E Recovery Prices Technical support Position on the market Central database C B B A C B A A A B B A C B A A B B A A A C B B C C C B A A B B A C A A Labpartici pants C B B C C C C C D C D C C C C C D D C B C D D B C C C C A A C C B A B C MySQL B C C D C B D D C D A C A B B B D D C B D C D B D D D B A B D D C A C D Assessment Oracle8 C B A A A A A A A A A A B A A A A A A A A A B A A A A A A A A A A D B A PostgreSQL A B A C A B A A C A B C A B C B B D C C B C D C B C B B A B D D C A D D
Elementary features
Distributed databases
Our preliminary conclusions: for Central Data Repository for ALICE at CERN:
http://ITS_DB_ALICE.if.pw.edu.pl
the same place for document Comparison of Oracle, MySQL and PostgreSQL DBMS
How to start with databases for ALICE and how to manage the project? General concept of system architecture Databases in production phase Software technologies recommended DBMS platform choice How to proceed?
Databases contents
(1)
ProdPhase database
all information coming from test-beds, from manufacturers, assembly processes, object flow between manufacturers and labs, etc.
RunLog database
to store the summary information describing the contents of an experimental run and to point the locations where detailed information associated with the run is stored
Example of Web based interface developed by Sylwester Radomski (undergraduate student from Faculty of Physics, WUT) for STAR can be seen on http://www.star.bnl.gov -> Computing and from table New the first item
The environment in which the archive facility operates is composed of many sources of information We have to deal with data: produced by various test-bench systems entered manually by operators submitted by collaborating institutes and companies Usually there is a number of distinct data formats Files are stored in many locations
it is not only hard to locate the right piece of information but also to ensure the safety and good quality of data
secure archiving of all the test results in repository easy availability of info upon location of objects (in geographic sense: manufacturers, labs) makes the assembly arrangement easier creating the possibility of automatic assignment of quality attributes according to the well defined criteria statistical analysis of the quality should be made easily and at any time preparing data for future on-line use by slow-control, DCS and DAQ easy access to all data during production and assembly phase In the future - easy access to all data during experiment run
DB production phase
Basic requirements:
data should be stored in central repository to make easy and reliable the management and maintenance access to the data should be assured for everybody which participates in tests during production phase, i.e. software allowing use of WEB browsers is necessary objects' registration should be possible manually (by operator with suitable privileges) as well as automatically (from LabVIEW application, for example or other software) The software should allow creating (SQL) queries to the database even for inexperienced users
there is an ever-increasing demand for centralized storage of data for consistent and easy to use search and retrieval facilities experts want to be able to retrieve and analyze the information in a user-friendly way, regardless of its origin They do not want to be forced to perform several queries just because data in question was taken by different data acquisition systems
they wish to do statistics on data sets spanning months (and more) without having to browse tens of subdirectories on backup storage devices usually - they prefer to use industry-standard, versatile software tools to process and analyze data they certainly would not mind should they be able to automate their routine, everyday tasks
we should address those issues by providing a modular framework for archiving and for platformindependent retrieval of data in heterogeneous distributed computing environment our database system must be open enough to follow inevitable evolution of information gathering systems related to the development of the particular detectors we should be able to cope with the fast evolving new Internet technologies in order to take full advantage of facilities they provide
PHP4 software running on the server side C/C++ for API JAVA + SWING & JDBC for applications requiring more interactivity (JDBC = JAVA DataBase Connectivity) seems to be the right choice of tools used for client side software development
"
measurements are performed on dedicated computer data are transferred over Ethernet local network to database users can access the measurements by means of JAVA applets or PHP applications graphical user interface make the construction of complex queries easy even for user with no database experience another capability of this applet is the visualisation of selected data it is clear that using JAVA, JDBC and PHP allows to access the database over the Internet or local network with user's favourite browser
JAVA applet
PHP applications
DB server
(daemon)
repository
DUT
Lab 4
Lab n
.....
Data Archive Server Library
ORACLE server (daemon)
MySQL repository
Lab 3
DUT
AliROOT
Lab 2
DBMS
CERN
Interactive software: WWW browsers, JAVA applets, PHP, HTML, command line utilities etc...
ORACLE server
AliROOT
DATA repository
DBMS
one can easily distinguish the three logical tiers - according to present tendencies: client layer, application services layer and data services layer each layer contains several components (not all shown on the picture) top level is a layer containing client applications, responsible for data transfer into database and visualisation the middle layer is composed of application services; this layer knows the logical structure and physical locations of data the bottom layer contains data and Database Management System
Project should be managed in few phases Project is large so I strongly suggest to apply
(1)
into subsystems (natural way: subdetectors, but not only) of list of actors/participants time schedule for particular tasks
formulation creation
approximate initial
(2)
It simply means that particular subsystems are elaborated successively. Each subsystem must go through the following phases:
analysis/conceptual design software design development (programming) implementation
improvements/corrections in earlier completed subsystems must be continued during the work on successive subsystems simultaneous work on several subsystems is a good practice;
(3)
work on pilot project - in parallel to the main one; the same software technology, it should contain most urgent things efficiency tests; creation of "simulated data" with capacity volumes similar to the expected ones creation of "conceptual models" during the analysis phase is necessary before design of subsystems; the appropriate formalism and class CASE tools are needed for that. For linux - UML (Unified Modelling Language) is a appropriate option elaboration for the whole project of such standards as: system of keys, terminology, security, access rights etc.
Start to formally organize database central group for ALICE After that: begin phase 1 of the project, i.e. strategy for the WHOLE project/experiment Partial, of highest priority tasks for this group:
determination of scope of the project formulation of general models which could be applied creation of list of actors/participants (including 1-2 representatives from each subdetector!) initial choice of software technologies which could be used partitioning into subsystems analysis/conceptual design