Sei sulla pagina 1di 2

In order to store and process the data for the Glasshouse project for the client

Kevin Hayes the functional requirements for the data were obtained in an
interview. The team were given the following edge cases for the final project. The
maximum utilisation of the database is the input of the 32,000 entries every five
minutes from the microcontroller. The database would need to store, at
minimum, the weight, moisture, temperature and light levels for each plant. The
database would also have to be able to corroborate which plants belong to which
experiments and what dates they should be monitored by. The data from this
Datastore needs to be able to quickly and responsively be displayed on a web
based platform allowing, at first 10 concurrent users, but with future scalability
to thousands of concurrent users.
Some quick estimations suggest the maximum input size for each data entry,
assuming the worst case scenario is 52 bytes worth of data. With the edge cases
provided of 35,000 sensors and data being polled every five minutes this is
approximately 10.08M insert operations daily with a maximum possible increase
of 0.5GB a day. Over the scale of years this adds up and with the nature of the
research requiring old data to be stored Big Data solutions may be suitable for
use in this project.
Three main solutions were compared and assessed for their suitability in this
project. MongoDB, Hardoop (Spark & Hive) and MySQL a traditional RDMS.
Documentation and prepared benchmarks were performed an a dummy suite of
data. It was found for all the data store solutions mentioned above similar
performance was found for a table of roughly 10,000 rows. However as the table
grew in size unless the constraints were relaxed in MySQL insert operations
inserts became more and more costly. MongoDB handled the insert operations
with greater efficiency. Hardoop also handles the inserts with great efficiency.
From these tests it was determined that a traditional RDMS was not the correct
choice for our client.
One of the key features of our designed solution is the ability to display the
obtained data on a web server. MongoDB has libraries which allow direct
querying of the Datastore using PHP, JSP or many other language whilst it was
found with Hardoop that the outputs required would need to be computed and
then output into a traditional DBMS before being displayed on the web server.
In the end of this investigation two viable solutions were found and are compared
below.
DBMS
Hardoop
(Spark/H
ive)
MongoD
B

Responsive
ness

Integrati Scalabili
on
ty

Ease of
Setup

Cost

Final

X
X

Overall it was determined that MongoDB is the most suitable DBMS for this
project. However Big Data techniques such as aggregations can be applied to
this datastore in order to further improve responsiveness whilst also allowing
some data warehousing calculations to be performed. MongoDB allows for some
scalability and if in future a proper Big Data solution is required it is relatively
easy to export MongoDB data into another system.

Potrebbero piacerti anche