Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1
Need for RTQ?
• Hive-based queries are too slow because they must be translated into the batch-
oriented MapReduce programming framework.
• Another alternative, moving some of the data into a data mart, has generally
meant accessing only a summary subset of the data that may have filtered out
the signal from the noise.
• HBase has also been insufficient for analytics because its design center is to
support simple operations such as create, read, update, and delete rather than
other operations such as aggregation.
Cost savings
o Reduce duplicate storage with specialized systems
o Reduce data movement for interactive analysis
o Leverage existing tools and employee skills
Discoverability
o Single metadata repository for unified business
views
o Supports familiar SQL language and existing
BI/discovery tools
o Enables more users to interact with data
● High performance:
○ C++ instead of Java
○ Runtime code generation
○ Completely new execution engine that doesn't build on
MapReduce
● The initial design center has nested data types with the
goal of scaling to 10,000 servers, petabytes of data, and
trillions of records processed in seconds.
Copyright ©2012 Cloudwick Technologies 22
● STORM - Twitter has open-sourced Storm, its
distributed, fault-tolerant, real-time computation
system, at GitHub. Storm is the real-time processing
system developed by BackType and is mostly written in
Clojure.