Sei sulla pagina 1di 11

August 30, 2011

Apache Hadoop

Udbhav Garg VIII- A

Overview
Apache Hadoop is a It is a collection of software libraries providing a defined application programming interface (API) that supports dataintensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

Main Features
Apache Software Foundation Developer(s) Stable release Preview release Development status Written in Operating system 0.20.203 / May 11, 2011; 3 months ago 0.21.0 / August 23, 2010; 11 months ago Active Java Cross-platform

Type
License

Distributed File System


Apache License 2.0

Website

http://hadoop.apache.org/

File System Supported


By May 2011, the list of supported Filesystems included: HDFS: Hadoop's own rack-aware filesystem. This is designed to scale to tens of petabytes of storage and runs on top of the filesystems of the underlying operating systems. Amazon S3 filesystem. This is targeted at clusters hosted on the Amazon Elastic Compute Cloud server-on-demand infrastructure.

File System Supported


CloudStore (previously Kosmos Distributed File System), which is rack-aware. FTP Filesystem: this stores all its data on remotely accessible FTP servers. Read-only HTTP and HTTPS file systems

Commercial Application
As of October 2009, commercial applications of Hadoop included: Log and/or clickstream analysis of various kinds Marketing analytics Machine learning Sophisticated data mining Image processing Processing of XML messages Web crawling and/or text processing General archiving.

Prominent users
Prominent users for Hadoop are: Yahoo! On February 19, 2008, Yahoo! Inc. launched what it claimed was the world's largest Hadoop production application. Facebook In the year 2010 Facebook claimed that they have the largest Hadoop cluster in the world with 21 PB of storage. On July 27, 2011 they announced the data has grown to 30 PB

Supported Products
Commercially supported Hadoop-related products There are a number of companies offering commercial implementations and/or providing support for Hadoop. Cloudera offers CDH (Cloudera's Distribution including Apache Hadoop) and Cloudera Enterprise. IBM offers InfoSphere BigInsights based on Hadoop in both a basic and enterprise edition.

Supported Products
In March 2011, Platform Computing announced support for the Hadoop MapReduce API in its Symphony software. In May 2011, MapR Technologies, Inc. announced the availability of their distributed filesystem and MapReduce engine, the MapR Distribution for Apache Silicon Graphics International offers Hadoop optimized solutions based on the SGI Rackable and CloudRack server lines with implementation services.

Supported Products
In June 2011, Yahoo! and Benchmark Capital formed Hortonworks Inc., whose focus is on making Hadoop more robust and easier to install, manage and use for enterprise users

Thank you

Potrebbero piacerti anche