Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ds
Copyright
Copyright 2011 Gluster, Inc. This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein.
Pg No. 2
ds
Table of Contents
1. About this Guide ............................................................................................... 4 1.1. Disclaimer ................................................................................................ 4 1.2. Audience .................................................................................................. 4 1.3. Prerequisite .............................................................................................. 4 1.4. Terms ...................................................................................................... 4 1.5. Typographical Conventions ............................................................................ 5 1.6. Feedback ................................................................................................. 5 2. Introducing Hadoop Compatible Storage of GlusterFS .................................................. 6 2.1. Architecture Overview ................................................................................. 6 2.2. Advantages ............................................................................................... 6 3. Preparing to Install Hadoop Compatible Storage ........................................................ 7 3.1. Pre-requisites ............................................................................................ 7 3.2. Dependencies ............................................................................................ 7 4. 5. 6. Installing and Configuring Hadoop Compatible Storage ................................................ 8 Starting and Stopping the Hadoop MapReduce Daemon on GlusterFS .............................. 11 5.1. Starting and Stopping MapReduce Daemon ........................................................ 11 Troubleshooting Hadoop Compatible Storage ........................................................... 12 6.1. Time Sync ................................................................................................ 12 6.2. Socket Creation Errors ................................................................................ 12 7. Creating GlusterFS Volumes ................................................................................ 13 7.1. Creating Distributed Striped Replicated Volumes ................................................ 13 7.2. Creating Striped Replicated Volumes ............................................................... 14 8. Managing Your Gluster Filesystem......................................................................... 15
Pg No. 3
ds
1.1. Disclaimer
Gluster, Inc. has designated English as the official language for all of its product documentation and other documentation, as well as all our customer communications. All documentation prepared or delivered by Gluster will be written, interpreted and applied in English, and English is the official and controlling language for all our documents, agreements, instruments, notices, disclosures and communications, in any form, electronic or otherwise (collectively, the Gluster Documents). Any customer, vendor, partner or other party who requires a translation of any of the Gluster Documents is responsible for preparing or obtaining such translation, including associated costs. However, regardless of any such translation, the English language version of any of the Gluster Documents prepared or delivered by Gluster shall control for any interpretation, enforcement, application or resolution.
1.2. Audience
This guide is intended for Apache Hadoop users interested in using GlusterFS as filesystem for Hadoop.
1.3. Prerequisite
This document assumes that you are familiar with the Linux operating system, concepts of File System, GlusterFS concepts, Apache Hadoop, and MapReduce framework.
1.4. Terms
Term master slave job Description Master manages scheduling of jobs, assigns tasks to slaves, monitors tasks and re-executes the failed tasks. Program which submits a job to the master. A set of map and/or reduce tasks, coordinated by the master. When the master receives a job, it assigns a unique name for the job, and assigns the tasks to workers until they are all completed. The first phase of a job, in which tasks are usually scheduled on the same node where their input data is hosted, so that local computation can be performed. Generally there is one map task per input. Individual task in this phase, which usually has access to all values for a given key produced by the map phase.
map
reduce
Pg No. 4
ds
Description A paradigm and associated framework for distributed computing, which decouples application code from the core challenges of fault tolerance and data locality. A task is essentially a unit of work, provided to a worker. A worker is responsible for carrying out a task. A job specifies the executable that is the worker. Workers are scheduled to run on the nodes, close to the data they are supposed to be processing.
worker
Square Brackets
Curly Brackets
1.6. Feedback
Gluster welcomes your comments and suggestions on the quality and usefulness of its documentation. If you find any errors or have any other suggestions, write to us at docfeedback@gluster.com for clarification and provide the chapter, section, and page number, if available. Gluster offers a range of resources related to Gluster software: Discuss technical problems and solutions on the Discussion Forum (http://community.gluster.org) Get hands-on step-by-step tutorials (http://www.gluster.com/community/documentation/index.php/Main_Page)
Pg No. 5
ds
2.2. Advantages
The following are the advantages of Hadoop Compatible Storage with GlusterFS: Provides simultaneous file-based and object-based access within Hadoop. Eliminates the centralized metadata server. Provides compatibility with MapReduce applications and rewrite is not required. Provides a fault tolerant filesystem.
Pg No. 6
ds
3.1. Pre-requisites
The following are the pre-requisites to install and configure GlusterFS with Hadoop Compatible Storage: Hadoop 0.20.2 is installed, configured, and is running on all the machines in the cluster. Java Runtime Environment Maven (mandatory only if you are building the plugin from the source) JDK (mandatory only if you are building the plugin from the source) Source code is available at https://github.com/gluster/hadoop-glusterfs.
3.2. Dependencies
The following package will be installed when you install Hadoop Compatible Storage on Gluster: getfattr
Pg No. 7
ds
7. (Optional) To install Hadoop Compatible Storage in a different location, run the following command: # rpm ivh --nodpes prefix /usr/local/glusterfs/hadoop glusterfs-hadoop0.20.2-0.1.x86_64.rpm
Pg No. 8
ds
8. Edit the conf/core-site.xml file. The following is the sample conf/core-site.xml file: <configuration> <property> <name>fs.glusterfs.impl</name> <value>org.apache.hadoop.fs.glusterfs.GlusterFileSystem</value> </property> <property> <name>fs.default.name</name> <value>glusterfs://fedora1:9000</value> </property> <property> <name>fs.glusterfs.volname</name> <value>hadoopvol</value> </property> <property> <name>fs.glusterfs.mount</name> <value>/mnt/glusterfs</value> </property> <property> <name>fs.glusterfs.server</name> <value> fedora2</value> </property> <property> <name>quick.slave.io</name> <value>Off</value> </property> </configuration> The following are the configurable fields: Property Name fs.default.name Default Value glusterfs://fedora1:9000 Description Any hostname in the cluster as the server and any port number. GlusterFS volume to mount. The directory used to fuse mount the volume. Any hostname or IP address on the cluster except the client/master.
Pg No. 9
ds
Description Performance tunable option. If this option is set to On, the plugin will try to perform I/O directly from the disk filesystem (like ext3 or ext4) the file resides on. Hence read performance will improve and job would run faster. Note: This option is not tested widely.
9. Create a soft link in Hadoops library and configuration directory for the downloaded files (in Step 7) using the following commands: # ln -s <target location> <source location> For example, # ln s /usr/local/lib/glusterfs-0.20.2-0.1.jar $HADOOP_HOME/lib/glusterfs-0.20.2-0.1.jar # ln s /usr/local/lib/conf/core-site.xml $HADOOP_HOME/conf/core-site.xml 10. (Optional) You can run the following command on Hadoop master to build the plugin and deploy it along with core-site.xml file, instead of repeating the above steps: # build-deploy-jar.py -d $HADOOP_HOME -c
Pg No. 10
ds
Pg No. 11
ds
Pg No. 12
ds
Pg No. 13
ds
Pg No. 14
ds
Pg No. 15