Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
http://www.ccs.neu.edu/home/matthias/HtDP2e/index.html
:%s/foobar/hadoop
Replace words in vi
Linux replace in editor
:%s/\/dfsdata//g
2012-02-21 01:36:55,819 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Datanode state: LV = -19 CTime = 1328715960120 is newer than the
namespace state: LV = -19 CTime = 0
Configuration
Imp to know current features how the new api are coming
Job
Inputformats
Mapperclass
Reducerclass
Outputformat
Key and value
Java reflection
Amount reads u do amount of write u do
Network bandwidth
No of songs per artist
Write ur own object
Lzp rpms http://pkgs.repoforge.org/lzo/
Networkinglinux
http://www.linuxhomenetworking.com/wiki/index.php/Main_Page
crontab
*/10 * * * * netstat -plten 2>&1 >> /root/netstat.log
*/10 * 3 * * netstat -plten 2>&1 >> | mail -s "cronjob output"
avinashy@cloudwick.com
# umount /media/disk/
umount: /media/disk: device is busy
umount: /media/disk: device is busy
First thing youll do will probably be to close down all your terminals and xterms but
heres a better way. You can use the fuser command to find out which process was
keeping the device busy:
# fuser -m /dev/sdc1
/dev/sdc1: 538
# ps auxw|grep 538
donncha 538 0.4 2.7 219212 56792 ? SLl Feb11 11:25 rhythmbox
Mount problem
Mount o remount,rw /
Edit vi /etc/fstab
/dev/sda1
/storage/data1
reboot
clusteradmin
ALL=(ALL)
NOPASSWD: ALL
Linux commands
http://support.nagios.com/knowledgebase/faqs/index.php?
option=com_content&view=article&id=52&catid=35&faq_id=305&expand=f
alse&showdesc=true
http://yahoo.github.com/hadoop-common/installing.html
export PATH=$PATH:/usr/bin/:/usr/bin/
export JAVA_HOME=/usr/java/jdk1.7.0/
export PATH=$PATH:$JAVA_HOME
rpm i -force jdk.1.6.
if java version shows 1.4
then
rm /usr/bin/java
ln s /usr/java/jdk..1.6 /bin/java /usr/bin/java
LABEL Centos
MENU LABEL Centos
KERNEL images/centos/x86_64/5.6/vmlinuz
append vga=normal initrd=images/centos/x86_64/5.6/initrd.img
ramdisk_size=32768
ksdevice=eth0 ks=ftp://192.168.1.45/install/ks/ks.cfg
Avinash.ldif
# avinash, < style="font-weight:bold;">localdomain.com
#dn: uid=root,ou=People,dc=localdomain,dc=com
#uid: root
#cn: admin
#objectClass: account
#objectClass: posixAccount
#objectClass: top
#objectClass: shadowAccount
#userPassword: {SSHA}PCHPZji+1m+sX0HwudP+UEqL9RZ4CXNR
#shadowLastChange: 15221
#shadowMin: 0
#shadowMax: 99999
#shadowWarning: 7
#loginShell: /bin/bash
#uidNumber: 0
#gidNumber: 0
#homeDirectory: /root
#gecos: root
dn: uid=arun,ou=People,dc=localdomain,dc=com
Ldap Authentication
<Directory "/var/www/html">
AuthType Basic
AuthName "enter your login id"
AuthBasicProvider ldap
AuthzLDAPAuthoritative of
AuthLDAPURL ldap://192.168.1.45:389/dc=localdomain,dc=com?uid?sub
require valid-user
Options None
Ganglia
Download EPEL(extra packages for enterprise linux)
user@host ~]$ sudo rpm -Uvh
http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-54.noarch.rpm
puppet
http://www.linuxforu.com/how-to/puppet-show-automating-unixadministration/
http://library.linode.com/application-stacks/ puppet/installation#sph_configuring-puppet
Download EPEL if ur linux didnt have in ur linux.
rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/x86_64/epelrelease-5-4.noarch.rpm
client
yum install puppet enablerepo=epel
yum install ruby-rdoc
vi /etc/sysconfig/puppet
file { "/avi":
source => "/etc/httpd/conf/httpd.conf",
recurse => "true"
Restart puppetmaster
Puppetd server puppet.example.com waitforcert 60 test
If any request errors /var/lib/puppet/ssl/certs or certificates_requests we
haqqve to delete in certs folder in client and server
http://ankitasblogger.blogspot.com/2011/01/hadoop-cluster-setup.html
Hadoop cluster setup
http://www.mazsoft.com/blog/post/2009/11/19/setting-up-hadoophive-cluster-on-Centos5.aspx
Install hadoop tar from cloudera tarball or rpm but I recommend through
tarball
Install java through rpm
Copy hadoop to /usr/local
Cp r hadoop.0.20.2.cdh3.u2. /usr/local
Export java path
export JAVA_HOME=/usr/java/jdk
change in core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.127:8020</value>
</property>
change in hdfs.site.xml
/storage/name(sda) /storage1/name(disk)sdb
ssh b@ip
or
ssh-copy-id i /home/hadoop/.ssh/id_rsa.pub hadoop@192.168.1.12x
ssh centos1
ssh centos2
...
ssh b/w jobtracker and datanodes
namenode conf files
nano conf/masters
secondary namenode
centos1
nano conf/slaves
datanodes
secondary namenode
vi masters
secondary namenode ip addr
slaves
datanodes ip addr
change the permissions of tarball hadoop with hadoop:hadoop for all the
servers
ssh-keygen -t dsa
ssh-copy-id -i /home/hadoop/.ssh/id_dsa hadoop@localhost
for each server
bin/dfs.start.all in namenode
bin/mapred.start.all in jobtracker
Namenode
SN
JT
DN
DN TT
DN TT
TT
MASTER SNIP
MASTER SN IP
SDN
lavesTT
SLAVES
4DNIP
SLAVES DNIP
No slaves
/etc/puppet/manifests/Site.pp
Node hostname{
Include ganglia
include ganglia::copy_conf
include ganglia::copy_services
}
#Ganglia service
class ganglia{
package { 'rpm wget ftp://192.168.1.102/ganglia-gmond-3.0.71.el5.x86_64.rpm':
ensure => installed
}
}
worked with another
class ganglia{
exec{"gmond":
command => "/usr/bin/wget ftp://192.168.1.146/ganglia-gmond3.0.7-1.el5.x86_64.rpm",
cwd => "/root",
creates => "/root/ganglia-gmond-3.0.7-1.el5.x86_64.rpm",
}
}
class ganglia{
package { 'ganglia':
ensure => installed
}
http://www.unixmen.com/linux-tutorials/1591-install-puppet-master-andclient-in-ubuntu
/sbin/service {
'ganglia':
ensure => true,
enable => true,
require => Package['ganglia']
}
package { 'yum':
ensure => installed,
}
}
http://tech.mangot.com/
class ganglia{
package { "ganglia":
ensure => installed
}
package { "ganglia-gmond":
ensure => installed
}
service { "gmond":
ensure => running,
subscribe => File["/etc/init.d/gmond"],
enable => true,
require => File["gmond"]
}
}
Another .pp
enable
=> "true",
name
=> "pakiti",
start => "/etc/init.d/pakiti start",
status => "/etc/init.d/pakiti status",
stop
=> "/etc/init.d/pakiti stop",
ensure => "running",
hasstatus => "true",
require => Package["pakiti-client"],
}
package { "ganglia":
=> "true",
#start
ensure
=> "running",
package { "ganglia":
ensure => installed
}
package { "ganglia-gmond":
ensure => installed
}
include ganglia::copy_conf
include ganglia::copy_services
}
service { "gmond":
enable
=> "true",
#start
ensure
=> "running",
class ganglia::copy_services{
file { 'gmond':
path => '/etc/init.d/gmond',
content =>
template('/etc/puppet/modules/ganglia/templates/services/gmond.erb'),
ensure => file,
owner => "root",
group => "root",
mode => 777,
}
}
class ganglia::copy_conf{
file { 'gmond.conf':
path => '/etc/gmond.conf',
ensure => file,
content =>
template('/etc/puppet/modules/ganglia/templates/conf/gmond.conf.erb'),
owner => "root",
group => "root",
Errors remove the requests and certs when you get error
/var/lib/puppet/ssl/certs
/var/lib/puppet/ssl/certificates-requests
Or else reinstall and run puppet
f
http://groups.google.com/group/puppetusers/browse_thread/thread/1b4f4edf1d328b4d?pli=1
Hadoop with puppet
http://itand.me/using-puppet-to-manage-users-passwords-and-ss
http://duxklr.blogspot.com/2011/05/using-puppet-to-manage-users-groupsand.html
define add_user($uid){
$username = $avinash,
user {$avinash:
home => "/home/$avinash",
shell => "/bin/bash",
uid => $503,
ensure => created,
}
group{$avinash:
gid => $504,
require => user[$avinash]
ensure => created;
}
file{"/home/$avinash/":
ensure => directory,
owner => $avinash,
group => $avinash,
mode => 750,
$username = $avinash,
user { $avinash:
home => "/home/$avinash",
shell => "/bin/bash",
uid => $503,
class /usr/sbin/useradd::virtual {
@user { "avinash":
home => "/home/avinash",
ensure => "present",
groups => ["root","avinash"],
uid => "504",
password => "centos",
comment => "User",
shell => "/bin/bash",
managehome => "true",
}
http://marksallee.wordpress.com/2010/08/25/create-a-puppet-test-networkwith-virtualbox/
puppet with hadoop
class hadoop{
exec{"hadoop-tar":
command => "/usr/bin/wget ftp://192.168.1.127/hadoop-0.20.2cdh3u2.tar.gz",
cwd => "/home/hadoop",
creates => "/home/hadoop/hadoop-0.20.2-cdh3u2.tar.gz",
}
exec {"hadooptar":
command => "/bin/tar -xvvf hadoop-0.20.2-cdh3u2.tar.gz",
cwd => "/home/hadoop",
creates => "/home/hadoop/hadoop-0.20.2-cdh3u2/",
}
# a fuller example, including permissions and ownership
file { "/storage":
ensure => "directory",
owner => "hadoop",
group => "hadoop",
mode => 750,
}
}
future
http://bitfieldconsulting.com/puppet-and-mysql-create-databases-and-users
mysql
http://blog.gurski.org/index.php/2010/01/28/automatic-monitoring-withpuppet-and-nagios/
nagios
pig
tar
Loadfunc
Reverse
Name
Avinash
Sharad
o/p
hsanivia
Setpath hadoop,hbase,java
Copy mysql jdbc driver..jar to sqoop/lib/
Create user account for hadoop
Password for hadoop in mysql
Use database;
Create a table in mysql
Create table tname(id int,name char(20),primary key(id));
Insert into tname values()
GRANT ALL ON mysql.* TO 'hadoop'@'localhost';
sqoop import --connect jdbc:mysql://192.168.1.56/<mysql> --username
root --password centos --table <sqooptable>
bin/pig -x local
REGISTER /home/hadoop/Pigex.jar
b = foreach a generate number,age,year(c);
a = load '/home/hadoop/y.txt' as (number,age,year);
drbd http://www.cloudera.com/blog/2009/07/hadoop-ha-configuration/
touch /home/hadoop/excludes
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/excludes</value>
<final>true</final>
</property>
<property>
<name>mapred.hosts.exclude</name>
<value>/home/hadoop/excludes</value>
<final>true</final>
</property>
#start httpd
#stop iptables
In worstcase
export ANT_LIB=/usr/share/ant/lib
bin/hive --service hwi
https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface
http://localhost:9999/hwi
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.231:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop-0.20/cache/${user.name}</value>
</property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
</configuration>
Hdfs-site-xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<!-- specify this so that running 'hadoop namenode -format' formats the
right dir -->
<name>dfs.name.dir</name>
<value>/var/lib/hadoop-0.20/cache/hadoop/dfs/name</value>
</property>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
<property>
<name>topology.script.file.name</name>
<value>/home/hadoop/topology.py</value>
</property>
'''
import sys
from string import join
DEFAULT_RACK = '/default/rack0';
RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
'1.2.3.4' : '/datacenter1/rack0',
'1.2.3.5' : '/datacenter1/rack0',
'1.2.3.6' : '/datacenter1/rack0',
'10.2.3.4' : '/datacenter2/rack0',
'10.2.3.4' : '/datacenter2/rack0'
}
if len(sys.argv)==1:
print DEFAULT_RACK
else:
print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")
Saas
Pass
Iaas
Namenode is in safemode
Nfs
Given doc
Server-backup
Client namenode
#wget ftp://192.168.1.32/jdk-6u25-linux-x64-rpm.bin
#chmod 755 jdk-6u25-linux-x64-rpm.bin
#./jdk-6u25-linux-x64-rpm.bin
export JAVA_HOME=/usr/java/jdk1.6.0_25
export PATH=$PATH:$JAVA_HOME
hadoop log retention
IRC, "mapred.userlog.retain.hours" (24h default) controls this in my
environment and it seems to work fine on my cluster. Are you sure you
have tasklogs older than 24h lying around? It might even be a bug that
may have been fixed in the subsequent 0.20 releases that went out
recently.
Thanks for the reply. I realized that the property you mentioned
was missing in my mapred-site.xml.
I added the entry and it works just fine.
Was my assumption that "*hadoop.tasklog.logsRetainHours " *in
log4j.properties will do the same wrong? What is this property for in that
case?
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_NAMENODE_OPTS"
HADOOP_NAMENODE_OPTS="-Xmx500m" will set it to 500MB. The "OPTS" here
> refers to JVM options. -Xmx is a common JVM option to set the maximum
> heap.
Set dfs.replication=2;
Increase the heap size of tasktracker jvm
mapred.child.java.opts property.
The default setting is -Xmx200m, which gives each task 200 MB of memory.
Datanode summary
http://192.168.1.123:50075/blockScannerReport?listblocks
46122674
ntp
server 0.us.pool.ntp.org
server 1.us.pool.ntp.org
server 2.us.pool.ntp.org
server 3.us.pool.ntp.org
service ntpd start
chkconfig ntpd on
iptables -I INPUT -p udp --dport 123 -j ACCEPT
iptables L
ntpq -p