1.hadoop ha是通過什麼配置實現自動切換的?
 2.配置中mapred與mapreduce的區別是什麼? 

3.hadoop ha兩個namenode之間的關係是什麼? 



-- hadoop 版本:2.4.0 

-- 
 安裝包名:  
 

             hadoop-2.4.0.tar.gz 或者源碼版本 hadoop-2.4.0-src.tar.gz(我hadoop、hbase、hive均是用的源碼編譯安裝) 



--  lzo支持, 

                  參考: 
 

                   http://slaytanic.blog.51cto.com/2057708/1162287/ 

                   http://hi.baidu.com/qingchunranzhi/item/3662ed5ed29d37a1adc85709 


-- 安裝以下RPM包: 

yum -y install openssh* 

yum -y install man* 

yum -y install compat-libstdc++-33* 

yum -y install libaio-0.* 

yum -y install libaio-devel* 

yum -y install sysstat-9.* 

yum -y install glibc-2.* 

yum -y install glibc-devel-2.* glibc-headers-2.* 

yum -y install ksh-2* 

yum -y install libgcc-4.* 

yum -y install libstdc++-4.* 

yum -y install libstdc++-4.*.i686* 

yum -y install libstdc++-devel-4.* 

yum -y install gcc-4.*x86_64* 

yum -y install gcc-c++-4.*x86_64* 

yum -y install elfutils-libelf-0*x86_64* elfutils-libelf-devel-0*x86_64* 

yum -y install elfutils-libelf-0*i686* elfutils-libelf-devel-0*i686* 

yum -y install libtool-ltdl*i686* 

yum -y install ncurses*i686* 

yum -y install ncurses* 

yum -y install readline* 

yum -y install unixODBC* 

yum -y install zlib 

yum -y install zlib* 

yum -y install openssl* 

yum -y install patch 

yum -y install git 

yum -y -y install  lzo-devel zlib-devel gcc autoconf automake libtool 

yum -y install lzop 

yum -y install lrzsz 

yum -y -y install  lzo-devel  zlib-devel  gcc autoconf automake libtool 

yum -y install nc 

yum -y install glibc 

yum -y install java-1.7.0-openjdk 

yum -y install gzip 

yum -y install zlib 

yum -y install gcc 

yum -y install gcc-c++ 

yum -y install make 

yum -y install protobuf 

yum -y install protoc 

yum -y install cmake 

yum -y install openssl-devel 

yum -y install ncurses-devel 

yum -y install unzip 

yum -y install telnet 

yum -y install telnet-server 

yum -y install wget 

yum -y install svn 

yum -y install ntpdate 



-- hive 安裝 ,參考: 
http://kicklinux.com/hive-deploy/ 





-- ################################################################################## -- 

-- 總共5台服務器,如下: 

------------------------------------------------------------------------------------------------------------------ 

|      IP地址      |       主機名       | NameNode | JournalNode | DataNode | Zookeeper | Hbase     | Hive       |        

------------------------------------------------------------------------------------------------------------------ 

| 192.168.117.194  | funshion-hadoop194 | 是       | 是          | 否       | 是        | 是        | 否         | 

------------------------------------------------------------------------------------------------------------------ 

| 192.168.117.195  | funshion-hadoop195 | 是       | 是          | 否       | 是        | 是        | 否         | 

------------------------------------------------------------------------------------------------------------------ 

| 192.168.117.196  | funshion-hadoop196 | 否       | 是          | 是       | 是        | 是(Master)| 是(Mysql)  | 

------------------------------------------------------------------------------------------------------------------ 

| 192.168.117.197  | funshion-hadoop197 | 否       | 是          | 是       | 是        | 是        | 否         | 

------------------------------------------------------------------------------------------------------------------ 

| 192.168.117.198  | funshion-hadoop198 | 否       | 是          | 是       | 是        | 是        | 否         | 

------------------------------------------------------------------------------------------------------------------ 


-- ################################################################################## -- 

---------------------------------------------------------------------------------------- 

--  配置Linux、安裝JDK 


--參考: linux(ubuntu)安裝Java jdk環境變量設置及小程序測試 


-- ################################################################################## -- 

---------------------------------------------------------------------------------------- 

-- Step 1. 建立用户hadoop的ssh無密碼登陸
 

--參考: 

linux(ubuntu)無密碼互通、相互登錄高可靠文檔 


CentOS6.4之圖解SSH無驗證雙向登陸配置 



-- ################################################################################## -- 

---------------------------------------------------------------------------------------- 

-- Step 2. zookeeper配置(配置奇數台zk集羣,我用的5台)
 

-- 參考: Zookeeper集羣環境安裝過程詳解 


-- ################################################################################## -- 

---------------------------------------------------------------------------------------- 

-- Step 3. Hadoop集羣配置: 

 -- Step 3.1 vi $HADOOP_HOME/etc/hadoop/slaves
 

funshion-hadoop196 

funshion-hadoop197 

funshion-hadoop198 









-- Step 3.2 vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh  (添加 JAVA_HOME 環境變量、本地library庫)
 

export JAVA_HOME=/usr/java/latest 


export LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib 


export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native 

export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native" 



-- 注意: ${HADOOP_PREFIX}/lib/native 下的內容如下: 

[hadoop@funshion-hadoop194 native]$ pwd 

/usr/local/hadoop/lib/native 

[hadoop@funshion-hadoop194 native]$ ls -l 

total 8640 

-rw-r--r--. 1 hadoop hadoop 2850660 Jun  9 14:58 hadoop-common-2.4.0.jar 

-rw-r--r--. 1 hadoop hadoop 1509888 Jun  9 14:58 hadoop-common-2.4.0-tests.jar 

-rw-r--r--. 1 hadoop hadoop  178637 Jun  9 14:58 hadoop-lzo-0.4.20-SNAPSHOT.jar 

-rw-r--r--. 1 hadoop hadoop  145385 Jun  9 14:58 hadoop-nfs-2.4.0.jar 

-rw-r--r--. 1 hadoop hadoop  983042 Jun  6 19:36 libhadoop.a 

-rw-r--r--. 1 hadoop hadoop 1487284 Jun  6 19:36 libhadooppipes.a 

lrwxrwxrwx. 1 hadoop hadoop      18 Jun  6 19:42 libhadoop.so -> libhadoop.so.1.0.0 

-rwxr-xr-x. 1 hadoop hadoop  586664 Jun  6 19:36 libhadoop.so.1.0.0 

-rw-r--r--. 1 hadoop hadoop  582040 Jun  6 19:36 libhadooputils.a 

-rw-r--r--. 1 hadoop hadoop  298178 Jun  6 19:36 libhdfs.a 

lrwxrwxrwx. 1 hadoop hadoop      16 Jun  6 19:42 libhdfs.so -> libhdfs.so.0.0.0 

-rwxr-xr-x. 1 hadoop hadoop  200026 Jun  6 19:36 libhdfs.so.0.0.0 

drwxrwxr-x. 2 hadoop hadoop    4096 Jun  6 20:37 Linux-amd64-64 






-- Step 3.3 vi $HADOOP_HOME/etc/hadoop/core-site.xml
 -- (注意:fs.default.FS參數在兩個namenode節點均一樣,即5台機器的core-site.xml文件內容完全一樣) 


<configuration> 

        <property> 

                <name>fs.defaultFS</name> 

                <value>hdfs://mycluster</value> 

        </property> 

        <property> 

                <name>dfs.ha.fencing.methods</name> 

                <value>sshfence</value> 

        </property> 

        <property> 

                <name>dfs.ha.fencing.ssh.private-key-files</name> 

                <value>/home/hadoop/.ssh/id_rsa_nn2</value> 

        </property> 

        <property> 

                <name>ha.zookeeper.quorum</name> 

                <value>funshion-hadoop194:2181,funshion-hadoop195:2181,funshion-hadoop196:2181,funshion-hadoop197:2181,funshion-hadoop198:2181</value> 

        </property> 


        <property> 

                <name>io.compression.codecs</name> 

                <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value> 

        </property> 

        <property> 

                <name>io.compression.codec.lzo.class</name> 

                <value>com.hadoop.compression.lzo.LzoCodec</value> 

        </property> 

        <property> 

                <name>io.file.buffer.size</name> 

                <value>131072</value> 

        </property> 

        <property> 

                <name>hadoop.tmp.dir</name> 

                <value>/home/hadoop/tmp</value> 

                <description>Abase for other temporary directories.</description> 

        </property> 

        <property> 

                <name>hadoop.proxyuser.hadoop.hosts</name> 

                <value>*</value> 

        </property> 

        <property> 

                <name>hadoop.proxyuser.hadoop.groups</name> 

                <value>*</value> 

        </property> 

        <property> 

                <name>hadoop.native.lib</name> 

                <value>true</value> 

        </property> 

        <property> 

                <name>ha.zookeeper.session-timeout.ms</name> 

                <value>60000</value> 

                <description>ms</description> 

        </property> 

        <property> 

                <name>ha.failover-controller.cli-check.rpc-timeout.ms</name> 

                <value>60000</value> 

        </property> 

        <property> 

                <name>ipc.client.connect.timeout</name> 

                <value>20000</value> 

        </property> 

</configuration> 



-- 注意: 屬性值dfs.ha.fencing.ssh.private-key-files的值id_rsa_nn2 是privatekey(即/home/hadoop/.ssh/目錄id_rsa文件的拷貝,且權限為600) 

        <property> 

                <name>dfs.ha.fencing.ssh.private-key-files</name> 

                <value>/home/hadoop/.ssh/id_rsa_nn2</value> 

        </property> 






-- Step 3.4 vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
 

<configuration> 

        <property> 

                <name>dfs.nameservices</name> 

                <value>mycluster</value> 

        </property> 

        <property> 

                <name>dfs.ha.namenodes.mycluster</name> 

                <value>nn1,nn2</value> 

        </property> 

        <property> 

                <name>dfs.namenode.rpc-address.mycluster.nn1</name> 

                <value>funshion-hadoop194:8020</value> 

        </property> 

        <property> 

                <name>dfs.namenode.rpc-address.mycluster.nn2</name> 

                <value>funshion-hadoop195:8020</value> 

        </property> 

        <property> 

                <name>dfs.namenode.servicerpc-address.mycluster.nn1</name> 

                <value>funshion-hadoop194:53310</value> 

        </property> 

        <property>:q 

                <name>dfs.namenode.servicerpc-address.mycluster.nn2</name> 

                <value>funshion-hadoop195:53310</value> 

        </property> 

        <property> 

                <name>dfs.namenode.http-address.mycluster.nn1</name> 

                <value>funshion-hadoop194:50070</value> 

        </property> 

        <property> 

                <name>dfs.namenode.http-address.mycluster.nn2</name> 

                <value>funshion-hadoop195:50070</value> 

        </property> 

        <property> 

                <name>dfs.namenode.shared.edits.dir</name> 

                <value>qjournal://funshion-hadoop194:8485;funshion-hadoop195:8485;funshion-hadoop196:8485;funshion-hadoop197:8485;funshion-hadoop198:8485/mycluster</value> 

        </property> 

        <property> 

                <name>dfs.journalnode.edits.dir</name> 

                <value>/home/hadoop/mydata/journal</value> 

        </property> 

        <property> 

                <name>dfs.client.failover.proxy.provider.mycluster</name> 

                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 

        </property> 

        <property> 

                <name>dfs.ha.automatic-failover.enabled</name> 

                <value>true</value> 

        </property> 


        <property> 

                <name>dfs.namenode.name.dir</name> 

                <value>file:///home/hadoop/mydata/name</value> 

        </property> 

        <property> 

                <name>dfs.datanode.data.dir</name> 

                <value>file:///home/hadoop/mydata/data</value> 

        </property> 

        <property> 

                <name>dfs.replication</name> 

                <value>2</value> 

        </property> 

        <property> 

                <name>dfs.image.transfer.bandwidthPerSec</name> 

                <value>1048576</value> 

        </property> 

</configuration> 






-- Step 3.5 vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
 

<configuration> 

        <property> 

                <name>mapreduce.jobhistory.address</name> 

                <value>funshion-hadoop194:10020</value> 

        </property> 

        <property> 

                <name>mapreduce.jobhistory.webapp.address</name> 

                <value>funshion-hadoop194:19888</value> 

        </property> 

        <property> 

                <name>mapreduce.map.output.compress</name> 

                <value>true</value> 

        </property> 

        <property> 

                <name>mapreduce.map.output.compress.codec</name> 

                <value>com.hadoop.compression.lzo.LzoCodec</value> 

        </property> 

        <property> 

                <name>mapred.child.env</name> 

                <value>LD_LIBRARY_PATH=/usr/local/hadoop/lib/native</value> 

        </property> 

        <property> 

                <name>mapred.child.java.opts</name> 

                <value>-Xmx2048m</value> 

        </property> 

        <property> 

                <name>mapred.reduce.child.java.opts</name> 

                <value>-Xmx2048m</value> 

        </property> 

        <property> 

                <name>mapred.map.child.java.opts</name> 

                <value>-Xmx2048m</value> 

        </property> 

        <property> 

                <name>mapred.remote.os</name> 

                <value>Linux</value> 

                <description>Remote MapReduce framework's OS, can be either Linux or Windows</description> 

        </property> 

</configuration> 


-- 注意: 1、以mapred.開頭的形式去指定屬性名,都是一種過時的形式,建議使用mapreduce. 

            比如:mapred.compress.map.output 屬性應該對應修改成:mapreduce.map.output.compress 

            具體可以查閲: http://hadoop.apache.org/docs/r2 ... /mapred-default.xml  文件, 

      當然,好像還有少量屬性名是沒有修改的,比如:mapred.child.java.opts、mapred.child.env 


-- 注意: /usr/local/hadoop/lib/native 目錄下有如下內容: 

[hadoop@funshion-hadoop194 sbin]$ ls -l /usr/local/hadoop/lib/native 

total 12732 

-rw-r--r-- 1 hadoop hadoop 2850900 Jun 20 19:22 hadoop-common-2.4.0.jar 

-rw-r--r-- 1 hadoop hadoop 1509411 Jun 20 19:22 hadoop-common-2.4.0-tests.jar 

-rw-r--r-- 1 hadoop hadoop  178559 Jun 20 18:38 hadoop-lzo-0.4.20-SNAPSHOT.jar 

-rw-r--r-- 1 hadoop hadoop 1407039 Jun 20 19:25 hadoop-yarn-common-2.4.0.jar 

-rw-r--r-- 1 hadoop hadoop  106198 Jun 20 18:37 libgplcompression.a 

-rw-r--r-- 1 hadoop hadoop    1124 Jun 20 18:37 libgplcompression.la 

-rwxr-xr-x 1 hadoop hadoop   69347 Jun 20 18:37 libgplcompression.so 

-rwxr-xr-x 1 hadoop hadoop   69347 Jun 20 18:37 libgplcompression.so.0 

-rwxr-xr-x 1 hadoop hadoop   69347 Jun 20 18:37 libgplcompression.so.0.0.0 

-rw-r--r-- 1 hadoop hadoop  983042 Jun 20 18:10 libhadoop.a 

-rw-r--r-- 1 hadoop hadoop 1487284 Jun 20 18:10 libhadooppipes.a 

lrwxrwxrwx 1 hadoop hadoop      18 Jun 20 18:27 libhadoop.so -> libhadoop.so.1.0.0 

-rwxr-xr-x 1 hadoop hadoop  586664 Jun 20 18:10 libhadoop.so.1.0.0 

-rw-r--r-- 1 hadoop hadoop  582040 Jun 20 18:10 libhadooputils.a 

-rw-r--r-- 1 hadoop hadoop  298178 Jun 20 18:10 libhdfs.a 

lrwxrwxrwx 1 hadoop hadoop      16 Jun 20 18:27 libhdfs.so -> libhdfs.so.0.0.0 

-rwxr-xr-x 1 hadoop hadoop  200026 Jun 20 18:10 libhdfs.so.0.0.0 

-rw-r--r-- 1 hadoop hadoop  906318 Jun 20 19:17 liblzo2.a 

-rwxr-xr-x 1 hadoop hadoop     929 Jun 20 19:17 liblzo2.la 

-rwxr-xr-x 1 hadoop hadoop  562376 Jun 20 19:17 liblzo2.so 

-rwxr-xr-x 1 hadoop hadoop  562376 Jun 20 19:17 liblzo2.so.2 

-rwxr-xr-x 1 hadoop hadoop  562376 Jun 20 19:17 liblzo2.so.2.0.0 






-- Step 3.6 vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
 

<configuration> 

        <property> 

                <name>yarn.resourcemanager.connect.retry-interval.ms</name> 

                <value>60000</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.ha.enabled</name> 

                <value>true</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.cluster-id</name> 

                <value>rm-cluster</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.ha.rm-ids</name> 

                <value>rm1,rm2</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.ha.id</name> 

                <value>rm1</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.hostname.rm1</name> 

                <value>funshion-hadoop194</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.hostname.rm2</name> 

                <value>funshion-hadoop195</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.recovery.enabled</name> 

                <value>true</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.store.class</name> 

                <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.zk-address</name> 

                <value>funshion-hadoop194:2181,funshion-hadoop195:2181,funshion-hadoop196:2181,funshion-hadoop197:2181,funshion-hadoop198:2181</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.address.rm1</name> 

                <value>${yarn.resourcemanager.hostname.rm1}:23140</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.scheduler.address.rm1</name> 

                <value>${yarn.resourcemanager.hostname.rm1}:23130</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.webapp.https.address.rm1</name> 

                <value>${yarn.resourcemanager.hostname.rm1}:23189</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.webapp.address.rm1</name> 

                <value>${yarn.resourcemanager.hostname.rm1}:23188</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.resource-tracker.address.rm1</name> 

                <value>${yarn.resourcemanager.hostname.rm1}:23125</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.admin.address.rm1</name> 

                <value>${yarn.resourcemanager.hostname.rm1}:23141</value> 

        </property> 


        <property> 

                <name>yarn.resourcemanager.address.rm2</name> 

                <value>${yarn.resourcemanager.hostname.rm2}:23140</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.scheduler.address.rm2</name> 

                <value>${yarn.resourcemanager.hostname.rm2}:23130</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.webapp.https.address.rm2</name> 

                <value>${yarn.resourcemanager.hostname.rm2}:23189</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.webapp.address.rm2</name> 

                <value>${yarn.resourcemanager.hostname.rm2}:23188</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.resource-tracker.address.rm2</name> 

                <value>${yarn.resourcemanager.hostname.rm2}:23125</value> 

        </property> 

        <property> 

                <name>yarn.resourcemanager.admin.address.rm2</name> 

                <value>${yarn.resourcemanager.hostname.rm2}:23141</value> 

        </property> 


        <property> 

                <name>yarn.resourcemanager.scheduler.class</name> 

                <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> 

        </property> 

        <property> 

                <name>yarn.scheduler.fair.allocation.file</name> 

                <value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.local-dirs</name> 

                <value>/home/hadoop/logs/yarn_local</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.log-dirs</name> 

                <value>/home/hadoop/logs/yarn_log</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.remote-app-log-dir</name> 

                <value>/home/hadoop/logs/yarn_remotelog</value> 

        </property> 

        <property> 

                <name>yarn.app.mapreduce.am.staging-dir</name> 

                <value>/home/hadoop/logs/yarn_userstag</value> 

        </property> 

        <property> 

                <name>mapreduce.jobhistory.intermediate-done-dir</name> 

                <value>/home/hadoop/logs/yarn_intermediatedone</value> 

        </property> 

        <property> 

                <name>mapreduce.jobhistory.done-dir</name> 

                <value>/var/lib/hadoop/dfs/yarn_done</value> 

        </property> 


        <property> 

                <name>yarn.log-aggregation-enable</name> 

                <value>true</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.resource.memory-mb</name> 

                <value>2048</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.vmem-pmem-ratio</name> 

                <value>4.2</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.resource.cpu-vcores</name> 

                <value>2</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.aux-services</name> 

                <value>mapreduce_shuffle</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 

                <value>org.apache.hadoop.mapred.ShuffleHandler</value> 

        </property> 

        <property> 

                <description>Classpath for typical applications.</description> 

                <name>yarn.application.classpath</name> 

                <value> 

                        $HADOOP_HOME/etc/hadoop, 

                        $HADOOP_HOME/share/hadoop/common/*, 

                        $HADOOP_HOME/share/hadoop/common/lib/*, 

                        $HADOOP_HOME/share/hadoop/hdfs/*, 

                        $HADOOP_HOME/share/hadoop/hdfs/lib/*, 

                        $HADOOP_HOME/share/hadoop/mapreduce/*, 

                        $HADOOP_HOME/share/hadoop/mapreduce/lib/*, 

                        $HADOOP_HOME/share/hadoop/yarn/*, 

                        $HADOOP_HOME/share/hadoop/yarn/lib/* 

                </value> 

        </property> 

</configuration> 



-- 注意: 兩個namenode,funshion-hadoop194直接用上面的配置, 

--       funshion-hadoop195的話,只需修改一個地方:修改yarn.resourcemanager.ha.id 屬性值為 rm2  






-- Step 3.7 vi $HADOOP_HOME/etc/hadoop/fairscheduler.xml
 

<?xml versinotallow="1.0"?> 

<allocations> 

        <queue name="news"> 

                <minResources>1024 mb, 1 vcores </minResources> 

                <maxResources>1536 mb, 1 vcores </maxResources> 

                <maxRunningApps>5</maxRunningApps> 

                <minSharePreemptionTimeout>300</minSharePreemptionTimeout> 

                <weight>1.0</weight> 

                <aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps> 

        </queue> 

        <queue name="crawler"> 

                <minResources>1024 mb, 1 vcores</minResources> 

                <maxResources>1536 mb, 1 vcores</maxResources> 

        </queue> 

        <queue name="map"> 

                <minResources>1024 mb, 1 vcores</minResources> 

                <maxResources>1536 mb, 1 vcores</maxResources> 

        </queue> 

</allocations> 


-- ################################################################################## -- 


--  

scp -r /usr/local/hadoop/etc/hadoop/* hadoop@funshion-hadoop195:/usr/local/hadoop/etc/hadoop/ 

scp -r /usr/local/hadoop/etc/hadoop/* hadoop@funshion-hadoop196:/usr/local/hadoop/etc/hadoop/ 

scp -r /usr/local/hadoop/etc/hadoop/* hadoop@funshion-hadoop197:/usr/local/hadoop/etc/hadoop/ 

scp -r /usr/local/hadoop/etc/hadoop/* hadoop@funshion-hadoop198:/usr/local/hadoop/etc/hadoop/ 








-- Step 4. 創建相關目錄
 

mkdir ~/logs 

mkdir ~/mydata 


-- 備註:mydate目錄下的相關子目錄會自動生成,不需要創建。 


-- 在每台集羣機器上創建如上兩個目錄,並同步 $HADOOP_HOME/etc/hadoop目錄下的所有文件到各節點 


-- ################################################################################## -- 






-- Step 5. 啓動Zookeeper、JournalNode、格式化Hadoop集羣並啓動
 -- Step 5.1 啓動Zooker  (ZK集羣是funshion-hadoop194、funshion-hadoop195、funshion-hadoop196、funshion-hadoop197、funshion-hadoop198 五台服務器) 

[hadoop@funshion-hadoop194 bin]$ /usr/local/zookeeper/bin/zkServer.sh start 

[hadoop@funshion-hadoop195 bin]$ /usr/local/zookeeper/bin/zkServer.sh start 

[hadoop@funshion-hadoop196 bin]$ /usr/local/zookeeper/bin/zkServer.sh start 

[hadoop@funshion-hadoop197 bin]$ /usr/local/zookeeper/bin/zkServer.sh start 

[hadoop@funshion-hadoop198 bin]$ /usr/local/zookeeper/bin/zkServer.sh start 


-- 可以如下查看Zookeeper集羣各節點的狀態: 

/usr/local/zookeeper/bin/zkServer.sh status 


-- 然後在某一個namenode節點執行如下命令,創建命名空間 

[hadoop@funshion-hadoop194 bin]$ cd $HADOOP_HOME 

[hadoop@funshion-hadoop194 hadoop]$ ./bin/hdfs zkfc -formatZK 


-- 備註:停止zookeeper相關命令類似如下: 

/usr/local/zookeeper/bin/zkServer.sh stop 

/usr/local/zookeeper/bin/zkServer.sh restart 






-- Step 5.2 啓動JournalNode進程(在funshion-hadoop194、funshion-hadoop195、funshion-hadoop196、funshion-hadoop197、funshion-hadoop198五台服務器上分別執行):
 [hadoop@funshion-hadoop194 bin]$ $HADOOP_HOME/sbin/hadoop-daemon.sh start journalnode 

[hadoop@funshion-hadoop195 bin]$ $HADOOP_HOME/sbin/hadoop-daemon.sh start journalnode 

[hadoop@funshion-hadoop196 bin]$ $HADOOP_HOME/sbin/hadoop-daemon.sh start journalnode 

[hadoop@funshion-hadoop197 bin]$ $HADOOP_HOME/sbin/hadoop-daemon.sh start journalnode 

[hadoop@funshion-hadoop198 bin]$ $HADOOP_HOME/sbin/hadoop-daemon.sh start journalnode 


-- Step 5.3 格式化Hadoop集羣並啓動:
 -- 在 funshion-hadoop194 上執行: 

[hadoop@funshion-hadoop194 bin]$ $HADOOP_HOME/bin/hdfs namenode -format mycluster 

[hadoop@funshion-hadoop194 bin]$ $HADOOP_HOME/sbin/hadoop-daemon.sh start namenode 


-- 上步執行完後,在 funshion-hadoop195 上執行: 

[hadoop@funshion-hadoop195 bin]$ $HADOOP_HOME/bin/hdfs namenode -bootstrapStandby 

[hadoop@funshion-hadoop195 bin]$ $HADOOP_HOME/sbin/hadoop-daemon.sh start namenode 


-- 上步執行完後,可以繼續在某個 namenode 執行 $HADOOP_HOME/sbin/start-all.sh 啓動datanode及yarn相關進程。 


-- 因為是配置的自動故障轉移,所以不能手工切換namenode的active和stadby角色。 


-- 可以通過haadmin查看每個Service的角色狀態: 


[hadoop@funshion-hadoop194 lab]$ $HADOOP_HOME/bin/hdfs haadmin -getServiceState nn1 

standby 

[hadoop@funshion-hadoop194 lab]$ $HADOOP_HOME/bin/hdfs haadmin -getServiceState nn2 

active 

[hadoop@funshion-hadoop194 lab]$ 


-- 通過hdfs-site.xml中的如下配置,我們知道nn1是在 funshion-hadoop194上的namenode服務,nn2是funshion-hadoop195上的namenode服務 


        <property> 

                <name>dfs.namenode.rpc-address.mycluster.nn1</name> 

                <value>funshion-hadoop194:8020</value> 

        </property> 

        <property> 

                <name>dfs.namenode.rpc-address.mycluster.nn2</name> 

                <value>funshion-hadoop195:8020</value> 

        </property> 



-- 所以,我們可以嘗試 kill 掉 nn2(狀態為active的namenode進程,然後去查看nn1的角色是否改變: 

[hadoop@funshion-hadoop195 bin]$ jps 

3199 JournalNode 

3001 NameNode 

1161 QuorumPeerMain 

3364 DFSZKFailoverController 

4367 Jps 

[hadoop@funshion-hadoop195 bin]$ kill -9 3001 

[hadoop@funshion-hadoop195 bin]$ jps 

3199 JournalNode 

1161 QuorumPeerMain 

3364 DFSZKFailoverController 

4381 Jps 

[hadoop@funshion-hadoop195 bin]$ $HADOOP_HOME/bin/hdfs haadmin -getServiceState nn1 

active 

[hadoop@funshion-hadoop195 bin]$ $HADOOP_HOME/sbin/hadoop-daemon.sh start namenode 

starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-funshion-hadoop195.out 

[hadoop@funshion-hadoop195 bin]$ $HADOOP_HOME/bin/hdfs haadmin -getServiceState nn1 

active 

[hadoop@funshion-hadoop195 bin]$ $HADOOP_HOME/bin/hdfs haadmin -getServiceState nn2 

standby 


-- 甚至可以直接reboot狀態為active的namenode節點(執行操作系統的重啓動作,看另一個standby狀態的namenode節點是否能正常轉換成acitve狀態 

-- 甚至可以在有作業運行的時候去執行reboot操作系統(namenode的active節點執行)以測試雙節點故障轉移是否確實健壯 


-- 集羣相關網頁: 

-- http://funshion-hadoop194:50070/dfshealth.html#tab-overview 


-- ################################################################################## -- 





-- Step 6. 上傳測試數據:
 





-- Step 6.1 安裝wget包、創建相關目錄及shell上傳數據腳本:
 [root@funshion-hadoop194 ~]# yum -y install wget 

[hadoop@funshion-hadoop194 ~]$  

[hadoop@funshion-hadoop194 ~]$ mkdir -p /home/hadoop/datateam/ghh/lab 

[hadoop@funshion-hadoop194 ~]$ mkdir -p /home/hadoop/log_catch/down 

[hadoop@funshion-hadoop194 ~]$ mkdir -p /home/hadoop/log_catch/put 

[hadoop@funshion-hadoop194 ~]$ mkdir -p /home/hadoop/log_catch/zip 

[hadoop@funshion-hadoop194 ~]$ vi /home/hadoop/datateam/ghh/lab/log_catch_hour_lzo.sh 


#!/bin/bash 


function f_show_info() 

{ 

        printf "%20s = %s\n" "$1" "$2" 

        return 0 

} 


function f_catch_all_day_log() 

{ 

        local str_date="" 

        local year="" 

        local mnotallow="" 

        local day="" 


        for(( str_date=${g_start_date};${str_date}<=${g_end_date}; str_date=$(date -d "+1 day ${str_date}" +%Y%m%d ) )) 

        do 

                year=$(date -d "${str_date}" +%Y ) 

                mnotallow=$(date -d "${str_date}" +%m ) 

                day=$(date -d "${str_date}" +%d ) 

                f_catch_all_log ${year} ${month} ${day} 

        done 

} 


function f_catch_all_log() 

{ 

        local year="$1" 

        local mnotallow="$2" 

        local day="$3" 

        local hour="" 

        local date_hour="" 

        local date_dir="" 

        local hdfs_dir="" 

        local g_hdfs_dir="" 

        local hdfs_file="" 

        local url="" 

        local i=0 

        local nRet=0 


        for(( i=${g_start_hour};i<=${g_end_hour};i++ )); 

        do 

                hour=$(printf "%02d" "$i") 

                date_hour="${year}${month}${day}${hour}" 

                date_dir="${year}/${month}/${day}" 

                hdfs_dir="${year}/${month}/${day}/${hour}" 

                g_hdfs_dir="${g_hdfs_path}/${hdfs_dir}" 

                hdfs_file="${g_hdfs_path}/${hdfs_dir}/BeiJing_YiZhuang_CTC_${date_hour}.lzo" 


                url="${g_url}/${date_dir}/BeiJing_YiZhuang_CTC_${date_hour}.gz" 

                f_show_info "url" "${url}" 

                f_show_info "hdfs" "${hdfs_file}" 

                f_catch_log "${url}" "${hdfs_file}" "${g_hdfs_dir}" 


                hdfs_file="${g_hdfs_path}/${hdfs_dir}/BeiJing_ShangDi_CNC_${date_hour}.lzo" 

                url="${g_url}/${date_dir}/BeiJing_ShangDi_CNC_${date_hour}.gz" 

                f_show_info "url" "${url}" 

                f_show_info "hdfs" "${hdfs_file}" 

                f_catch_log "${url}" "${hdfs_file}" "${g_hdfs_dir}" 

        done 


        return $nRet 

} 


function f_catch_log() 

{ 

        local tmp_name=$( uuidgen | sed 's/-/_/g' ) 

        local local_down_file="${g_local_down_path}/${tmp_name}" 

        local local_zip_file="${g_local_zip_path}/${tmp_name}" 

        local local_put_file="${g_local_put_path}/${tmp_name}" 

        local log_url="$1" 

        local hdfs_file="$2" 

        local nRet=0 


        if [[ 0 == $nRet ]];then 

                wget -O "${local_down_file}" "${log_url}" 

                nRet=$? 

        fi 


        if [[ 0 == $nRet ]];then 

                gzip -cd "${local_down_file}" | lzop -o "${local_zip_file}" 

                nRet=$? 

        fi 


#       if [[ 0 == $nRet ]];then 

#               gzip -cd "${local_down_file}" > "${local_zip_file}" 

#               nRet=$? 

#       fi 


        if [[ 0 == $nRet ]];then 

                mv "${local_zip_file}" "${local_put_file}" 

                hdfs dfs -mkdir -p "${g_hdfs_dir}" 

                hdfs dfs -put "${local_put_file}" "${hdfs_file}" 

                nRet=$? 

        fi 


        if [[ 0 == $nRet ]];then 

                hadoop jar /usr/local/hadoop/lib/native/hadoop-lzo-0.4.20-SNAPSHOT.jar com.hadoop.compression.lzo.LzoIndexer "${hdfs_file}" 

                nRet=$? 

        fi 


        rm -rf "${local_down_file}" "${local_put_file}" "${local_zip_file}" 


        return $nRet 

} 



-- 





# shell begins here 

# 


g_local_down_path="/home/hadoop/log_catch/down" 

g_local_zip_path="/home/hadoop/log_catch/zip" 

g_local_put_path="/home/hadoop/log_catch/put" 


g_start_date="" 

g_end_date="" 

g_start_hour=0 

g_end_hour=0 

g_hdfs_path="" 

g_url="" 


nRet=0 


if [[ 0 == $nRet ]];then 

        if [[ $# -ne 6 ]];then 

                f_show_info "cmd format" "sh ./log_catch.sh 'url' 'hdfs_path' 'start_date' 'end_date' 'start_hour' 'end_hour'" 

                nRet=1 

        else 

                g_url="$1" 

                g_hdfs_path="$2" 

                g_start_date="$3" 

                g_end_date="$4" 

                g_start_hour="$5" 

                g_end_hour="$6" 

        fi 

fi 


if [[ 0 == $nRet ]];then 

        f_catch_all_day_log 

        nRet=$? 

fi 


exit $nRet 






-- Step 6.2 調用腳本上傳數據:
 [hadoop@funshion-hadoop194 ~]$ nohup sh /home/hadoop/datateam/ghh/lab/log_catch_hour_lzo.sh 'http://192.168.116.61:8081/website/pv/2' 'hdfs://mycluster/dw/logs/web/origin/pv/2' 20140524 20140525 0 23 & 


-- nohup sh /home/hadoop/datateam/ghh/lab/log_catch_hour_lzo.sh 'http://192.168.116.61:8081/website/pv/2' 'hdfs://mycluster/dw/logs/web/origin/pv/2' 20140525 20140525 3 23 & 


-- 上面這些腳本都是取公司的Oxeye的日誌數據。(大家可以忽略此步操作) 


-- ################################################################################## -- 





-- Step 7. Hive安裝(安裝到196機器) (使用Hive與HBase整合安裝;使用源碼編譯安裝)
 -- (其實應該先安裝hbase,再安裝hive可能順序合理一點) 


-- 參考: https://cwiki.apache.org/conflue ... iorto0.13onHadoop23 

          http://www.hadoopor.com/thread-5470-1-1.html 

          
 

          
 

          http://www.micmiu.com/bigdata/hive/hive-hbase-integration/ 


-- 源碼下載編譯操作如下: 

mkdir -p /opt/software/hive_src 

cd /opt/software/hive_src/ 

svn checkout  http://svn.apache.org/repos/asf/hive/trunk/  hive_trunk 

cd /opt/software/hive_src/hive_trunk 



-- 下載以後,我們檢查 hive_trunk目錄下的pom.xml文件,發現hadoop-23.version這個變量已經引用了hadoop 2.4.0版本,所以,我們可以什麼也不用修改,直接用ant去編譯: 

<hadoop-23.version>2.4.0</hadoop-23.version> 


-- 或者如果發現版本不正確的話,我們可以這樣指定參數執行(也可以修改pom.xml文件中對應正確的hadoop、hbase、zookeeper版本): 

-- 最後我選用的版本相關參數如下: 

    <hadoop-23.version>2.4.0</hadoop-23.version> 

    <hbase.hadoop1.version>0.98.3-hadoop1</hbase.hadoop1.version> 

    <hbase.hadoop2.version>0.98.3-hadoop2</hbase.hadoop2.version> 

    <zookeeper.version>3.4.6</zookeeper.version> 


-- 最後,開始編譯: 

cd /opt/software/hive_src/hive_trunk 

mvn clean package -DskipTests -Phadoop-2,dist 


[INFO] Hive .............................................. SUCCESS [  6.481 s] 

[INFO] Hive Ant Utilities ................................ SUCCESS [  4.427 s] 

[INFO] Hive Shims Common ................................. SUCCESS [  2.418 s] 

[INFO] Hive Shims 0.20 ................................... SUCCESS [  1.284 s] 

[INFO] Hive Shims Secure Common .......................... SUCCESS [  2.466 s] 

[INFO] Hive Shims 0.20S .................................. SUCCESS [  0.961 s] 

[INFO] Hive Shims 0.23 ................................... SUCCESS [  3.247 s] 

[INFO] Hive Shims ........................................ SUCCESS [  0.364 s] 

[INFO] Hive Common ....................................... SUCCESS [  5.259 s] 

[INFO] Hive Serde ........................................ SUCCESS [  7.428 s] 

[INFO] Hive Metastore .................................... SUCCESS [ 27.000 s] 

[INFO] Hive Query Language ............................... SUCCESS [ 51.924 s] 

[INFO] Hive Service ...................................... SUCCESS [  6.037 s] 

[INFO] Hive JDBC ......................................... SUCCESS [ 14.293 s] 

[INFO] Hive Beeline ...................................... SUCCESS [  1.406 s] 

[INFO] Hive CLI .......................................... SUCCESS [ 10.297 s] 

[INFO] Hive Contrib ...................................... SUCCESS [  1.418 s] 

[INFO] Hive HBase Handler ................................ SUCCESS [ 33.679 s] 

[INFO] Hive HCatalog ..................................... SUCCESS [  0.443 s] 

[INFO] Hive HCatalog Core ................................ SUCCESS [  8.040 s] 

[INFO] Hive HCatalog Pig Adapter ......................... SUCCESS [  1.795 s] 

[INFO] Hive HCatalog Server Extensions ................... SUCCESS [  2.007 s] 

[INFO] Hive HCatalog Webhcat Java Client ................. SUCCESS [  1.548 s] 

[INFO] Hive HCatalog Webhcat ............................. SUCCESS [ 11.718 s] 

[INFO] Hive HCatalog Streaming ........................... SUCCESS [  1.845 s] 

[INFO] Hive HWI .......................................... SUCCESS [  1.246 s] 

[INFO] Hive ODBC ......................................... SUCCESS [  0.626 s] 

[INFO] Hive Shims Aggregator ............................. SUCCESS [  0.192 s] 

[INFO] Hive TestUtils .................................... SUCCESS [  0.324 s] 

[INFO] Hive Packaging .................................... SUCCESS [01:21 min] 

[INFO] ------------------------------------------------------------------------ 

[INFO] BUILD SUCCESS 

[INFO] ------------------------------------------------------------------------ 

[INFO] Total time: 04:53 min 

[INFO] Finished at: 2014-06-22T11:58:05+08:00 

[INFO] Final Memory: 147M/1064M 

[INFO] ------------------------------------------------------------------------ 


-- 最後,/opt/software/hive_src/hive_trunk/packaging/target 目錄下的 apache-hive-0.14.0-SNAPSHOT-bin.tar.gz 文件,就是我們需要的安裝包(這個版本還沒有正式發佈) 






-- Step 7.1 My SQL安裝(安裝到194機器),並在My SQL中創建名為hive的數據庫用以存放hive元數據:
 

-- 安裝如下rpm包 

rpm -ivh MySQL-client-5.6.17-1.linux_glibc2.5.x86_64.rpm 

rpm -ivh MySQL-devel-5.6.17-1.linux_glibc2.5.x86_64.rpm 

rpm -ivh MySQL-embedded-5.6.17-1.linux_glibc2.5.x86_64.rpm 

rpm -e --nodeps mysql-libs-5.1.66-2.el6_3.x86_64 

rpm -ivh MySQL-server-5.6.17-1.linux_glibc2.5.x86_64.rpm 

rpm -ivh MySQL-shared-5.6.17-1.linux_glibc2.5.x86_64.rpm 

rpm -ivh MySQL-shared-compat-5.6.17-1.linux_glibc2.5.x86_64.rpm 

rpm -ivh MySQL-test-5.6.17-1.linux_glibc2.5.x86_64.rpm 



A RANDOM PASSWORD HAS BEEN SET FOR THE MySQL root USER ! 

You will find that password in '/root/.mysql_secret'. 


You must change that password on your first connect, 

no other statement but 'SET PASSWORD' will be accepted. 

See the manual for the semantics of the 'password expired' flag. 


Also, the account for the anonymous user has been removed. 


In addition, you can run: 


  /usr/bin/mysql_secure_installation 


which will also give you the option of removing the test database. 

This is strongly recommended for production servers. 


See the manual for more instructions. 


Please report any problems at  http://bugs.mysql.com/ 


The latest information about MySQL is available on the web at 


   http://www.mysql.com 


Support MySQL by buying support/licenses at  http://shop.mysql.com 


New default config file was created as /usr/my.cnf and 

will be used by default by the server when you start it. 

You may edit this file to change server settings 


-- 查看安裝生成的root用户隨機密碼: 

[root@funshion-hadoop194 ~]# more /root/.mysql_secret 

# The random password set for the root user at Mon Jun  9 18:18:48 2014 (local time): QVkyOjwSlAEiPaeT 


-- 登錄My SQL數據庫並修改root密碼,並創建名為hive的數據庫與用户: 

[root@funshion-hadoop194 ~]# service mysql start 

Starting MySQL... SUCCESS!  


-- 設置mysql服務自啓動 

chkconfig mysql on 


[root@funshion-hadoop194 ~]# mysql -uroot -pQVkyOjwSlAEiPaeT 

Warning: Using a password on the command line interface can be insecure. 

Welcome to the MySQL monitor.  Commands end with ; or \g. 

Your MySQL connection id is 1 

Server version: 5.6.17 


Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. 


Oracle is a registered trademark of Oracle Corporation and/or its 

affiliates. Other names may be trademarks of their respective 

owners. 


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. 


mysql> SET PASSWORD = PASSWORD('bee56915'); 

Query OK, 0 rows affected (0.00 sec) 


mysql> flush privileges; 

Query OK, 0 rows affected (0.00 sec) 


mysql> CREATE DATABASE `hive` /*!40100 DEFAULT CHARACTER SET utf8 */; 

Query OK, 1 row affected (0.00 sec) 


mysql> CREATE USER 'hive'@'funshion-hadoop196' IDENTIFIED BY password('bee56915'); 

Query OK, 0 rows affected (0.00 sec) 



GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%' Identified by 'bee56915';  

GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'localhost' Identified by 'bee56915';  

GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'127.0.0.1' Identified by 'bee56915';   

GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'funshion-hadoop196' Identified by 'bee56915';  






-- Step 7.2 解決hive安裝包到/usr/local下,添加hive相關環境變量:
 [root@funshion-hadoop194 ~]# cd /opt/software 

[root@funshion-hadoop194 software]# ls -l|grep hive 

-rw-r--r--.  1 root root  65662469 May 15 14:04 hive-0.12.0-bin.tar.gz 

[root@funshion-hadoop194 software]# tar -xvf ./hive-0.12.0-bin.tar.gz 

[root@funshion-hadoop194 software]#  mv hive-0.12.0-bin /usr/local 

[root@funshion-hadoop194 software]# cd /usr/local 

[root@funshion-hadoop194 local]# chown -R hadoop.hadoop ./hive-0.12.0-bin 

[root@funshion-hadoop194 local]# ln -s hive-0.12.0-bin hive 


[hadoop@funshion-hadoop194 local]$ vi ~/.bash_profile 

export HIVE_HOME=/usr/local/hive 

export PATH=$PATH:$HIVE_HOME/bin 


[hadoop@funshion-hadoop194 local]$ source ~/.bash_profile 






-- Step 7.3 在My SQL數據庫的hive數據庫中執行創建hive元數據腳本:
 

[hadoop@funshion-hadoop194 mysql]$ mysql -uroot -pbee56915 

Warning: Using a password on the command line interface can be insecure. 

Welcome to the MySQL monitor.  Commands end with ; or \g. 

Your MySQL connection id is 3 

Server version: 5.6.17 MySQL Community Server (GPL) 


Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. 


Oracle is a registered trademark of Oracle Corporation and/or its 

affiliates. Other names may be trademarks of their respective 

owners. 


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. 


mysql> use hive; 

Database changed 

mysql> source /usr/local/hive/scripts/metastore/upgrade/mysql/hive-schema-0.14.0.mysql.sql 


... 


mysql> show tables; 

+---------------------------+ 

| Tables_in_hive            | 

+---------------------------+ 

| BUCKETING_COLS            | 

| CDS                       | 

| COLUMNS_V2                | 

| DATABASE_PARAMS           | 

| DBS                       | 

| DB_PRIVS                  | 

| DELEGATION_TOKENS         | 

| GLOBAL_PRIVS              | 

| IDXS                      | 

| INDEX_PARAMS              | 

| MASTER_KEYS               | 

| NUCLEUS_TABLES            | 

| PARTITIONS                | 

| PARTITION_EVENTS          | 

| PARTITION_KEYS            | 

| PARTITION_KEY_VALS        | 

| PARTITION_PARAMS          | 

| PART_COL_PRIVS            | 

| PART_COL_STATS            | 

| PART_PRIVS                | 

| ROLES                     | 

| ROLE_MAP                  | 

| SDS                       | 

| SD_PARAMS                 | 

| SEQUENCE_TABLE            | 

| SERDES                    | 

| SERDE_PARAMS              | 

| SKEWED_COL_NAMES          | 

| SKEWED_COL_VALUE_LOC_MAP  | 

| SKEWED_STRING_LIST        | 

| SKEWED_STRING_LIST_VALUES | 

| SKEWED_VALUES             | 

| SORT_COLS                 | 

| TABLE_PARAMS              | 

| TAB_COL_STATS             | 

| TBLS                      | 

| TBL_COL_PRIVS             | 

| TBL_PRIVS                 | 

| TYPES                     | 

| TYPE_FIELDS               | 

| VERSION                   | 

+---------------------------+ 

41 rows in set (0.00 sec) 


mysql> grant all privileges on hive.* to 'hive'@'funshion-hadoop196'; 

Query OK, 0 rows affected (0.00 sec) 


mysql> flush privileges; 

Query OK, 0 rows affected (0.00 sec) 


mysql> exit 

Bye 

[hadoop@funshion-hadoop194 mysql]$  






-- Step 7.4 修改hive相關配置文件:
 

[hadoop@funshion-hadoop194 mysql]$ cd $HIVE_HOME/conf 

[hadoop@funshion-hadoop194 conf]$ ls -l 

total 92 

-rw-rw-r--. 1 hadoop hadoop 81186 Oct 10  2013 hive-default.xml.template 

-rw-rw-r--. 1 hadoop hadoop  2378 Oct 10  2013 hive-env.sh.template 

-rw-rw-r--. 1 hadoop hadoop  2465 Oct 10  2013 hive-exec-log4j.properties.template 

-rw-rw-r--. 1 hadoop hadoop  2870 Oct 10  2013 hive-log4j.properties.template 


[hadoop@funshion-hadoop194 conf]$ mv hive-env.sh.template hive-env.sh 

[hadoop@funshion-hadoop194 conf]$ mv hive-default.xml.template hive-site.xml 


---------------------------------------- 

-- 7.4.1 修改 $HIVE_HOME/bin/hive-config.sh 文件,添加如下環境變量:
 [hadoop@funshion-hadoop194 conf]$ vi $HIVE_HOME/bin/hive-config.sh 


export JAVA_HOME=/usr/java/latest 

export HIVE_HOME=/usr/local/hive 

export HADOOP_HOME=/usr/local/hadoop 






-- 7.4.2 修改 $HIVE_HOME/conf/hive-site.xml 的第2002行:
 4.4.報錯—請修改hive-site.xml:(vi編輯下: /auth) 


-- 原值: 

<value>auth</auth> 


-- 修改為: 

<value>auth</value>) 





-- 7.4.3 修改 $HIVE_HOME/conf/hive-site.xml 的如下property:
 

-- 7.4.3.1 

-- 原值: 

<property> 

  <name>javax.jdo.option.ConnectionURL</name> 

  <value>jdbc:derby:;databaseName=metastore_db;create=true</value> 

  <description>JDBC connect string for a JDBC metastore</description> 

</property> 


-- 修改為: 

<property> 

  <name>javax.jdo.option.ConnectionURL</name> 

  <value>jdbc:mysql://funshion-hadoop194:3306/hive?createDatabaseIfNotExist=true</value> 

  <description>JDBC connect string for a JDBC metastore</description> 

</property> 



-- 7.4.3.2 

-- 原值: 

<property> 

  <name>javax.jdo.option.ConnectionDriverName</name> 

  <value>org.apache.derby.jdbc.EmbeddedDriver</value> 

  <description>Driver class name for a JDBC metastore</description> 

</property> 


-- 修改為: 

<property> 

  <name>javax.jdo.option.ConnectionDriverName</name> 

  <value>com.mysql.jdbc.Driver</value> 

  <description>Driver class name for a JDBC metastore</description> 

</property> 


-- 7.4.3.3 

-- 原值: 

<property> 

  <name>javax.jdo.option.ConnectionUserName</name> 

  <value>APP</value> 

  <description>username to use against metastore database</description> 

</property> 


-- 修改為: 

<property> 

  <name>javax.jdo.option.ConnectionUserName</name> 

  <value>hive</value> 

  <description>username to use against metastore database</description> 

</property> 



-- 7.4.3.4 

-- 原值: 

<property> 

  <name>javax.jdo.option.ConnectionPassword</name> 

  <value>mine</value> 

  <description>password to use against metastore database</description> 

</property> 


-- 修改為 

<property> 

  <name>javax.jdo.option.ConnectionPassword</name> 

  <value>bee56915</value> 

  <description>password to use against metastore database</description> 

</property> 


-- 7.4.3.5 

-- 原值: 

<property> 

  <name>hive.metastore.warehouse.dir</name> 

  <value>/user/hive/warehouse</value> 

  <description>location of default database for the warehouse</description> 

</property> 


-- 修改為: 

<property> 

  <name>hive.metastore.warehouse.dir</name> 

  <value>hdfs://mycluster:8020/user/hive/warehouse</value> 

  <description>location of default database for the warehouse</description> 

</property> 


-- 7.4.3.6 

-- 原值: 

<property> 

  <name>hive.exec.scratchdir</name> 

  <value>/tmp/hive-${user.name}</value> 

  <description>Scratch space for Hive jobs</description> 

</property> 


-- 修改為: 

<property> 

  <name>hive.exec.scratchdir</name> 

  <value>hdfs://mycluster:8020/tmp/hive-${user.name}</value> 

  <description>Scratch space for Hive jobs</description> 

</property> 



-- 添加: 

<property> 

  <name>hbase.zookeeper.quorum</name> 

  <value>funshion-hadoop194,funshion-hadoop195,funshion-hadoop196,funshion-hadoop197,funshion-hadoop198</value> 

</property> 


<property> 

<name>hive.aux.jars.path</name> 

  <value> 

file:///usr/local/hive/lib/hive-ant-0.14.0-SNAPSHOT.jar, 

file:///usr/local/hive/lib/protobuf-java-2.5.0.jar, 

file:///usr/local/hbase/lib/hbase-server-0.98.3-hadoop2.jar, 

file:///usr/local/hbase/lib/hbase-client-0.98.3-hadoop2.jar, 

file:///usr/local/hbase/lib/hbase-common-0.98.3-hadoop2.jar, 

file:///usr/local/hbase/lib/hbase-common-0.98.3-hadoop2-tests.jar, 

file:///usr/local/hbase/lib/hbase-protocol-0.98.3-hadoop2.jar, 

file:///usr/local/hbase/lib/htrace-core-2.04.jar, 

file:///usr/local/hive/lib/zookeeper-3.4.6.jar, 

file:///usr/local/hive/lib/guava-11.0.2.jar</value> 

</property> 


-- 上面格式是方便查看,真正使用下面的格式:將所有的jar包放到一行: 

<property> 

<name>hive.aux.jars.path</name> 

  <value>file:///usr/local/hive/lib/hive-ant-0.14.0-SNAPSHOT.jar,file:///usr/local/hbase/lib/hbase-server-0.98.3-hadoop2.jar,file:///usr/local/hbase/lib/hbase-client-0.98.3-hadoop2.jar,file:///usr/local/hbase/lib/hbase-common-0.98.3-hadoop2.jar,file:///usr/local/hbase/lib/hbase-common-0.98.3-hadoop2-tests.jar,file:///usr/local/hbase/lib/hbase-protocol-0.98.3-hadoop2.jar,file:///usr/local/hbase/lib/htrace-core-2.04.jar,file:///usr/local/hive/lib/zookeeper-3.4.6.jar</value> 

</property> 


-- 首先需要把hive/lib下的hbase包替換成安裝的hbase的,需要如下幾下: 

hbase-client-0.98.2-hadoop2.jar 

hbase-common-0.98.2-hadoop2.jar 

hbase-common-0.98.2-hadoop2-tests.jar 

hbase-protocol-0.98.2-hadoop2.jar 

htrace-core-2.04.jar 

hbase-server-0.98.2-hadoop2.jar 


將hadoop節點添加到hive-site.xml中 

<property> 

<name>hbase.zookeeper.quorum</name> 


<vale>所有節點</value> 


</property> 



-- 另外,你必須在創建Hive庫表前,在HDFS上創建/tmp和/user/hive/warehouse(也稱為hive.metastore.warehouse.dir所指定的目錄),並且將它們的權限設置為chmod g+w。完成這個操作的命令如下: 

$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp 

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse 

$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp 

$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse 





-- Step 7.5 啓動並登錄hive,並創建hive表
 

14/06/16 18:58:50 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead 


-- 集羣啓動: 

# bin/hive --service hiveserver -hiveconf hbase.zookeeper.quorum=funshion-hadoop194,funshion-hadoop195,funshion-hadoop196,funshion-hadoop197,funshion-hadoop198 & 

# bin/hive -hiveconf hbase.zookeeper.quorum=funshion-hadoop194,funshion-hadoop195,funshion-hadoop196,funshion-hadoop197,funshion-hadoop198 & 

# bin/hive -hiveconf hive.root.logger=DEBUG,console hbase.master=funshion-hadoop194:60010 


# bin/hive -hiveconf hbase.master=funshion-hadoop194:60010 --auxpath /usr/local/hive/lib/hive-ant-0.13.1.jar,/usr/local/hive/lib/protobuf-java-2.5.0.jar,/usr/local/hive/lib/hbase-client-0.98.3-hadoop2.jar, \ 

/usr/local/hive/lib/hbase-common-0.98.3-hadoop2.jar,/usr/local/hive/lib/zookeeper-3.4.6.jar,/usr/local/hive/lib/guava-11.0.2.jar 


#bin/hive -hiveconf hbase.zookeeper.quorum=node1,node2,node3 


-- 客户端登錄: 

$HIVE_HOME/bin/hive -h127.0.0.1 -p10000 

$HIVE_HOME/bin/hive -hfunshion-hadoop194 -p10000 

$HIVE_HOME/bin/hive -p10000 


[hadoop@funshion-hadoop194 lib]$ hive --service hiveserver &  


[hadoop@funshion-hadoop194 lib]$ hive 

14/06/10 16:56:59 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 

14/06/10 16:56:59 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 

14/06/10 16:56:59 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 

14/06/10 16:56:59 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 

14/06/10 16:56:59 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 

14/06/10 16:56:59 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 

14/06/10 16:56:59 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 


Logging initialized using configuration in jar:file:/usr/local/hive-0.12.0-bin/lib/hive-common-0.12.0.jar!/hive-log4j.properties 

SLF4J: Class path contains multiple SLF4J bindings. 

SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 

SLF4J: Found binding in [jar:file:/usr/local/hive-0.12.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] 

SLF4J: See  http://www.slf4j.org/codes.html#multiple_bindings  for an explanation. 

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 

hive> show databases; 

OK 

Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! 


-- 如果報類似如上錯誤,在 ~/.bash_profile 添加環境變量,如下: 


export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 


export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/lib/native/hadoop-lzo-0.4.20-SNAPSHOT.jar 


-- hive客户端登錄: 

[hadoop@funshion-hadoop194 bin]$  $HIVE_HOME/bin/hive -h127.0.0.1 -p10000 

14/06/10 17:13:17 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 

14/06/10 17:13:17 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 

14/06/10 17:13:17 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 

14/06/10 17:13:17 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 

14/06/10 17:13:17 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 

14/06/10 17:13:17 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 

14/06/10 17:13:17 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 


Logging initialized using configuration in jar:file:/usr/local/hive-0.12.0-bin/lib/hive-common-0.12.0.jar!/hive-log4j.properties 

SLF4J: Class path contains multiple SLF4J bindings. 

SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 

SLF4J: Found binding in [jar:file:/usr/local/hive-0.12.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] 

SLF4J: See  http://www.slf4j.org/codes.html#multiple_bindings  for an explanation. 

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 

[127.0.0.1:10000] hive> create database web; 

OK 

[127.0.0.1:10000] hive> CREATE EXTERNAL TABLE pv2( 

                      >   protocol string,  

                      >   rprotocol string,  

                      >   time int,  

                      >   ip string,  

                      >   fck string,  

                      >   mac string,  

                      >   userid string,  

                      >   fpc string,  

                      >   version string,  

                      >   sid string,  

                      >   pvid string,  

                      >   config string,  

                      >   url string,  

                      >   referurl string,  

                      >   channelid string,  

                      >   vtime string,  

                      >   ext string,  

                      >   useragent string,  

                      >   step string,  

                      >   sestep string,  

                      >   seidcount string,  

                      >   ta string) 

                      > PARTITIONED BY (  

                      >   year string,  

                      >   month string,  

                      >   day string, 

                      >   hour string) 

                      > ROW FORMAT DELIMITED  

                      >   FIELDS TERMINATED BY '\t'  

                      > STORED AS INPUTFORMAT  

                      >   'com.hadoop.mapred.DeprecatedLzoTextInputFormat'  

                      > OUTPUTFORMAT  

                      >   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 

                      > LOCATION 

                      >   'hdfs://mycluster/dw/logs/web/origin/pv/2'; 

OK 

[127.0.0.1:10000] hive> desc pv2; 

OK 

protocol                    string                      None                 

rprotocol                   string                      None                 

time                        int                         None                 

ip                          string                      None                 

fck                         string                      None                 

mac                         string                      None                 

userid                      string                      None                 

fpc                         string                      None                 

version                     string                      None                 

sid                         string                      None                 

pvid                        string                      None                 

config                      string                      None                 

url                         string                      None                 

referurl                    string                      None                 

channelid                   string                      None                 

vtime                       string                      None                 

ext                         string                      None                 

useragent                   string                      None                 

step                        string                      None                 

sestep                      string                      None                 

seidcount                   string                      None                 

ta                          string                      None                 

year                        string                      None                 

month                       string                      None                 

day                         string                      None                 

hour                        string                      None                 

                   

# Partition Information                   

# col_name                    data_type                   comment              

                   

year                        string                      None                 

month                       string                      None                 

day                         string                      None                 

hour                        string                      None                 

[127.0.0.1:10000] hive>   


-- 創建web數據庫後,將發現 hdfs://mycluster/user/hive/warehouse 路徑下多了一個web.db文件夾: 


[hadoop@funshion-hadoop194 bin]$ hdfs dfs -ls hdfs://mycluster/user/hive/warehouse 

Found 1 items 

drwxr-xr-x   - hadoop supergroup          0 2014-06-10 17:13 hdfs://mycluster/user/hive/warehouse/web.db 





-- Step 7.5 給hive表添加分區
 ALTER TABLE pv2 ADD PARTITION(year = '2014', month = '05', day = '24', hour='00') 

LOCATION 'hdfs://mycluster:8020/dw/logs/web/origin/pv/2/2014/05/24/00'; 


ALTER TABLE pv2 ADD PARTITION(year = '2014', month = '05', day = '24', hour='01') 

LOCATION 'hdfs://mycluster:8020/dw/logs/web/origin/pv/2/2014/05/24/01'; 


ALTER TABLE pv2 ADD PARTITION(year = '2014', month = '05', day = '24', hour='02') 

LOCATION 'hdfs://mycluster:8020/dw/logs/web/origin/pv/2/2014/05/24/02'; 


select * from pv2 limit 10; 

select count(*) from pv2 where year='2014' and mnotallow='05' and day='24' and hour='00'; 


-- ###################################################################################################### -- 


------------------------- 

-- hbase 源碼編譯安裝參考: 



-- 同步時鐘 

yum install ntpdate 


crontab -e 

30 5 * * * cd /usr/sbin;./ntpdate 192.168.111.17>/dev/null 


http://www.micmiu.com/bigdata/hbase/hbase-build-for-hadoop2/ 

http://blog.chinaunix.net/xmlrpc ... 4212789&uid=9162199 

http://hbase.apache.org/book/configuration.html#ftn.d3246e665 

http://www.myexception.cn/open-source/1472081.html 


-- mvn -f pom.xml.hadoop2 install -DskipTests assembly:single -Prelease 


select year(to_date(('2011-12-08 10:03:01')) from userinfo limit 1; 

select hour(to_date('2011-12-08_10:03:01')) from userinfo limit 1; 

--------------------------------------------------------------------------------- 

2.Hbase Compile 


hbase-0.98.3/pom.xml 文件修改hadoop版本 2.2.0 -> 2.4.0 


修改如下兩個地方: 

tar -xvf  

-- 1. 

<hadoop-two.version>2.4.0</hadoop-two.version> 


-- 2. 

<artifactId>hadoop-common</artifactId> 


<version>2.4.0</version> 



mvn clean package -DskipTests 

但是沒有看到有生成包,仔細找資料,獲得打包過程。 

hbase可以打包出hadoop1,也可以打包hadoop2,我們需要hadoop2,先生成pom.xml.hadoop2 文件,在打包: 


bash ./dev-support/generate-hadoopX-poms.sh 0.98.1 0.98.1-hadoop2 

MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop2 clean install -DskipTests -Prelease 

MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop2 install -DskipTests site assembly:single -Prelease 



10多分鐘,編譯目錄有100多M大小,得到打包文件 

hbase-0.98.1/hbase-assembly/target/hbase-0.98.1-hadoop2-bin.tar.gz 



-- 出現如下版本信息,表示設置成功: 


Downloading:  https://repository.apache.org/co ... /httpcore-4.2.4.pom 

Downloaded:  https://repository.apache.org/co ... /httpcore-4.2.4.pom  (6 KB at 0.7 KB/sec) 

Downloading:  https://repository.apache.org/co ... ents-core-4.2.4.pom 

Downloaded:  https://repository.apache.org/co ... ents-core-4.2.4.pom  (12 KB at 12.8 KB/sec) 

Downloading:  https://repository.apache.org/co ... ient-core-2.4.0.pom 

Downloaded:  https://repository.apache.org/co ... ient-core-2.4.0.pom  (4 KB at 1.9 KB/sec) 

Downloading:  https://repository.apache.org/co ... ce-client-2.4.0.pom 

Downloaded:  https://repository.apache.org/co ... ce-client-2.4.0.pom  (7 KB at 5.5 KB/sec) 

Downloading:  https://repository.apache.org/co ... rn-common-2.4.0.pom 

Downloaded:  https://repository.apache.org/co ... rn-common-2.4.0.pom  (9 KB at 7.8 KB/sec) 

Downloading:  https://repository.apache.org/co ... doop-yarn-2.4.0.pom 

Downloaded:  https://repository.apache.org/co ... doop-yarn-2.4.0.pom  (4 KB at 3.4 KB/sec) 

Downloading:  https://repository.apache.org/co ... -yarn-api-2.4.0.pom 

Downloaded:  https://repository.apache.org/co ... -yarn-api-2.4.0.pom  (5 KB at 5.1 KB/sec) 

Downloading:  https://repository.apache.org/co ... ient-core-2.4.0.jar 

Downloading:  https://repository.apache.org/co ... rn-common-2.4.0.jar 

Downloading:  https://repository.apache.org/co ... -yarn-api-2.4.0.jar 

Downloaded:  https://repository.apache.org/co ... ient-core-2.4.0.jar  (1459 KB at 370.7 KB/sec) 

Downloaded:  https://repository.apache.org/co ... -yarn-api-2.4.0.jar  (1601 KB at 367.9 KB/sec) 

Downloaded:  https://repository.apache.org/co ... rn-common-2.4.0.jar  (1375 KB at 307.4 KB/sec) 

[INFO]  



-- vi hbase-site.xml 


<configuration> 

        <property> 

                <name>hbase.master</name> 

                <value>funshion-hadoop196:60010</value> 

        </property> 

        <property> 

                <name>hbase.rootdir</name> 

                <value>hdfs://mycluster:8020/user/hbase</value> 

        </property> 

        <property> 

                <name>hbase.cluster.distributed</name> 

                <value>true</value> 

        </property> 

        <property> 

                <name>hbase.zookeeper.property.clientPort</name> 

                <value>2181</value> 

        </property> 

        <property> 

                <name>hbase.zookeeper.quorum</name> 

                <value>funshion-hadoop194,funshion-hadoop195,funshion-hadoop196,funshion-hadoop197,funshion-hadoop198</value> 

        </property> 

        <property> 

                <name>hbase.tmp.dir</name> 

                <value>/home/hadoop/tmp/hbase</value> 

        </property> 

</configuration> 



-- vi hbase-env.sh 


export JAVA_HOME=/usr/java/latest 


-- 進funshion-hadoop194(master) 

cd $HBASE_HOME/bin 

./start-hbase.sh 

-- 然後查看HMaster進程是否啓動 

[hadoop@funshion-hadoop196 bin]$ jps 

4574 HMaster 

1443 QuorumPeerMain 

3420 NameNode 

3896 ResourceManager 

5134 Jps 

3623 JournalNode 

4972 Main 

3806 DFSZKFailoverController 


-- 進其他regionserver節點,查看HRegionServer進程是否啓動: 

[hadoop@funshion-hadoop195 logs]$ jps 

1267 QuorumPeerMain 

1964 JournalNode 

2345 Jps 

1891 NameNode 

2089 DFSZKFailoverController 


-- 如上所示,沒有啓動HRegionServer進程,執行以下命令啓動HRegionServer進程(其他節點類似) 

./hbase-daemon.sh start regionserver 


[hadoop@funshion-hadoop195 bin]$ jps 

2703 Jps 

1267 QuorumPeerMain 

2398 HRegionServer 

1964 JournalNode 

1891 NameNode 

2089 DFSZKFailoverController 


-- 如果其他節點的HRegionServer進程無法啓動,建議將$HADOOP_HOME/etc/hadoop下的hdfs-site.xml和core-site.xml 放到hbase/conf下 


--------------------------------------------------------------------------------- 

-- 登錄並測試hbase是否能正常使用: 

[hadoop@funshion-hadoop196 logs]$ hbase shell 

2014-06-16 11:03:24,416 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

2014-06-16 11:03:24,494 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

2014-06-16 11:03:24,560 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

2014-06-16 11:03:24,636 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

2014-06-16 11:03:24,686 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

HBase Shell; enter 'help<RETURN>' for list of supported commands. 

Type "exit<RETURN>" to leave the HBase Shell 

Version 0.98.3-hadoop2, rUnknown, Thu Jun 12 16:40:37 CST 2014 


hbase(main):001:0> status 

4 servers, 0 dead, 0.5000 average load 


hbase(main):002:0> create 'testtable','colfam1' 

0 row(s) in 3.4710 seconds 


=> Hbase::Table - testtable 

hbase(main):003:0> list 'testtable' 

TABLE                                                                                                                                                                                    

testtable                                                                                                                                                                                

1 row(s) in 0.1590 seconds 


=> ["testtable"] 

hbase(main):004:0> put 'testtable','myrow-1','colfam1:q1','value-1' 

0 row(s) in 0.4770 seconds 


hbase(main):005:0> put 'testtable','myrow-2','colfam1:q2','value-2' 

0 row(s) in 0.0440 seconds 


hbase(main):006:0> put 'testtable','myrow-2','colfam1:q3','value-3' 

0 row(s) in 0.0370 seconds 


hbase(main):007:0> scan 'testtable' 

ROW                                             COLUMN+CELL                                                                                                                              

myrow-1                                        column=colfam1:q1, timestamp=1402888334639, value=value-1                                                                                

myrow-2                                        column=colfam1:q2, timestamp=1402888343658, value=value-2                                                                                

myrow-2                                        column=colfam1:q3, timestamp=1402888350278, value=value-3                                                                                

2 row(s) in 0.2360 seconds 


hbase(main):008:0> get 'testtable','myrow-1' 

COLUMN                                          CELL                                                                                                                                     

colfam1:q1                                     timestamp=1402888334639, value=value-1                                                                                                   

1 row(s) in 0.1040 seconds 


hbase(main):009:0> delete 'testtable','myrow-2','colfam1:q2' 

0 row(s) in 0.1320 seconds 


hbase(main):010:0> scan 'testtable' 

ROW                                             COLUMN+CELL                                                                                                                              

myrow-1                                        column=colfam1:q1, timestamp=1402888334639, value=value-1                                                                                

myrow-2                                        column=colfam1:q3, timestamp=1402888350278, value=value-3                                                                                

2 row(s) in 0.0650 seconds 


hbase(main):012:0> disable 'testtable' 

0 row(s) in 1.8050 seconds 


hbase(main):013:0> drop 'testtable' 

0 row(s) in 0.7200 seconds 


hbase(main):014:0> exit 

[hadoop@funshion-hadoop194 logs]$ 



-- 安裝後的URL:http://funshion-hadoop196:60010/master-status 


-- ###################################################################################################### -- 









<property> 

    <name>yarn.resourcemanager.resource-tracker.address</name> 

    <value>master-hadoop:58031</value> 

  </property> 

  <property> 

    <name>yarn.resourcemanager.address</name> 

    <value>master-hadoop:58032</value> 

  </property> 

  <property> 

    <name>yarn.resourcemanager.scheduler.address</name> 

    <value>master-hadoop:58030</value> 

  </property> 

  <property> 

    <name>yarn.resourcemanager.admin.address</name> 

    <value>master-hadoop:58033</value> 

  </property> 

  <property> 

    <name>yarn.resourcemanager.webapp.address</name> 

    <value>master-hadoop:58088</value> 

  </property> 

  <property> 

    <description>Classpath for typical applications.</description> 

    <name>yarn.application.classpath</name> 

    <value> 

        $HADOOP_CONF_DIR, 

        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, 

        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, 

        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, 

        $YARN_HOME/*,$YARN_HOME/lib/* 

    </value> 

  </property> 

  <property> 

    <name>yarn.nodemanager.aux-services</name> 

    <value>mapreduce_shuffle</value> 

  </property> 

  <property> 

    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 

    <value>org.apache.hadoop.mapred.ShuffleHandler</value> 

  </property> 

  <property> 

    <name>yarn.nodemanager.local-dirs</name> 

    <value>/var/lib/hadoop/dfs/yarn/local</value> 

  </property> 

  <property> 

    <name>yarn.nodemanager.log-dirs</name> 

    <value>/var/lib/hadoop/dfs/yarn/logs</value> 

  </property> 

  <property> 

    <description>Where to aggregate logs</description> 

    <name>yarn.nodemanager.remote-app-log-dir</name> 

    <value>/var/lib/hadoop/dfs/yarn/remotelogs</value> 

  </property> 

  <property> 

    <name>yarn.app.mapreduce.am.staging-dir</name> 

    <value>/var/lib/hadoop/dfs/yarn/userstag</value> 

  </property> 

  <property> 

    <name>mapreduce.jobhistory.intermediate-done-dir</name> 

    <value>/var/lib/hadoop/dfs/yarn/intermediatedone</value> 

  </property> 

  <property> 

    <name>mapreduce.jobhistory.done-dir</name> 

    <value>/var/lib/hadoop/dfs/yarn/done</value> 

  </property> 

  <property> 

    <name>yarn.log-aggregation-enable</name> 

    <value>true</value> 

  </property> 


--------------------------- 

<property> 

    <description>Classpath for typical applications.</description> 

    <name>yarn.application.classpath</name> 

    <value> 

        $HADOOP_CONF_DIR, 

        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, 

        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, 

        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, 

        $YARN_HOME/*,$YARN_HOME/lib/* 

    </value> 

  </property> 

這個有沒有必要配置 



-- hadoop reformat 


rm -rf ~/mydata/* 

rm -rf ~/mycluster/* 

rm -rf /usr/local/zookeeper/var/data/zookeeper* 

rm -rf /usr/local/zookeeper/var/data/ver* 

rm -rf /usr/local/zookeeper/var/datalog/* 

rm -rf /tmp/hadoop* 

rm -rf /tmp/hbase* 

rm -rf /tmp/hive* 

rm -rf /tmp/yarn* 

rm -rf ~/tmp/* 

rm -rf ~/logs/* 

mkdir -p /home/hadoop/logs/yarn_local 

mkdir -p /home/hadoop/logs/yarn_log 

mkdir -p /home/hadoop/logs/yarn_remotelog 

mkdir -p /home/hadoop/logs/yarn_userstag 

mkdir -p /home/hadoop/logs/yarn_intermediatedone 

mkdir -p /home/hadoop/logs/yarn_done 


        <property> 

                <name>yarn.nodemanager.local-dirs</name> 

                <value>/home/hadoop/logs/yarn_local</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.log-dirs</name> 

                <value>/home/hadoop/logs/yarn_log</value> 

        </property> 

        <property> 

                <name>yarn.nodemanager.remote-app-log-dir</name> 

                <value>/home/hadoop/logs/yarn_remotelog</value> 

        </property> 

        <property> 

                <name>yarn.app.mapreduce.am.staging-dir</name> 

                <value>/home/hadoop/logs/yarn_userstag</value> 

        </property> 

        <property> 

                <name>mapreduce.jobhistory.intermediate-done-dir</name> 

                <value>/home/hadoop/logs/yarn_intermediatedone</value> 

        </property> 

        <property> 

                <name>mapreduce.jobhistory.done-dir</name> 

                <value>/var/lib/hadoop/dfs/yarn_done</value> 

        </property> 



vi /usr/local/zookeeper/var/data/myid 



-- hadoop、hbase、hive啓動後,最後namenode節點應該有如下相關進程(如果另外一個namenode節點無ResourceManager,可以手動去啓動一下): 

[hadoop@funshion-hadoop194 bin]$ jps 


25583 JournalNode            -- JournalNode進程(每個節點都有) 

21911 QuorumPeerMain          -- Zookeeper相關進程(每個節點都有) 

25380 NameNode                -- hadoop namenode進程(兩個namenode節點均有) 

26261 HMaster                 -- Hbase Master進程(只有Hbase Master節點有) 

25860 ResourceManager         -- Hadoop 的 ResourceManager進程(兩個namenode節點均有) 

25769 DFSZKFailoverController -- Hadoop的故障自動轉移相關進程(就靠它在故障的時候幹活了,兩個namenode節點均有) 



-- hadoop、hbase、hive啓動後,最後datanode節點有如下相關進程: 


[hadoop@funshion-hadoop195 ~]$ jps 

32090 HRegionServer 

31854 JournalNode 

31781 NameNode 

31978 DFSZKFailoverController 

32189 Jps 

30808 QuorumPeerMain 


-- 安裝後相關的URL: 


-- HADOOP集羣: 

http://funshion-hadoop194:23188/cluster 

-- Hbase集羣: 

http://funshion-hadoop196:60010/master-status 





# export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME} 

# export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME} 

# export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME} 

# export YARN_HOME=${HADOOP_DEV_HOME} 

# export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop 

# export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop 

# export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop 


export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}/share/hadoop/mapreduce 

export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}/share/hadoop/common 

export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}/share/hadoop/hdfs 

export YARN_HOME=${HADOOP_DEV_HOME}/share/hadoop/yarn 

export HADOOP_YARN_HOME=${HADOOP_DEV_HOME}/share/hadoop/yarn 

export HADOOP_CLIENT_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop 

export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop 

export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop 

export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop 


ncing is necessary 

2014-06-20 11:49:37,137 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election 

2014-06-20 11:49:37,168 INFO org.apache.zookeeper.ZooKeeper: Session: 0x146b75b9d960000 closed 

2014-06-20 11:49:37,169 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x146b75b9d960000 

2014-06-20 11:49:37,169 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 

2014-06-20 11:49:39,180 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: funshion-hadoop195/192.168.117.195:53310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 

2014-06-20 11:49:39,183 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at funshion-hadoop195/192.168.117.195:53310: Call From funshion-hadoop195/192.168.117.195 to funshion-hadoop195:53310 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:   
http://wiki.apache.org/hadoop/ConnectionRefused 

2014-06-20 11:49:41,188 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: funshion-hadoop195/192.168.117.195:53310. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 

2014-06-20 11:49:41,191 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at funshion-hadoop195/192.168.117.195:53310: Call From funshion-hadoop195/192.168.117.195 to funshion-hadoop195:53310 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:   
http://wiki.apache.org/hadoop/ConnectionRefused 


-- #################################################################################################################### -- 






-- 最後,測試hive與hbase整合是否成功: 


1、先在hive中創建hbase表: 

[hadoop@funshion-hadoop196 ~]$ hive 


Logging initialized using configuration in jar:file:/usr/local/apache-hive-0.14.0-SNAPSHOT-bin/lib/hive-common-0.14.0-SNAPSHOT.jar!/hive-log4j.properties 

hive (default)> use web; 

OK 

Time taken: 0.813 seconds 

hive (web)> CREATE TABLE hive_table_1(key int, value string) 

          > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 

          > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,id:1") TBLPROPERTIES ("hbase.table.name"="hive_table_1"); 

OK 

Time taken: 4.471 seconds 

hive (web)> desc hive_table_1; 

OK 

key                         int                         from deserializer    

value                       string                      from deserializer    

Time taken: 0.605 seconds, Fetched: 2 row(s) 


2、然後在hbase查看錶 hive_table_1 是否存在,並插入一行數據: 

[hadoop@funshion-hadoop196 ~]$ hbase shell 

2014-06-22 13:19:35,775 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

2014-06-22 13:19:35,865 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

2014-06-22 13:19:35,955 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

2014-06-22 13:19:36,028 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

2014-06-22 13:19:36,120 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 

HBase Shell; enter 'help<RETURN>' for list of supported commands. 

Type "exit<RETURN>" to leave the HBase Shell 

Version 0.98.3-hadoop2, rUnknown, Thu Jun 12 16:40:37 CST 2014 


hbase(main):001:0> status 

2014-06-22 13:19:42,761 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 

3 servers, 1 dead, 1.3333 average load 


hbase(main):002:0> satus 

NameError: undefined local variable or method `satus' for #<Object:0x2dbb822> 


hbase(main):003:0> status 

4 servers, 0 dead, 1.0000 average load 


hbase(main):004:0> list 

TABLE                                                                                                                                                                            

hive_table_1                                                                                                                                                                     

testtable                                                                                                                                                                        

2 row(s) in 0.2430 seconds 


=> ["hive_table_1", "testtable"] 

hbase(main):005:0> put 'hive_table_1', "1", "id:1", "1" 

0 row(s) in 0.8620 seconds 


3、然後在hive中查詢,看是否能訪問到表數據: 

hive (web)> select * from hive_table_1; 

OK 

1        1 

Time taken: 0.761 seconds, Fetched: 1 row(s)