Hadoop、hive环境搭建详解

一、 Hadoop环境搭建

首先在Apache官网下载hadoop的包hadoop-0.20.2.tar.gz。

解压hadoop-0.20.2.tar.gz包，具体命令如下：

tar zxvf hadoop-0.20.2.tar.gz

其中要注意的是，tar 包用xvf ，gz包用zxvf。

在安装中，如果遇到识别问题，或者无法解压，很有可能是权限问题，解决方案是修改此文件的使用权限，命令如下：

chmod 777 hadoop-0.20.2.tar.gz

其中，777为所有权限。

如果依然报错，如：Archive contains obsolescent base-64 headers;Error exit delayed from previous errors。

这种情况，一般是压缩包损坏的问题。因为大多数人会将包下载到windows环境，再通过ftp等方法上传到Linux环境。容易产生包损坏。建议大家直接下载到Linux即可。具体命令如下：

wget http://labs.renren.com/apache-mirror/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz

直接下载到当前目录。

当文件准备好之后，我们要修改配置，将Hadoop 简单run起来。

首先，我们进入hadoop-0.20.2/conf目录当中，其中会存在如下配置文件：

首先修改masters和slaves，这个是指定我们的m和s的ip地址，这里我们就以单台机器为例子，在文件中直接输入当前机器的IP。

之后我们修改mapred-site.xml文件，具体配置如下

Xml代码

1. <span style="font-size: medium;"><?xml version="1.0"?>

2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

4.

6. <configuration>

7. <property>

8. <name>mapred.job.tracker</name>

9. <value>hdfs://192.168.216.57:8012</value>

10. <description>The host and port that the MapReduce job tracker runs

11. at. If "local", then jobs are run in-process as a single map

12. and reduce task.

13. Pass in the jobtracker hostname via the

14. -Dhadoop.jobtracker=JOBTRACKER_HOST java option.

15. </description>

16. </property>

17. </configuration></span>

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>mapred.job.tracker</name>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

Pass in the jobtracker hostname via the

-Dhadoop.jobtracker=JOBTRACKER_HOST java option.

</description>

</property>

</configuration>

job.tracker是关键，mapReduce会将一个job，通过map()，打散为n个task。

之后是对文件core-site.xml的配置，详细配置如下：

Xml代码

1. <span style="font-size: medium;"><?xml version="1.0"?>

2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

4.

6. <configuration>

8. <property>

9. <name>fs.default.name</name>

10. <value>hdfs://cap216057.sqa:9000</value>

11. </property>

12. <property>

13. <name>hadoop.tmp.dir</name>

14. <value>/home/admin/tmp/</value>

15. <description>A base for other temporary directories. Set to a

16. directory off of the user's home directory for the simple test.

17. </description>

18. </property>

19.

20. </configuration></span>

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>fs.default.name</name>

</property>

<name>hadoop.tmp.dir</name>

<value>/home/admin/tmp/</value>

<description>A base for other temporary directories. Set to a

directory off of the user's home directory for the simple test.

</description>

</property>

</configuration>

这个主要是配置我们的文件系统。其中，fs.default.name的value，不可以写IP地址，要写域名。域名的查询，具体命令如下：

cd ~

cd etc

vi hosts

1/2 1 2 下一页尾页