Install RHadoop with Hadoop 2.2 – Red Hat L

系统 1761 0

Prerequisite

Hadoop 2.2 has been installed ( and the below installation steps should be applied on each of Hadoop node )

 

Step 1. Install R (by yum)

[hadoop@c0046220 yum.repos.d]$ sudo yum update

 

[hadoop@c0046220 yum.repos.d]$ yum search r-project

 

[hadoop@c0046220 yum.repos.d]$ sudo yum install R

...

Installed:

R.x86_64 0:3.0.2-1.el6

 

Dependency Installed:

R-core.x86_64 0:3.0.2-1.el6 R-core-devel.x86_64 0:3.0.2-1.el6 R-devel.x86_64 0:3.0.2-1.el6 R-java.x86_64 0:3.0.2-1.el6

R-java-devel.x86_64 0:3.0.2-1.el6 bzip2-devel.x86_64 0:1.0.5-7.el6_0 fontconfig-devel.x86_64 0:2.8.0-3.el6 freetype-devel.x86_64 0:2.3.11-14.el6_3.1

java-1.6.0-openjdk-devel.x86_64 1:1.6.0.0-1.62.1.11.11.90.el6_4 kpathsea.x86_64 0:2007-57.el6_2 libRmath.x86_64 0:3.0.2-1.el6 libRmath-devel.x86_64 0:3.0.2-1.el6

libXft-devel.x86_64 0:2.3.1-2.el6 libXmu.x86_64 0:1.1.1-2.el6 libXrender-devel.x86_64 0:0.9.7-2.el6 libicu.x86_64 0:4.2.1-9.1.el6_2

netpbm.x86_64 0:10.47.05-11.el6 netpbm-progs.x86_64 0:10.47.05-11.el6 pcre-devel.x86_64 0:7.8-6.el6 psutils.x86_64 0:1.17-34.el6

tcl.x86_64 1:8.5.7-6.el6 tcl-devel.x86_64 1:8.5.7-6.el6 tex-preview.noarch 0:11.85-10.el6 texinfo.x86_64 0:4.13a-8.el6

texinfo-tex.x86_64 0:4.13a-8.el6 texlive.x86_64 0:2007-57.el6_2 texlive-dvips.x86_64 0:2007-57.el6_2 texlive-latex.x86_64 0:2007-57.el6_2

texlive-texmf.noarch 0:2007-38.el6 texlive-texmf-dvips.noarch 0:2007-38.el6 texlive-texmf-errata.noarch 0:2007-7.1.el6 texlive-texmf-errata-dvips.noarch 0:2007-7.1.el6

texlive-texmf-errata-fonts.noarch 0:2007-7.1.el6 texlive-texmf-errata-latex.noarch 0:2007-7.1.el6 texlive-texmf-fonts.noarch 0:2007-38.el6 texlive-texmf-latex.noarch 0:2007-38.el6

texlive-utils.x86_64 0:2007-57.el6_2 tk.x86_64 1:8.5.7-5.el6 tk-devel.x86_64 1:8.5.7-5.el6 zlib-devel.x86_64 0:1.2.3-29.el6

 

Complete!

 

Validation:

[hadoop@c0046220 yum.repos.d]$ R

 

R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"

Copyright (C) 2013 The R Foundation for Statistical Computing

Platform: x86_64-redhat-linux-gnu (64-bit)

 

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

 

Natural language support but running in an English locale

 

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

 

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

 

>

 

 

Step 2. Install RHadoop

2.1 Getting RHadoop Packages

Download packages rhdfs, rhbase and rmr2 from https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads and then run the R code below.

[hadoop@c0046220 RHadoop]$ cd /tmp

[hadoop@c0046220 tmp]$ mkdir RHadoop

[hadoop@c0046220 tmp]$ cd RHadoop

[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhdfs/master/build/rhdfs_1.0.8.tar.gz

[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rmr2/3.1.0/build/rmr2_3.1.0.tar.gz

 

[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhbase/master/build/rhbase_1.2.0.tar.gz

 

2.2 Install R packages that RHadoop depends on.

[hadoop@c0046220 java]$ echo $JAVA_HOME

/usr/java/jdk1.8.0_05

 

[hadoop@c0046220 java]$ sudo -i

[root@c0046220 ~]# export JAVA_HOME=/usr/java/jdk1.8.0_05

[root@c0046220 ~]# R CMD javareconf

[root@c0046220 ~]# R

...

> .libPaths();

[1] "/usr/lib64/R/library" "/usr/share/R/library"

 

> install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"))

> #install.packages("caTools") #needed for rmr2

 

2.3 Install RHadoop

Set environment variables

[hadoop@c0046220 ~]$ vi ~/.bashrc

# set HADOOP locations for RHADOOP

export HADOOP_CMD=$HADOOP_HOME/bin/hadoop

export HADOOP_STREAMING=/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

[hadoop@c0046220 ~]$ source .bashrc

 

[hadoop@c0040084 R]$ sudo -i

[root@c0040084 ~]# R

...

> Sys.setenv(HADOOP_HOME="/opt/hadoop/hadoop-2.2.0");

> Sys.setenv(HADOOP_CMD="/opt/hadoop/hadoop-2.2.0/bin/hadoop");

> Sys.setenv(HADOOP_STREAMING="/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar");

> install.packages(pkgs="/tmp/RHadoop/rhdfs_1.0.8.tar.gz",repos=NULL);

> install.packages(pkgs="/tmp/RHadoop/rmr2_3.1.0.tar.gz",repos=NULL);

 

Step 3. Validation

Load and initialize the rhdfs package, and execute some simple commands as below:

library(rhdfs)

hdfs.init()

hdfs.ls("/")

[hadoop@c0046220 ~]$ R

...

> library(rhdfs)

Loading required package: rJava

...

Be sure to run hdfs.init()

> hdfs.init()

14/05/15 10:02:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

> hdfs.ls("/")

permission owner group size modtime file

1 drwxr-xr-x hadoop supergroup 0 2014-05-14 03:05 /apps

2 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:40 /data

3 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:45 /output

4 drwxrwx--- hadoop supergroup 0 2014-05-15 10:02 /tmp

5 drwxr-xr-x hadoop supergroup 0 2014-05-14 05:48 /user

6 drwxr-xr-x hadoop supergroup 0 2014-05-13 06:43 /usr

 

Load and initialize the rmr2 package, and execute some simple commands as below:

library(rmr2)

from.dfs(to.dfs(1:100))

from.dfs(mapreduce(to.dfs(1:100)))

[hadoop@c0046220 ~]$ R

...

> library(rmr2)

Loading required package: Rcpp

Loading required package: RJSONIO

Loading required package: bitops

Loading required package: digest

Loading required package: functional

Loading required package: reshape2

Loading required package: stringr

Loading required package: plyr

Loading required package: caTools

 

> from.dfs(to.dfs(1:100))

...

$key

NULL

 

$val

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

[91] 91 92 93 94 95 96 97 98 99 100

 

> from.dfs(mapreduce(to.dfs(1:100)))

...

$key

NULL

 

$val

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

[91] 91 92 93 94 95 96 97 98 99 100

 

 

library(rmr2)

input<- '/user/hadoop/tmp.txt'

wordcount = function(input, output = NULL, pattern = " "){

wc.map = function(., lines) {

keyval(unlist( strsplit( x = lines,split = pattern)),1)

}

 

wc.reduce =function(word, counts ) {

keyval(word, sum(counts))

}

 

mapreduce(input = input ,output = output, input.format = "text",

map = wc.map, reduce = wc.reduce,combine = T)

}

 

wordcount(input)

 

> library(rmr2)

> input<- '/user/hadoop/tmp.txt'

> wordcount = function(input, output = NULL, pattern = " "){

+ wc.map = function(., lines) {

+ keyval(unlist( strsplit( x = lines,split = pattern)),1)

+ }

+

+ wc.reduce =function(word, counts ) {

+ keyval(word, sum(counts))

+ }

+

+ mapreduce(input = input ,output = output, input.format = "text",

+ map = wc.map, reduce = wc.reduce,combine = T)

+ }

>

> wordcount(input)

...

14/05/15 10:18:40 INFO mapreduce.Job: Job job_1399887026053_0013 completed successfully

14/05/15 10:18:40 INFO mapreduce.Job: Counters: 45

File System Counters

FILE: Number of bytes read=11018

FILE: Number of bytes written=278566

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=2004

HDFS: Number of bytes written=11583

HDFS: Number of read operations=9

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Failed reduce tasks=1

Launched map tasks=2

Launched reduce tasks=2

Data-local map tasks=2

Total time spent by all maps in occupied slots (ms)=23412

Total time spent by all reduces in occupied slots (ms)=13859

Map-Reduce Framework

Map input records=24

Map output records=112

Map output bytes=10522

Map output materialized bytes=11024

Input split bytes=208

Combine input records=112

Combine output records=114

Reduce input groups=105

Reduce shuffle bytes=11024

Reduce input records=114

Reduce output records=112

Spilled Records=228

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=569

CPU time spent (ms)=3700

Physical memory (bytes) snapshot=574214144

Virtual memory (bytes) snapshot=6258499584

Total committed heap usage (bytes)=365953024

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=1796

File Output Format Counters

Bytes Written=11583

rmr

reduce calls=110

14/05/15 10:18:40 INFO streaming.StreamJob: Output directory: /tmp/file612355aa2e35

function ()

{

fname

}

<environment: 0x37d70d0>

>

>

> from.dfs("/tmp/file612355aa2e35")

$key

[1] "-"

[2] "of"

[3] "Hong"

[4] "Paul's"

[5] "School"

[6] "College"

[7] "Graduate"

...

References

https://s3.amazonaws.com/RHadoop/RHadoop2.0.2u2_Installation_Configuration_for_RedHat.pdf

http://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Installing-R-under-Unix_002dalikes

 

http://www.rdatamining.com/tutorials/rhadoop

http://blog.fens.me/rhadoop-rhadoop/

http://datamgmt.com/installing-r-and-rstudio-on-redhat-or-centos-linux/

 

https://github.com/RevolutionAnalytics/RHadoop/wiki

https://github.com/RevolutionAnalytics/RHadoop/wiki/Which-Hadoop-for-rmr

Install RHadoop with Hadoop 2.2 – Red Hat Linux


更多文章、技术交流、商务合作、联系博主

微信扫码或搜索:z360901061

微信扫一扫加我为好友

QQ号联系: 360901061

您的支持是博主写作最大的动力,如果您喜欢我的文章,感觉我的文章对您有帮助,请用微信扫描下面二维码支持博主2元、5元、10元、20元等您想捐的金额吧,狠狠点击下面给点支持吧,站长非常感激您!手机微信长按不能支付解决办法:请将微信支付二维码保存到相册,切换到微信,然后点击微信右上角扫一扫功能,选择支付二维码完成支付。

【本文对您有帮助就好】

您的支持是博主写作最大的动力,如果您喜欢我的文章,感觉我的文章对您有帮助,请用微信扫描上面二维码支持博主2元、5元、10元、自定义金额等您想捐的金额吧,站长会非常 感谢您的哦!!!

发表我的评论
最新评论 总共0条评论