使用sqoop1.4.4从oracle导入数据到hive中错误记

系统 1933 0

   在使用命令导数据过程中,出现如下错误

      sqoop import --hive-import --
      
        connect jdbc:oracle:thin:@
        
          
            192.168.29.16:1521/testdb --username NAME --
            
              passord PASS --verbose -m 1 --table T_USERINFO
            
          
        
      
    

错误1: File does not exist: hdfs://opt/sqoop-1.4.4/lib/commons-io-1.4.jar

      FileNotFoundException: File does not exist: hdfs:
      
        //
      
      
        opt/sqoop-1.4.4/lib/commons-io-1.4.jar
        
at org.apache ... ...
at org.apache ... ...

原因分析: 

感谢  Daniel Koverman s answer  http://stackoverflow.com/questions/19375784/sqoop-jar-files-not-found

      It is common 
      
        for
      
       Hadoop services to look 
      
        for
      
       jars 
      
        in
      
       HDFS because all nodes 
      
        in
      
       the cluster can access files 
      
        in
      
       HDFS. This is important 
      
        if
      
       the MapReduce job being kicked off by the Hadoop service, 
      
        in
      
       this 
      
        case
      
      
         Sqoop, has a dependence on those jars. Remember, the Mappers are running on a DataNode, not the NameNode even though you are (probably) running the Sqoop command from the NameNode. Putting the jars on HDFS is not the only possible solution to this problem, but it is a sensible one.



Now we can deal with the actual error. At least one, but probably all, of your Mappers are unable to 
      
      
        find
      
       a jar they need. That means that either the jar does not exist or the user trying to access them does not have the required permissions. First check 
      
        if
      
       the 
      
        file
      
       exists by running hadoop fs -
      
        ls
      
       home/SqoopUser/sqoop-
      
        1.4
      
      .
      
        3
      
      -cdh4.
      
        4.0
      
      /sqoop-
      
        1.4
      
      .
      
        3
      
      -cdh4.
      
        4.0
      
      .jar by a user with superuser privileges on the cluster. If it does not exist, put it there with hadoop fs -put {jarLocationOn/NameNode/fileSystem/sqoop-
      
        1.4
      
      .
      
        3
      
      -cdh4.
      
        4.0
      
      .jar} /home/SqoopUser/sqoop-
      
        1.4
      
      .
      
        3
      
      -cdh4.
      
        4.0
      
      /sqoop-
      
        1.4
      
      .
      
        3
      
      -cdh4.
      
        4.0
      
      .jar. 
    

解决方法:

 将提示中涉及的jar文件put到hdfs文件系统中的相同位置,如果文件系统中没有对应的目录,则需要建立相应目录,在我的错误提示中,由于hdfs://master:8020/中缺少了 /opt/sqoop-1.4.4/lib/文件夹中的各种jar,所以我的做法是把此处整个/opt/sqoop-1.4.4/lib文件夹put到hdfs://master:8020/中

      <!--查看以下文件系统中的文件目录,这是递归查询,如果文件很多 建议不要家-R参数,而是逐层查看-->
      
        

hadoop fs 
      
      -
      
        ls
      
       -R /

<!--建立相同的目录结构-->
      
        

hadoop fs 
      
      -
      
        mkdir
      
       /
      
        opt

hadoop fs 
      
      -
      
        mkdir
      
       /opt/sqoop-
      
        1.4
      
      .
      
        4
      
      

<!--将本地的/opt/sqoop-
      
        1.4
      
      .
      
        4
      
      /lib 拷贝到hdfs中的/opt/sqoop-
      
        1.4
      
      .4目录中-->
      
        

hadoop fs 
      
      -put /opt/sqoop-
      
        1.4
      
      .
      
        4
      
      /lib /opt/sqoop-
      
        1.4
      
      .
      
        4
      
      /

<!--查看一下结果,确认拷贝成功-->
      
        

hadoop fs 
      
      -
      
        ls
      
       -R /opt/sqoop-
      
        1.4
      
      .
      
        4
      
    

错误2 :java.lang.ClassNotFoundException: Class U_BASICINFO not found

对于要导入到hive中的表,错误提示说找不到对应的.class和.jar文件

      
        java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class U_BASICINFO not found

    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:
      
      
        462
      
      
        )

    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:
      
      
        522
      
      
        )

Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class U_BASICINFO not found

    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:
      
      
        1895
      
      
        )

    at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:
      
      
        394
      
      
        )

    at .....
      
    

原因分析:暂时不知道

解决方案:

感谢 user236575 ’s answer:  http://stackoverflow.com/questions/21599785/sqoop-not-able-to-import-table/21626010#21626010

默认sqoop在执行导入table过程中会生成对应的table的java文件和编译产生的.class和.jar文件,.java文件保存在sqoop/bin目录下,而class 和 jar文件则保存在/tmp/sqoop-hduser/compile/ 下相应的文件夹中。

我的解决方式是找到要导入表的class和jar文件,然后将他们拷贝到sqoop/bin目录下面和hdfs文件系统中的/user/USERNAM/ 目录下面(后期测试后,只要将.class和.jar拷贝到sqoop/bin目录下就可以成功import)。

      <!--拷贝到sqoop/bin目录下-->


      
        cp
      
       /tmp/sqoop-root/compile/某个临时文件夹包含需要的class和jar文件
      
        /*
      
      
          /opt/sqoop-1.4.4/bin/



<!--put到hdfs中的/user/USERNAME/文件夹下-->

hadoop fs -put /tmp/sqoop-root/compile/某个临时文件夹包含需要的class和jar文件/* /user/root/
      
    

错误3 org.apache.hadoop.mapred.file already exists exception:output directory hdfs://user/root/... ...

解决方案:

在执行过一次导入数据表命令后,当再次执行时,可能会出现这种错误,这是只要进入到hdfs中将对应的文件或者文件夹删除即可。

      hadoop fs -
      
        rm
      
       /user/USERNAME
      
        /*
      
    

错误4 sqoop导入数据时出现java.sql.SQLException: ORA-01017: invalid username/password; logon denied   

原因:oracle 11对大小写敏感,所以需要关掉oracle数据库大小写敏感。

解决方法:

1.登入数据库,执行:alter system set sec_case_sensitive_logon=false

2.或者重新建立一个用户,用全部大写或者小写建立用户名或密码(由于sqoop中密码用户名必须大写,但是最终是按大写还是小写传入数据库的不清楚,所以可能需要尝试大写和小写两种方式后才知道)。

错误5  INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s)

      13/12/14 20:12:07 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

13/12/14 20:12:08 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

13/12/14 20:12:09 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

13/12/14 20:12:10 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

13/12/14 20:12:11 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

13/12/14 20:12:12 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

13/12/14 20:12:13 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

13/12/14 20:12:14 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032.


    

问题原因: hadoop平台可能只启动了dfs没有启动yarn。

解决方法: 用start-all.sh启动hadoop或者用start-dfs.sh和start-yarn.sh组合启动hadoop。

 

使用sqoop1.4.4从oracle导入数据到hive中错误记录及解决方案


更多文章、技术交流、商务合作、联系博主

微信扫码或搜索:z360901061

微信扫一扫加我为好友

QQ号联系: 360901061

您的支持是博主写作最大的动力,如果您喜欢我的文章,感觉我的文章对您有帮助,请用微信扫描下面二维码支持博主2元、5元、10元、20元等您想捐的金额吧,狠狠点击下面给点支持吧,站长非常感激您!手机微信长按不能支付解决办法:请将微信支付二维码保存到相册,切换到微信,然后点击微信右上角扫一扫功能,选择支付二维码完成支付。

【本文对您有帮助就好】

您的支持是博主写作最大的动力,如果您喜欢我的文章,感觉我的文章对您有帮助,请用微信扫描上面二维码支持博主2元、5元、10元、自定义金额等您想捐的金额吧,站长会非常 感谢您的哦!!!

发表我的评论
最新评论 总共0条评论