Written by Deepak Vohra

Oracle Big Data Connectors 3.0.0 became available recently (https://blogs.oracle.com/datawarehousing/entry/announcing_big_data_appliance_3). Oracle Loader for Hadoop (OLH) is one of the Big Data Connectors and a tool to load data into Oracle Database from HDFS. OLH also supports several types of data sources such as a delimited text file, Avro, Apache Hive, Oracle NoSQL Database and custom input formats. OLH supports output formats other than Oracle Database table. The output could also be a delimited text file or a data pump file. OLH makes use of Hadoop to prepare and load the data and is essentially a Map Reduce application. In this article we shall load data into Oracle Database from HDFS using OLH. The article has the following sections.

Setting the Environment
Creating the Target Oracle Database Table
Creating the Data to Load in HDFS
Connecting with Oracle Database from VirtualBox
Configuring Map Reduce Job in XML Configuration File
Loading HDFS Data to Oracle Database

Setting the Environment

We shall load data into Oracle Database 12c, which may be downloaded from http://www.oracle.com/technetwork/database/enterprise-edition/downloads/index.html. In the article Oracle Database 12c is installed on Windows OS and OLH is run on Ubuntu on Oracle VM VirtualBox. Download VirtualBox 4.3.10 for Windows hosts from https://www.virtualbox.org/wiki/Downloads. Download ubuntu-12.10-desktop-i386 ISO from http://virtualboxes.org/images/ubuntu/. Install VirtualBox and create a Virtual Machine for Ubuntu.

Start the virtual machine and login to Ubuntu OS. Start the Ubuntu terminal. In the Ubuntu terminal change the login’s session owner to root with the following command.

su –l root

Next, we shall install CDH4.6 Hadoop ecosystem and Oracle Loader for Hadoop. To download the CDH4.6 tar.gz file run the following command.

root>wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.6.0.tar.gz

The output from the command indicates that the hadoop-2.0.0-cdh4.6.0.tar.gz file gets downloaded.

  

Extract the hadoop-2.0.0-cdh4.6.0.tar.gz file to a directory with the following command.

root>tar -xvf hadoop-2.0.0-cdh4.6.0.tar.gz

The CDH4.6 files get extracted.

  

We also need to install Java 7. As root user run the following command.

root>sudo apt-get install openjdk-7-jdk

Select Y when prompted Do you want to Continue [Y/n]. Subsequently run the following command.

root>sudo apt-get update

Java 7 gets installed.

Next, download   Oracle Loader for Hadoop Release 3.0.0 oraloader-3.0.0.x86_64.zip from http://www.oracle.com/technetwork/database/database-technologies/bdc/big-data-connectors/downloads/index.html. Unzip the file to a directory. Two files get extracted oraloader-3.0.0-h1.x86_64.zip and oraloader-3.0.0-h2.x86_64.zip. The oraloader-3.0.0-h1.x86_64.zip file is for Apache Hadoop 1.x and oraloader-3.0.0-h2.x86_64.zip for CDH4 and CDH5. As we are using CDH4.6 extract oraloader-3.0.0-h2.x86_64.zip on Ubuntu as user root with the following command.

root>unzip oraloader-3.0.0-h2.x86_64.zip

Oracle Loader for Hadoop 3.0.0 gets extracted to oraloader-3.0.0-h2 directory.

Set the environment variables as user root with the following command.

root>vi ~/.bashrc

Set the following environment variables in the vi editor.

export HADOOP_COMMON_HOME=hadoop-2.0.0-cdh4.6.0
export JAVA_HOME=/user/lib/jvm/java-7-openjdk-i386
export HADOOP_HOME=hadoop-2.0.0-cdh4.6.0
export OLH_HOME=oraloader-3.0.0-h2
export HADOOP_CONF_DIR=$HADOOP_COMMON_HOME\etc\hadoop
export PATH=$PATH:$HADOOP_COMMON_HOME/bin:$HADOOP_COMMON_HOME/sbin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OLH_HOME/jlib/*

Exit the root user session and restart the root user session to activate the environment variable settings.

Creating the Target Oracle Database Table

As we shall be loading HDFS data into Oracle Database we need to create a table in Oracle Database. Start SQL Plus and run the following SQL script to create a database table CATALOG in OE schema.

CREATE TABLE OE.Catalog(CatalogId INTEGER
PRIMARY KEY, Journal VARCHAR2(25), Publisher VARCHAR2(25),
Edition VARCHAR2(25), Title VARCHAR2(45), Author VARCHAR2(25));

The database table OE.CATALOG gets created and its description may be output using the DESC OE.CATALOG command.

 

Creating the Data to Load in HDFS

We have created a target Oracle Database table. Next, create the data in HDFS to load into the Oracle Database table. Create the following catalog.txt file in the root user directory. Fields are delimited by a ’,’ and rows are delimited by the ‘\n’ character.

1,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,Database Resource Manager,Kimberly Floss
2,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,From ADF UIX to JSF,Jonas Jacobi
3,Oracle Magazine,Oracle Publishing,March-April 2005,Starting with Oracle ADF,Steve Muench

Create a directory /catalog in HDFS to load data into with the following command.

hdfs dfs -mkdir /catalog

Add the catalog.txt file to the /catalog directory with the following command.

hdfs dfs -put catalog.txt /catalog/catalog.txt

The catalog.txt file gets added to HDFS. The target database table has been created and the data to be loaded has been added to HDFS. But, before we may start loading the data we need to configure the network settings on the VirtualBox   to be able to connect to Oracle Database from VirtualBox and also configure the OLH job in a job configuration file.

 

Connecting with Oracle Database from VirtualBox

If we were connecting to Oracle Database with the OLH and the Oracle Database running on the same machine we would have used host name as “localhost” or the IPv4 address of the machine. But, because OLH is installed on the VirtualBox Ubuntu and Oracle Database is running on the host machine the host name is not localhost or the IPv4 address. We shall use the default gateway and the default route to connect to Oracle Database. To display information about default gateway/route run the following command as root user on the Ubunut VM.

netstat –rn

The default gateway information gets displayed. The connection requests not local to Ubuntu are routed using the default gateway 10.0.2.2. A destination of 0.0.0.0 is the default route and a route lookup that does not match any other route falls back to the default route.

  

We shall use host name as the default gateway 10.0.2.2 when connecting to Oracle Database on the host. We need to enable NAT (Network Address Translation) to be able to connect to host from the VirtualBox. Select Settings in Oracle VM VirtualBox Manager. In Settings select Network and select Enable Network Adapter Attached to: as NAT. Click on OK. NAT is also the default setting.

  

Configuring Map Reduce Job in XML Configuration File

We need to configure the Map Reduce job which shall run the Oracle Loader for Hadoop in a job configuration file. Create a OraLoadJobConf.xml file (file name may be something else) in the root user directory. Job configuration is specified using properties in the job configuration file. The HDFS directories (, separated) from which data is to be loaded is specified with the mapred.input.dir property.

<property>
   <name>mapred.input.dir</name>
   <value>/catalog</value>
</property>

The output directories for the Map Reduce job must be specified using the mapreduce.output.fileoutputformat.outputdir or the deprecated mapred.output.dir property. The output directory/ies must not be created prior to running the OLH as these are created automatically by the Map Reduce job. And if a directory with the same name as used in a previous OLH job run is to be used again the directory from the previous run must be removed.

<property>
   <name>mapreduce.output.fileoutputformat.outputdir</name>
   <value>oraloadout</value>
</property>

The input format is specified using the mapreduce.inputformat.class property. The following input formats are supported.

 

Input format Class

Description

oracle.hadoop.loader.lib.input.DelimitedTextInputFormat

Loads data from a delimited text file.

oracle.hadoop.loader.lib.input.RegexInputFormat

Loads data from files which are more complex than delimited text files.

oracle.hadoop.loader.lib.input.HiveToAvroInputFormat

Loads data from a Hive table.

oracle.hadoop.loader.lib.input.AvroInputFormat

Loads data from binary Avro data files.

oracle.kv.hadoop.KVAvroInputFormat

Loads data from Oracle NoSQL Database.

As we shall be loading data from a delimited text file catalog.txt in the HDFS directory /catalog specify input format as follows in the job configuration file.

<property>
   <name>mapreduce.inputformat.class</name>
   <value>oracle.hadoop.loader.lib.input.DelimitedTextInputFormat</value>
</property>

With the delimited text file input the following properties may optionally be used to further describe the input file.

Property

Description

oracle.hadoop.loader.input.fieldTerminator

 

A character to indicate the end of an input field. Default value is ‘,’.

oracle.hadoop.loader.input.initialFieldEncloser

 

A character that indicates the beginning of a field. Default value is not defined.

oracle.hadoop.loader.input.trailingFieldEncloser

A character that indicates the end of a field. Default value is not defined.

oracle.hadoop.loader.input.fieldNames

A list of names for the input fields. Data in the delimited text file is read and assigned the names in the list in the same order. For example, the first field in the data file has the field name that is first in the list. The default field names are F0, F1, F3….Fn. We shall demonstrate loading data with the default field names and also with custom field names.

Specify the following properties for the delimited text file. The field terminator is the \uHHHH format for a ‘,’ with the HHHH being the UTF-16 encoding. The field names in the oracle.hadoop.loader.input.fieldNames property match the column names in the CATALOG table.

<property>
<name>oracle.hadoop.loader.input.fieldTerminator</name>
   <value>\u002C</value>
</property>

<property>
    <name>oracle.hadoop.loader.input.fieldNames</name>
      <value>CATALOGID,JOURNAL,PUBLISHER,EDITION,TITLE,AUTHOR</value>
</property>

OLH supports the following output formats.

Output format class

Description

oracle.hadoop.loader.lib.output.JDBCOutputFormat

Loads the data into a target table. Makes use of JDBC batching for optimization such that if an error occurs during the loading the rows that have already been loaded stay loaded and the remaining rows are not loaded.

oracle.hadoop.loader.lib.output.OCIOutputFormat

Loads data into a partitioned table using one or more reducers.

oracle.hadoop.loader.lib.output.DelimitedTextOutputFormat

Loads data into a delimited text file.

oracle.hadoop.loader.lib.output.DataPumpOutputFormat

Loads data into a data pump file.

As we shall be loading data into a non-partitioned Oracle Database table specify output format as follows.

<property>
   <name>mapreduce.job.outputformat.class</name>
   <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
</property>

The number of rows added in a batch may optionally be specified with the oracle.hadoop.loader.connection.defaultExecuteBatch property with the default setting being 100.

The input fields are mapped to target database table using either automatic mapping or manual mapping. For automatic mapping the following are required.

  1. All columns in the target database have been defined.
  2. Input field names in the IndexedRecord input record   match the target column names. IndexedRecord is the object used in the input format implementation.
  3. All fields that are mapped to the Date column type can be parsed using the same Java date format.

We are using a pre-configured (not custom) input format, the DelimitedTextInputFormat, with which the IndexedRecord input object is used transparently with the default IndexedRecord field names being F0, F1… Fn. If automatic mapping is to be used the target database table must have column name F0, F1, F2…Fn. As we created the CATALOG table with other column names manual mapping is used for which the oracle.hadoop.loader.input.fieldNames property is required to be specified.

<property>
    <name>oracle.hadoop.loader.input.fieldNames</name>
     <value>CATALOGID,JOURNAL,PUBLISHER,EDITION,TITLE,AUTHOR</value>
</property>

The target database table must be specified for the input fields to be mapped to Oracle Database. The oracle.hadoop.loader.loaderMap.targetTable property is used to specify the target database table.

<property>
   <name>oracle.hadoop.loader.loaderMap.targetTable</name>
   <value>CATALOG</value>
</property>    

The connection information must also be configured. The following properties are provided for specifying the connection parameters.

Property

Description

Oracle Database setting

oracle.hadoop.loader.connection.url

The connection URL used to connect to Oracle Database.

jdbc:oracle:thin:@${HOST}:

${TCPPORT}/

${SERVICE_NAME}

TCPPORT

The port number to connect to.

1521

HOST

The host name.

10.0.2.2

SERVICE_NAME

The Oracle Database service name.

ORCL

oracle.hadoop.loader.connection.user

The user name or schema name.

OE

oracle.hadoop.loader.connection.password

Password.

OE

The OraLoadJobConf.xml configuration file is listed below.

<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
 
<!--                          Input settings                             -->
 
 <property>
   <name>mapreduce.inputformat.class</name>
   <value>oracle.hadoop.loader.lib.input.DelimitedTextInputFormat</value>
 </property>
 
 <property>
   <name>mapred.input.dir</name>
   <value>/catalog</value>
 </property>
 
 <property>
   <name>oracle.hadoop.loader.input.fieldTerminator</name>
   <value>\u002C</value>
 </property>
 <property>
    <name>oracle.hadoop.loader.input.fieldNames</name>
       <value>CATALOGID,JOURNAL,PUBLISHER,EDITION,TITLE,AUTHOR</value>
        </property>
 
<!--                          Output settings                    -->         
 <property>
   <name>mapreduce.job.outputformat.class</name>
   <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
 </property>
   
 <property>
   <name>mapreduce.output.fileoutputformat.outputdir</name>
   <value>oraloadout</value>
 </property>
 
<!--                          Table information     -->                       
 
 <property>
   <name>oracle.hadoop.loader.loaderMap.targetTable</name>
   <value>CATALOG</value>
 </property>      
   
 
 
<!--                          Connection information        -->              
 
<property>
  <name>oracle.hadoop.loader.connection.url</name>
  <value>jdbc:oracle:thin:@${HOST}:${TCPPORT}/${SERVICE_NAME}</value>
</property>
 
<property>
  <name>TCPPORT</name>
  <value>1521</value>
</property>
 
<property>
  <name>HOST</name>
  <value>10.0.2.2</value>
</property>
 
<property>
 <name>SERVICE_NAME</name>
 <value>ORCL</value>
</property>
 
<property>
  <name>oracle.hadoop.loader.connection.user</name>
  <value>OE</value>
</property>
 
<property>
  <name>oracle.hadoop.loader.connection.password</name>
  <value>OE</value>        
   
</property>  
 
</configuration>

Loading HDFS Data to Oracle Database

Having configured the Map Reduce job for the OLH next we shall run the OLH to load data from HDFS into Oracle Database 12c. The database must be running before the loading is to start. The Oracle Loader for Hadoop is run as a Hadoop job with the oracle.hadoop.loader.OraLoader class using the following command syntax.

bin/hadoop jar oraloader.jar oracle.hadoop.loader.OraLoader -libjars
CustomInputFormat.jar -conf job-Conf.xml
-fs [<local|namenode:port>]
-jt [<local|jobtracker:port>]

The –libjars option specifies the JARs required to run the hadoop command. The –conf option specifies the job configuration file. The –fs option specifies the Hadoop Namenode address and port. The –jt option specifies the Hadoop JobTracker address and port. The –fs and –jt options are not required to be specified.

Run the Oracle Loader for Hadoop with the following command.

hadoop jar $OLH_HOME/jlib/oraloader.jar   oracle.hadoop.loader.OraLoader   -conf OraLoadJobConf.xml -libjars $OLH_HOME/jlib/oraloader.jar

Oracle Loader for Hadoop 3.0.0 gets started.

  

The Map Reduce job runs and data from HDFS gets loaded into Oracle Database 12c.

  

The complete output from the OLH job is listed:

root@ubuntu-VirtualBox:~# hadoop jar $OLH_HOME/jlib/oraloader.jar   oracle.hadoop.loader.OraLoader   -conf OraLoadJobConf.xml -libjars $OLH_HOME/jlib/oraloader.jar   
Oracle Loader for Hadoop Release 3.0.0 - Production

Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.  
 
2014-04-28 23:17:19,987 INFO  loader.OraLoader (OraLoader.java:run(402)) - Oracle Loader for Hadoop Release 3.0.0 - Production
Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
2014-04-28 23:17:19,998 INFO  loader.OraLoader (OraLoader.java:run(403)) - Built-Against: hadoop-2.2.0-cdh5.0.0-beta-2 hive-0.12.0-cdh5.0.0-beta-2 avro-1.7.3 jackson-1.8.8  
2014-04-28 23:17:24,650 INFO  loader.OraLoader (OraLoader.java:run(466)) - oracle.hadoop.loader.loadByPartition is disabled because table: CATALOG is not partitioned
2014-04-28 23:17:24,776 INFO  output.DBOutputFormat (DBOutputFormat.java:checkJobInput(176)) - Setting reduce tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat
 
2014-04-28 23:17:26,458 INFO  loader.OraLoader (OraLoader.java:run(561)) - Sampling time=0D:0h:0m:0s:54ms (54 ms)
2014-04-28 23:17:26,463 INFO  loader.OraLoader (OraLoader.java:monitorAndPrintJob(648)) - Submitting OraLoader job OraLoader  
2014-04-28 23:17:26,978 INFO  jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2014-04-28 23:17:29,136 INFO  input.FileInputFormat (FileInputFormat.java:listStatus(245)) - Total input paths to process : 1
2014-04-28 23:17:29,566 INFO  mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(371)) - number of splits:1   
2014-04-28 23:17:30,669 INFO  mapreduce.JobSubmitter (JobSubmitter.java:printTokens(454)) - Submitting tokens for job: job_local953624128_0001
2014-04-28 23:17:33,549 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-3565277347236921770/tableMetadata-a7bcca5d-d4bf-4304-b36a-ee2c7f7484f6 <- /root/tableMetadata-a7bcca5d-d4bf-4304-b36a-ee2c7f7484f6
2014-04-28 23:17:34,799 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/tableMetadata-a7bcca5d-d4bf-4304-b36a-ee2c7f7484f6 as file:/root/build/test/mapred/local/-3565277347236921770/tableMetadata-a7bcca5d-d4bf-4304-b36a-ee2c7f7484f6
2014-04-28 23:17:38,335 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-4312484096159851235/ora-hadoop-common.jar <- /root/ora-hadoop-common.jar
2014-04-28 23:17:38,814 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/ora-hadoop-common.jar as file:/root/build/test/mapred/local/-4312484096159851235/ora-hadoop-common.jar
2014-04-28 23:17:39,743 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-6470313814398159350/ojdbc6.jar <- /root/ojdbc6.jar
2014-04-28 23:17:39,811 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/ojdbc6.jar as file:/root/build/test/mapred/local/-6470313814398159350/ojdbc6.jar
2014-04-28 23:17:39,812 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/7406111300499153058/orai18n.jar <- /root/orai18n.jar
2014-04-28 23:17:39,847 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/orai18n.jar as file:/root/build/test/mapred/local/7406111300499153058/orai18n.jar
2014-04-28 23:17:39,849 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/6823321701753283874/orai18n-utility.jar <- /root/orai18n-utility.jar
2014-04-28 23:17:39,890 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/orai18n-utility.jar as file:/root/build/test/mapred/local/6823321701753283874/orai18n-utility.jar
2014-04-28 23:17:39,895 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/604237540348944088/orai18n-mapping.jar <- /root/orai18n-mapping.jar
2014-04-28 23:17:39,925 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/orai18n-mapping.jar as file:/root/build/test/mapred/local/604237540348944088/orai18n-mapping.jar
2014-04-28 23:17:39,933 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/8785254478692374558/orai18n-collation.jar <- /root/orai18n-collation.jar
2014-04-28 23:17:39,969 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/orai18n-collation.jar as file:/root/build/test/mapred/local/8785254478692374558/orai18n-collation.jar
2014-04-28 23:17:40,000 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-1294049037210381348/oraclepki.jar <- /root/oraclepki.jar
2014-04-28 23:17:40,052 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/oraclepki.jar as file:/root/build/test/mapred/local/-1294049037210381348/oraclepki.jar
2014-04-28 23:17:40,053 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-732057438188758231/osdt_cert.jar <- /root/osdt_cert.jar
2014-04-28 23:17:40,084 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/osdt_cert.jar as file:/root/build/test/mapred/local/-732057438188758231/osdt_cert.jar
2014-04-28 23:17:40,087 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-5329476776964839866/osdt_core.jar <- /root/osdt_core.jar
2014-04-28 23:17:40,138 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/osdt_core.jar as file:/root/build/test/mapred/local/-5329476776964839866/osdt_core.jar
2014-04-28 23:17:40,155 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-3976616872240980722/commons-math-2.2.jar <- /root/commons-math-2.2.jar
2014-04-28 23:17:40,187 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/commons-math-2.2.jar as file:/root/build/test/mapred/local/-3976616872240980722/commons-math-2.2.jar
2014-04-28 23:17:40,188 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-1851794532904303080/jackson-core-asl-1.8.8.jar <- /root/jackson-core-asl-1.8.8.jar
2014-04-28 23:17:40,238 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/jackson-core-asl-1.8.8.jar as file:/root/build/test/mapred/local/-1851794532904303080/jackson-core-asl-1.8.8.jar
2014-04-28 23:17:40,239 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/6900390032396025019/jackson-mapper-asl-1.8.8.jar <- /root/jackson-mapper-asl-1.8.8.jar
2014-04-28 23:17:40,281 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/jackson-mapper-asl-1.8.8.jar as file:/root/build/test/mapred/local/6900390032396025019/jackson-mapper-asl-1.8.8.jar
2014-04-28 23:17:40,291 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-4035464625512297029/avro-1.7.3.jar <- /root/avro-1.7.3.jar
2014-04-28 23:17:40,331 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/avro-1.7.3.jar as file:/root/build/test/mapred/local/-4035464625512297029/avro-1.7.3.jar
2014-04-28 23:17:40,344 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/7454283881570874826/avro-mapred-1.7.3-hadoop1.jar <- /root/avro-mapred-1.7.3-hadoop1.jar
2014-04-28 23:17:40,393 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/olhcache/lib-800eb9f4-605a-401d-815e-f3b164d2b3e1/avro-mapred-1.7.3-hadoop1.jar as file:/root/build/test/mapred/local/7454283881570874826/avro-mapred-1.7.3-hadoop1.jar
2014-04-28 23:17:40,398 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:symlink(202)) - Creating symlink: /root/build/test/mapred/local/-7543421878704020629/oraloader.jar <- /root/oraloader.jar
2014-04-28 23:17:40,442 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setup(166)) - Localized file:/root/oraloader-3.0.0-h2/jlib/oraloader.jar as file:/root/build/test/mapred/local/-7543421878704020629/oraloader.jar
2014-04-28 23:17:40,645 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(981)) - mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files
2014-04-28 23:17:40,841 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/-4312484096159851235/ora-hadoop-common.jar
2014-04-28 23:17:40,851 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/-6470313814398159350/ojdbc6.jar
2014-04-28 23:17:40,852 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/7406111300499153058/orai18n.jar
2014-04-28 23:17:40,854 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/6823321701753283874/orai18n-utility.jar
2014-04-28 23:17:40,871 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/604237540348944088/orai18n-mapping.jar
2014-04-28 23:17:40,874 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/8785254478692374558/orai18n-collation.jar
2014-04-28 23:17:40,875 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/-1294049037210381348/oraclepki.jar
2014-04-28 23:17:40,885 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/-732057438188758231/osdt_cert.jar
2014-04-28 23:17:40,887 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/-5329476776964839866/osdt_core.jar
2014-04-28 23:17:40,887 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/-3976616872240980722/commons-math-2.2.jar
2014-04-28 23:17:40,894 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/-1851794532904303080/jackson-core-asl-1.8.8.jar
2014-04-28 23:17:40,894 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/6900390032396025019/jackson-mapper-asl-1.8.8.jar
2014-04-28 23:17:40,895 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/-4035464625512297029/avro-1.7.3.jar
2014-04-28 23:17:40,895 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/7454283881570874826/avro-mapred-1.7.3-hadoop1.jar
2014-04-28 23:17:40,895 INFO  mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:makeClassLoader(235)) - file:/root/build/test/mapred/local/-7543421878704020629/oraloader.jar
2014-04-28 23:17:40,984 INFO  mapreduce.Job (Job.java:submit(1290)) - The url to track the job: http://localhost:8080/
2014-04-28 23:17:41,003 INFO  mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(321)) - OutputCommitter set in config null
2014-04-28 23:17:41,129 INFO  mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(339)) - OutputCommitter is oracle.hadoop.loader.lib.output.DBOutputCommitter
2014-04-28 23:17:41,233 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(387)) - Waiting for map tasks
2014-04-28 23:17:41,241 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(214)) - Starting task: attempt_local953624128_0001_m_000000_0
2014-04-28 23:17:41,599 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorProcessTree : [ ]
2014-04-28 23:17:41,624 INFO  mapred.MapTask (MapTask.java:runNewMapper(726)) - Processing split: file:/catalog/catalog.txt:0+270
2014-04-28 23:17:42,218 INFO  loader.OraLoader (OraLoader.java:monitorAndPrintJob(698)) - map 0% reduce 0%
2014-04-28 23:17:42,947 INFO  mapred.MapTask (MapTask.java:createSortingCollector(386)) - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2014-04-28 23:17:47,445 INFO  mapred.MapTask (MapTask.java:setEquator(1168)) - (EQUATOR) 0 kvi 26214396(104857584)
2014-04-28 23:17:47,450 INFO  mapred.MapTask (MapTask.java:init(960)) - mapreduce.task.io.sort.mb: 100
2014-04-28 23:17:47,451 INFO  mapred.MapTask (MapTask.java:init(961)) - soft limit at 83886080
2014-04-28 23:17:47,455 INFO  mapred.MapTask (MapTask.java:init(962)) - bufstart = 0; bufvoid = 104857600
2014-04-28 23:17:47,456 INFO  mapred.MapTask (MapTask.java:init(963)) - kvstart = 26214396; length = 6553600
2014-04-28 23:17:48,157 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(502)) -  
2014-04-28 23:17:48,175 INFO  mapred.MapTask (MapTask.java:flush(1425)) - Starting flush of map output
2014-04-28 23:17:48,181 INFO  mapred.MapTask (MapTask.java:flush(1444)) - Spilling map output
2014-04-28 23:17:48,183 INFO  mapred.MapTask (MapTask.java:flush(1445)) - bufstart = 0; bufend = 414; bufvoid = 104857600
2014-04-28 23:17:48,187 INFO  mapred.MapTask (MapTask.java:flush(1447)) - kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600
2014-04-28 23:17:48,328 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1633)) - Finished spill 0
2014-04-28 23:17:48,428 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local953624128_0001_m_000000_0 is done. And is in the process of committing
2014-04-28 23:17:48,495 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(502)) - map
2014-04-28 23:17:48,497 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local953624128_0001_m_000000_0' done.
2014-04-28 23:17:48,497 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(239)) - Finishing task: attempt_local953624128_0001_m_000000_0
2014-04-28 23:17:48,502 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(395)) - Map task executor complete.
2014-04-28 23:17:48,667 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorProcessTree : [ ]
2014-04-28 23:17:48,814 INFO  mapred.Merger (Merger.java:merge(568)) - Merging 1 sorted segments
2014-04-28 23:17:48,853 INFO  mapred.Merger (Merger.java:merge(667)) - Down to the last merge-pass, with 1 segments left of total size: 409 bytes
2014-04-28 23:17:48,863 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(502)) -  
2014-04-28 23:17:49,644 INFO  loader.OraLoader (OraLoader.java:monitorAndPrintJob(698)) - map 100% reduce 0%
2014-04-28 23:17:49,746 INFO  output.DBOutputFormat (DBOutputFormat.java:<init>(479)) - conf prop: defaultExecuteBatch: 100
2014-04-28 23:17:49,747 INFO  output.DBOutputFormat (DBOutputFormat.java:<init>(480)) - conf prop: loadByPartition: false
2014-04-28 23:17:49,758 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(981)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2014-04-28 23:17:49,898 INFO  output.DBOutputFormat (DBOutputFormat.java:getInsertStmt(759)) - Insert statement: INSERT INTO "OE"."CATALOG" ("CATALOGID", "JOURNAL", "PUBLISHER", "EDITION", "TITLE", "AUTHOR") VALUES (?, ?, ?, ?, ?, ?)
2014-04-28 23:17:51,692 ERROR output.JDBCOutputFormat (JDBCOutputFormat.java:executeBatch(431)) - JDBC BatchUpdateException in batch of size 4
2014-04-28 23:17:51,693 ERROR output.JDBCOutputFormat (JDBCOutputFormat.java:executeBatch(432)) - java.sql.BatchUpdateException: ORA-01400: cannot insert NULL into ("OE"."CATALOG"."CATALOGID")
2014-04-28 23:17:52,030 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local953624128_0001_r_000000_0 is done. And is in the process of committing
2014-04-28 23:17:52,035 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(502)) -  
2014-04-28 23:17:52,047 INFO  mapred.Task (Task.java:commit(1140)) - Task attempt_local953624128_0001_r_000000_0 is allowed to commit now
2014-04-28 23:17:52,059 INFO  output.JDBCOutputFormat (JDBCOutputFormat.java:commitTask(154)) - Committed work for task attempt attempt_local953624128_0001_r_000000_0
2014-04-28 23:17:52,071 INFO  output.FileOutputCommitter (FileOutputCommitter.java:commitTask(432)) - Saved output of task 'attempt_local953624128_0001_r_000000_0' to file:/root/oraloadout/_temporary/0/task_local953624128_0001_r_000000
2014-04-28 23:17:52,082 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(502)) - reduce > reduce
2014-04-28 23:17:52,086 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local953624128_0001_r_000000_0' done.
2014-04-28 23:17:52,692 INFO  loader.OraLoader (OraLoader.java:monitorAndPrintJob(698)) - map 100% reduce 100%
2014-04-28 23:17:52,695 INFO  loader.OraLoader (OraLoader.java:monitorAndPrintJob(709)) - Job complete: OraLoader (job_local953624128_0001)
2014-04-28 23:17:52,701 INFO  loader.OraLoader (OraLoader.java:monitorAndPrintJob(711)) - Counters: 27
    File System Counters
        FILE: Number of bytes read=41834270
        FILE: Number of bytes written=42452558
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=4
        Map output records=4
        Map output bytes=414
        Map output materialized bytes=428
        Input split bytes=90
        Combine input records=0
        Combine output records=0
        Reduce input groups=1
        Reduce shuffle bytes=0
        Reduce input records=4
        Reduce output records=4
        Spilled Records=8
        Shuffled Maps =0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=429
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=245309440
    File Input Format Counters  
        Bytes Read=290
    File Output Format Counters  
        Bytes Written=1819

Subsequently run a SELECT query in SQL Plus to list the rows loaded from HDFS into Oracle Database using OLH.

  

If automatic mapping is used in which the oracle.hadoop.loader.input.fieldNames property is not specified in the job configuration file and the default field names F0, F1,…Fn are used, the Oracle Database table must have same name columns F0,F1…Fn.

  

The input file catalog.txt has an empty line, which does not constitute a row. As indicated in the Map Reduce output an error is generated for the empty line.

ERROR output.JDBCOutputFormat (JDBCOutputFormat.java:executeBatch(431)) - JDBC BatchUpdateException in batch of size 4

2014-04-28 23:17:51,693 ERROR output.JDBCOutputFormat (JDBCOutputFormat.java:executeBatch(432)) - java.sql.BatchUpdateException: ORA-01400: cannot insert NULL into ("OE"."CATALOG"."CATALOGID")

But, the rows which are valid according to the JDBCOutputFormat are loaded and the empty line is not loaded.

In this article we loaded HDFS data from a delimited text file into Oracle Database 12c using Oracle Loader for Hadoop.