By Deepak Vohra

 
Oracle Loader for Hadoop (OLH) is used to load data from several different input formats including HDFS, Hive, Avro and Oracle NoSQL Database into Oracle Database and other output formats such as a data pump file and a delimited text file. Apache Solr is not one of the supported input formats, but a Hive table may be created over Solr and the Hive table data may be loaded into Oracle Database using OLH. In this tutorial we shall first store data in Solr and subsequently create a Hive table over the Solr collection. Using Oracle Loader for Hadoop 3.0.0 we shall load the Hive table data into Oracle Database.
 
This tutorial has the following sections.
 
 

Setting the Environment

 
We have used Oracle Linux 6 installed over Oracle VirtualBox 4.3 in this tutorial. Oracle Database 11g is installed over Oracle Linux 6. We need to download and install the following software on Oracle Linux 6.
 
  1. Oracle Loader for Hadoop 3.0.0
  2. Oracle Database 11g
  3. Apache Solr 4.9
  4. Hadoop 2.0.0 CDH 4.6
  5. Hive 0.10.0 CDH 4.6
  6. Java 7
 
Create a directory /solr to install the software and set its permissions.
 
mkdir /solr
chmod -R 777 /solr
cd /solr
 
Download the Apache Solr 4.9 solr-4.9.0.tgz file from http://archive.apache.org/dist/lucene/solr/4.9.0/. Extract the tgz file to the /solr directory.
 
>cd /solr
>tar -xvf solr-4.9.0.tgz
 
Also download the Hive Storage Handler for Solr using the following command.
 
>git clone http://github.org/chimpler/hive-solr
 
The Hive storage handler for Solr gets downloaded into the hive-solr package.
 
 
From the hive-solr package run the mvn package command to generate the Hive Storage Handler to generate the Solr jar file.
 
>cd hive-solr
>mvn package
 
The hive-solr-0.0.1-SNAPSHOT-jar-with-dependencies.jar gets generated in the target directory.
 
 
Download Java 7 from http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html and extract the gz file to the /solr directory.
 
/solr>tar zxvf jdk-7u55-linux-i586.gz
 
Download Hadoop 2.0.0 CDH 4.6 and extract the tar.gz file to the /solr directory.
 
/solr>wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.6.0.tar.gz
/solr>tar -xvf hadoop-2.0.0-cdh4.6.0.tar.gz
 
Create symlinks for the Hadoop 2.0.0 bin directory and the conf directory.
 
/solr>ln -s /solr/hadoop-2.0.0-cdh4.6.0/bin /solr/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/bin
/solr>ln -s /solr/hadoop-2.0.0-cdh4.6.0/etc/hadoop /solr/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2/conf
 
Download the Oracle Loader for Hadoop 3.0.0 from http://www.oracle.com/technetwork/database/database-technologies/bdc/big-data-connectors/downloads/index.html and extract the oraloader-3.0.0-h2.x86_64.zip file to the /solr directory.
 
/solr>unzip oraloader-3.0.0-h2.x86_64.zip
 
Download the Hive 0.10.0 CDH 4.6 file and extract the /solr directory.
 
/solr>wget http://archive.cloudera.com/cdh4/cdh/4/hive-0.10.0-cdh4.6.0.tar.gz
/solr>tar -xvf hive-0.10.0-cdh4.6.0.tar.gz
 
Set the fs.defaultFS and hadoop.tmp.dir configuration properties in the /solr/hadoop-2.0.0-cdh4.6.0/etc/hadoop/core-site.xml file.
 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://10.0.2.15:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///var/lib/hadoop-0.20/cache</value>
</property>
</configuration>
 
Create the Hadoop tmp directory and set its permissions.
 
mkdir -p /var/lib/hadoop-0.20/cache
chmod -R 777 /var/lib/hadoop-0.20/cache
 
Set the HDFS configuration properties in the /solr/hadoop-2.0.0-cdh4.6.0/etc/hadoop/hdfs-site.xml file.
 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/1/dfs/nn</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
 
Create the NameNode storage directory and set its permissions.
 
mkdir -p /data/1/dfs/nn
chmod -R 777 /data/1/dfs/nn
 
Create the Hive configuration file hive-site.xml by copying from the hive-default.xml.template file.
 
cp /solr/hive-0.10.0-cdh4.6.0/conf/hive-default.xml.template /solr/hive-0.10.0-
cdh4.6.0/conf/hive-site.xml
 
Set the Hive configuration properties in the hive-site.xml file.
 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://10.0.2.15:8020/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:10000</value>
</property>
</configuration>
 
Set the environment variables in the bash shell for Oracle Database, Hive, Hadoop, Apache Solr, Hive Storage Handler for Solr and Java.
 
vi ~/.bashrc
export HADOOP_PREFIX=/solr/hadoop-2.0.0-cdh4.6.0
export HADOOP_CONF=$HADOOP_PREFIX/etc/hadoop
export HIVE_HOME=/solr/hive-0.10.0-cdh4.6.0
export HIVE_CONF=$HIVE_HOME/conf
export OLH_HOME=/solr/oraloader-3.0.0-h2
export JAVA_HOME=/solr/jdk1.7.0_55
export ORACLE_HOME=/home/oracle/app/oracle/product/11.2.0/dbhome_1
export ORACLE_SID=ORCL
export HADOOP_MAPRED_HOME=/solr/hadoop-2.0.0-cdh4.6.0/bin
export HADOOP_HOME=/solr/hadoop-2.0.0-cdh4.6.0/share/hadoop/mapreduce2
export HADOOP_CLASSPATH=$HADOOP_HOME/*:$HADOOP_HOME/lib/*:$HIVE_HOME/lib/*:$OLH_HOME/jlib/*:
$HIVE_CONF:/solr/hive-solr-0.0.1-SNAPSHOT-jar-with-dependencies.jar
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_MAPRED_HOME:$ORACLE_HOME/bin:$HIVE_HOME/bin
export CLASSPATH=$HADOOP_CLASSPATH
 
Close and restart the Linux command shell for the env variables to take effect. Copy the hive-solr-0.0.1-SNAPSHOT-jar-with-dependencies.jar to the Hive lib directory.
 
/solr> cp hive-solr-0.0.1-SNAPSHOT-jar-with-dependencies.jar /solr/hive-0.10.0-cdh4.6.0/lib
 
Format the NameNode and start the HDFS cluster; NameNode and DataNode.
 
hadoop namenode -format
hadoop namenode
hadoop datanode
 
Create the Hive warehouse directory and set its permissions.
 
hadoop dfs -mkdir hdfs://10.0.2.15:8020/user/hive/warehouse
hadoop dfs -chmod -R g+w hdfs://10.0.2.15:8020/user/hive/warehouse
 
We need to copy the Oracle Loader for Hadoop and Hive to the HDFS for them to be available in the classpath at runtime when the OLH is run. Create a directory /solr in the HDFS and set its permissions.
 
hdfs dfs -mkdir hdfs://localhost:8020/solr
hadoop dfs -chmod -R g+w hdfs://localhost:8020/solr
 
Copy the OLH and Hive directories to the /solr directory in the HDFS.
 
hdfs dfs -put /solr/oraloader-3.0.0-h2 hdfs://localhost:8020/solr
hdfs dfs -put /solr/hive-0.10.0-cdh4.6.0 hdfs://localhost:8020/solr
 

Creating Oracle Database Table

 
We shall load the following log data from Solr to Oracle Database.
 
Apr-8-2014-7:06:16-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to STANDBY
Apr-8-2014-7:06:17-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to STARTING
Apr-8-2014-7:06:18-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to ADMIN
Apr-8-2014-7:06:19-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to RESUMING
Apr-8-2014-7:06:20-PM-PDT Notice WebLogicServer AdminServer BEA-000331 Started WebLogic AdminServer
Apr-8-2014-7:06:21-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to RUNNING
Apr-8-2014-7:06:22-PM-PDT Notice WebLogicServer AdminServer BEA-000360 Server started in RUNNING mode
 
Create a Oracle Database table OE.WLSLOG using the following SQL script in SQL*Plus.
 
CREATE TABLE OE.wlslog (time_stamp VARCHAR2(255), category VARCHAR2(255), type VARCHAR2(255), servername VARCHAR2(255), code VARCHAR2(255), msg VARCHAR2(255));
 
The Oracle Database table gets created and its structure may be listed with the DESC command.
 
 

Configuring Solr

 
We shall store the log data in Solr using XML format with fields time_stamp, category, type, servername, code, and msg. We need to declare the fields in the schema.xml file in the /solr/solr-4.9.0/example/solr/collection1/conf directory. Add the following <field/> elements in the schema.xml file. The fields may be indexed or non-indexed; we have set indexed to false.
 
<field name="time_stamp" type="string" indexed="false" stored="true" multiValued="false" />
<field name="category" type="string" indexed="false" stored="true" multiValued="false" />
<field name="type" type="string" indexed="false" stored="true" multiValued="false" />
<field name="servername" type="string" indexed="false" stored="true" multiValued="false" />
<field name="code" type="string" indexed="false" stored="true" multiValued="false" />
<field name="msg" type="string" indexed="false" stored="true" multiValued="false" />
 
The schema.xml file already has a <category/> element. As duplicate elements are not supported remove the other <category/> element. Documents in Solr require the id field by default. The log data we shall be storing does not include the id field. For Solr to be able to store documents without the id field we need some additional configuration. Comment out the following elements from the schema.xml file.
 
<!--
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />-->
<!--
<uniqueKey>Id</uniqueKey> -->
 
As a document stored in Solr automatically adds the _version_ field and the log data we shall be storing does not have the_version_ field also comment out the _version_ field from the schema.xml.
 
<!--<field name="_version_" type="long" indexed="true" stored="true"/>-->
 
If the id and _version_ fields are removed we also need to comment out the following elements from the solrconfig.xml file in the /solr/solr-4.9.0/example/solr/collection1/conf directory.
 
<!-- <updateLog>
<str name="dir">${solr.ulog.dir:}</str>
</updateLog>-->
 
<!-- <requestHandler name="/get" class="solr.RealTimeGetHandler">
<lst name="defaults">
<str name="omitHeader">true</str>
<str name="wt">json</str>
<str name="indent">true</str>
</lst>
</requestHandler> -->
 
<!--<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
 
<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="df">text</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler> -->
 
After making the configuration modifications start the Solr server. If the Solr was already running, the server needs to be restarted.
 
>cd /solr/solr-4.9.0/example/
>java -jar start.jar
 
Apache Solr server gets started.
 
 

Storing Data in Solr

 
In this section we shall store data in Solr using the Solr Admin Console. We shall store the log data in the following XML format consisting of seven documents.
 
<add>
<doc>
 
<field name="time_stamp">Apr-8-2014-7:06:16-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000365</field>
<field name="msg">Server state changed to STANDBY</field>
</doc>
<doc>
<field name="time_stamp">Apr-8-2014-7:06:17-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000365</field>
<field name="msg">Server state changed to STARTING</field>
</doc>
<doc>
<field name="time_stamp">Apr-8-2014-7:06:18-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000365</field>
<field name="msg">Server state changed to ADMIN</field>
</doc>
<doc>
<field name="time_stamp">Apr-8-2014-7:06:19-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000365</field>
<field name="msg">Server state changed to RESUMING</field>
</doc>
 
<doc>
<field name="time_stamp">Apr-8-2014-7:06:20-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000331</field>
<field name="msg">Started WebLogic AdminServer</field>
</doc>
<doc>
<field name="time_stamp">Apr-8-2014-7:06:21-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000365</field>
<field name="msg">Server state changed to RUNNING</field>
</doc>
<doc>
<field name="time_stamp">Apr-8-2014-7:06:22-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000360</field>
<field name="msg">Server started in RUNNING mode</field>
</doc>
</add>
 
Start the Solr Admin Console with the URL http://localhost:8983/solr. In the Solr Admin Console select the Solr collection collection1. Click on Documents to select the /update request handler input form to add new documents to Solr. Select Document Type as Solr Command and add the XML document previously listed to the Document(s) field. Click on Submit Document button.
 
 
As indicated by the Response the documents get stored in Solr.
 
 
Subsequently select the Query option or specify the URL http://localhost:8983/solr/#/collection1/query in the browser. The default query specified in the q field is *:*, which selects all documents. The request handler specified in the qt field is /select. Click on the Execute Query button to run the query.
 
 
The response lists 7 documents in JSON format.
 
 
As we had removed the id and _version_ fields the documents do not contain those fields.
 
 

Creating Hive Table

 
In this section we shall create a Hive external table over the Solr collection. One of the differences between loading data from a data source such as JSON & XML and Solr is that while OLH is able to load data from a Hive external table created over a JSON/XML file, OLH is not able to load data from a Hive external table created over a Solr collection. We shall be creating a Hive external table over Solr and subsequently creating another Hive table (managed or external) from the data in the first Hive external table. To create a Hive external table over Solr we shall use the com.chimpler.hive.solr.SolrStorageHandler storage handler class. We shall the use the following serdeproperties for mapping the Solr data into Hive.
 
Serde property
Description
Value
solr.column.mapping
Specifies the mapping of Solr fields to Hive columns.
time_stamp,category,type,servername,code,msg
solr.url
Specifies the Solr collection URL.
http://localhost:8983/solr/collection1
 
Start the Hive server with the following command.
 
>hive --service hiveserver
 
Start the Hive command shell with the following command.
 
>hive
 
Run the following command in Hive to add the Hive storage handler for Solr to the Hive shell classpath.
 
>ADD JAR hive-solr-0.0.1-SNAPSHOT-jar-with-dependencies.jar;
 
Run the following command in the Hive shell to create a Hive external table wlslog over the Solr collection1.
 
hive> create external table wlslog (
time_stamp STRING,
category STRING,
type STRING,
servername STRING,
code STRING,
msg STRING) stored by "com.chimpler.hive.solr.SolrStorageHandler"
with serdeproperties ("solr.column.mapping"="time_stamp,category,type,servername,code,msg")
tblproperties ("solr.url" = "http://localhost:8983/solr/collection1");
The Hive external table wlslog gets created in the default database. Run the following command to select list data from the Hive table wlslog.
 
hive>select * from wlslog;
 
The output from the various Hive shell commands is shown below. The SELECT command lists the Solr data.
 
 
As mentioned before a limitation of Solr is that OLH is not able to load data directly from a Hive external table over a Solr collection. We need to create another Hive table and load data into it from the first Hive table. Create another Hive table solr with the same columns and column types as the wlslog table with the following command in the Hive shell.
 
create external table solr (
time_stamp STRING,
category STRING,
type STRING,
servername STRING,
code STRING,
msg STRING);
 
The 2nd Hive table gets created.
 
 
Next, run the following command in Hive shell to load data from the wlslog table to the solr table.
 
hive>INSERT OVERWITE TABLE solr SELECT * FROM wlslog;
 
Data from the wlslog table gets loaded into the solr table.
 
 
Subsequently run a select query on the solr table to list the data loaded from the Hive external table wlslog.
 
 

Loading Hive Table Data into Oracle Database

 
In this section we shall load data from the Hive table solr into Oracle Database table OE.WLSLOG using Oracle Loader for Hadoop. As in the earlier OLH tutorials we need a configuration file for OLH. Specify the input format class as oracle.hadoop.loader.lib.input.HiveToAvroInputFormat. Specify the Hive database name as default and specify the Hive table name as solr. Specify the output format class as oracle.hadoop.loader.lib.output.JDBCOutputFormat and specify the target database table as OE.WLSLOG. The output directory of the MapReduce job and the connection parameters for the Oracle Database are also specified in the following configuration file OraLoadConf.xml.
 
<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
<!-- Input settings -->
<property>
<name>mapreduce.inputformat.class</name>
<value>oracle.hadoop.loader.lib.input.HiveToAvroInputFormat</value>
</property>
<property>
<name>oracle.hadoop.loader.input.hive.databaseName</name>
<value>default</value>
</property>
<property>
<name>oracle.hadoop.loader.input.hive.tableName</name>
<value>solr</value>
</property>
<!-- Output settings -->
<property>
<name>mapreduce.job.outputformat.class</name>
<value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.outputdir</name>
<value>oraloadout</value>
</property>
<!-- Table information -->
<property>
<name>oracle.hadoop.loader.loaderMap.targetTable</name>
<value>OE.WLSLOG</value>
</property>
<!-- Connection information -->
<property>
<name>oracle.hadoop.loader.connection.url</name>
<value>jdbc:oracle:thin:@${HOST}:${TCPPORT}:${SID}</value>
</property>
<property>
<name>TCPPORT</name>
<value>1521</value>
</property>
<property>
<name>HOST</name>
<value>localhost</value>
</property>
<property>
<name>SID</name>
<value>ORCL</value>
</property>
<property>
<name>oracle.hadoop.loader.connection.user</name>
<value>OE</value>
</property>
<property>
<name>oracle.hadoop.loader.connection.password</name>
<value>OE</value>
</property>
</configuration>
 
Before running OLH logout of the Hive shell and the Oracle Database SQL*Plus command line utilities as the OLH needs to connect to Hive server and Oracle Database to load data from Hive to Oracle Database. Run the following command to run OLH with the configuration file specified with the –conf option.
 
hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf OraLoadConf.xml -libjars $OLH_HOME/jlib/oraloader.jar
 
The Oracle Loader for Hadoop gets started.
 
 
The MapReduce job runs to load data from Hive to Oracle Database.
 
 
A more detailed output from the OLH command is listed:
 
[root@localhost solr]# hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf OraLoadConf.xml -libjars $OLH_HOME/jlib/oraloader.jar
Oracle Loader for Hadoop Release 3.0.0 - Production
Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
14/09/04 17:45:00 INFO loader.OraLoader: Oracle Loader for Hadoop Release 3.0.0 - Production
Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
14/09/04 17:45:11 INFO loader.OraLoader: oracle.hadoop.loader.loadByPartition is disabled because table: WLSLOG is not partitioned
14/09/04 17:45:11 INFO loader.OraLoader: oracle.hadoop.loader.enableSorting disabled, no sorting key provided
14/09/04 17:45:11 INFO loader.OraLoader: Reduce tasks set to 0 because of no partitioning or sorting. Loading will be done in the map phase.
14/09/04 17:45:11 INFO output.DBOutputFormat: Setting map tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat
14/09/04 17:45:11 WARN conf.Configuration: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
14/09/04 17:45:16 INFO loader.OraLoader: Sampling time=0D:0h:0m:0s:121ms (121 ms)
14/09/04 17:45:16 INFO loader.OraLoader: Submitting OraLoader job OraLoader
14/09/04 17:45:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/09/04 17:45:20 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:10000
14/09/04 17:45:20 INFO hive.metastore: Connected to metastore.
14/09/04 17:45:31 INFO mapred.FileInputFormat: Total input paths to process : 1
14/09/04 17:45:33 INFO mapreduce.JobSubmitter: number of splits:1
14/09/04 17:45:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local734080747_0001
14/09/04 17:45:49 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/09/04 17:45:49 INFO mapred.LocalJobRunner: OutputCommitter set in config null
14/09/04 17:45:50 INFO mapred.LocalJobRunner: OutputCommitter is oracle.hadoop.loader.lib.output.DBOutputCommitter
14/09/04 17:45:50 INFO mapred.LocalJobRunner: Waiting for map tasks
14/09/04 17:45:50 INFO mapred.LocalJobRunner: Starting task: attempt_local734080747_0001_m_000000_0
14/09/04 17:45:50 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
14/09/04 17:45:50 INFO mapred.MapTask: Processing split: hdfs://10.0.2.15:8020/user/hive/warehouse/solr/000000_0:0+717
14/09/04 17:45:51 INFO loader.OraLoader: map 0% reduce 0%
14/09/04 17:45:53 INFO output.DBOutputFormat: conf prop: defaultExecuteBatch: 100
14/09/04 17:45:53 INFO output.DBOutputFormat: conf prop: loadByPartition: false
14/09/04 17:45:53 INFO output.DBOutputFormat: Insert statement: INSERT INTO "OE"."WLSLOG" ("TIME_STAMP", "CATEGORY", "TYPE", "SERVERNAME", "CODE", "MSG") VALUES (?, ?, ?, ?, ?, ?)
14/09/04 17:45:53 INFO mapred.LocalJobRunner:
14/09/04 17:45:59 INFO mapred.LocalJobRunner: map
14/09/04 17:45:59 INFO loader.OraLoader: map 100% reduce 0%
14/09/04 17:46:01 INFO mapred.Task: Task:attempt_local734080747_0001_m_000000_0 is done. And is in the process of committing
14/09/04 17:46:01 INFO mapred.LocalJobRunner: map
14/09/04 17:46:01 INFO mapred.Task: Task attempt_local734080747_0001_m_000000_0 is allowed to commit now
14/09/04 17:46:01 INFO output.JDBCOutputFormat: Committed work for task attempt attempt_local734080747_0001_m_000000_0
14/09/04 17:46:01 INFO output.FileOutputCommitter: Saved output of task 'attempt_local734080747_0001_m_000000_0' to hdfs://10.0.2.15:8020/user/root/oraloadout/_temporary/0/task_local734080747_0001_m_000000
14/09/04 17:46:01 INFO mapred.LocalJobRunner: map
14/09/04 17:46:01 INFO mapred.Task: Task 'attempt_local734080747_0001_m_000000_0' done.
14/09/04 17:46:01 INFO mapred.LocalJobRunner: Finishing task: attempt_local734080747_0001_m_000000_0
14/09/04 17:46:01 INFO mapred.LocalJobRunner: Map task executor complete.
14/09/04 17:46:02 INFO loader.OraLoader: Job complete: OraLoader (job_local734080747_0001)
14/09/04 17:46:02 INFO loader.OraLoader: Counters: 23
File System Counters
FILE: Number of bytes read=10412615
FILE: Number of bytes written=11375011
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=10423679
HDFS: Number of bytes written=9769750
HDFS: Number of read operations=238
HDFS: Number of large read operations=0
HDFS: Number of write operations=36
Map-Reduce Framework
Map input records=7
Map output records=7
Input split bytes=1106
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=229
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=20312064
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1622
[root@localhost solr]#
 
Subsequently run a SELECT query in SQL*Plus to list the data loaded into Oracle Database.
 
 
The 7 rows of data loaded from the 7 documents in Solr get listed.
 
 
In this tutorial we loaded Solr data into Oracle Database using Oracle Loader for Hadoop 3.0.0.