Integrating Apache Solr Data into Oracle Database in Oracle Data Integrator 11g

Apache Solr is a search & index engine providing fast indexing and full-text search. Solr is the most commonly used NoSQL database management system search engine. In an earlier tutorial (http://www.toadworld.com/platforms/oracle/w/wiki/11014.loading-solr-data-into-oracle-database-with-oracle-loader-for-hadoop-3-0-0.aspx) we loaded data from Solr into Oracle Database using Oracle Loader for Hadoop 3.0.0. The OLH involved configuration and command line invocation of the OLH tool. The Oracle Data Integrator provides a user interface in which Apache Solr data may loaded into Oracle Database using the integration module IKM File-Hive To Oracle. An IKM for direct integration from Apache Solr is not available but a Hive table may be created over an Apache Solr collection and the IKM File-Hive To Oracle may be used to integrate the Hive table data into Oracle Database. In this tutorial we shall integrate Solr data in Oracle Data Integrator (ODI) 11g. This tutorial has the following sections.
 
 

Setting the Environment

 
We shall use the same environment as in the tutorial on loading Solr data with OLH. We require an additional software, the Oracle Data Integrator 11g for this tutorial. Start the ODI with the following commands.
 
>cd /home/dvohra/dbhome_1/oracledi/client
>sh odi.sh
 
In the ODI select Operator>Connect to Repository to connect to the repository.
 
 
In Oracle Data Integrator Login click on OK to login to ODI.
 
 
Start the HDFS, the NameNode and DataNode, with the following commands.
 
/solr>hadoop namenode
/solr>hadoop datanode
 
Start Hive server with the following command.
 
/solr>hive --service hiveserver
 
For the directory with the example Solr instance start the Solr server with the following command.
 
/solr>cd /solr/solr-4.9.0/example/
/solr/solr-4.9.0/example/>java -jar start.jar
 
 

Creating the Hive and Oracle Database Tables

 
In this section we shall create the Hive table to load data from and the Oracle Database table to integrate data into. In an the OLH-Solr tutorial we created a Hive external table over a Solr data collection using a Hive storage handler for Solr. While OLH and ODI do not support direct loading/integration from the Hive external table defined over the Solr collection, OLH/ODI may be used to load/integrate from a table defined over the Hive external table. Next, create a Hive managed table into which we shall load data from the Hive external table wlslog, created in the OLH-Solr tutorial. Start the Hive shall with the following command.
 
>hive
 
Create a Hive managed table with the same columns as the Hive external table wlslog. Run the following Hive script in the Hive shell.
 
create table solr (
time_stamp STRING,
category STRING,
type STRING,
servername STRING,
code STRING,
msg STRING);
 
A Hive table gets created.
 
 
Run the following command in the Hive shell to load data from the wlslog table to the solr table. The wlslog table data gets loaded into the solr table.
 
 
To create the target database table OE.WLSLOG run the following command in SQL*Plus.
 
CREATE TABLE OE.wlslog (time_stamp VARCHAR2(4000), category VARCHAR2(4000), type VARCHAR2(4000), servername VARCHAR2(4000), code VARCHAR2(4000), msg VARCHAR2(4000));
 
The Oracle Database table gets created and its structure may be listed with the DESC command.
 
 
 

Creating the Physical Architecture for Apache Solr

 
The physical architecture for Solr comprises of a Hive technology data server and a physical schema defined over the default database in Hive. The Hive table is not defined in the physical architecture but is defined in the logical architecture. In ODI select Topology>Physical Architecture>Technologies>Hive.
 
 
Right-click on the Hive technology and select New Data Server.
 
 
In the Data Server Definition specify a Name; the Technology is pre-selected as Hive.
 
 
Select the Flexfields tab. De-select the checkbox in the Default column and specify thrift://localhost:10000 in the Value column.
 
 
Select the JDBC tab. Select the JDBC Driver as org.apache.hadoop.hive.jdbc.HiveDriver and JDBC Url as jdbc:hive://localhost:10000/default. Click on Test Connection to test the connection.
 
 
Click on OK for the Confirmation dialog indicating that the data will be saved.
 
 
Click on OK in the Information dialog indicating that the at least one physical schema should be registered for the data server.
 
 
Click on Test in the Test Connection dialog.
 
 
If the connection gets established a Successful Connection message gets displayed in a Information dialog. Click on OK.
 
 
Click on Save. A data server for Solr gets created. The data server is not directly defined over the Solr search-engine as a Solr based technology is not available in ODI. The data server is defined over a Hive database.
 
 
As prompted earlier we need to create a physical schema for the data server. Right-click on the Solr data server and select New Physical Schema.
 
 
In the Physical Schema Definition the Name is pre-specified as Solr.default. Specify default in the Schema (Schema) and Schema (Work Schema) fields.
 
 
Click on Save to save the physical schema definition.
 
 
An Information dialog prompts to specify a context for the physical schema. Click on OK. We shall specify a context in a subsequent section.
 
 
A physical schema Solr.default gets created.
 
 
 

Creating the Physical Architecture for Oracle Database

 
The physical architecture for Oracle Database also consists of a data server and a physical schema. Select Topology>Physical Architecture>Technologies>Oracle. Right-click on Oracle and select New Data Server.
 
 
In Data Server Definition specify a Name (OracleDatabase) and specify Instance as ORCL. The Technology is pre-selected as Oracle. Specify the User and Password for the schema in which the OE.WLSLOG target database table was created earlier.
 
 
Select the JDBC tab. Select the JDBC Driver as oracle.jdbc.OracleDriver and select the JDBC Url as jdbc:oracle:thin:@127.0.0.1:1521:ORCL. Click on Test Connection to test the JDBC connection.
 
 
Click on OK in a Confirmation dialog that indicates that the data will be saved.
 
 
Click on Ok in an Information dialog that prompts that at least one physical schema should be registered with the data server. We shall register a physical schema after we have created the data server.
 
 
Click on Test in the Test Connection dialog.
 
 
If a connection with Oracle Database gets established a Successful Connection message gets displayed. Click on OK.
 
 
Click on Save to complete the data server. A new data server Oracle Database gets created in the Oracle technology.
 
 
Next, create the physical schema that was prompted for earlier. Right-click on the OracleDatabase data server and select New Physical Schema.
 
 
In Physical Schema Definition specify the Schema (Schema) and Schema (Work Schema) as OE. The Name is pre-specified as OracleDatabase.OE.
 
 
Click on Save to save the physical schema configuration. Click on OK in the Information dialog that prompts that a context should be specified for the physical schema. We shall specify a context in a subsequent section.
 
 
A new physical schema gets created in the OracleDatabase data server.
 
  

Creating the Logical Architecture for Apache Solr

 
The logical architecture is the logical or abstract interface to the physical architecture and comprises of a logical schema. To create a logical schema for Solr select Topology>Logical Architecture>Technologies>Hive.
 
 
As a Solr specific technology is not provided in ODI we shall be creating a Hive technology based logical schema defined over a Hive technology based physical schema.
 
 
In Logical Schema Definition specify a Name. In the Physical Schemas column for the Global Context select Solr.default.
 
 
Click on Save. A new logical schema gets created in Hive technology.
 
 
 

Creating the Logical Architecture for Oracle Database

 
The logical schema is the abstract interface to the physical schema. In this section we shall define a logical schema for the Oracle Database physical schema defined earlier. A logical schema comprises of a Context; a context was prompted to be required when creating the physical schemas. Select the Topology>Logical Architecture>Technologies>Oracle node.
 
 
Right-click on Oracle and select New Logical Schema.
 
 
In Logical Schema Definition specify a Name. In the Physical Schemas column for the Global Context select the OracleDatabase.OE physical created, which was created earlier.
 
 
Click on Save. A new logical schema OracleDatabase gets created.
 
 
 

Creating the Model for Solr

 
While the logical schema for Solr defined the abstract interface for the Oracle Data Integrator from which to connect to the Solr connection we have still not defined a data model including the Hive table defined over the Solr data collection. Unless a data model is defined we won’t know which Hive table to load data from. In this section we shall define a model for Solr. But, first create a model folder, which is not required but recommended, especially if multiple models are to be created. Select Designer>Models and select New Model Folder from the drop-down list.
 
 
In Model Folder Definition specify a Name. Click on Save.
 
 
A model folder gets created. To add a model to the model folder right-click on the model folder and select New Model.
 
 
In Model Definition specify a Name (Solr) and select Technology as Hive. Select the Logical Schema Solr created earlier. Select Action Group as <Generic Action>. Click on Save to save the model definition.
 
 
Next, add a datastore to the model. Right-click on the Solr model and select New Datastore.
 
 
In Datastore Definition specify a Name and select Datastore Type as Table. In Resource Name specify solr, which is the Hive table created from data from Solr.
 
 
Select the Columns tab. Click on Add Column to add columns in the datastore.
 
 
Add columns TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE and MSG, which correspond to the columns in the Hive table default.solr. Also specify the column Type and Logical length for each column. Select Not Null checkbox for each column.
 
 
Click on Save to create the datastore. A datastore gets listed in the Solr model.
 
 
To display the data in the model right-click on the datastore and select View Data.
 
 
The data in the datastore gets displayed. It is the same data that was added to the Solr collection and later loaded into the Hive table solr from the Hive external table wlslog defined over the Solr collection.
 
 
 

Creating the Model for Oracle Database

 
We created a logical schema as an interface on which the ODI may access the Oracle Database but we have not yet defined a data model for the Oracle Database table into which to load the Solr data. To create a model for Oracle Database select New Model from the drop-down list in Designer>Models.
 
 
In Model Definition specify a Name (OracleDatabase) and select Technology as Oracle. Select Logical Schema as OracleDatabase. Select the Oracle Default Action Group.
 
 
Select the Reverse Engineer tab. Specify the Mask as WLSLOG and the Characters to Remove from Table Alias also as WLSLOG. Select Types of objects to reverse-engineer as Table. Select Standard and Context as Global. All of these are pre-selected by default except the Mask and Characters to Remove from Table Alias. Click on Reverse Engineer.
 
 
Click on Yes in the Conformation dialog that prompts that the data will be saved.
 
 
A new model WLSLOG gets reverse-engineered from Oracle Database table OE.WLSLOG.
 
 
The datastore should be initially empty as we have yet to integrate data from Solr. Right-click on the WLSLOG datastore and select View Data.
 
 
The empty table for the WLSLOG datastore gets displayed.
 
 

Creating an Integration Project

 
Next, create an integration project for the integration of the Hive-Solr data into Oracle Database. Select Designer>Projects. Select New Project from the drop-down list.
 
 
In Project Definition specify a Name.
 
 
Click on Save to add the integration project.
 
 
As we shall be using the IKM File-Hive to Oracle to integrate the Hive data into Oracle Database we need to import the IKM into the integration project. Right-click on Knowledge Modules>Integration and select Import Knowledge Modules.
 
 
In Import Knowledge Modules select the IKM File-Hive To Oracle and click on OK.
A Import Report gets displayed. Click on Close. The IKM File-Hive To Oracle gets imported into the integration project.
 
 

Creating an Integration Interface

 
An integration project still does not contain enough configuration to integrate the Hive-Solr data. We need to define an integration interface. Right-click on First Folder>Interfaces and select New Interface.
 
 
In Interface Definition specify a Name (Solr-OracleDatabase).
 
 
Select the Mapping tab shown in the previous illustration. Two regions get displayed; one for the dataset for the source datastores and another for the target datastore. Select the Solr datastore created from the Hive table Solr and drag and drop the datastore to the region for the source dataset.
 
 
The source dataset gets added. The diagram lists the columns in the source datastore.
 
 
Similarly, select the target datastore WLSLOG and drag and drop the datastore in the region for the target datastore.
 
 
Click on Yes in the Automap dialog.
 
 
The target datastore also gets added.
 
 
Click on the Quick-Edit tab. The Source and the target datastores are listed. For the Target Datastore WLSLOG table select Staging Area in the Execute On column for each of the columns.
 
 
Select the Flow tab. The default flow diagram shows the flow of data into a staging area in the target datastore. We need the staging area to be in the source datastore.
 
 
Select the Overview tab. Select the Staging Area Different From Target checkbox. Select the Hive:Solr model datastore.
 
 
Select the Flow tab. The flow diagram is shown to be modified with the Staging Area in the source datastore. The IKM Selector is selected as the IKM File-Hive To Oracle.
 
 
Click on Save to save the integration interface configuration. A new integration interface gets added.
 
 
 

Running the Interface

 
In this section we shall run the integration interface to integrate the Hive-Solr data into Oracle Database. Right-click on the Solr-OracleDatabase interface and select Execute.
 
 
Click on OK in the Execution dialog.
 
An Information dialog indicates that the Session has started. Click on OK.
 
Three MapReduce jobs run to integrate the Hive data into Oracle Database.
 
 
The OE.WLSLOG Oracle Database model datastore that was empty should have data integrated into it after the interface is run. Right-click on the WLSLOG datastore and select View Data.
 
 
The data integrated from Hive gets displayed.
 
 
The data integrated into Oracle Database may be selected using a SELECT query in SQL*Plus.
 
 
The 7 documents integrated from Solr via Hive get listed.
 
 
In this tutorial we integrated data from Solr search engine via a Hive table into Oracle Database 11g in Oracle Data Integrator 11g.