Indexing Oracle Database Table Data in Apache Solr

Oracle Community

Indexing Oracle Database Table Data in Apache Solr

By Deepak Vohra

 
Most applications store their related data in a relational database such as the Oracle Database. While the Oracle Database data may be accessed with a Oracle Database utility such as SQL*Plus using SQL statements such as the SELECT statement, the Apache Solr search engine provides a search platform for efficient full-text search and indexing. For the Oracle Database data to be used with Solr the database data has to be imported into Solr. The DataImportHandler is a configuration based Solr utility to import and index data from a relational database such as Oracle Database into Solr. Solr documents may be built by aggregating data from multiple columns and tables. DataImportHandler supports full and delta (partial) imports. In this tutorial we shall import and index Oracle Database data into Apache Solr. Subsequently the data imported into Solr may be queried using Solr Query parser. This tutorial has the following sections.
 

Setting the Environment

 
We need to download and install the following software:
 
  1. Oracle Database 11g or 12c from http://www.oracle.com/technetwork/database/enterprise-edition/downloads/index-092322.html.
  2. Apache Solr from http://lucene.apache.org/solr/downloads.html.
  3. Java 7 from http://www.oracle.com/technetwork/java/javase/downloads/jre7-downloads-1880261.html.
 
We have used Windows 7 OS in this tutorial. Download the solr-4.9.0.zip file for the Windows OS. For Linux download the .tgz file instead. Extract the solr-4.9.0.zip file to a directory, for example C:/Solr. The C:/Solr/solr-4.9.0/example directory contains an instance of the Jetty servlet container used to run Solr using an example configuration. To start Solr server run the following command from the C:/Solr/solr-4.9.0/example directory.
 
C:/Solr/solr-4.9.0/example >java -jar start.jar
 
The Solr Jetty server gets started.
 
 
The Solr search engine gets started and a Solr collection called collection1 gets created.
 
 

Creating Oracle Database Table

 
In this section we shall create an Oracle Database table and add the following data to the table to be later imported into Solr.
 
Apr-8-2014-7:06:16-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to STANDBY
Apr-8-2014-7:06:17-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to STARTING
Apr-8-2014-7:06:19-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to ADMIN
Apr-8-2014-7:06:20-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to RESUMING
Apr-8-2014-7:06:22-PM-PDT Notice WebLogicServer AdminServer BEA-000331 Started WebLogic AdminServer
Apr-8-2014-7:06:23-PM-PDT Notice WebLogicServer AdminServer BEA-000365 Server state changed to RUNNING
Apr-8-2014-7:06:24-PM-PDT Notice WebLogicServer AdminServer BEA-000360 Server started in RUNNING mode
 
Run the following command in SQL*Plus to create Oracle Database table OE.WLSLOG.
 
CREATE TABLE OE.WLSLOG (ID VARCHAR2(255) PRIMARY KEY, TIME_STAMP VARCHAR2(255), CATEGORY VARCHAR2(255), TYPE VARCHAR2(255), SERVERNAME VARCHAR2(255), CODE VARCHAR2(255), MSG VARCHAR2(255));
 
Run the following SQL statements to add data to the OE.WLSLOG table.
 
INSERT INTO OE.WLSLOG (ID, TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG) values ('logentry1','Apr-8-2014-7:06:16-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to STANDBY');
INSERT INTO OE.WLSLOG (ID, TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG) values ('logentry2','Apr-8-2014-7:06:17-PM-PDT','Notice','WebLogicServer','AdminServer','BEA-000365','Server state changed to STARTING');
INSERT INTO OE.WLSLOG (ID,TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG) values ('logentry3','Apr-8-2014-7:06:18-PM-PDT', 'Notice', 'WebLogicServer', 'AdminServer', 'BEA-000365', 'Server state changed to ADMIN');
INSERT INTO OE.WLSLOG (ID,TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG) values ('logentry4','Apr-8-2014-7:06:19-PM-PDT', 'Notice', 'WebLogicServer', 'AdminServer', 'BEA-000365', 'Server state changed to RESUMING');
INSERT INTO OE.WLSLOG (ID,TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG) values ('logentry5','Apr-8-2014-7:06:20-PM-PDT', 'Notice', 'WebLogicServer', 'AdminServer', 'BEA-000361', 'Started WebLogic AdminServer');
INSERT INTO OE.WLSLOG (ID,TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG) values ('logentry6','Apr-8-2014-7:06:21-PM-PDT', 'Notice', 'WebLogicServer', 'AdminServer', 'BEA-000365', 'Server state changed to RUNNING');
INSERT INTO OE.WLSLOG (ID,TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG) values ('logentry7','Apr-8-2014-7:06:22-PM-PDT', 'Notice', 'WebLogicServer', 'AdminServer', 'BEA-000360', 'Server started in RUNNING mode');
 
The Oracle Database table OE.WLSLOG gets created and data gets added to the table.
 
 
Next, we shall configure Apache Solr to connect to Oracle Database and import the data into Solr.

Configuring the Oracle Database Datasource in Solr

 
In this section we shall configure the following:
 
  1. Copy the Oracle Database JDBC driver into the Solr classpath.
  2. Configure the Solr schema to add fields used in the document created in Solr from data imported from Oracle Database.
  3. Configure the Oracle Database data source in a Solr configuration file data-config.xml. The data source configuration includes the connection information such as the Oracle Database JDBC driver class, the connection URL, and username and password. The data-config.xml file also specifies the SQL query used to fetch data in full import and delta import. The mapping of the Oracle Database table columns to Solr document fields is also configured in data-config.xml.
  4. Configure the data-config.xml file in the solrconfig.xml file.
  5. Configure a request handler for the DataImportHandler in the solrconfig.xml file.
  6. Add the directory containing the DataImportHandler jar files to the classpath of Solr.
 
Download the Oracle Database JDBC driver JAR file ojdbc6.jar from http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html. Copy the ojdbc6.jar to the C:\Solr\solr-4.9.0\dist directory.
 
As discussed in the introductory tutorial on Solr the field names to be used in a Solr document must be configured in the schema.xml file, which is in the C:\Solr\solr-4.9.0\example\solr\collection1\conf directory. Add <field/> elements for fields time_stamp, category, type, servername, code, and msg. The schema.xml must not contain duplicate field configurations. If any of these fields has been previously configured or is configured by default remove the duplicate configuration. The category field name is also configured by default in schema.xml. Remove the duplicate configuration for the category field. Add the following <field/> elements to schema.xml.
 
<field name="time_stamp" type="string" indexed="true" stored="true" multiValued="false" />
<field name="category" type="string" indexed="true" stored="true" multiValued="false" />
<field name="type" type="string" indexed="true" stored="true" multiValued="false" />
<field name="servername" type="string" indexed="true" stored="true" multiValued="false" />
<field name="code" type="string" indexed="true" stored="true" multiValued="false" />
<field name="msg" type="string" indexed="true" stored="true" multiValued="false" />
 
Next, create a data-config.xml in the C:/Solr/solr-4.9.0/example/solr/collection1/conf directory. The root element of data-config.xml is <dataConfig/>. The datasource may be configured in the data-config.xml file or the solrconfig.xml file. We shall be configuring the datasource in the data-config.xml file. Add a <dataSource/> tag within the root element <dataConfig/>. The type attribute in <dataSource/> tag specifies the type of the datasource. For a JDBC datasource specify type attribute as ‘JdbcDataSource’, which is also the default for type attribute. The name attribute may be specified and is required if using multiple datasources. The other attributes supported for a JdbcDataSource are the following.
 
Attribute
Description
Required/Optional
driver
The JDBC driver class
Required
url
The connection URL
Required
user
The user name
Optional
password
The password
Optional
jndiName
JNDI name for a pre-configured data source
Optional
batchSize
Batch size used in JDBC connection
Optional
convertType
Converts the type of the data read to the type of the target Solr data type (true/false); false by default.
Optional
autoCommit
Specifies auto commit mode (true/false); false by default.
Optional
readOnly
Enables read only mode. (true/false); false by default.
Optional
transactionIsolation
Specifies the transaction isolation level.
Optional
 
Specify the following dataSource element attributes for Oracle Database datasource.
 
Attribute
Value
name
jdbc (The value may vary)
driver
oracle.jdbc.OracleDriver
url
jdbc:oracle:thin:@127.0.0.1:1521/ORCL (The value may vary)
user
OE (The value may vary)
password
OE (The value may vary)
 
A Solr document to be created is configured using the <document/> element. A <document/> element may specify one or more <entity/> elements with each <entity/> element representing a database table or view. The only required attribute in <entity/> element is name. The processor attribute in <entity/> is required for non RDBMS datasource. The default processor being the RDBMS datasource processor SqlEntityProcessor. The SqlEntityProcessor processor requires the query attribute to be specified. The query attribute specifies the SQL query statement used to query the database. Each entity may specify one or fields to be created in the document. Each field is mapped from a result set column with the field name being the same as the column name, which is specified with the column attribute in the <field/> element. The name attribute in field element may be specified to specify the field name. We also configured the fields in the solrconfig.xml. Therefore we don’t need to configure the field type or other attributes such as whether the field is indexed, stored, and multi valued, all of which are inferred from solrconfig.xml. In the data-config.xml specify the column/field mapping for the Oracle Database table OE.wlslog columns ID, TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, and MSG. The data-config.xml is listed below.
 
<dataConfig>
<dataSource name="jdbc" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@127.0.0.1:1521/ORCL" user="OE" password="OE"/>
<document>
<entity name="wlslog" query="select * from wlslog">
<field column="ID" name="id" />
<field column="TIME_STAMP" name="time_stamp" />
<field column="CATEGORY" name="category" />
<field column="TYPE" name="type" />
<field column="SERVERNAME" name="servername" />
<field column="CODE" name="code" />
<field column="MSG" name="msg" />
</entity>
</document>
</dataConfig>
 
We need to configure a request handler for data import in solrconfig.xml (
C:\Solr\solr-4.9.0\example\solr\collection1\conf directory). A request handler is configured using the <requestHandler/> element and specifies the url mapping used to invoke the request handler, and the request handler class. The request handler class for the data import request handler is org.apache.solr.handler.dataimport.DataImportHandler. In the solrconfig.xml specify the following <requestHandler/> element in which the url mapping used to invoke the handler is specifies with the name attribute as /dataimport and the request handler class is specified using the class attribute. The data-config.xml file relative path is also configured in the solrconfig.xml file.
 
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
 
The data import request handler class is in the solr-dataimporthandler-4.9.0.jar, which is in the C:/Solr/solr-4.9.0/dist directory. We had also copied the ojdbc6.jar to the C:/Solr/solr-4.9.0/dist directory. We need to add these jars to the classpath of Solr. Add the following <lib/> element in solrconfig.xml for the Solr classloader to load the jars in the C:/Solr/solr-4.9.0/dist directory.
 
<lib dir="../../../dist/" regex=".*\.jar" />
 
The Solr server is required to be restarted after any of the configuration files schema.xml, data-config.xml and solrconfig.xml is modified for the new configuration to take effect.
 

Running the Data Import in Solr

 
In this section we shall run the data import from Oracle Database to Solr. When the data is imported the data is indexed in Solr and becomes available for querying. Login to the Solr Admin Console with the url http://localhost:8983/solr/. Select the collection1 collection. Select Dataimport from the list of options. The /dataimport gets added to the url, which invokes the DataImportHandler request handler class. Incoming queries are dispatched to a request handler by name based on the path specified in the request. With /dataimport in the url path the request is dispatched to the /dataimport request handler, which is configured in the solrconfig.xml.
 
 
In the Command selection list select full-import for a full import of the database table using the query specified in the data-config.xml. The other option is delta-import for an incremental import for new insert/update made on a database table based on the timestamp for when the previous query was run on the database table. Click on Execute to start the import and indexing.
 
A Indexing message gets displayed to indicate the status of the import.
 
 
The indexing status does not get updated automatically. To update the Indexing status click on the Refresh Status button. If the indexing has completed the Indexing completed message gets displayed.
 

Querying Imported Data

 
Having index the Oracle Database data in Solr the indexed data may be queried. Click on Query option for the collection1. With the default query *:* (q field), which fetches all the documents click on the Query button.
 
 
The 7 documents indexed in Solr from Oracle Database get listed as indicated by the numFound field in the response. A Solr document was indexed for each row of data in Oracle Database table OE.WLSLOG.
 
 
Each document has a _version_ field for the update log associated with it.
 
 
In this tutorial we imported and indexed data from Oracle Database into Solr. Subsequently we queried the indexed data in Solr Admin Console.
22587 3 /
Follow / 13 May 2015 at 8:47am

Hi,

I have tried above post .it created only for id in index.but it not creating index for other field.its giveing log error likeThe field :middleName present in DataConfig does not have a counterpart in Solr Schema.please tell me what issue .

Thanks and regards,

Kannabiran

Follow / 19 Nov 2016 at 1:16am

Thank you.

Follow / 6 Apr 2017 at 7:39pm

Re kannabiran: The  field names to be used in a Solr document must be configured in the schema.xml file. Refer section Configuring the Oracle Database Datasource in Solr.