Getting Started with Apache Solr

Oracle Community

Getting Started with Apache Solr

By Deepak Vohra

Apache Solr is a Apache Lucene based enterprise search platform providing features such as full-text search, near real-time indexing, and database integration. Apache Solr may be integrated with Oracle Database using a data import handler. In this tutorial we introduce Apache Solr. This tutorial has the following sections.

Setting the Environment

Apache Solr runs as a full-text search server within a servlet container. Any of the commonly used servlet containers such as WebLogic Server, WebSphere Application Server, JBoss Application Server, GlassFish Server, and Apache Tomcat may be used with the default being Jetty, which is included with the Solr installation. We have used Oracle Linux 6 for this tutorial. The following software is required for running Solr.
-The Apache Solr binary distribution
-Java 7 or later.
Download Java 7 from Extract the Java 7 tar file.
>tar zxvf jdk-7u55-linux-i586.gz
Set the JAVA_HOME environment variable in the bash file.
>vi ~/.bashrc
export JAVA_HOME=/json/jdk1.7.0_55
Create a directory for Solr and set its permissions to global (777).
>mkdir /solr
>chmod -R 777 /solr
Download the Apache Solr binary distribution solr-4.9.0.tgz from
Untar the Solr binary distribution tgz file.
>cd /solr
>tar -xvf solr-4.9.0.tgz
Change directory (cd) to the /solr/solr-4.9.0/example/ directory and start the Apache Solr search server with the following command.
>cd /solr/solr-4.9.0/example/
>java -jar start.jar
The Jetty server gets started and the container configuration gets loaded from solr.xml configuration file.
A new Searcher gets registered.
Login to the Solr Admin console with the URL http://localhost:8983/solr/. The Admin Dashoboard displays information such as the version, JVM, System properties such as Physical Memory and Swap Space.
In the Core Selector select the collection1 collection. The Overview tab lists the Statistics about the Solr server and data collection.

Configuring the Solr Schema

The Solr configuration is specified in the schema.xml file in the /solr/solr-4.9.0/example/solr/conf directory. The schema configuration includes the field names in the documents indexed in Solr and the different types of the fields. Every document that is to be indexed for search in Apache Solr must specify its fields in the schema.xml file. Each document indexed for search must include the id field by default. The following table lists the elements found in the schema.xml file.
Specifies a field that may be used in a document indexed in Solr. The field name must not be duplicate.
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
The unique key field used to determine the uniqueness of a document. Required by default.
Specifies a field type used in a field name.
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
The <field/> element provides several attributes with some of the common attributes listed in the following table.
Field name.
Field default value.
A boolean (true/false) indicating if the field is to be indexed. Only an indexed field is searchable and sortable.
A boolean (true/false) indicating if the field value is retrievable during search.
A boolean (true/false) indicating if the field may have multiple values in a document.
We shall be storing documents with the fields time_stamp, category, type, servername, code and msg. We need to specify these elements in the schema.xml file. Open the schema.xml file in the vi editor.
>cd /solr/solr-4.9.0/example/solr/collection1/conf
>vi schema.xml
Add the <field/> elements with field name time_stamp, category, type, servername, code and msg. A section of the schema.xml file is listed.
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<field name="time_stamp" type="string" indexed="true" stored="true" multiValued="false" />
<field name="category" type="string" indexed="true" stored="true" multiValued="false" />
<field name="type" type="string" indexed="true" stored="true" multiValued="false" />
<field name="servername" type="string" indexed="true" stored="true" multiValued="false" />
<field name="code" type="string" indexed="true" stored="true" multiValued="false" />
<field name="msg" type="string" indexed="true" stored="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
The Solr server must be re-started after modifying the schema.xml file.

Indexing a Document in Solr

Solr supports indexing structured documents in formats CSV, XML, JSON and binary. In this section we shall index the following JSON document.
{"time_stamp":"Apr-8-2014-7:06:16-PM-PDT","category": "Notice","type":"WebLogicServer","servername": "AdminServer","code":"BEA-000365","msg": "Server state changed to STANDBY" }
{"time_stamp":"Apr-8-2014-7:06:17-PM-PDT","category": "Notice","type":"WebLogicServer","servername": "AdminServer","code":"BEA-000365","msg": "Server state changed to STARTING" }
{"time_stamp":"Apr-8-2014-7:06:18-PM-PDT","category": "Notice","type":"WebLogicServer","servername": "AdminServer","code":"BEA-000360","msg": "Server started in RUNNING mode" }
We shall be indexing the document as an XML document. Convert the preceding JSON document to the following XML form. The root element is required to be <add/> and each document must be enclosed in the <doc/> element. The id field is required.
<field name="id">wlslog1</field>
<field name="time_stamp">Apr-8-2014-7:06:16-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000365</field>
<field name="msg">Server state changed to STANDBY</field>
<field name="id">wlslog2</field>
<field name="time_stamp">Apr-8-2014-7:06:17-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000365</field>
<field name="msg">Server state changed to STARTING</field>
<field name="id">wlslog3</field>
<field name="time_stamp">Apr-8-2014-7:06:18-PM-PDT</field>
<field name="category">Notice</field>
<field name="type">WebLogicServer</field>
<field name="servername">AdminServer</field>
<field name="code">BEA-000360</field>
<field name="msg">Server started in RUNNING mode</field>
In the Solr Admin console select the Documents tab for collection1. In the Request-Handler field select /update. The different options available in Document Type are CSV, Document Builder, File Upload, JSON. Solr Command (raw XML or JSON) and XML. Select the option Solr Command (raw XML or JSON). In the Document (s) field add the XML document listed previously. Click on Submit Document.
If the document (s) get added to Solr a Status: success message is returned.
The documents added are not available for search till the Solr server is restarted.
An alternative method for indexing is also available using a Java utility for posting the documents from the command line. Store the XML document as wlslog.xml in the directory
/solr/solr-4.9.0/example/exampledocs. Stop the Solr server. Cd to the /solr/solr-4.9.0/example/exampledocs directory and run the Java utility to post the document to Solr.
>cd /solr/solr-4.9.0/example/exampledocs
>java -jar post.jar wlslog.xml
The wlslog.xml file gets indexed.
If the id field is not specified in a document the following error message is generated.
Restart the server after indexing for the documents to become available for search.

Searching Document/s in Solr

In this section we shall query the documents indexed in the previous section. Specify the URL http://localhost:8983/solr/#/collection1/query in a browser. Specify the Request-Handler as /select. In the q field specify the query. The default query of *:* returns all the documents and all the fields.
The wt selection specifies the format of the returned document. Some of the supported response document formats are JSON, XML and CSV, the default being JSON. Click on Execute Query to run the query.
The result of the query gets returned and displayed as JSON. The numFound field in the response JSON object indicates the number of documents found. The documents are returned as a JSON array docs.
All the fields in each document are returned by default.
As an example search for documents with the type field as WebLogicServer. Specify the query as type:WebLogicServer. As each of the documents contains the type field with value WebLogicServer all the the documents are returned.
As an example of a query in which not all documents are returned specify the query as code:BEA-000360. Only one document is returned in the query response.
In this tutorial we introduced the Apache Solr search engine. We indexed three documents and subsequently queried the documents. For the Solr Query Parser syntax refer
3203 1 /
Follow / 11 Jul 2016 at 4:52pm

Hi, for indexing solr, how did you convert the above json to solr xml?