Jon Morisi's SQL Blog: Oracle CLOB to Solr via the Data Import Handler

no copyright infringement is intended

In this example I build off of the previous example, by adding an Oracle CLOB field.

All in all, this is pretty simple and requires just a few updates:

Update the Oracle table to have a CLOB column
Update the collections data config file to use the ClobTransformer
Update the managed-schema to use a TextField data type

Update the Oracle table to have a CLOB column

CREATE TABLE "[schema]"."SOLR_TEST"
(
"ID" NUMBER generated always AS identity,
"TIME_STAMP" timestamp (6) DEFAULT sys_extract_utc(systimestamp),
"CATEGORY"   VARCHAR2(255 byte),
"TYPE"       VARCHAR2(255 byte),
"SERVERNAME" VARCHAR2(255 byte),
"CODE"       VARCHAR2(255 byte),
"MSG"        CLOB
);

Update the collections data config file to use the ClobTransformer

<dataconfig>
<datasource driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:...[put your connections string here] />
<document>
<entity name = "SOLR_TEST" transformer="ClobTransformer" "query="SELECT ID, to_char(TIME_STAMP,'YYYY-MM-DD"T"HH24:MI:SS"Z"') AS TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG FROM SOLR_TEST" deltaImportQuery="SELECT ID, to_char(TIME_STAMP,'YYYY-MM-DD"T"HH24:MI:SS"Z"') AS TIME_STAMP, CATEGORY, TYPE, SERVERNAME, CODE, MSG FROM SOLR_TEST WHERE ID = ${dataimporter.delta.ID}" deltaQuery="SELECT ID

FROM SOLR_TEST WHERE TIME_STAMP > TO_TIMESTAMP('${dataimporter.last_index_time}', 'YYYY-MM-DD HH24:MI:SS')">

<field column = "ID" name="id" />
<field column = "TIME_STAMP" name="time_stamp" />
<field column = "CATEGORY" name="category" />
<field column = "TYPE" name="type" />
<field column = "SERVERNAME" name="servername" />
<field column = "CODE" name="code" />
<field column = "MSG" name="msg" clob="true" sourceColName="MSG" />
</entity>
</document>

</dataconfig>

Update the managed-schema to use a TextField data type

...
<field name="msg" type="text_en" indexed="true" stored="true" multiValued="false" />
...
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.EnglishPossessiveFilterFactory" /> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" /> <filter class="solr.PorterStemFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.EnglishPossessiveFilterFactory" /> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" /> <filter class="solr.PorterStemFilterFactory" /> </analyzer> </fieldType>
...

Full disclosure, I built off of the "example-data-driven-schema" and the field type comes directly from there.

That's it! You can now run a full import, make some additions, and run a delta. The most difficult part of this process, for me, was getting 4000+ characters inserted into the clob field in the Oracle table. Here's what that looks like:

DECLARE
  my_clob CLOB;
BEGIN
  my_clob := '4000 plus characters...';
  INSERT INTO [SCHEMA].solr_test
              (
                          category,
                          TYPE,
                          servername,
                          code,
                          msg
              )
              VALUES
              (
                          'Manual',
                          'SQL Insert',
                          'hostname',
                          'BEA-100000',
                          my_clob
              );

END;

Jon Morisi's SQL Blog

Pages

10/11/2018

Oracle CLOB to Solr via the Data Import Handler

Update the Oracle table to have a CLOB column

Update the collections data config file to use the ClobTransformer

Update the managed-schema to use a TextField data type

1 comment: