Search Configuration

Influent's plugin-style framework allows you to integrate optional third-party search platforms into your app to enable:

  • Indexing, error correction and configurable matching for text searches on entity and transaction data
  • Transactional pattern searches that, starting with an example account or accounts, find other accounts with similar activity histories

The example Influent apps use Apache Solr and Graph Query-by-Example (QuBE) to implement these capabilities, respectively. The following sections describe how to install and configure these platforms.

Apache Solr

Using Apache Solr, you can configure text searches on entity and transactions to adjust for misspellings and return results similar to the user's search criteria. The following sections describe how to install Solr and use the application to index your data.

NOTE: The following instructions describe how to create a single Solr core that indexes your entity and transaction data. If there is any overlap between your unique entity IDs and transaction IDs (e.g., both are sequential identifiers beginning at 1), you must create separate cores for your entity and transaction data.

Installing Solr

Create an instance of the Solr platform version 4.X.X (version 5.X is not currently supported by Influent). For more information on installing Solr, see the Tutorials and other documentation on the Apache Solr website.

If necessary, change the Solr home directory as described in the Solr Install wiki. This directory stores your Solr cores, which describe your Solr configuration and data schema. The default Solr home directory is the example/solr/ folder in your installation directory.

Creating a New Core Directory

In Solr, the core directory stores config and schema files that detail the fields in your transaction and/or entity tables.

NOTE: The following steps must be performed before you create a new core within the Solr Admin console.

To create a new core directory
  1. Browse to the example/solr/ folder in your Solr installation directory.
  2. Create a subfolder for the new Solr core that will index your transaction and/or entity data. Name this folder after your data source or simply call it influent.

    NOTE: We will use the latter for the purposes of this example.

  3. Create three subfolders within the influent/ folder:
    • Copy the following folders from example/example-DIH/solr/db/conf/:
      • conf/: Contains Solr Data Import Handler config files useful for importing entity data.
      • lib/: Contains a version-specific HSQLDB JDBC driver. If your database was created with a different version or is a different format, copy the appropriate JDBC driver into this folder.
    • data/: Empty folder.

Defining the Solr Schema

The schema.xml file in the conf/ folder specifies the schema of the Solr table into which your transaction and/or entity details will be imported.

To edit the schema
  1. Open the schema.xml file in the conf/ folder you copied in the previous step.
  2. Add each of the transaction (from your raw data) and/or entity (from EntitySummary) columns you want to index in the following format.

    For more information about the schema.xml format, see the Solr SchemaXml wiki page.

    <field name="lenders_name" type="text_general" indexed="true"
     stored="true" required="true" multiValued="true"/>
    

    Where:

    Name Required? Description
    name Yes Unique alphanumeric string representing the field name. Cannot start with a digit.
    type Yes Indicates the field type (e.g., string, double or date). Use only values defined by the <fieldType> elements in the schema.xml file.

    For numeric fields that should be searchable using range queries (e.g., find all values from 0 to 10), the field type should be specified as sint, slong, sfloat or sdouble.

    NOTE: Depending on your version of Solr, the available <fieldType> elements in your schema.xml file may vary.

    Review the different elements to determine which types pass their values through analyzers and filters to enable synonym matching, stopwords and stemming. The latest version of Solr uses text_general for this purpose.

    indexed No Indicates whether the field should be indexed so it can be searched or sorted (true/false).
    stored No Indicates whether the field can be retrieved (true/false).
    required No Indicates whether the field is required (true/false). Errors will be generated if the data you import does not contain values for any required fields.

    NOTE: Any fields marked as required will be expected for all transaction and entity types you import. Therefore, this attribute should only be used for fields that all types have in common.

    multiValued No Indicates whether the field can contain multiple values for a single entity (true/false).

    NOTE: Search result sorting is not currently supported for multivalued fields.

  3. Remove or comment out any <field> elements defined in the source file that you do not need.

  4. Save the schema.xml file.

Choosing Fields to Import into Solr

The db-data-config.xml file in the conf/ folder defines how to select the transaction (from your raw data) and/or entity (from EntitySummary) fields that can be imported into Solr from your database.

For complete details on the data-config schema, see the Solr DataImportHandler wiki page.

To specify the fields you want to be able to import into Solr
  1. Edit the attributes of the <dataSource> element to specify the details of the data you want to import:
    Attribute Description
    driver Classpath of the JDBC driver for the database type (e.g., org.hsqldb.jdbcDriver)
    url Location of the database
    user Username required to connect to the database
    password Password required to connect to the database
  2. Add one or more <entity> elements to the <document> element. Entities generally correspond to the different transaction and account types in Influent, each of which can have its own unique set of details. For each type, you should define the following attributes:
    Attribute Description
    name Unique name that describes the transaction or entity. Typically the name that will appear in Influent to describe the transaction or entity type.
    transformer List of the transformers used to modify fields (if necessary). For more information on the available transformers, see the Transformers section of the Apache Solr DataImportHandler wiki page.
    query SQL string used to retrieve information from the database containing the entity attributes you want to import

    Depending on your data source, you may need to specify different account types. For example, the Kiva application supports one transaction type (financial) three different entity types (lenders, borrowers and partners).

  3. Use the query attribute at the <entity> level to select all the columns in your raw data and/or EntitySummary tables.

  4. Alternatively, add a set of <field> elements to each <entity> element to select individual transaction and/or entity details. Fields represent the entity attributes on which users of your Influent project can search. Each field can have its own unique set of attributes. Define the following attributes for each field you add:
    Attribute Description
    Column Table column from which the field values should be imported
    Name If you want to rename or transform a field, use this value enter a new name
    Transformer-Specific Attributes If you are using a transformer to modify an existing field, make sure to call the appropriate transformer-specific attribute (e.g., Template or Clob).

    For more information on transformer attributes, see the Transformers section of the Apache Solr DataImportHandler wiki page.

    NOTE: You must create individual <field> elements for fields that you want to rename or transform.

  5. You can also define child <entity> elements to retrieve and join data from other tables not invoked in the parent <entity>.

  6. Save the db-data-config.xml file.

For example, the db-data-config.xml file for the Bitcoin application has:

  • One document (for transactions and accounts)
  • Two entities:
    • account, which represents Bitcoin accounts
    • financial, which represents transactions between Bitcoin accounts

Adding a New Core

Once you have finished editing the db-data-config.xml file, you can add the new core in the Solr Admin console.

To add the new core
  1. Access the Solr Admin console and select Core Admin from the navigation menu.
  2. On the Core Admin page, click Add Core.
  3. On the New Core dialog:
    • Enter a unique name for the core. We recommended you use the same name as the folder you created in your installation directory.
    • In the instanceDir field, enter the name of the folder you created in your installation directory.
  4. Click Add Core.

If Solr is able to successfully add the new core, the Core Admin page is refreshed to display its properties. Otherwise, an error message is displayed within the New Core dialog.

To address any errors
  1. Select Logging from the navigation menu to access details about the errors. Common errors include:
    • Missing dependencies in the conf/ directory. Review the schema.xml file in this folder to make sure you have copies of all the .txt and .xml files that it references, along with any other dependencies.
    • Incorrect file paths for the <lib> directives (plugins). As described above, edit the solrconfig.xml file to specify the correct path (typically the dist/ folder of your root Solr directory).
  2. Perform the appropriate corrective actions.
  3. Delete the core.properties file in the instanceDir/ folder before attempting to add the core again.

Importing Your Data

Once you have configured the db-data-config.xml file and created the new core in the Solr Admin console, you can begin to import your data.

NOTE: Depending on the size of your data, you may need to allocate more memory to your Solr instance before attempting your import. For more information on memory considerations, see the Solr Performance Factors wiki.

To import your data
  1. Select your new core in the Solr Admin console navigation menu, then click Dataimport in the submenu.
  2. On the dataimport page:
    • Set the Command to full-import.
    • Select the Clean, Commit and Optimize options.
    • Use the Entity drop-down list to specify the entity type you want to import.
  3. Click Execute.
To verify that your data was imported successfully
  • Click Query under the core submenu in the navigation menu and perform several test queries on your data.

Deploying Solr

For information on deploying your new core and Solr in a servlet container, see the Solr Tomcat wiki.

Graph QuBE

Using the Graph Query-by-Example (QuBE) tool created by MIT Lincoln Labs (MIT-LL) in collaboration with Giant Oak, you can enable transactional pattern searching. The following sections describe how to install and configure Graph QuBE for Influent.

Installation

The process of installing and running the Graph QuBE tool is described in detail in the MIT-LL Graph QuBE User Manual. In general, this process requires you to:

  1. Download and build the Graph QuBE source code.
  2. Index your raw transaction data with Graph QuBE, which creates an H2 database with derived features for each of the transactions in your source data.
  3. Start the Graph QuBE server and connect it to the H2 database. The server exposes entity and pattern search capabilities through REST queries.

NOTE: New implementations of GraphQuBE require custom dataset bindings.

Next Steps

To connect your databases and indexed search data to Influent, see the Connecting Your Data to Influent topic.