Clustering Configuration

To organize and simplify the visual representation of branched transaction flow, Influent can dynamically group together similar entities into clusters. Clusters can contain individual accounts and/or other clusters to form a hierarchy.

The following sections describe aspects of Influent's hierarchical clustering algorithm that can be configured using the clusterer.config file in the src/main/resources/ folder of your project directory.

NOTE: For information on advanced clustering properties and concepts, see the Clustering Settings reference topic.

Hierarchical Grouping

The entity.clusterer.clusterfields value controls how Influent dynamically groups entities into hierarchical clusters. You can configure Influent to group entities on any of the fields in your entity data.

If you specify multiple fields, Influent considers each one in order. If matching values are found, the entities are grouped into a cluster and examined to determine whether further grouping is required. Influent then moves on the next field in the hierarchy.

To specify the fields on which to cluster your entities
  • Edit the entity.clusterer.clusterfields value to pass in FL_PropertyTag names and/or field names from your entity data in the following format. Each entry should be separated by a comma.

    <FL_PropertyTag | FIELDNAME>:<FIELD TYPE>
    

    Where valid FL_PropertyTag names include:

    ID, TYPE, NAME, LABEL, STAT, TEXT, STATUS, GEO, DATE, AMOUNT, COUNT, USD, 
    DURATION and TOPIC
    

    And valid Field Types include:

    Field Type Group By
    geo Hierarchical geographic location:
    1. Continent
    2. Region (e.g., South Asia or East Asia)
    3. Country
    4. Latitude/longitude (within a threshold distance)
    categorical Exact categorical value. Only apply to fields with string values.
    label Label value:
    1. Alphabetical
    2. Fuzzy string clustering
    numeric:K Numeric values in bins (e.g., 0-99, 100-199, etc.) where:

    K is an optional value (e.g., 50) that specifies the range of values in each bin. Defaults to 100.

    topic:K Topic tags in distribution property bins where:

    K is an optional value between 0 (exactly the same) and 1 (completely different) that specifies the tolerance for bins. Defaults to 0.5.

Example

In the Kiva application:

entity.clusterer.clusterfields = TYPE:categorical,GEO:geo,LABEL:label
  1. TYPE:categorical - Clusters entities based on a categorical grouping of the TYPE FL_PropertyTag. In the Kiva application, this property tag is mapped to an account type field which supports three values:
    • Lenders
    • Partners
    • Loans (borrowers)
  2. GEO:geo - Clusters entities based on a geographical (geo) grouping of the GEO FL_PropertyTag. In the Kiva application, this property tag is mapped to a derived field which support hierarchical geographical data:
    1. Continent
    2. Region
    3. Country
    4. Similar latitude/longitude (within a threshold distance), if available
  3. LABEL:label: Clusters entities based on a label grouping of the LABEL FL_PropertyTag. In the Kiva application, this property tag is mapped to an account name field.

Maximum Cluster Size

You can control the maximum number of entities that clusters can contain. Influent uses this setting to subdivide clusters until they and any of their subclusters contain no more than the configured value of entities.

By default, this property is set to 6 entities.

To specify the maximum number of entities a cluster can contain
  • Set the entity.clusterer.maxclustersize property to the desired value.

Distribution Summary Properties

In the Influent interface, each cluster is represented as a stack of cards. Each stack features a group of summary icons that indicate the distribution of accounts in the cluster that share certain properties. The summary icons also appear in the Details Pane for a selected cluster.

Cluster Stack Summary Distribution Icons

To specify the summary icons you want to appear on each cluster
  • Edit the entity.clusterer.clusterproperties value to pass in FL_PropertyTag names and/or field names from your entity data in the following format. Each entry should be separated by a comma.

    <FL_PropertyTag | FIELDNAME>:<DESCRIPTIVE NAME>:<true | false>
    

    Where:

    Component Description
    <FL_PropertyTag | FIELDNAME> This property currently supports only:
    • SingleRange fields of type:
      • string
      • FL_GeoData
    • DistributionRange fields of type:
      • FL_TOPIC
    <DESCRIPTIVE NAME> Brief alphanumeric string to display in the Cluster Member Summary section of the Details Pane.
    <true | false> Optional component that indicates whether to normalize the distribution to sum to 1.0. Defaults to false.

Example

In the Kiva application:

entity.clusterer.clusterfields = TYPE:Kiva Account Type,GEO:Location,
STATUS:Status,WARNING:Warnings

For each cluster, Kiva displays the following distribution summary information:

Property Summarizes Label
TYPE:Kiva Account Type Accounts belonging to each represented account type:
  • Lender
  • Partner
  • Loan (borrower)
Kiva Account Type
GEO:Location Accounts located in each represented country Location
STATUS:Status Accounts that are closed or have defaulted Status
WARNING:Warnings Accounts associated with warnings (i.e., those with high delinquency or default rates) Warnings

Next Steps

For information on configuring the Influent server, see the Server Configuration topic.

For more information on advanced clustering settings, see the Clustering Settings reference topic.