The ClusterSummary tables summarize entities with a large number of associated entities (e.g., an account owner with a large number of accounts). While the DataViewTables scripts in influent-spi/src/main/dataviews/ will create the ClusterSummary tables, you must choose whether to populate them.
For information on adding aggregate entity groups to your data to simplify the user experience and optimize application performance, see the following section. The size of your data will help determine what cluster summaries to generate.
For a diagram illustrating how the ClusterSummary tables relate to other Influent tables, see the Entity Relationships section.
While the records in your Transaction Flow tables are automatically generated using the information in your source database, you may choose to add additional records to the tables if your source data contains large groups of related entities. Aggregating the following types of entity groups can simplify the user experience and optimize application performance:
- Large groups of accounts owned by the same entity
- Large groups of accounts that branch off a single entity (e.g., if one entity has sent 100K transactions, you may want to create a single, clustered summary of the 100K recipients so they can be easily loaded in the transaction flow view)
NOTE: Aggregation of large entity groups into clusters is a custom process that must be specifically defined and coded for each Influent application. For the example applications, large entity groups are defined as those containing more than 1,000 related entities. Your implementation may vary depending on your source data and machine resources.
There are two types of records that can be inserted into the Transaction Flow tables to summarize these types of clusters:
- Account Owner Records, which represent an aggregation of child entities owned by the same actor. Account owner records have an entity type (to or from) of O. They also require you to provide a link to the Account Owner's records in the Cluster Summary tables.
- Cluster Summary Records, which represent an aggregation of a group of entities. Cluster summary records have an entity type (to or from) of S.
The ClusterSummary table stores the properties (labels, entity attributes, branching permissions) of all the cluster summary and account owner entities in your data. Each cluster should have multiple entries in this table, each of which stores a different property.
All properties should be mapped to an FL_PropertyTag. For complete details of the available PropertyTags, see the DataEnums_vX.X.avdl file in influent-spi/src/main/avro/.
At a minimum, each entry should have entries for the following properties:
- Child Entity Count: Total number of entities that belong to the group.
- Owner ID: The ID of the entity that owns the group of child entities.
- Account Type: Dataset-specific account type.
- Label: Name of the group.
Additional properties should be stored for any summary distribution icons you have enabled for entity displays in the Influent Flow view.
To view example ClusterSummary entries, see the Example Records section after the description of the table columns.
|EntityId||varchar(100)||Yes||Unique identifier of the cluster entity.|
|Property||varchar(50)||Yes||Name of summary property, which corresponds to an entity attribute (e.g., label or summary icon) that can be displayed on the stack that represents the account owner or cluster summary in the Influent Flow view.|
|Tag||varchar(50)||Yes||Tag associated with the corresponding property expressed as an FL_PropertyTag.
For complete details of the available PropertyTags, see the DataEnums_vX.X.avdl file in influent-spi/src/main/avro/.
|Type||varchar(50)||Yes||Data type of the corresponding property expressed as an FL_PropertyType: FLOAT, DOUBLE, INTEGER, LONG, BOOLEAN, STRING, DATE or GEO.
For complete details of the available DataEnums, see the DataEnums_vX.X.avdl file in influent-spi/src/main/avro/.
|Value||varchar(200)||Yes||String representation of the property value|
|Stat||float||Yes||Associated stat for the property value such as frequency or weight|
Account owners should have a record with an ownerId property that associates the entity ID of the account owner to the record. In the following example, the s.partner.p10 entity is the owner of the accounts that belong to sp10:
Cluster summaries that do not support branching should have a record with an UNBRANCHABLE property. By default, all other records are considered are branchable.
The ClusterSummaryMembers table stores a list of entities that belong to each account owner or cluster summary. It is up to each application to determine what cluster summaries to generate based on the size of data.
|SummaryId||varchar(100)||Yes||Unique identifier of the cluster entity.|
|EntityId||varchar(100)||Yes||Unique identifier of an individual entity that belongs to the corresponding cluster entity.|
The following entity relationship diagram illustrates the order in which the ClusterSummary tables are built using the information in your source dataset. As each table is essentially a summary of your original data, each table is linked to every other table through the unique entity IDs in your dataset.