About Sparse structures
A Cube can store a value for any combination (cell) of members of its dimensions. For example a Cube that has three dimensions in its structure includes a cell for any combinations of said dimensions.
The virtual size of a Cube is the maximum number of cells it has. For example, if a Cube has the dimensions Month, Product and Customer in its structure and they contain, respectively, 24 months, 500 products, and 1,000 customers, the virtual size of the Cube is 24x500x1000=12,000,000 cells.
Normally, after loading data into the Cube only a small fraction of its cells really contain data. The ratio between the number of cells containing data and the total number of cells of the Cube (obtained by multiplying the number of members of each dimension together) is the Cube density.
Sparse management
Board does not create a cell for any possible value of a Cube dimension: There are several compression methods, but the most efficient one is the sparse management.
To better understand how it works, let's imagine a food and beverages company which sells a large number of products to a large number of customers, such as small retail shops, restaurants, hotels, catering companies, hospitals, schools and the like.
In this scenario, a typical customer would probably order only a small list of products from the entire stock, and they may vary depending on the type of customer: a hospital might buy different products from a hotel or a school.
If all customers don't buy all possible products and vice-versa, then we say that the Customer and Product Entities are sparse. If customer C1 buys product P1, then the C1-P1 combination is called a "sparse combination".
A sparse structure is a combination of two or more hierarchically unrelated Entities for which the number of distinct combinations of existing values is much smaller than the total number of potential combinations.
Sparse combinations are created when data is loaded into Cubes and sparse structures are defined when creating Cube versions. When a sparse structure is defined, disk space is allocated only for the sparse combinations created during the loading process, so disk space overhead is minimal.
To manage sparsity efficiently and effectiviely, administrators can monitor the structure of a Cube from the Cubes section. After selecting the appropriate Cube, click on the sparsity ANALYSIS tab to monitor sparsities that contain huge amounts of combinations. Here you can easily spot which Cubes use large amounts of sparsities and easily delete the unused sparse combinations in huge sparsities to optimize hard disk space usage and speed up the interaction with the affected Cubes.
- Sparse structures are shared among all Cubes: when a sparse structure is defined, it will also be used for any new Cube that has the same Entities as dimensions.
- Time Entities cannot be included in a sparse structure.
- Some degree of disk space compression also occurs on dimensions that are not part of a sparse structure.
- When Board is in charge of the definition of a dense/sparse structure, it will put as many Entities as possible in sparse mode, thus trying to keep the sparse structure pointer to 64-bit based on the automatic or manual Max Item number set for each Entity. In other words, Board will put all Entities in sparse mode as long as the product of Max Item numbers stays below the 64-bit limit. In case of automatic Max Item numbers, affected Entities will be considered as having a corresponding Max Item number greater than the current one while keeping the value below the 64-bit limit.
- If the 64-bit pointer is not sufficient, Board will scale up to a 128-bit pointer. For Entities set as dense, however, there are no such limits.
Sparse definition guidelines
When you create a Cube version with two or more dimensions, use the following guidelines to define a sparse structure:
- Ignore the time dimension
- Define the dimensions of Cube version without setting sparse Entities
- Identify the two largest Entities in terms of number of members, then ask yourself the following question: ”will every possible combination of both Entities correspond to a meaningful value?”. If the answer is "no" then the combination of those two Entities should be defined as a sparse structure.
Subsequently, identify the next largest Entity and go through the same reasoning considering the sparse structure as a unique entity. If the combination of the new Entity and the two Entities mentioned before is sparse, then add the new Entity to the sparse structure. Repeat this process for the other dimensions - In general, you should define a sparse structure when two Entities have more than 1,000 members, or when one Entity has several thousand members and the other a few hundreds.
- It's always advisable to set a sparse structure, even if it is not needed: when in doubt, set the Entities as sparse to ensure better performance and to be less taxing for the hardware
- There is a dimensional limitation in defining sparse structures. If you have exceeded the limit, a warning message appears on the Cube icon at the top of the corresponding version. For a Cube version that includes a sparse structure, the product of the Max item number of dense Entities has no limits while the product of the Max item number of sparse Entities must be less than 7.8x10^28 (please note that those numbers are approximations)