How to Cluster MicroStrategy Intelligence Server? - Part 1


This is the first from a 2 part series of blog posts on MicroStrategy Intelligence Server Clustering. 
MicroStrategy is one of the market leading BI platforms. Customers implement MicroStrategy to take advantage of its advanced analytical and reporting features. However, with organizational growth comes a need for 'more', more reports and more faster. Clustering of MicroStrategy Intelligence Servers (henceforth referred as I Server) is one of the strategies implemented by organizations using MicroStrategy to meet this demand for growth. The current post describes the concepts related to Clustering of I Server in MicroStrategy and how Clustering is to be done in a UNIX/LINUX environment.

What is Clustering of I Server?

Clustering refers to a concept where a number of I Servers are joined together to act as a single I Server. The individual I Servers which are joined to form the cluster are called nodes. We will discuss here the in-built I Server clustering feature provided by MicroStrategy. Using the in built clustering capability of MicroStrategy at most 4 I Servers can be clustered.

What are the advantages of clustering?


  • Increased Availability: If one of the MicroStrategy I Server fails in a cluster then the other I Servers in the cluster take up the requests. This increases the overall availability of the BI environment.
  • Strategic Resource Usage: Projects can be distributed across the different I Server nodes. Thus not all Projects should be running on all the I Server nodes.
  • Increased Performance: Multiple I Servers would provide greater processing power and hence could contribute to increased performance *
  • Scalability: A clustered I Server environment would provide the flexibility to scale up the user base and reporting scope with time

 * It has been noticed in many scenarios that even though the clustered I server environment is all geared up to provide higher performance the desired results are not achieved because of database side limitations. Even though the clustered I Server is able to accept more number of job requests to process, the in ability of the database to process the increased requests act as a performance bottleneck for the system. The database must accordingly be scaled up to take complete advantage of the increased processing power of I Server cluster.

Clustering also helps to implement the following strategies effectively in a BI environment.

Failover Support: 

Failover support ensures the availability of the BI environment even in case of sudden failure. Clustering of I Servers facilitates failover support in terms of:


  • Load Distribution – Whenever one of the nodes fail in the cluster, all new requests are directed to the other functional nodes
  • Request Recovery – Whenever one of the nodes fail suddenly all the web users connected to that node would be automatically redirected to the other node. In order that the users are automatically logged in to the other node, the following option must be enabled. If the users were executing any report when the failed node, those jobs would however be cancelled. User would be needed to resubmit the jobs in the new node.



Load Balancing: 

Load Balancing strategy helps to maintain an even load on the I Server nodes so that no node is highly overloaded. This helps make use of the available resources in the most efficient manner. 
MicroStrategy load balancing algorithm is driven by the number of user sessions. Thus though load balancing MicroStrategy tries to achieve an even distribution of the number of user sessions connected on each node.

A Load Balance Factor can be specified to define the loading of the I Server cluster nodes. In a cluster, if both the nodes are to be equally loaded then the Load Balance Factor for all nodes should be 1. The number of user connections on each node is in direct proportion to the Load Balance Factor defined for that node.

The following graphs provide the distribution of user sessions in two scenarios.
  • Load Balance Factor Node 1 = 1
  • Load Balance Factor Node 2 = 1   
  • Load Balance Factor Node 1 = 2
  • Load Balance Factor Node 2 = 1 

Project Distribution (Synchronous and Asynchronous cluster) and Project Failover: 

I Server cluster can be made as Synchronous or Asynchronous Clusters.
In an environment where multiple projects are running, all projects may not be loaded on all nodes of the I Server cluster. Different projects may be loaded on each node of the I Server cluster. An Asynchronous Cluster is one which has different projects loaded at each node of the cluster. A Synchronous Cluster is one which has the same project loaded at all nodes of the cluster. 

Asynchronous Clustering provides ability to flexibly use the system resources along with greater performance and scalability. In case the node hosting the project fails, the project can be hosted in any other functional node of the I Server cluster thus ensuring availability.

The current post is specific to a Synchronous Cluster environment.

Before configuring the Cluster


Cluster Cache


In a single I Server environment, the cache (Intelligent Cube, Report, Document) are all stored on the machine running the I Server. However, in a clustered environment there is a requirement to make the cache available across all the nodes of the clustered environment. Similarly, Intelligent Cubes whenever updated should be available with the updated data across all nodes of the cluster.


There could be two approaches by which cache can be shared in clustered environment:

Local Cache: Each node hosts its individual Intelligent Cube and local cache file along with its own cache index file. The cache of each node is thereafter shared so that it is accessible from all nodes in the cluster. 
If a user logs in to Node 1 and executes a report then the cache of the report would be created in Node 1. However, if a later time if the user logs in and gets connected to Node 2 then the cache created on Node 1 would still be available for access.

                                                                                                                                        

Central Cache: The cache is maintained in a centralized location which is thereafter access from all the nodes in the cluster. The cache can be stored on a separate machine dedicated to store cache. In this case only one cache index file is created.




Pros
Cons
Local Caching
•Allows faster read and write operations for cache files created by local server.
•Faster backup of cache lookup table.
•Allows most caches to remain accessible even if one node in a cluster goes offline.
•The local cache files may be temporarily unavailable if an Intelligence Server is taken off the network or powered down.
•A document cache on one node may depend on a dataset that is cached on another node, creating a multi-node cluster dependency.
Centralized Caching
•Allows for easier backup process.
•Allows all cache files to be accessible even if one node in a cluster goes offline.
•May better suit some security plans, because nodes using network account are accessing only one machine for files.
•All cache operations are required to go over the network if shared location is not located on one of the Intelligence Server machines.
•Requires additional hardware if shared location is not located on an Intelligence Server.
•All caches become inaccessible if the machine hosting the centralized caches goes offline.


As best practice, the choice between local and centralized cache is completely driven by the cache usage. If the cache is to be used heavily then go for centralized cache. If the cache usage is not heavy, then creation and maintenance of another location for storing cache could not be justified. In such scenario local cache should be the choice.

History List

If a database based history list is being used, then the History List messages and their associated caches are stored in the database which is then accessible from all nodes of the cluster.
In case of a file based History List, the Inbox folder in I Server contains collection of History List messages for all users. The History List file and cache are created in the Inbox folder of server node to which the user connects. It is required to share the Inbox folder each node with all the other nodes in the cluster, so that the user’s History File are available irrespective of the node to which the user connects.

With MicroStrategy 9.2.x there is an additional option to define ‘User Affinity Cluster’. If this option is enabled then all sessions of the user would preferentially be connected to the same node of the cluster. When the user would log in for the first time, the users History List would be connected on one of the cluster. All future logins of the user would connect to the same cluster node. With this option History List resource usage and the overhead to keep the History List on nodes in sync could be minimized

The ‘Subscription Load Balancing’ option would be available only when ‘User Affinity Cluster’ is enabled. If ‘Subscription Load Balancing’ is enabled then while executing the user History List subscriptions session log in would be based upon load balancing between I Server nodes.
However, disabling this option would help make use of the ‘User Affinity Cluster’.


If ‘User Affinity Cluster’ is not used then cache backup frequency should be set to zero (0) to ensure history list messages are synchronized across cluster nodes.



........To be continued