supporting multitenancy in the application framework
Multitenancy is that characteristic of an application that allows one deployment of it to be used by multiple organizations. Wikipedia provides a nice, high-level introduction.
The historical approach is to build multitenancy support in the application. This means every resource access by a user needs to take into which tenant the user belongs to. This can be very complex, especially what the feature needs to be retrofitted into any existing code base.
A second approach is to use virtualization. Each tenant has its own operating system instance. This has the very significant benefit of not introducing complexity at the application level. But the complexity moves to the management of virtual images, especially if the application is distributed (tiered, or scaled), or if the number of tenants is high, and changes often. Also, in an environment with a large number of tenants, the cost of one VM per tenant (scaled) may be prohibitive.
A possible third approach - to be investigated here - is to build multitenancy support into the application framework. This frees application developers from having to worry about it, and does not have the cost and complexity associated with managing a large number of virtual instances. But, complexity is obviously added to the application framework, and applications deployed on the framework are constrained in the way they access resources; they HAVE to use the framework's APIs, and of course the framework APIs have to exist and be multitenant aware for every type of resource consumed by the app.
The following is nothing more than a brainstorm, in hopes of getting a clearer picture of what approach 3 means, in the case of a JEE web app. It contains no solutions. Analysis of each potential pitfall is superficial. The list of potential pitfalls is not exhaustive.
Request handling and tenant context
The tenant needs to be identified by the time the framework hands off the request to the application. This tenant context is then needed whenever a shared resource is accessed, so the resource provider knows which tenant is requesting access.
Ideally, the application code is not aware of the tenant context. At the same time, it must be available to all tenant-aware resource providers.
Thread-local storage works well enough in the synchronous model. When the application is asynchronous, a special executor service implementation must be used to ensure that the tenant context is appropriately stored along with the task, and that the tenant context is added to thread local local storage executing the task. A bit messy...
Database Access and schema
The framework must expose a multitenancy-aware JPA implementation. How this gets done depends on schema choices. Multiple approaches can be investigated here:
* One persistence unit per tenant. Each persistence units targets a distinct database instance. The may result in a large number of databases and JPA scalability in terms of number of persistence units would have to be asserted. In this case the JPA wrapper only has to control creation of entity managers.
* One global database, tenant-specific tables. The JPA wrapper intercepts query and persist operations, and qualifies the table. Database scalability in terms of number of tables needs to be asserted. A mix of this and the above approach may also be considered.
* Global tables with a hidden tenant key. In this case the JPA wrapper intercepts queries and adds the tenant column (in WHERE conditions and INSERT statements). This seems like the most complex in terms of the framework, and introduces additional storage (the tenant column). This is the approach chosen by Grails.
Note: The request context (containing the tenant information) needs to be available to JPA wrapper implementation when it creates or retrieves an entity manager. This information needs to be passed out of band (e.g. thread-local) which is an architecture constraint on the application.
System APIs
File system access, network IO, ...
Overriding file system features of java.io and java.nio packages, or preventing their direct use, is required to support multitenant filesystem access, as different tenants must transparently access different areas of the file system.
Overriding network access features may be necessary in order to ensure fair access to network or thread resources.
Application configuration, monitoring
In the single tenant case, it is common practice to store application configuration in files accessible as resources. This does not work in a multitenant situation. Instead, applications need to consume a tenant-aware framework API.
Creating and using a tenant-aware JMX provider may be worth investigating here.
Logging
The Java logging API supports the configuration of application defined log appenders. application developers may choose only from a list of tenant-aware appenders. The framework sets tenant-aware defaults.
External Service Access
Requests to third party services must appear as coming from a tenant. authentication data, such as an OAuth access token cache, must be access in a tenant-aware fashion.
JMS
When sharing a JMS provider among tenants, the following situation should be considered:
* namespace management. 2 tenants should be able to use the same application-level topic/queue name.
* fair resource utilization. One tenant's event activity should be constrained so as not to impact others.
The framework can inject a tenant-aware JMS client implementation, which could be a wrapper around an existing one. In order to ensure fair resource utilization, the JMS broker needs to be able to impose limits, and may need to be tenant-aware.
Libraries
Some library implementations may be incompatible with a multitenant environment. Things to watch out for include singleton configuration or data objects.
Bottom line (so far)
In most cases, Java allows the application to use their own implementation of a particular API. For these cases, it is possible for the framework to provide tenant-aware implementations.
In a JEE application, the container must be tenant aware. This requires modifications of the container: the servlet container itself in case the servlet API is used directly, or the higher-level framework on which the application is built.
Base java APIs (IO, net) are not easily pluggable. This requires that the framework expose alternate APIs.
In practice, applications running on a multitenant framework will be restricted in what they can use (code, resources), as everything they use must be multitenant compatible. How restricted will depend on the context. Google App Engine provides an idea of what these restrictions are.
Things may be less restricted in a more controlled environment in which the tenants are applications created by a know group, e.g. services created by a company and deployed on a common framework.
The historical approach is to build multitenancy support in the application. This means every resource access by a user needs to take into which tenant the user belongs to. This can be very complex, especially what the feature needs to be retrofitted into any existing code base.
A second approach is to use virtualization. Each tenant has its own operating system instance. This has the very significant benefit of not introducing complexity at the application level. But the complexity moves to the management of virtual images, especially if the application is distributed (tiered, or scaled), or if the number of tenants is high, and changes often. Also, in an environment with a large number of tenants, the cost of one VM per tenant (scaled) may be prohibitive.
A possible third approach - to be investigated here - is to build multitenancy support into the application framework. This frees application developers from having to worry about it, and does not have the cost and complexity associated with managing a large number of virtual instances. But, complexity is obviously added to the application framework, and applications deployed on the framework are constrained in the way they access resources; they HAVE to use the framework's APIs, and of course the framework APIs have to exist and be multitenant aware for every type of resource consumed by the app.
The following is nothing more than a brainstorm, in hopes of getting a clearer picture of what approach 3 means, in the case of a JEE web app. It contains no solutions. Analysis of each potential pitfall is superficial. The list of potential pitfalls is not exhaustive.
Request handling and tenant context
The tenant needs to be identified by the time the framework hands off the request to the application. This tenant context is then needed whenever a shared resource is accessed, so the resource provider knows which tenant is requesting access.
Ideally, the application code is not aware of the tenant context. At the same time, it must be available to all tenant-aware resource providers.
Thread-local storage works well enough in the synchronous model. When the application is asynchronous, a special executor service implementation must be used to ensure that the tenant context is appropriately stored along with the task, and that the tenant context is added to thread local local storage executing the task. A bit messy...
Database Access and schema
The framework must expose a multitenancy-aware JPA implementation. How this gets done depends on schema choices. Multiple approaches can be investigated here:
* One persistence unit per tenant. Each persistence units targets a distinct database instance. The may result in a large number of databases and JPA scalability in terms of number of persistence units would have to be asserted. In this case the JPA wrapper only has to control creation of entity managers.
* One global database, tenant-specific tables. The JPA wrapper intercepts query and persist operations, and qualifies the table. Database scalability in terms of number of tables needs to be asserted. A mix of this and the above approach may also be considered.
* Global tables with a hidden tenant key. In this case the JPA wrapper intercepts queries and adds the tenant column (in WHERE conditions and INSERT statements). This seems like the most complex in terms of the framework, and introduces additional storage (the tenant column). This is the approach chosen by Grails.
Note: The request context (containing the tenant information) needs to be available to JPA wrapper implementation when it creates or retrieves an entity manager. This information needs to be passed out of band (e.g. thread-local) which is an architecture constraint on the application.
System APIs
File system access, network IO, ...
Overriding file system features of java.io and java.nio packages, or preventing their direct use, is required to support multitenant filesystem access, as different tenants must transparently access different areas of the file system.
Overriding network access features may be necessary in order to ensure fair access to network or thread resources.
Application configuration, monitoring
In the single tenant case, it is common practice to store application configuration in files accessible as resources. This does not work in a multitenant situation. Instead, applications need to consume a tenant-aware framework API.
Creating and using a tenant-aware JMX provider may be worth investigating here.
Logging
The Java logging API supports the configuration of application defined log appenders. application developers may choose only from a list of tenant-aware appenders. The framework sets tenant-aware defaults.
External Service Access
Requests to third party services must appear as coming from a tenant. authentication data, such as an OAuth access token cache, must be access in a tenant-aware fashion.
JMS
When sharing a JMS provider among tenants, the following situation should be considered:
* namespace management. 2 tenants should be able to use the same application-level topic/queue name.
* fair resource utilization. One tenant's event activity should be constrained so as not to impact others.
The framework can inject a tenant-aware JMS client implementation, which could be a wrapper around an existing one. In order to ensure fair resource utilization, the JMS broker needs to be able to impose limits, and may need to be tenant-aware.
Libraries
Some library implementations may be incompatible with a multitenant environment. Things to watch out for include singleton configuration or data objects.
Bottom line (so far)
In most cases, Java allows the application to use their own implementation of a particular API. For these cases, it is possible for the framework to provide tenant-aware implementations.
In a JEE application, the container must be tenant aware. This requires modifications of the container: the servlet container itself in case the servlet API is used directly, or the higher-level framework on which the application is built.
Base java APIs (IO, net) are not easily pluggable. This requires that the framework expose alternate APIs.
In practice, applications running on a multitenant framework will be restricted in what they can use (code, resources), as everything they use must be multitenant compatible. How restricted will depend on the context. Google App Engine provides an idea of what these restrictions are.
Things may be less restricted in a more controlled environment in which the tenants are applications created by a know group, e.g. services created by a company and deployed on a common framework.
Labels: j2ee, java, jee, multitenancy