Introduction

Capacity Planning has been a hot bed for technical developments in the past decade. As products like our Insights Capacity Planner begin to integrate things like Machine Learning algorithms into scenario modelling and such, it is interesting to speculate as to where the technology set as a whole could end up.

The challenges for the future

In our first blog, we have talked about how Capacity Management has been done in the past. Focused mainly in the infrastructure level, on metrics that make sense to technical IT roles. In doing so we’ve left out the service and business tiers, getting scheduled lists of actions for the future by means of manual interactive processes (what-if tools).

Of course there are some great benefits of understanding that, but there is certainly more insight to be gained in the future, as technologies continue to improve.

Bringing the service and business layers into play

Capacity management tools will include different (inter-connected) views for the different tiers: infrastructure, service and business. Gartner support this in their latest report Market Guide for Capacity Management Tools, May 2016: "Organizations should invest in tools that have strong presentation and reporting tools that allow the user to have different views for the IT infrastructure, the services and the business."

The views mentioned above should also include a business’ main dashboard view with the objective of proving that all the actions taken in the past as well as those planned for the future regarding IT infrastructure have been or will be effective in optimizing costs while maintaining the risks of the business below acceptable levels.

The tools should try to model the numerical relationship between the service and the component (infrastructure) tiers. In doing so predictions and future planning are not based on component utilization trends anymore. Instead they should be focused on expected/planned/predicted service metrics.

For example, if you know your business and the services you are dealing with, you can plan to provide a number of customers with a specified quality of service. We project this into the future, then, our models are capable (from those service metrics) of calculating what capacity we need in the affected components.

The challenge here, as we said, is that the relationship is tight but complex. A simple approach is to apply linear regression models between component metrics (predictors) and service metrics (targets). This is something that ITRS Insights Capacity Planner can do right now, with customizable functions. However, remember that this relationship is highly non-linear. There are other mathematical modelling techniques that connect service and component metrics, like Queuing Networks and Queuing Petri Nets. These are powerful but when there are challenges implementing them. Especially because they generally tend to involve intrusive instrumentation of the monitored systems and because they require powerful solver algorithms that should be applied to big IT states. As a result we are talking about machine learning at a big data scale.

Automated planning

Automation will take over from what-if analysis and planning. Instead of the user iteratively and manually trying a series of infrastructure changes around a future scenario and checking the impact at the service level, it will be possible to simply specify what the future scenario is, and see where resources should be allocated automatically as a result. Therefore, the application will provide an automated suggestion of actions for the future. In this automated context, we can call them commands. As such, they can be presented through an API so that other automatic systems can interact with the application.

Going Real-Time

The concept "real-time" is relative, depending on the granularity of the process. For us, in the context of capacity management, going "real-time" would mean to go from weeks/days into hours/minutes. Currently we are describing our IT infrastructure with a time granularity of one day (we gather all our data every night, by means of a batch process). Our predictions, decisions and actions are described in terms of days in the future. This "slow paced" environment gives headroom for prediction algorithms and other suggested complex mathematical calculations to be trained and applied.

But what if we wanted to speed up the model so that we could apply it to "elastic" environments "in the cloud" where virtual machines are provisioned, started and stopped automatically as the supported services put more or less demand on the infrastructure? Conceptually, we are just making the processes faster. But this will have a huge impact on the underlying implementation, as the modelling becomes more complex, the underpinning parts will struggle with increased speeds.

Now is the time to see if real-time capacity management can happen, and if it can, we have a big challenge for real-time big data.

 

Tags: Insights