By TopDown Team
February 6th, 2017
Suez owns and operates sixteen water and waste utilities and operates ninety municipal water and waste water systems. In October of 2015 Suez implemented Hyperion Planning for budgeting. In February of 2016 scope was increased to include monthly reforecasts. As more actual data and more forecast scenarios were added the database grew.
Suez’s application soon reached 90 GB and like many large Planning applications aggregating the entire application was time consuming. Depending on whether new actuals were being loaded and which aggregations were included the process could take up to 12 hours. Suez administrators had to carefully plan when to execute aggregations to avoid taking the application off line and disrupting users. Back up and other administrative tasks were also very time consuming. So Suez collaborated with TopDown to optimize the aggregation process.
Suez wanted a thorough yet cost effective approach. TopDown proposed three-steps:
- Run a comprehensive set of tests to identify the most promising potential optimization actions
- Prioritize the candidate actions
- Implement the optimization actions that Suez selected
Candidate Optimization Actions
We wanted to ensure that we considered all effective actions. But given a tight time window we needed to identify the most promising actions quickly so we would have enough time to implement and test. TopDown’s approach was to assess the application based on a comprehensive list of twenty potential optimization actions. These ranged from simple low-risk steps such as avoiding loading zero-padded files to complex steps such as rewriting calc scripts. The list also included the usual suspects such as block size, cache settings, sparse/dense configuration and outline order as well as testing FIXPARRALLEL and CALCPARALLEL. We would not have time to test and validate all of these actions. So we prioritized based on three criteria; potential effectiveness, risk and effort. For each of the three criteria we ranked each candidate action on a scale of one through five, five being the easiest or safest or most effective. The ranking was very subjective but it gave us a framework for prioritization. Actions such as cache settings present no risk to the data and are very low effort. However, they are usually only low to moderate effectiveness. Some outline changes are low risk and low effort but these tend to be low effectiveness. Other outline changes might rank high on potential effectiveness but they pose higher risk to the data. Risk and effort are interdependent because higher risk means more testing time and error resolution time. Some calc script changes could be high effectiveness and moderate risk, for example reducing the number of passes and blocks processed. However, calc script changes are usually high risk and high effort.
No Low Hanging Fruit
Based on testing, conversations with the administrators and analysis of log files and database statistics we found there was no low hanging fruit. We ruled out any outline changes that would significantly affect the way users worked with the data such as eliminating dimensions, years or scenarios. The block size was reasonable at about 200k. Changing sparse /dense configuration changes calc order and that’s risky. It’s a Standard Multi-Currency application which stores FX rates at certain generation one intersections. That reduces the possibility of using non-stored members at generation one of sparse dimensions. Overall, we probably couldn’t change enough stored members to non-stored to make a difference.
The aggregation calc scripts were simple aggregations of the necessary sparse dimensions. Changing the aggregation order might help a bit but there was no opportunity to reduce passes or blocks processed.
Some actions are potentially high effectiveness, low risk and low effort such as eliminating zero-fill. This problem is rare but the impact can be huge. The biggest source of this problem is loading zero-padded files. Another less significant source is users submitting zeros instead of #missing. Analysis of the load files and spot checks of level 0 data showed no evidence of significant zero-fill.
I’m my experience I’ve found the calc cache rarely makes much difference. However, many others disagree and creating blocks during aggregations is where it is most likely to matter. Fortunately, it’s simple to check. An analysis of the log file showed the calc cache was in single bit-map mode indicating a higher Calc Cache could be better. Unfortunately, the calc cache was already maxed out.
The hit rate on the index cache was already .99. But the data cache hit rate was only .46 so raising the data cache might help with no risk and little effort. But that step wasn’t very effective. The database was badly fragmented. However running restructures before aggregating is time consuming and unlikely to result in a net reduction of process time.
We ran a series of automated tests and identified two alternative actions that combined FIXPARALLEL with selected dynamic calcs. These actions could potentially reduce agg times to about 20 minutes. That was a big improvement. But users would still have to wait for their aggregated results. The database would still be very big and cumbersome to manage.
We were running out of effective options. We needed actions that promised high effectiveness that could be completed within budget.
Converting to Hybrid
Initially we discounted Hybrid Aggregation because Suez was on 188.8.131.52.500. Common wisdom holds that’s too early to put into production because there are too many limitations. Note that we originally ranked Hybrid high in effectiveness but low in safety and ease of implementation. But if the application could be converted to Hybrid Aggregation the aggregation steps would be eliminated altogether saving hours and disk utilization would be reduced from 90 GB to under 400 MB. That was the brass ring.
Certain complex calculations and included in Hybrid major limitations. When we analyzed the calculations we found the only modifications required were two instances cross-dimensional operators and a set of time balance calculations. Hybrid was would much easier and lower risk than anticipated.
The basic conversion to Hybrid Aggregation includes adding the ASODYNAMICAGGINBSO line to the Essbase.cfg file and setting the sparse dimension aggregations to dynamic in the outline.
Cross dim Workarounds
Cross-dimensional operators in outline member formulas will cause the Hybrid query processor to failover to BSO and you will see a message to that effect in the application log file. This means the sparse aggregations that were converted to dynamic calcs will be very slow. The workaround is to move these calcs to level 0 calc script calculations. This is not possible if the calc must be processed at upper levels for example for certain rate or ratio calcs. However, this works if the level 0 results can be aggregated according to the outline. Generally speaking, if the calculations are purely additive this will work. If they mix addition and subtraction with multiplication and division they will not.
Here is one of two similar member formulas that needed to be converted. Notice that the formula is a ratio and therefor fails the above test. However, in this case the results were only used to produce other level 0 results and the upper level results are not needed.
|SA1001||SA1001 – Consumption /day (MG)||“SA1007″/”Budget”->”No_BU”->”NO_OU”->”No_Department”->”No_Projects”->”Local”->”No_DT”->”No_BC”->”No_CE”->”hsp_inputvalue”->”SA2000″;|
So this formula was added to the level 0 calc script which was set to run on save in the appropriate forms.
/* 08/10/2016 modified to add the formula for SA1001 for Hybrid Aggregation conversation */
Time Balance Workaround
As mentioned above 184.108.40.206.500 does not support Time Balance calculations. The following are examples of the formulas used to work around that limitation. The @ISMBR lists have been shortened for readability.
IF(@ISMBR("ConsumptionPercentage","99204000") @SUM ("Jan":"Dec")/12; ELSEIF(@ISMBR("SA1001","SA1301")) "Dec"; ELSE @SUM ("Jan":"Dec"); ENDIF
IF(@ISMBR("ConsumptionPercentage","99204000")) @SUM ("Oct":"Dec")/3; ELSEIF(@ISMBR("SA1001","SA1301")) "Dec"; ELSE @SUM ("Oct":"Dec"); ENDIF
/* simulate TB Avg */ IF(@ISMBR("ConsumptionPercentage","99204000")) @SUM ("Jan":"May")/12; /* simulate TB Last */ ELSEIF(@ISMBR("SA1001","SA1301")) "May"; ELSEIF(@ISMBR(@IDescendants("Balance Sheet"))) "BegBalance"+@SUM ("Jan":"May"); ELSE @SUM ("Jan":"May"); ENDIF
After converting to Hybrid Aggregation most of the reporting performance remained acceptable. Smart View queries for one business unit showed no noticeable change. Bus another business unit had Smart View queries went from a few seconds to 30 seconds or more. However, with no aggregation step necessary, the total time from data entry to final result was reduced by 15 to 60 minutes. In addition, the users no longer needed to launch the aggregation so one step was eliminated.
There was one important Smart View report that posed a problem. This report retrieved Essbase data at many aggregated levels and used the Excel Group feature to provide drill-down. This report was enormous and query times went from 30 seconds to 15 minutes. Most of the data in this report didn’t change so the solution was break it up into active and static sections and to refresh only the active section.
Several Financial Reporting reports also caused a failover to the BSO query processor, resulting in dismally slow performance. This was confirmed via the log message indicating hybrid failed. The solution was to change the tuning option from “Using WITH MEMBER” to “Using Separate Queries”.
In the end users didn’t notice much change in query times but after implementing Hybrid they no longer need to wait to see their aggregated results. Some users had been running their own aggregations and that step was eliminated.
The biggest win was the transformation for administrators. Prior to implementing Hybrid, database refreshes could consume entire weekends loading actuals, repopulating forecasts and running aggregations. In some cases, it wasn’t possible to complete the entire job during scheduled downtime so admin procedures could spill over into business hours. Implementing Hybrid eliminated that painful process freeing up admins to focus on new opportunities. With respect to the users, while there were some disadvantages these were more than offset by significant advantages. While it’s true that Hybrid Aggregation has limits, for many of our applications these limits may not be as significant as common wisdom suggested.