In the year of 2016, the Federal Ministry for Economic Affairs and Energy (BMWi) launched the “Pilotprogramm Einsparzähler”. This program has the goal of fostering digital innovations and supporting the introduction of value-added services for energy efficiency based on smart meter data. smartB took part in this program because of a lack of a tool capable of calculating trustable energy savings based on the IPMVP  and ISO 50006  in the German energy market.
This paper aims to give the reader a better understanding of why the ECM-Tool was developed based on the previous cited norms, what the actual situation of the market regarding similar tools is and why smartB’s approach is better. Moreover, it goes deeper in the development of the tool and its technical elements, describes how to calculate energy savings and reinforce this with three different real cases where the ECM-Tool was applied to verify achieved savings after an energy conservation measure.
2. Norms and Context: IPMVP and ISO 50006 to Achieve Efficiency First
The Federal Ministry of Economic Affairs and Energy (BMWi) set forth a strategy for energy efficiency policy in Germany in the 2016 “Green Paper on Energy Efficiency” , which has since received critical feedback in a stakeholder consultation process. The conclusions of this process will lead to a white paper on energy efficiency, soon to be published. The first thesis emphasizes the central goal to save energy: “Efficiency first leads to a cost-optimal energy transition and reinforces renewable energy’s effect on decarbonisation. A unit of energy saved need not be produced, stored or transmitted over the grid”. (BMWi, 2016) In the same year and in order to promote digital innovations in the field of energy efficiency and monitoring for all end customer sectors, BMWi launched a funding support scheme called “Pilotprogramm Einsparzähler (i.e. pilot program energy savings meter)”. The program promotes the development of digital platforms following the Efficiency First principle, focusing not on individual projects but on the establishment of a business model. The funding authority BMWi established the condition sine qua non for any project in the Pilotprogramm Einsparzähler to offer a product with hard and software that presents the end customer with a transparent and reproducible quantification of energy efficiency. Comparing energy consumption before and after an energy conservation measure (ECM) allows for a quantification and verification of achieved energy savings as laid out in international standards for energy management such as the International Performance Measurement and Verification Protocol (IPMVP)  as well as ISO 50001:2018 , ISO 50006:2014  and ISO 50015:2014 , respectively.
Measurement and verification (M&V) is a prerequisite for all performance-based energy-efficiency projects to assess and audit the quantitative outcomes of energy conservation measures (ECMs). The IPMVP provides a rigorous and yet flexible framework for evaluating the performance of an ECM and is the most prevalent M&V methodology employed worldwide. The related alternative standard increasingly applied in Germany and therefore the focus of this paper is the ISO 50006:2014. This norm standardizes process and methodology in energy management to provide evidence of improved energy-related performance. The consumption of essential significant energy uses has to be adjusted by factors such as weather or product properties.
Both the core concepts and general methodologies in ISO and IPMVP are robust and show little distinction. The most significant drawback in these norms is the widely published  lack of guidance on the calculation process. This issue is more present when performing M&V in industrial facilities, where the quantity of factors impacting on energy performance complicates the modelling process. The ISO 50006 might be considered to complement IPMVP although there is not an official link between them. ISO 50015 actually compliments ISO 50001 and like IPMVP, ISO 50015 sets out to establish a common set of principles and guidelines to be used for measurement and verification of organisational energy performance.
ISO 50006 is the pertinent norm for smartB’s ECM-tool, since it guides organizations on measuring energy performance using energy baselines (EnB) and energy performance indicators (EnPI), which in turn is fundamental in managing energy performance with regard to ISO 50001 and ISO 50015. An EnPI defines a quantitative value or measure of energy performance for a given system’s significant energy use (SEU), which can be derived as (a) an absolute value, e.g. energy use in kWh over a certain time span; (b) a ratio of values, e.g. energy use in kWh per unit of output; (c) a statistical model, e.g. energy use as a regression function of weather conditions and output; or (d) an engineering model, e.g. energy use modelled with physical properties in any functional form. (a) and (b) provide the clear advantage of simplicity, which in turn might fail to adequately indicate multiple dependencies between energy use and relevant variables. Therefore only (c) and (d) allow for the normalization of an SEU to routinely modify energy data in order to account for changes in two or more relevant variables to compare energy performance under equivalent circumstances.
The EnB is the quantitative reference providing a basis for comparison of energy performance in the baseline period before an ECM versus the reporting period after an ECM. In case of (c) a statistical model, the EnB consists of a set of parameters used to forecast energy consumption in the reporting period, based on the functional relationship in the baseline period between energy use and relevant variables, had there not been an ECM. Apart from normalizing with respect to two or more relevant variables, this forecasting method gives a hypothetical energy consumption curve in the reporting period. By controlling for the influence of relevant variables and static factors on a system’s energy performance, the statistical model facilitates the interpretation of a change in energy consumption as a causal effect of the ECM. The amount of energy saved can be derived from the area between two curves (i.e. ΔE in the following graph): the forecast energy consumption and the actual energy consumption in the reporting period after the ECM. Figure 1 visualizes this approach in two-dimensional
Figure 1. Normalization calculation process.
space, simplified to show the context between energy consumption and one relevant variable © DIN EN ISO 50006:2014.
3. A Glance at the Market for Energy Management Systems Supporting ISO 50006
3.1. German Market Participants
Energy management software is the key to keeping companies lean, efficient, and sustainable. Recent advancements in technology and IT infrastructure make implementing these solutions easier, and more attractive than ever1. Cloud-based energy management solutions, for instance, eliminate the majority of the local IT infrastructure traditionally necessary.
Nevertheless, even if trends in technology support energy management software, many solutions on the German market do not offer the feature to calculate achieved savings according to the aforementioned international standards. The Federal Office for Economic Affairs and Export Control (BAFA), which also administers the Pilotprogramm Einsparzähler, provides a reference list of more than 200 software solutions available on the German market, all of them ISO 50001 certified. However, it is unknown which of these solutions also support ISO 50006 and verification of achieved savings2. Therefore, without any claim of comprehensiveness, the following shortlist shows market participants of the ECM-tool in Germany:
ÖKOTEC, EneffCo: This web based tool has the main advantage of being user friendly and having a nice dashboard, which makes energy savings calculations easy. It allows the user to evaluate the influences of external factors such as outside temperature, partial load and standby. Besides this, it is possible to create different KPIs in order to measure the changes in the energy efficiency.
The tool has been developed to support the norms ISO 50001, ISO 50006 and ISO 50015 and this has been reflected in all the different modules inside of EneffCo.
Limòn, é.Visor: é.Visor is an integrated software solution for energy monitoring and energy management, which supports the user from data acquisition through analysis and reporting. Comparative analyses (benchmarking) of energy data, identification of potential savings according to ISO 50001 are possible with this software. Moreover, it takes into account influencing factors in the efficiency evaluation that impact the specific energy consumption. This means the user can display and evaluate the energy demand depending on influencing factors.
eSightenergy: Besides smartB’s ECM tool, the eSightenergy tool is the only software based on IPMVP on the German market. Its measurement and verification module is a powerful tool that puts a lot of effort in building an accurate baseline, allowing for the analysis up to ten independent variables that influence the energy consumption. One thing that makes eSightenergy special is that the savings can be compared in different time intervals and resolutions, all with different graphs that give extra insights to the savings such as the CUSUM tool that allows the user to analyze the change in the savings trends.
Furthermore, the definition of targets makes possible to get clarity in every kind of deviation that could happen and making possible to react and correct the problem that could have lead to this.
3.2. International Market Participants
Outside of Germany these companies provide energy management solutions:
Wattics (USA): The wattics solution for measurement and verification based on the IPMVP is one of the most robust in the international market. Wattics takes the user through a series of steps including the creation of a new project, description, definition of the energy conservation measure, establishment of the baseline and reporting period. Furthermore, the tool gives the possibility to add routine adjustments (i.e. weather and production) and non-routine adjustments (unexpected changes in the system, maintenance stop, changes in de dimension of the building etc.). After following all the steps and introducing or uploading the necessary information the user is able to visualize the energy model generated for the tool and the calculation of the savings. All the relevant information of the project is shown in a customizable report what makes easier the spread of information.
DEXMA (Spain): DEXMA offers a specific IPMVP module for monitoring energy saving by project or action. Nevertheless, the main disadvantage is that it is not capable to generate the energy model, in other words, it is necessary to use excel or another software to obtain this model. Once the model formula is inserted in the program, it offers different graphs in order to verify savings.
Retscreen (Canada): Retscreen is an energy management software that has been developed for the government of Canada in order to impulse the energy efficiency projects. It has a robust module for the calculation of energy savings but its design is more oriented to energy experts that know what they are doing. A notable characteristic is the download of daily weather values of the NASA server from all around the world, such as air temperature, solar irradiation, humidity, pressure etc. Moreover a great feature is the creation of cooling and heating degree days automatically based on a given reference temperature.
During the creation of an energy model the user can choose not only linear forms but polinomial, exponential and logarithmic, making the tool ideal for complex cases where not always a linerat solution can be found. Nevertheless, uploading data to the system is not easy and is needed to merge data from one database to another. Even if many graphs options are included to visualize the data, they are in a very old school way, since the focus of the software is more oriented in its functionality.
4. Unique Selling Proposition (USP)
As exposed in the previous chapter, there is a variety of software solutions available on the market that allow the user to calculate energy savings based either on the IPMVP or the relevant ISO norms. Nevertheless, every one of them has its own limitations, which translates to a lack of flexibility and efficiency for the user.
Software offered by other market participants uses diverse M&V methods to calculate savings. In many cases, these methods are implementations of industry-standard approaches, such as those described in the IPMVP  or those usually used for evaluating efficiency programs . Tools may differ in 1) whether they describe what they calculate as gross or net savings, 2) in the regression approach to calculate the energy model, 3) the method to determine savings, or 4) in their ability to operate on whole buildings as well as submetering data. In addition, some tools are programmed to report accuracy metrics such as baseline model goodness-of-fit, or estimations of savings uncertainty .
Since the quantification of energy savings often is not the focus of existing energy management software, respective modules are often complicated to use, not user friendly and intuitive. In order to set up data handling and modelling to derive savings, the user needs a high level of expertise, read a manual and attend instruction classes or online seminars, which generate extra costs and barriers to entry. Usability benefits from seamless data integration via an application programming interface (API), which allows scaling the solution to many projects, without manual data import and export from other sources. Tools should also allow the user to set the baseline period, the date of the ECM and reporting period manually.
Data resolution can be an issue, since many solutions on the market do not support high frequency data. Two to fifteen minute intervals between observations are a common default resolution. Nevertheless, in order to find better correlations and therefore more accurate energy models some cases mandate a higher resolution of data , e.g. mostly production processes. At the same time, for many applications daily values for SEU, relevant variables and system output suffice to parameterize an ECM-model.
The cloud-based software solution of smartB applies improved data access and advanced analytics to automate and accelerate the M&V process. Allowing the use of higher resolution data and multiparameter models lead to a more precise model of an ECM’s impact on an energy system. smartB’s ECM-tool uses up-to-date database and programming technologies, without restrictions in the number of relevant variables or data frequency while at the same time updating frontend and model output within few seconds after changes by the user.
The ECM-tool allows the user to apply the best statistical model after interpretation of the significance and accuracy of predicted values shown in the model output. This requires some experience in regression modelling and knowledge about the informative value of t-statistics and p-values. The user can choose independent variables freely by adding or deleting them with a simple click, to see the model updating in real time. The approach to click and play with models and input data (i.e. stepwise regression) can be automated, to run the model including only relevant variables with high predictive power for energy consumption.
A visualization of data is fundamental to determine correlations and deriving energy insights. The ECM-tool was designed to compress all the important information in one page to determine and validate energy savings. A compact presentation of the normalized savings with statistical properties of the applied model simplifies reproducibility and verification by a third party, which is a clear USP of our solution.
The goal of regression models in energy management is to forecast expected values for energy consumption based on the predictive power in relevant variables. The ECM-tool is designed to give a point estimate of achieved savings, defined as the cumulated difference between actual consumption and model prediction for the reporting period. Measurement errors and statistical modelling introduce uncertainty of point estimates for energy savings . Therefore, in order to quantify model accuracy, the ECM-tool shows the following set of indicators:
· Regression coefficients: The ECM-tool shows a table with variable names and coefficients with t-statistics and p-values.
· R2, the coefficient of determination, measures the proportion of variance in energy consumption (the dependent variable) that is predictable with the relevant (independent) variables. We can accept the model, if R2 is > 80%, although this indicates suboptimal model fit and for better accuracy of the prediction the user should revisit data input until the model predictive power measured in R2 is > 90%. There is no general critical value for R2 and in cases where simpler arithmetics could suffice (such as averages over time), regression models in the ECM-tool might show low R2, but savings are correctly quantified. At the same time, R2 fails to inform the user about the range of likely outcomes for the point estimate of energy savings, which mandates the presentation of prediction intervals.
· Adjusted RMSE, the standard error of the estimate, is the standard deviation of the prediction errors from a regression, adjusted for the degrees of freedom of the model (sample size minus number of model coefficients). The (adjusted) RMSE is denoted in the same units as the dependent variable, which allows the calculation of prediction intervals. However, for the same reason, the RMSE does not indicate the relative precision of one model compared to another model.
· PI, prediction intervals for estimated savings: The ECM-tool shows prediction intervals with upper and lower bound, to give a straightforward indication of the precision and reliability of estimated savings. The tool applies the following formula under the assumption that N [reporting], the number of observations in the reporting period, is greater than 50 and that errors from the model are normally distributed. In that case, 95% (i.e. 1 − α) of the area under the normal distribution3 (i.e. prediction interval) lies within 1.96 standard deviations (i.e. RMSE) of the mean (i.e. estimated savings) .
· CV (RMSE), the coefficient of variation in the root mean squared error, is calculated as the root mean square error to the mean of the dependent variable. Just like R2, the coefficient of determination, CV (RMSE) is unitless and takes values between zero and one. Lower values of the CV indicate smaller residuals relative to predicted values and therefore better model fit.
With this set of USPs smartB’s ECM-tool solves the problem to calculate energy savings, encountered by many energy consultants. Office software such as Microsoft Excel, the favorite tool of many German engineers, provides a similar functionality. However, the ECM-tool potentially digests much more data significantly faster and visualizes data automatically. Furthermore, using the ECM-tool is more reliable and reproducible, since calculating point estimates and prediction intervals for accumulated energy savings can be a cumbersome manual effort and therefore prone to errors. Figure 2 demonstrates a mock up of the ECM-tool proposed above.
5. The Scope of the ECM-Tool
We separate three phases in energy management, of which only the second phase is the scope of the ECM-tool:
· Definition of energy system and energy performance indicators including data gathering as well as energy efficiency measure planning and implementation.
· Quantification of energy savings using energy baselines and a statistical model as well as the interpretation and reporting of energy efficiency improvements.
Figure 2. ECM-tool Mockup.
· Maintaining energy performance indicators and continual improvements of energy efficiency as well as preparation of management reviews and decision models.
5.1. Preparation by the User
System boundaries and model definition: Phase 1 implies preparation by the user, before the ECM-tool yields a quantification of energy performance of a given system. The selection of time series to include in the data set is driven by theoretical considerations based on the type and boundary of the energy system which received an energy conservation measure. The approach to use regression models as EnPIs applies best to subsystems with clear boundaries between SEUs, such as lighting, an (electrical) heating register or a ventilation system. The regression model should include all relevant variables for all subsystems covered by the system boundaries. If the system boundaries include several SEUs, a relevant variable might not impact each SEU equally, which introduces noise and inaccuracy. Therefore, the fewer SEUs the system covers, the more accurate the regression model predicts energy savings. An energy system with suitable boundaries shows high correlation between energy consumption and relevant variables with high predictive power as measured in high R2 and narrow prediction intervals4.
Date of ECM implementation: The core feature of the ECM-tool is to quantify savings as a direct consequence of an energy conservation measure implemented at a certain date. However, the ECM-tool can as well be applied to simply compare year over year energy consumption by arbitrarily setting the time frames for baseline and reporting periods accordingly.
EnPI definition: Whenever the ECM-tool is used to quantify savings, the EnPI is defined as the set of coefficients from the regression model over the baseline period. In case of electricity, the SEU can be denoted in units work (e.g. kWh) or units of power (e.g. kW). Since the ECM-tool shows quantified savings in the same unit as the dependent variable, kWh is preferred, which allows a direct interpretation of the output without manual follow-up calculations.
Suitable baseline period: The user should provide sufficient time frames for the baseline period and the reporting period, to derive reliable parameters. In general, a model predicts reliably, if relevant variables exhibit the same range of values during the baseline period as well as during the reporting period. In cases of a production process or a lighting system, data over some weeks of observations in the baseline period often fulfill this criterion. However, an accurate model of the correlations between an SEU and outside temperature in a thermal process (e.g. heating) should include warm and cold seasons to fully capture the system’s behavior over one full year. For instance, if the baseline period covers temperatures between 10˚C - 25˚C, a prediction of expected energy consumption for temperatures below 0˚C in the reporting period hinges on the assumption that the correlation between temperature and consumption extrapolates linearly. Therefore, whenever the energy system provides thermal energy the influence of local weather conditions (in terms of average temperature) must be tested and the baseline period adjusted accordingly.
5.2. The Architecture of the ECM-Tool
In phase 1, the user prepares a .csv data-file, which meets the criteria defined above. In phase 2, the user runs a statistical analysis on this data to calculate savings and interacts with the ECM-tool on two main user interfaces: the landing page and the model page.
5.2.1. The Landing Page
The landing page shows the signup and an email verification feature using a web-token. The user opens an existing project or names and creates a new project by uploading one or more .csv-files containing time-series data. The tool shows an error, if it cannot read the .csv-file or if it is empty. With a click the user opens the project, which triggers the tool to check data for consistency and discard incomplete or non-numeric columns in order to create clean data to be used on the following screen for regression modelling.
5.2.2. The Model Page
The model page shows four elements: data selection, timeframes, data visualization and model output.
Data selection: The user is presented with two lists, each containing all time-series available after data consistency checks and data cleansing. The user clicks to select the dependent variable (i.e. energy consumption) and the independent variables (i.e. temperature, system output, etc.) and the method for data aggregation and upsampling.
Timeframes: The user enters a total of six dates for the beginning and end of the baseline period, ECM-period and reporting period.
Data visualization: The ECM-tool visualizes all selected time-series with a color-legend from the beginning of the baseline to the end of the reporting period. The time frame of ECM-implementation is shaded grey.
Model output: The model output shows all relevant parameters from the regression necessary to quantify energy savings and judge the reliability of the model. On the one hand, the regression function shows the intercept and the marginal effects of the independent variables. On the other hand, the coefficient of variation in the root mean squared error CV (RMSE) and the R2 indicate the predictive power of the model. Prediction intervals (i.e. 95% confidence intervals) around the point estimate of cumulated energy savings show the user directly how reliably the model quantifies savings.
5.3. Technical Implementation of the ECM-Tool
Data handling and parsing are implemented based on the programming language “Python 3.7”5 with the popular “Pandas 0.24”6 library for data manipulation and analysis. Pandas offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The intelligence of the ECM-tool relies on “scikit-learn 0.20”7, a free software machine learning library for the Python programming language.
Automated upsampling: If the user selects time-series with different sampling frequencies to include in the model, the tool aggregates data in order to create a balanced panel data set (i.e. upsampling to the highest interval between observations). The user selects either “sum” or “average” as the method of aggregation. This is relevant if for example energy consumption is measured in kWh/minute, but the temperature is available in average ˚C/h. In this case the tool sums up 60 values of energy consumption in kWh/min to generate kWh/h (i.e. method “sum”). If energy consumption is denoted in kW, then the model should calculate hourly averages (i.e. method “average”). The regression model would then use hourly values for all variables. The tool does not support downsampling, since it yields unintended statistical properties in regression modelling.
Ridge regression: The ECM-tool applies an estimator called the ridge regression, as implemented in the python library “scikit-learn”8. Compared to ordinary least squares, regularization with ridge regression has some desirable properties. Ridge regression slightly shrinks the coefficients, which on the one hand introduces bias in the estimation, but on the other hand reduces model complexity and multicollinearity. At the same time, using a method called the cross-validation (i.e. out-of-sample testing), ridge regression calculates the specific weight of the regularization-term to yield the optimal balance between the increase in bias of the coefficients and the decrease in overall variance of the model. A decrease in variance of the model, as measured in the RMSE, leads to narrower prediction intervals and therefore a more precise quantification of cumulated energy savings.
5.4. Integration of Results in the Context of the Organization
In phase 3 the user integrates insights from the ECM-tool in a continual process to improve energy efficiency organization-wide, as outlined in ISO 50001:2018. The ECM-tool supports the monitoring and verification of changes in energy performance of subsystems to judge the effectiveness of an ECM. The output easily translates into associated economic gain, by comparing investment costs and the monetary value of quantified savings under equivalent circumstances.
6. The Energy Performance Report
We present three anonymised cases to quantify achieved energy savings by using the ECM-tool. Each case mandates specific preparation of the data set and poses unique properties. All consumption data used unaltered for this paper was generated by real electricity meters in the field and the presentation in this paper in its current form is legitimised by the owner of the data. Each of the three cases presents particular features of the ECM-tool as well as a discussion of the quality and reliability of savings estimates.
6.1. Case A: Large Car Park
System boundaries: This case looks at a large car park in central Germany with daily consumption of electricity in kWh over the time frame 5th November 2016 until 31st July 2017. The system boundary includes two SEUs: the ventilation system and the lighting system, both of which run 24/7 continuously over the full observation period. Figure 3 below shows the system boundaries for Case A.
ECM: The system received a total of three energy conservation measures. After an initial 32 days of baseline measurements the light sources were replaced by
Figure 3. System boundaries for the large car park case.
LED on 7th December 2016, the second and third ECM were improvements of parameters in the system control panel of the ventilation system on 8th February 2017 and 29th March 2017 respectively. The observation period is 269 days of which 237 are the reporting period.
Relevant variables: Since the system does not include thermal processes and neither ventilation nor lighting had demand sensitive control settings, the statistical model does not include any time series as independent variables. In this case, the regression estimator calculates the minimum distance of the model prediction from a constant, which yields simple averages of the dependent variable, as the report and follow-up discussion will show.
The model: Y = a + b1D1 + b2D2 + b3D3.
In order to separate the savings effect of three ECMs, we use dummy variables to indicate the date of implementation. In effect, variable D1 equals zero, for every date before the first ECM (i.e. exchange to LED on 7th December 2016), while it equals one, for every day past the first ECM. D2 and D3 are constructed similarly around the dates of the second and third ECM (i.e. 8th February 2017 and 29th March 2017). Therefore, the coefficients b1-b3 show additional average savings induced by each of the measures.
6.1.2. ECM-Tool Application—The Performance Report
Elements from the ECM-tool can be seen in Table 1.
Table 1. Performance report for ECM-Tool.
6.1.3. Interpretation of the Model
As expected, the model based on dummies shows high variance in the prediction but sizable relative savings. We accept the model to reliably quantify energy savings.
Baseline period: Modelling the 24/7 car park is a very special case, since in order to quantify savings, the energy system is sufficiently specified without relevant variables or influencing factors. In regression modelling, the constant is interpreted as the expected value of the dependent variable if all independent variables are zero. Therefore, any least squares estimator yields the average of the dependent variable, if no further information is included. This is what we see in the intercept of the model: we expect an average of 698.396 kWh electricity consumption per day, before any ECM during the 32 days of baseline period between 5th November 2016 and 7th December 2016.
Estimated savings in the reporting period: The dummy variables extend the model to separate the marginal effects of each of the three ECMs. Since dummy variables are constants as well, the coefficients are interpreted as average savings after the ECMs. The first ECM, the exchange to LED, had a large savings effect of 322.982 kWh per day, which implies savings of around 46% compared to average consumption in the baseline period. The t-value of 24.755 indicates very high statistical significance. This estimation of marginal energy savings from LED exchange persist for 237 days until the end of the observation period on 31. July 2017. Similarly, the coefficients for two more ECMs show marginal savings induced by the respective ECM. Therefore, while all three ECMs have been implemented (i.e. 125 days before 31st July 2017) the energy systems consume 463.03 kWh (or 66%) per day less than in the baseline period. Over the full reporting period, the ECM-tool calculates sizable 98,236 kWh of energy savings.
Quality of the model and uncertainty: The rather low R2 = 0.67 implies that only two thirds of the variation in energy consumption can be explained by variation in the independent variables (i.e. three dummies). Furthermore, CV (RMSE) = 0.31 shows poor model fit in terms of the relative sizes of the squared residuals and outcome values. R2 and CV (RMSE) already indicate high uncertainty of the prediction from this regression model based exclusively on dummy variables. Applying the formula shown above, the 95% prediction interval for savings over 237 days spans +/− 48,195.81 kWh (i.e. almost half the estimated overall savings). Even with high statistical uncertainty, we can accept the model to quantify savings.
Comparison of ridge regression in the ECM-tool vs. OLS estimation: The following table shows the output of a standard ordinary least squares estimator (OLS) implemented in Excel. The coefficients are very similar and confirm expected deviations from the ridge regression estimation: OLS shows larger coefficients in absolute values, but also higher variance of the estimation as indicated by lower t-values of the coefficients. However, there is hardly any difference between ridge regression and OLS in terms of estimated savings. OLS calculates overall savings of 101,611 kWh versus 98,236 kWh in the ECM-tool, a deviation of 3.3% with the ECM-tool on the conservative end of the estimate. At the same time, since we see rather high CV (RMSE) the prediction interval is rather wide. The prediction intervals from OLS and ECM-tool cannot be compared, since OLS in Excel does not compute adjusted RMSE. Also, the relative savings effect is little different: OLS yields savings of 474,318 kWh on average per day after all three ECMs have been implemented in relation to 709,323 kWh per day in the baseline period. Therefore, OLS shows 66.8% savings compared to 66% with ridge regression in the ECM-tool. R2 and CV (RMSE) take the same values in both models.
Table 2 shows the most important statistics for the evaluation of the models.
6.2. Case B: Ventilation System in an Office Building
System boundaries: This case looks at the ventilation system of an office building in the north of Germany with daily consumption of electricity in kWh over the time frame 11th September 2017 until 24th July 2019. The system boundaries, as seen in Figure 4, include the ventilation system that applies electrical power for the operation of the fans, butterfly valves, valve drives and water pumps. All power consumers are high efficiency devices. Outside temperatures do not influence the power consumption of the system. The building is connected to a local heating and cooling network with generation at a central location in another building on site. The electricity uptake of the water pumps is negligible compared to other applications on the same meter and can therefore not be detected separately. The ventilation system runs various predefined operating modes depending on the day of the week.
ECM: The system received a total of two energy conservation measures. The first one, a decrement in the operational time of the system was implemented on 2nd November 2017 remaining as follows:
Table 2. Statistics for the models trained for Case A.
Figure 4. System boundaries for the ventilation system case.
Monday 03:35 am to 05.50 pm.
Tuesday-Friday 05:20 am to 05:50 pm.
Public holidays off.
The second ECM was a filter replacement, which had no significant effect on the energy consumption of the systems. On the one hand, this is due to the regular change intervals of the filters, on the other hand, the efficiency improvement might be too small to be detected with the available data and the model presented here. One whole year of baseline data was taken for the calculation of the energy model.
Relevant variables: Since this system does not have weather influences or other relevant variables but it depends strongly on the day of the week and holidays, it was necessary to include seasonal dummy variables to create an energy model. In order to describe the system the following variables were created: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Public Holidays and special cases. For the classification of a data set based on dummy variables (i.e. days of the week) a regression model needs n-1 dummies than categories. In effect, the constant term in the regression estimator serves as the reference case. In the model presented here, the reference day is Sunday.
The model: Y = a + b1D1 + b2D2 + b3D3 + b4D4 + b5D5 + b6D + b7D7.
The model includes dummy variables for weekdays (Monday through Friday), special cases and public holidays. By testing several possible models in stepwise regression manually, the output from the ECM-tool indicated that Saturday had no different average consumption than the reference day Sunday, based on marginal effects and their statistical significance (p-value = 0.44). Therefore only 7 out of 8 potential dummy variables were taken into consideration.
6.2.2. ECM-Tool Application—The Performance Report
Elements from the ECM-tool can be seen below in Table 3.
6.2.3. Interpretation of the Model
The presented model shows very good fit and little variance in the prediction. Quantified savings are reliable and precise.
Baseline period: Even without weather influences on the system, the baseline period of one year of data between 1st November 2017 and 1st November 2018 yields a robust model with little variance that covers all holidays and special cases (i.e. interruption of the system due to maintenance).
Table 3. Performance report for Case B.
Estimated savings in the reporting period: The reporting period has a total of 263 days and starts one day after the decrement in the operational time. It covers the interval of 3rd November 2018-24th July 2019, during this time the energy consumption decreased around 20%. The total achieved savings 5739.801 kWh +/− 369.784 kWh with a 95% level of confidence is a very good result. This proves the model closely resembles reality and is therefore able to predict precisely with narrow prediction intervals of +/− 6.4% around point estimates for savings.
Quality of the model and uncertainty: The high R2 = 0.97 implies that most of the variation in energy consumption can be explained by the variation in the independent variables (i.e. seven dummies). Furthermore, CV (RMSE) = 0.12 indicates a good model fit in terms of the relative sizes of the squared residuals and outcome values. Furthermore, very low p-values of the coefficients (<0.0000) imply that all the variables in the model are relevant since its p-value tends to be zero. This model clearly yields a reliable and precise quantification of the ECM’s effect on energy consumption.
6.3. Case C: Cooling System
System boundaries: Case C is based on a cooling system with suboptimal data availability. The SEU is measured in kWh electrical work including weather data as a relevant variable. The output of the cooling system in thermal energy is not available. Instead, we use electrical power consumption of the connected secondary system as a proxy, which takes the cooling energy as an input. Furthermore, there is little information about the context of the cooling system and the secondary system apart from the location in an industrial production setting. The reasons to include this example despite these serious drawbacks are twofold: On the one hand, this case proves that a statistical model with favorable properties can fail to convince the user of the reliability of potential savings without defining the correct physical properties. The interpretation of model output is only as good as the underlying assumptions for data generation. On the other hand, the ECM-tool provides efficient usability for multivariate time series regression analysis with stepwise regression, which this case demonstrates vividly. System boundaries are demonstrated in Figure 5.
ECM: During the observation period there was no ECM implemented.
Relevant variables: The cooling system provides thermal energy, which mandates the inclusion of outside temperature to build a sufficient model. In total there are five potentially relevant variables available in the data set: Outside temperature in ˚C, cooling degree days squared and cubed (CDD, CDD2, CDD3) as well as the energy consumption of the secondary system. The cutoff temperature for CDD (17˚C) was selected to provide the best correlation between CDD and energy consumption using a third party statistic tool.
The model: Y = a + b1 × (kWh of secondary system) + b2 × ˚C.
Figure 5. System boundaries for the cooling system case.
By analyzing output from several models, CDD were discarded from the model for the lack of additional explanatory power for energy consumption. The most intuitive gauge to guide this decision is R2. The sparsely parameterised model including only temperature and the secondary system shows R2 = 0.90. An alternative model using CDD (also squared and cubed) instead of outside temperature gave a maximum of R2 = 0.67. The improvement of the explanatory power of the model from including both temperature and CDD is negligible, with R2 increasing marginally in the fourth digit. However, the complexity of the interpretation of coefficients from regressions model including higher order terms (e.g. cubed CDD) rises exponentially. Therefore, we select the model with only two variables: outside temperature and energy consumption of the secondary system as a proxy for cooling demand.
6.3.2. ECM-Tool Application—The Performance Report
Elements from the ECM-tool can be seen below in Table 4.
6.3.3. Interpretation of the Model
Baseline period and energy savings: The observation period covers one whole year, which ensures full coverage of seasonal effects and potential variance of outside temperature. For the lack of an ECM there is no reporting period and no savings in this case.
Marginal Effects: The coefficient on X1 implies that energy consumption of the cooling system increases by 0.086 kWh for every increase of 1 kWh in the secondary system. Since X2 is denoted in ˚C × 1000 (e.g. average daily outside
Table 4. Performance Report for Case C.
temperature of 7.7˚C = 7700), the cooling system takes up additional 85,346 kWh per day for every one degree increase in average daily outside temperature.
Quality of the model: From a statistical point of view, the data and model fulfill all requirements to serve as a reliable baseline to quantify savings. A R2 = 0.90 implies that a high amount of the variation in energy consumption can be explained by the variation in two independent variables. Furthermore, CV (RMSE) = 0.14 indicates a good model fit in terms of the relative sizes of the squared residuals and outcome values. However, the data shows very sizable consumption (with daily average of > 2 million kWh in the cooling system) and rather little information about context and technical details. Without further information (e.g. a detailed schematic diagram), we do not recommend to rely on savings calculated based on this data. Furthermore, the user would need to check with the system operator, whether static factors remain stable over time. Changes in system setup would have to be identified, such as the area to be cooled or the characteristics of the secondary system. Foremost, however, including time series of the output of the cooling system in kWh thermal energy from a metering system would replace the (necessarily imperfect) proxy variable, kWh electricity uptake of the secondary system, and improve the statistical model for better resemblance with the physical properties of the underlying SEU.
The ECM-tool was developed with funding support from the Federal Ministry for Economic Affairs and Energy as part of the Pilotprogramm Einsparzähler. The tool solves a problem of practitioners in the field of energy efficiency to calculate savings from an energy conservation measure in a convenient, transparent and reproducible way. This paper describes the preparation and process to quantify energy savings based on three cases with real data from electricity meters. ISO 50006 provides the guidelines for the ECM-tool to focus on the usability of a multivariate regression model to compare consumption of an energy system before and after an ECM under equivalent circumstances.
The currently available minimal viable product of the ECM-tool covers modelling and quantification of savings. The assumption here is that the user presents a suitable time series data set for energy consumption and all relevant variables over a sufficient period of time. With little experience in regression modelling, the interpretation of the output from the tool allows the user to judge the statistical characteristics of the model. However, the tool does not cover any meta-information on the energy system under observation or any plausibility checks.
With this paper, we close the chapter of ECM-tool development for the time being for the lack of further funding. However, as this paper argues, there are few solutions on the market to solve the specific problem to monitor and verify energy savings in a complex world. Potentially, the ECM-tool could enter the market as a stand-alone software or as a feature in an energy management software package with wider use.
This paper and the research behind it would not have been possible without the exceptional support of the colleagues of smartB Energy Management GmbH, that made the development of the first ECM Tool version possible with their knowledge and expertise. Moreover, we would like to thank Dipl.-Ing. Carsten Ernst and Stefan Bauer for their important contributions during the project.
4For instance, consider system boundaries including lighting and a heating register, both on the same electricity meter. Since we expect a relevant influence of outside temperature on the electricity uptake of the heating register, the model should include average temperature as an explanatory variable in the model. However, we expect no predictive power of outside temperature on the electricity consumption of the lighting system. Therefore, separate models for each SEU (e.g. heating, lighting, etc.) deliver more accurate quantifications of energy savings, which is only possible if each subsystem is measured with separate electricity meters.
5Python Release Python 3.7.0. Retrieved from https://www.python.org/downloads/release/python-370/.
6pandas.DataFrame.isna. Retrieved from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isna.html.
7Scikit-Learn (n.d.). scikit-learn/scikit-learn. Retrieved from https://github.com/scikit-learn/scikit-learn/releases/tag/0.20.2.
8Chapter 1.1.2 at https://scikit-learn.org/stable/modules/linear_model.html.
11https://angular.io/E.16. Release 9.6. Retrieved from https://www.postgresql.org/docs/9.6/release-9-6.html.
12Amazon EC2 Instance Types—Amazon Web Services. Retrieved from https://aws.amazon.com/ec2/instance-types/.
13Retrieved from http://releases.ubuntu.com/18.04/.
 Franconi, E., Gee, M., Goldberg, M., Granderson, J., Guiterman, T., Li, M. and Smith, B.A. (2017) The Status and Promise of Advanced M&V: An Overview of “M&V 2.0” Methods, Tools, and Applications. Berkeley Lab., Berkeley.
 Gallagher, C.V., Leahy, K., O’Donovan, P., Bruton, K. and O’Sullivan, D.T. (2018) Development and Application of a Machine Learning Supported Methodology for Measurement and Verification (M&V) 2.0. Energy and Buildings, 167, 8-22.