Blog Post

#119 “Real Time” Applications and Analytics

John Petze Sat 24 Mar 2012

We often encounter questions about the use of SkySpark with “real time” data. Through these conversations we have found that there is a significant variation in what people mean when they talk about real time data. We think that a few definitions can help clarify the topic and enable a clear understanding of how SkySpark can be used with “real time” data (it can by the way!).

Some Background

I had the opportunity early in my career to meet the man widely recognized as the inventor of microprocessor-based control. Dick Morley created Morley’s Digital Controller in the late 1970’s, which became a very successful company – Modicon (now a division of Schneider). Dick was also involved in the founding of Andover Controls, (where I started my career in automation), which applied digital controller technology to the emerging field of energy management and direct digital control in buildings. I clearly remember a meeting where he asked a group of us young, wide-eyed engineers if we knew the definition of “real time”. No one spoke up to venture a definition to the founder of the industry. We didn’t have to. He quickly supplied the answer. Real time means fast enough for the application .

So if we are controlling a piece of industrial manufacturing equipment that might mean we need a control loop response time of 10 milliseconds to meet the real time needs of the process. Many processes require even faster real time requirements – perhaps on the order of microseconds.

For a VAV box controlling temperature of a room, the control response time of the temperature control loop might be 30 seconds, a minute or longer. On that same VAV box, though, the control loop responsible for managing the airflow volume to maintain an airflow setpoint might have a response time requirement of 1-2 seconds.

At the other end of the spectrum, I have been in meetings with people involved with utility applications and when they say real time they often mean 15-minute data updates and control response.

So lets recap this first key point – real time response means fast enough for the intended application or process.

Resolution of Data – The Sampling Rate. Another aspect of “real time” we need to consider is the resolution of the sensor data. By this we mean how frequently a system senses (and records) data values. For example, many BAS controllers can store data values on the second. In this case the data-sampling rate would be once per second.

The sampling rate of sensor data has a major impact on the volume of data created. For example a history record of zone temperatures once per minute will generate 1,440 values per day. That same sensor recorded on a sampling rate of 1 second will generate 86,400 sensor values. Sixty times as much data!

In most systems it’s common that the control response capability and sampling rate are equal. For example, a controller capable of providing 1 second control response will often allow sensor values to be recorded once per second. One major limiting factor, however, is that most controllers will typically run out of storage space if you choose to record sensor values every second. So in reality they are often set to record values once per minute or longer.

Freshness of Data. The next concept to consider when discussing real time applications is the “freshness” of data or frequency of the updates. A common example of this would be the update frequency of sensor data on an operator screen like an equipment graphic. The capability of a system to deliver updates to the user is affected by a variety of factors including:

Bandwidth and speed of the communication network
Computing resources of the controller that can be dedicated to communicating data to the UI application
Efficiency of the software that presents that data to the user
Computing resources of the computer that is displaying the data

Different systems vary considerably in this area and it’s not uncommon for screen updates of 10 seconds to be considered “real time” updates. Going back to our definition of real time, the “application” here can be described as the ability (or need) of the human operator to read and respond to changing values he sees on the screen. Based on this definition a ten 10 second update frequency may be considered adequate.

The data source -- the controller -- may be operating its local control loops with a response time of 1 second or 100 milliseconds, but the fastest that new data will appear on the screen might be once every 10 seconds or longer. In some building automation systems it’s not uncommon to see screen update times as long as a minute! That might be perfectly acceptable for the “application” of the human user (although most people find that a bit frustrating).

Application of These Principles with Analytics

When planning the application of an analytics solution all of these factors need to be considered. For example, knowing the sampling rate of the data will enable us to understand the richness that analytic rules will be able to work with – think about the different level of insight provided by 15 minute interval meter data versus monthly energy data. The sampling rate will also tell us about the volume of data we will need to store and manage. (Remember that the sampling rate doesn’t affect the rec count in SkySpark though).

In many applications there will be different data sources, each with it’s own sampling rate. SkySpark is specifically designed to do analytics across data with different sampling rates.

Next there is the question of how frequently data can be pulled from the source. Here again different data sources may have different update frequencies and SkySpark is designed to handle those situations as well.

SkySpark Capabilities for Real Time Data, Data Freshness and Analytic Rule Response. SkySpark is designed to handle the type of data produced by real time sensing and control systems – high sampling rate data. The Folio database readily stores data with time stamps down to the second. (For special applications it can be set up to support nanosecond resolution. Drop us a note if you’re working on an application that needs time stamp resolution faster than a second.) So on this count SkySpark can easily handle the data produced by your real time systems.

Data Freshness is related to the application, and not limited by SkySpark. The frequency with which you pull new data into the Folio database varies based on the needs of your application and limitations of your communication infrastructure. For example, you might have an application where you can only upload data from a site once per night at midnight (we’ve seen these restrictions placed on systems by IT departments). At that time, however, you might be pulling in data that has 1 second resolution. So in this case, the data freshness would be once every 24 hours, but the data sampling rate would be once per second. The sampling rate and the freshness are “decoupled”.

As for the frequency of rule processing, SkySpark enables this setting to be tuned to maximize performance and meet the analysis needs of the particular application. Rule processing is decoupled from the sampling rate and freshness of the data. For details see Spark Ext Documentation and review the Cache Engine section.

Next lets consider the user of the analytics application. Using the same example, we might have a situation where building operators want to see their analytic results (sparks) every morning. They could subscribe to a daily digest notification, which will provide a daily summary email. When they view their sparks they will be seeing all of the issues detected in the last 24 hours of data. Their “response time” would be on the order of a day (about the same time frame as the data freshness).

Too slow you say? It depends on the application and needs of the user – there is no one size fits all answer. SkySpark doesn’t dictate these requirements. For example, we have seen some examples where the data sampling rate is once per minute, the data freshness is once per 15 minutes, and yet the update provided to the user (the building operator) is once per week or once per month. That’s right, building operators are informed of sparks once per week. Why might an operator find this level of “freshness” in sparks desirable?

It could be because the types of issues they are looking for can’t really be addressed more quickly. Perhaps they need to be planned into their facility maintenance schedule, which is set on a weekly basis. Or it could be because they have so many issues requiring attention that they simply can’t do anything with issues presented on a more frequent basis.

It could also be because the issues they are tracking take time to form – the patterns appear over a period of time. Some patterns form in a minute, some take hours, some take days and some might take a season or a year to form. This is a key point where people often confuse analytics with alarms. See previous blog post on Understanding the Difference between Alarms and Analytics. For example, the pattern that represents a defective temperature sensor might require 12 or 24 hours of operation to detect. It can’t be detected at any specific second in time -- it is detected by interpreting a pattern in data that appears after some period of time has elapsed.

All of this leads us to appreciate that there is yet another “decoupled” response loop involved in the use of analytics – that of the corrective action response time. The limiting factor here is typically the human systems that will respond to analytic results. An issue detected by SkySpark might not be able to be addressed until the next planned service call to the facility.

Data In Motion. Most of the analytics applications we have seen to date apply rules to historical data to find patterns and issues, and can be thought of as a form of data mining of "data at rest". There are other applications that need to deal with "data in motion" or streaming data. This is yet another aspect in the discussion of “real time”.

To date, applications of SkySpark have leveraged the time-series database as the store for historical data that will be subject to analytic rules. But SkySpark's Folio database is designed to handle real-time streaming data as well.

Typically this type of real-time data is independent from sampled historical data and is streamed straight from the controller to the server as fast as the communication infrastructure can handle (typically on order of updates every 200ms to 10sec).

The Folio database infrastructure has been designed to support this use case and be used as real-time sensor data platform. A key capability in this area is a feature we call "transient commits". Transient commits allow real-time streams of sensor data to be applied into the tag database without the overhead of disk persistence. This design offers an elegant model where a single entity such as a sensor is modeled by one record in Folio containing both transient real-time data, such its current value and status, along with its persistent tags such as its name, unit, and what it measures. This capability opens up opportunities to use SkySpark in applications where analysis of streaming data is required.

Hopefully this discussion has helped provide clarification on the topic of “real time” as it relates to data sampling rates, data upload frequency, and SkySpark’s ability to support real time applications.