Continuous Query is a query executed by TDengine periodically with a sliding window, it is a simplified stream computing driven by timers, not by events. Continuous query can be applied to a table or a STable, and the result set can be passed to the application directly via call back function, or written into a new table in TDengine. The query is always executed on a specified time window (window size is specified by parameter interval), and this window slides forward while time flows (the sliding period is specified by parameter sliding).
Continuous query is defined by TAOS SQL, there is nothing special. One of the best applications is downsampling. Once it is defined, at the end of each cycle, the system will execute the query, pass the result to the application or write it to a database.
If historical data pints are inserted into the stream, the query won't be re-executed, and the result set won't be updated. If the result set is passed to the application, the application needs to keep the status of continuous query, the server won't maintain it. If application re-starts, it needs to decide the time where the stream computing shall be started.
How to use continuous query
Pass result set to application
Application shall use API taos_stream (details in connector section) to start the stream computing. Inside the API, the SQL syntax is:
SELECT aggregation FROM [table_name | stable_name] INTERVAL(window_size) SLIDING(period)
where the new keyword INTERVAL specifies the window size, and SLIDING specifies the sliding period. If parameter sliding is not specified, the sliding period will be the same as window size. The minimum window size is 10ms. The sliding period shall not be larger than the window size. If you set a value larger than the window size, the system will adjust it to window size automatically.
SELECT COUNT(*) FROM FOO_TABLE INTERVAL(1M) SLIDING(30S)
The above SQL statement will count the number of records for the past 1-minute window every 30 seconds.
Save the result into a database
If you want to save the result set of stream computing into a new table, the SQL shall be:
CREATE TABLE table_name AS SELECT aggregation from [table_name | stable_name] INTERVAL(window_size) SLIDING(period)
Also, you can set the time range to execute the continuous query. If no range is specified, the continuous query will be executed forever. For example, the following continuous query will be executed from now and will stop in one hour.
CREATE TABLE QUERY_RES AS SELECT COUNT(*) FROM FOO_TABLE WHERE TS > NOW AND TS <= NOW + 1H INTERVAL(1M) SLIDING(30S)
Manage the Continuous Query
Inside TDengine shell, you can use the command "show streams" to list the ongoing continuous queries, the command "kill stream" to kill a specific continuous query.
If you drop a table generated by the continuous query, the query will be removed too.
Time series data is a sequence of data points over time. Inside a table, the data points are stored in order of timestamp. Also, there is a data retention policy, the data points will be removed once their lifetime is passed. From another view, a table in DTengine is just a standard message queue.
To reduce the development complexity and improve data consistency, TDengine provides the pub/sub functionality. To publish a message, you simply insert a record into a table. Compared with popular messaging tool Kafka, you subscribe to a table or a SQL query statement, instead of a topic. Once new data points arrive, TDengine will notify the application. The process is just like Kafka.
TDengine allocates a fixed-size buffer in memory, the newly arrived data will be written into the buffer first. Every device or table gets one or more memory blocks. For typical IoT scenarios, the hot data shall always be newly arrived data, they are more important for timely analysis. Based on this observation, TDengine manages the cache blocks in First-In-First-Out strategy. If no enough space in the buffer, the oldest data will be saved into hard disk first, then be overwritten by newly arrived data. TDengine also guarantees every device can keep at least one block of data in the buffer.
By this design, the application can retrieve the latest data from each device super-fast, since they are all available in memory. You can use last or last_row function to return the last data record. If the super table is used, it can be used to return the last data records of all or a subset of devices. For example, to retrieve the latest temperature from thermometers in located Beijing, execute the following SQL
select last(*) from thermometers where location=’beijing’
Through this design, caching tools like Redis are no longer needed in the system, helping reduce the complexity of the system.
TDengine creates one or more virtual nodes(vnode) in each data node. Each vnode contains data for multiple tables and has its own buffer. The buffer of a vnode is fully separated from the buffer of another vnode, not shared. But the tables in a vnode share the same buffer.
System configuration parameter cacheBlockSize configures the cache block size in bytes, and another parameter cacheNumOfBlocks configures the number of cache blocks. The total memory for the buffer of a vnode is
cacheBlockSize * cacheNumOfBlocks. Another system parameter
numOfBlocksPerMeter configures the maximum number of cache blocks a table can use. When you create a database, you can specify these parameters.