Cayuga FAQ
Relating to Other Stream Systems
- Q: There are other stream processing systems referred to as DSMSes, or Stream Databases, including TelegraphCQ from UC Berkeley, STREAM from Stanford University, and Aurora/Borealis from MIT/Brown/Brandeis. How does Cayuga compare to these systems?
- A: Compared to DSMSes, Cayuga falls into the category
of Complex Event Processing (CEP).
In terms of application workload, Cayuga is designed for pattern matching queries. A typical such query is “notify me if event A is followed by event B within T seconds”. Although many (but not all) pattern matching patterns are expressible in DSMSes, the processing efficiency is typically much worse, mainly due to the lack of multi-query Optimization (MQO) strategies in the DSMS engines. For example, Cayuga on a single PC can scale up to hundreds of thousands of concurrently running pattern matching queries, whereas no DSMS implementation currently can scale up to more than a handleful of concurrently running queries. In short, the main distinguishing feature of Cayuga is its scalability.
However there is no free lunch -- there exist queries expressible in DSMSs like STREAM that are not expressible in Cayuga. The idea is that Cayuga occupies a spot between simple pub/sub and the full power of SQL-like queries in DSMSs, and achieves greater expressiveness than pub/sub while retaining most of its advantages in terms of scalability. - Q: The paper Towards Expressive Publish/Subscribe Systems documents Cayuga as "a solution for extended pub/sub applications". Do you see Cayuga being applied more to publish/subscribe settings?
- A: In our opinion pub/sub are CEP are not disjoint streaming domains. Traditional pub/sub refers to stateless query workloads, which are indeed quite different from CEP workloads. However, Cayuga supports stateful pub/sub in a scalable fashion, and the same technology it uses can address CEP workloads very well.
- Q: Cayuga seems to be closer in spirit to XML filtering. Do you see Cayuga being deployed as the next generation of publish/subscribe filter to allow for stateful filtering instead of the current stateless filtering readily found in pub/sub systems?
- A: Regarding XML stateless pub/sub filtering, the architecture of Cayuga engine is close to a standard XML pub/sub engine, in that both are based on finite state automata. In comparison DSMSes have different engine architectures. However, Cayuga is not designed for XML pub/sub workloads, and there are various existing solutions to XML pub/sub.
- Q: Cayuga uses a processing model based upon nondeterministic finite state automata. Many systems do not use this approach. Are there any performance benefits for using this model over, for example, the box-and-arrows model used in the Aurora system? Is there any reason why two equivalent systems, one using an NFA model and one using a different model, will behave differently after deployment?
- A: The NFA processing model is amenable to MQO techniques, and is the key to the high scalability of Cayuga. When running a single query, the performance difference between the NFA model and another model, say box-and-arrows model, should be negligible. However, when running a million queries, the performance difference will be significant, since unlike the NFA model, it is not well understood how to perform MQO effectively on a box-and-arrow processing model.
Installation
- Q: When I execute the build in Windows, I cannot find xerces dlls.
- A: the DLL files are in the working directory of Cayuga, which is the top-level cayuga-system directory. You could invoke Cayuga executable from that same directory, or set up the working directory option in Visual Studio to that directory, as was described in the user manual.
Deployment
- Q: In field deployments of Cayuga, it might be necessary to change or add new data input schemas and output schemas on the fly. Can this be accomplished with Cayuga? If so, how can tests be performed using the system on available data sets in relational, CSV, plain text/flat file, and TimeML formats?
- A: Currently Cayuga can read stream data from disk files or TCP sockets. In either case there is one single stream data format that Cayuga understands, documented in the user manual. This implies that the user will have to write his own input adapters to read streams of different formats. In the case of reading disk files, the Input adapter/converter can be executed offline to convert the stream data to the disk file format that Cayuga understands. In the case of reading TCP sockets, the input adapter needs to be executed online by reading the input events of a custom format, converting them to the format that Cayuga understands, and sending the converted events to Cayuga via TCP.
- Q: Is there a standard interface for connecting outside applications to Cayuga in the manner that software systems use ODBC to connect to a database and issue commands and queries?
- A: The main interface to Cayuga consists the offline
config file, where options can be set to control the behavior of Cayuga, and
the online TCP messages to send to Cayuga, such that new queries can be
added to Cayuga at run-time. Please see the detailed information in the
manual.
There is no ODBC like interface for Cayuga, but we think the TCP message based interface should play the same role here for stream processing. One limitation of the current Cayuga implementation is that the only type of command that Cayuga can process at run-time is adding a new query. We do not support replacing/deleting queries yet, since this will involve considerable work, and is somewhat akin to the problem of unloading a class from a Java VM. - Q: How easy is it to incorporate Cayuga into both existing and future software systems? Can other systems utilize Cayuga as a event processing engine efficiently without the Cayuga system being obtrusive or apparent to the end user?
- A: Cayuga runs in its own process. The end user/application interacts with Cayuga by submitting queries (written in CEL or AIR) to it and receiving query result streams from it via TCP communication. This way Cayuga can serve as an event engine in a larger software environment without being obtrusive.
Trace Visualizer
- Q: Can the trace visualizer that Cayuga uses be applied to a real-time data feed, so that someone who has deployed the Cayuga system can observe the flow of data and determine how a particular piece of data is moving through the system? This might be especially interesting for a debugging standpoint, if the expected data flow is known, the processing can be observed to determine the correctness of the system.
- A: This indeed the motivation scenario for our trace visualizer design. It can be used at Cayuga run-time as well as offline. Please check out our demo video for how we use the visualizer while Cayuga is processing stream queries. Basically, this is done by having Cayuga engine continuously writing out checkpoints for its system state, and trace message log. The visualizer can then read this information asynchronously. Note that dumping such information at run-time will affect the efficiency of Cayuga engine by a factor of 5 or so.