"Tomi Maila"

Hybrid Dataflow – Convergence of Flow-Based Programming with Synchronous Dataflow

In my previous post I discussed the differences between two visual programming paradigms; synchronous dataflow programming and asynchronous flow-based programming. Although the two programming paradigms are approaching the visual programming from little different perspectives, both approaches converge at some subset of the visual programming domain. Let’s look at a very simple example of multiplication of two constant numbers.

Flow diagram for multiplying two constants

Flow diagram for multiplying two constants

The above diagram would multiply two constant numbers 7 and 5 and return 35 when evaluated. This diagram would be exactly the same in synchronous dataflow and in asynchronous flow-based programming; in both approaches the diagram needs to be evaluated only once. Flow-based programming and synchronous dataflow can be seen as converging into a single approach when only constant values are being used. Of course one cannot solve any real computation problems with only constant values.

What if one of the inputs to the multiplication primitive was an asynchronous stream of numeric values instead. For each numeric value in the stream, the primitive would multiply the stream value with a provided constant value and stream the output values out as they are evaluated. This is the approach that the flow-based programming paradigm takes, with dialects that support replacing streams with constant values. This approach is illustrated in the image below.

Flow diagram for multiplying a constant with a stream

Flow diagram for multiplying a constant with a stream

I have used thick connector lines to illustrate streams and thin connector lines to illustrate one-time values. In the above example, one of the inputs (x) is treated as a stream the other input (y) is treated as a value.

Indeed there is no reason why a hybrid of the two approaches to dataflow programming couldn’t coexist in the same flow-based programming language. We are already using a hybrid approach in the above diagram by using both stream and constant value inputs to the same node. In a general convergent hybrid approach, each node with a stream input would continue executing as long as the stream is valid (i.e. the stream is not closed by the runtime). The node would process each element in the stream one at a time, in the conventional flow-based programming manner. Non-stream one-time inputs would be treated in a synchronous dataflow approach. One-time value input to a flow-based programming node would be treated as an infinite stream that would always return the same (not necessarily constant) value.

Consider the following example. Synchronously multiply two input values and use the result of the multiplication as an input for an asynchronous operation as one of the multipliers. This example is illustrated in the image below. The operation multiplies the result of the left most operation together with each value in the stream x’. The result is evaluated when ever values arrive at the stream x’ just like in our past examples.

Hybrid approach combining flow-based programming with synchronous dataflow programming.

Hybrid dataflow approach combining flow-based programming paradigm with synchronous dataflow programming.

Combining the synchronous dataflow and asynchronous flow-based programming as first class citizens in the same visual programming language has some clear advantages. Some problems are naturally easier to both comprehend and solve in a synchronous approach and some flow-based programming design patterns would become simpler if a synchronous dataflow was a first-class citizen of the language. This is especially true for low-level operations where synchronous dataflow is often the way programmers naturally approach problems. Asynchronous approach naturally works very well on orchestrating the system and is very natural in designing system architectures and defining dependencies.

Dataflow and Flow-Based Programming – Two Approaches To Visual Programming

In this blog I am going to cover topics around two visual programming paradigms, namely dataflow programming and flow-based programming. Literally speaking flow-based programming is one model of dataflow programming but often when people refer to flow-based programming they mean asynchronous dataflow programming specifically and when people refer to dataflow programming they mean synchronous dataflow programming. NoFlo would be an example of asynchronous flow-based programming model whereas LabVIEW represents the synchronous dataflow approach. As most of the readers are at most familiar with only one of the two paradigms, let me try to explain similarities and  differences.

In both programming models, the applications are represented by graphs represented by nodes and directed connections. The nodes represent different functional operations in the graph and the connections define how to pass data between the operations. Variables become unnecessary as data can be passed around purely by defining connections in the graph.

The synchronous dataflow programming approach resembles little more the conventional text-based programming approach. In the same way as conventional text-based program is executed once from top to bottom in sequential order, the synchronous dataflow diagram is executed once starting with a data from all the input terminals to the diagram. Each node or subdiagram in the diagram is executed once when the data of all of its input terminals becomes available. Once the node in the diagram completes execution, the data becomes available on all of its output terminals and as such available to all nodes directly downstream from the executing node. This data then “flows” to the input terminals of the downstream nodes allowing them to start executing and the process is followed the same way until the whole diagram is completed. Special case is subdiagrams such as for loop that repeats its content multiple times for example to iterate over an array input.

Below is an example of a synchronous dataflow implementation of recursively listing all files in a folder hierarchy using a List Files and Folders function starting from a source folder and then logging the file names using a Log String function. The List Files and Folders function returns a list of files on the top output terminal and a list of folder on the bottom output terminal. These terminals become available with populated lists of files and folders in the Source folder immediately once the List Files and Folders node finishes executing. To log the files in the Source folder the Files list is being iterated using a loop. Each file path is then converted to a string and logged one at a time. Concurrently all the subfolders under the Source folder are recursively passed to the function itself, again one at a time using another loop, to recursively log all the files in subfolders of the Source folder.

Logging all the files in a folder using dataflow programming approach

Logging all the files in a folder using dataflow programming approach

The asynchronous flow-based programming approach resembles more event-based and message passing programming models compared to synchronous dataflow model. The nodes in the diagram are connected with asynchronous messaging channels (e.g. queues) and all of the nodes are continuously waiting for messages to arrive to its inputs. Whereas in synchronous dataflow programming the nodes executed only once, in flow-based programming model the nodes are constantly waiting for new asynchronous messages to arrive from other nodes.

Below is the same example of a asynchronous flow-based programming implementation of recursively listing all files in a folder hierarchy again using a List Files and Folders function starting from a source folder and then logging the file names using a Log String function. Now, as List Files and Folders is continuously waiting for inputs via its input terminal, it can asynchronously pass the subfolders directly to itself. As such the List Files and Folders node recurses trough the folder structure by itself streaming the paths of all the files to the Path To String node that further streams the paths converted to a string to a log function.

Logging all the files in a folder using flow-based programming approach

Logging all the files in a folder using flow-based programming approach

There are variations of both models and I am not going into detail to compare the variations. The intent of this post is to paint a general picture of the different approaches with a simple but non-trivial example. Furthermore both models have their benefits and shortcomings and can very well co-exists in different parts of the same application, if supported by the programming language. What do you think are the benefits and shortcomings of each of the two models?

Visual programming blog is back


Several years ago I started a blog to cover various topics on LabVIEW and visual programming. My passion was to contribute to the visual programming community and bring visual programming to the main stream. After  an active start of few years, I moved from Helsinki to San Francisco Bay Area and somehow ended up not having enough time for the blog. It may have been a few years of silence but now ExpressionFlow is back to cover exciting topics around visual programming and other software development related issues. The world of visual programming is changing rapidly and especially flow-based programming is gaining more traction due to NoFlo’s Kickstarter success. I am bringing ExpressionFlow back not only to follow the most exciting times in the evolution of visual programming but also to share new fresh ideas and concepts on how to improve the visual programming experience. Welcome back and thanks for following!