In this blog I am going to cover topics around two visual programming paradigms, namely dataflow programming and flow-based programming. Literally speaking flow-based programming is one model of dataflow programming but often when people refer to flow-based programming they mean asynchronous dataflow programming specifically and when people refer to dataflow programming they mean synchronous dataflow programming. NoFlo would be an example of asynchronous flow-based programming model whereas LabVIEW represents the synchronous dataflow approach. As most of the readers are at most familiar with only one of the two paradigms, let me try to explain similarities and differences.
In both programming models, the applications are represented by graphs represented by nodes and directed connections. The nodes represent different functional operations in the graph and the connections define how to pass data between the operations. Variables become unnecessary as data can be passed around purely by defining connections in the graph.
The synchronous dataflow programming approach resembles little more the conventional text-based programming approach. In the same way as conventional text-based program is executed once from top to bottom in sequential order, the synchronous dataflow diagram is executed once starting with a data from all the input terminals to the diagram. Each node or subdiagram in the diagram is executed once when the data of all of its input terminals becomes available. Once the node in the diagram completes execution, the data becomes available on all of its output terminals and as such available to all nodes directly downstream from the executing node. This data then “flows” to the input terminals of the downstream nodes allowing them to start executing and the process is followed the same way until the whole diagram is completed. Special case is subdiagrams such as for loop that repeats its content multiple times for example to iterate over an array input.
Below is an example of a synchronous dataflow implementation of recursively listing all files in a folder hierarchy using a List Files and Folders function starting from a source folder and then logging the file names using a Log String function. The List Files and Folders function returns a list of files on the top output terminal and a list of folder on the bottom output terminal. These terminals become available with populated lists of files and folders in the Source folder immediately once the List Files and Folders node finishes executing. To log the files in the Source folder the Files list is being iterated using a loop. Each file path is then converted to a string and logged one at a time. Concurrently all the subfolders under the Source folder are recursively passed to the function itself, again one at a time using another loop, to recursively log all the files in subfolders of the Source folder.
The asynchronous flow-based programming approach resembles more event-based and message passing programming models compared to synchronous dataflow model. The nodes in the diagram are connected with asynchronous messaging channels (e.g. queues) and all of the nodes are continuously waiting for messages to arrive to its inputs. Whereas in synchronous dataflow programming the nodes executed only once, in flow-based programming model the nodes are constantly waiting for new asynchronous messages to arrive from other nodes.
Below is the same example of a asynchronous flow-based programming implementation of recursively listing all files in a folder hierarchy again using a List Files and Folders function starting from a source folder and then logging the file names using a Log String function. Now, as List Files and Folders is continuously waiting for inputs via its input terminal, it can asynchronously pass the subfolders directly to itself. As such the List Files and Folders node recurses trough the folder structure by itself streaming the paths of all the files to the Path To String node that further streams the paths converted to a string to a log function.
There are variations of both models and I am not going into detail to compare the variations. The intent of this post is to paint a general picture of the different approaches with a simple but non-trivial example. Furthermore both models have their benefits and shortcomings and can very well co-exists in different parts of the same application, if supported by the programming language. What do you think are the benefits and shortcomings of each of the two models?