Dataflow and Flow-Based Programming – Two Approaches To Visual Programming

In this blog I am going to cover topics around two visual programming paradigms, namely dataflow programming and flow-based programming. Literally speaking flow-based programming is one model of dataflow programming but often when people refer to flow-based programming they mean asynchronous dataflow programming specifically and when people refer to dataflow programming they mean synchronous dataflow programming. NoFlo would be an example of asynchronous flow-based programming model whereas LabVIEW represents the synchronous dataflow approach. As most of the readers are at most familiar with only one of the two paradigms, let me try to explain similarities and  differences.

In both programming models, the applications are represented by graphs represented by nodes and directed connections. The nodes represent different functional operations in the graph and the connections define how to pass data between the operations. Variables become unnecessary as data can be passed around purely by defining connections in the graph.

The synchronous dataflow programming approach resembles little more the conventional text-based programming approach. In the same way as conventional text-based program is executed once from top to bottom in sequential order, the synchronous dataflow diagram is executed once starting with a data from all the input terminals to the diagram. Each node or subdiagram in the diagram is executed once when the data of all of its input terminals becomes available. Once the node in the diagram completes execution, the data becomes available on all of its output terminals and as such available to all nodes directly downstream from the executing node. This data then “flows” to the input terminals of the downstream nodes allowing them to start executing and the process is followed the same way until the whole diagram is completed. Special case is subdiagrams such as for loop that repeats its content multiple times for example to iterate over an array input.

Below is an example of a synchronous dataflow implementation of recursively listing all files in a folder hierarchy using a List Files and Folders function starting from a source folder and then logging the file names using a Log String function. The List Files and Folders function returns a list of files on the top output terminal and a list of folder on the bottom output terminal. These terminals become available with populated lists of files and folders in the Source folder immediately once the List Files and Folders node finishes executing. To log the files in the Source folder the Files list is being iterated using a loop. Each file path is then converted to a string and logged one at a time. Concurrently all the subfolders under the Source folder are recursively passed to the function itself, again one at a time using another loop, to recursively log all the files in subfolders of the Source folder.

Logging all the files in a folder using dataflow programming approach

Logging all the files in a folder using dataflow programming approach

The asynchronous flow-based programming approach resembles more event-based and message passing programming models compared to synchronous dataflow model. The nodes in the diagram are connected with asynchronous messaging channels (e.g. queues) and all of the nodes are continuously waiting for messages to arrive to its inputs. Whereas in synchronous dataflow programming the nodes executed only once, in flow-based programming model the nodes are constantly waiting for new asynchronous messages to arrive from other nodes.

Below is the same example of a asynchronous flow-based programming implementation of recursively listing all files in a folder hierarchy again using a List Files and Folders function starting from a source folder and then logging the file names using a Log String function. Now, as List Files and Folders is continuously waiting for inputs via its input terminal, it can asynchronously pass the subfolders directly to itself. As such the List Files and Folders node recurses trough the folder structure by itself streaming the paths of all the files to the Path To String node that further streams the paths converted to a string to a log function.

Logging all the files in a folder using flow-based programming approach

Logging all the files in a folder using flow-based programming approach

There are variations of both models and I am not going into detail to compare the variations. The intent of this post is to paint a general picture of the different approaches with a simple but non-trivial example. Furthermore both models have their benefits and shortcomings and can very well co-exists in different parts of the same application, if supported by the programming language. What do you think are the benefits and shortcomings of each of the two models?

Related Posts

7 Comments

Marcos

about 5 months ago

When I first learned LabVIEW, I remember thinking for loops and case statements were counter-intuitive. I expected something a lot more like flow-based programming. So, for data flow programming we have a very mature, robust, feature complete IDE and compiler called LabVIEW. Is there something equivalent for flow-based programming?

JackDunaway

about 5 months ago

Tomi, I'm happy to see your return on a newly-branded ExpressionFlow! Continuing to compare synchronous/asynchronous flow, it's helpful to consider which is better suited for the layer of abstraction and business domain in which you're designing. _tl;dr Synchronous dataflow is good for modeling and implementing parallel, procedural business logic; asynchronous dataflow is a good model for designing concurrent systems and services; applications likely need both these things._ Considering synchronous dataflow, procedural business logic may be performed in parallel. Going back up to your top diagram, "List Files and Folders" represents a synchronous method; blocks on the diagram represent blocking functions/methods. Considering asynchronous dataflow, this more closely represents actor-oriented or service-oriented design. Each block, such as "List Files and Folders", now represents a concurrent service with a request/reply API, behind which sits some sort of incoming message queue which dispatches handlers to perform appropriate business logic. For practical implementations, I find it helpful to apply different semantics to each type of dataflow. Typically, sync flow is better suited within the same execution system providing higher levels of determinism, often by-value with a healthy dose of the functional paradigm, and thinking of execution as "parallel". Async flow is better suited for actor-oriented or service-oriented design of systems, with distributed state, typically referring to these actors/services/active objects by-reference as being "concurrent". Asynchronous and synchronous dataflow rarely compete; their semantics simply tend to map based on the layer of abstraction. Look at the application from 10k feet, you'll see async dataflow; look at one of those blocks from 1k feet, you'll start to see synchronous dataflow (or, not dataflow at all, depending on the language). Any language can provide asynchronous dataflow, so long as it provides facilities to build message transports and message handlers running concurrently (whether within the same context, or distributed). Which language or type of language is best suited for asynchronous dataflow? Simply, the one with the best/most active community developing those abilities! (Those abilities come either as frameworks, or first-class language features.) What languages support the best synchronous dataflow? That question is trickier; personally, I've found success with visual representations like LabVIEW, where procedural syntax is presented as a diagram that looks like your model above. (LabVIEW was a tough sell and bumpy transition coming from more traditional, popular languages, but the syntax that so cleanly represents the semantics of the underlying computational model eventually won me over.) Final thought: I see asynchronous flow as having the superset of capabilities as a synchronous flow diagrams -- asynchronous flow may be constrained to behave identically as synchronous flow. Consider an HTTP request (a form of asynchronous dataflow) encapsulated within a synchronous method in an API -- the application calling that method sees nothing but a blocking method, with the async flow abstracted from the calling application. Whether in the application codebase, a framework used by the application, or the language itself, even the tiniest unit of work (like "List Files and Folders") may be considered an Actor (Service, etc.), where the complexity of such a system is reduced as necessary by wrapping into a synchronous API. Ideally, a language syntax expresses parallel/concurrent execution for both types of dataflow in a way that's natural for humans to synthesize, both for building procedural business logic and building systems. Ideally, the syntactical burden is minimized by a language or framework, enabling both types of dataflow implementations.

Martin Clausen

about 5 months ago

I think the asynchronous flow-based programming approach is much more intuitive. For me the whole metaphor of flow breaks down when introducing a "dead-end" function that represents a recursive call to a "up-stream" function.

Tomi Maila

about 5 months ago

Martin, perhaps my example on the recursive call wasn't the best. Typically recursive calls in a synchronous dataflow wouldn't be dead ends calls but would return some data exactly in the same way as recursive calls to functions in a other programming languages.

Yair

about 5 months ago

I see you're trying to torture the LV programmers by swapping the pink and cyan colors on the string and path types ;) Anyway, while I understand that the arrows in the second example represent async communication channels, I'm not entirely clear about the loop there - do the diagram and the list function in this example automatically know how to handle the two types (path and array of paths) or is this simply a detail you glossed over in order to simplify the diagram?

Tomi Maila

about 5 months ago

Yair, you raise a good point. In the second diagram, all wires are asynchronous streams. The List Files & Folders function would assume the input to be a asynchronous stream of information packages, each package being either a single folder or a list of folders. The function would be polymorphic in the sense that it could handle the two types of information packages i.e. either a list (arriving trough the loop wire) or a single folder (the initial information package).

metropolitan life insurance company new york address

about 2 days ago

Nice blog Һere! Allso ʏour website loads uup νery fast! Whatt host aare ʏoս usiոg? Can I ցet your affiliate link tߋ yoսr host? І wisɦ mmy website loadded սp aas quicklʏ as ƴоurs lol

Leave a Comment



Heads up! You are attempting to upload an invalid image. If saved, this image will not display with your comment.