" dataflow "

Synchronous Functional Dataflow Programming & Closures of Named Functions

When I originally selected the name for this blog back many years ago, I had the topic of this post in mind. For me, dataflow programming has always resembled functional programming and many concepts natural in functional programming translate very well into dataflow world. The key concept in functional programming is the idea that any expression can be passed around as a value within the application. Translating this idea into dataflow programming would mean that one could pass around subdiagrams in your dataflow diagram.

Passing around function references is nothing new to dataflow programming. For example LabVIEW has supported references to named functions for a long time. However, passing around references to named functions isn’t very interesting as the expressivity of named functions doesn’t adapt to the context.

To take advantage of the full power of functional programming the concept of closures needs to be introduced to the dataflow programming. A closure is a function definition or reference together with an environment that defines some of the input values to the function. When passing around functions in a functional programming, one actually passes around closures i.e. functions with their partially predetermined environment.

Let’s look how this concept would adapt to synchronous dataflow programming languages such as LabVIEW.  In the image below we illustrate the construction of a closure of a named function “add”.

Constructing a closure in functional synchronous dataflow programming, passing it around as a value and calling it using an apply method

Constructing a closure in functional synchronous dataflow programming, passing it around as a value and calling it using an apply method

We introduce a new closure structure (on the left) that constructs a closure of a function inside the structure. The environment of the closure if determined by connecting wires from outside the diagram to one or more of the input terminals of the function within the closure structure. In the example above we have constructed a closure of an “add” function by fixing it’s first input to constant value of 3. Naturally the value didn’t have to be constant, it could as well be any value flowing in the wire. The output of the closure structure returns an expression representing a function definition of f: f(x)=x+3.

The closure can be called by passing the closure to an apply structure (on the right). The apply structure is a function that is expecting a function definition as an input. It allows the programmer to bind the unbound terminals of the function. Execution of the apply structure leads into evaluation of the function determined by the function input terminal (green terminal) by binding the unbound terminals to defined values at the time of execution of the apply structure and then evaluating the function.

In the above example, a closure representing a partial function is first constructed by binding number three to one of the terminals of the add function and then passing this closure forward to an apply structure. The apply structure then binds the other terminal to value seven and evaluates the function. The result of the evaluation is 3+7 = 10.

Using closure in the above is not very useful and the example is provided for illustrative purposes. We will in the future posts discuss more the design patterns of using functional synchronous dataflow programming. We will also introduce closures of anonymous functions defined in-place on the diagram. The principles introduced here can be extended into flow-based programming paradigm and I will discuss this in a future post.

Embedding synchronous dataflow into a flow-based programming diagram

In my previous post I discussed about combining flow-based programming and syncrhonous dataflow both in a single visual programming language. The goal of such language would be to combine the best of  both of the flow-based approaches allowing the developer to choose the paradigm that best suits to the problem in hand. Today I want to take a deeper look into this hybrid dataflow model and introduce a specific synchronous structure that, from outside the structure appears and acts like a normal asynchronous flow-based programming node but from within encloses a synchronous dataflow diagram.

Consider an image processing task of smoothing an image by replacing each pixel value with an average of rectangular 3×3 neighborhood of the original pixel. We want to implement such an algorithm in our hybrid flow-based programming application. The input and output of such diagram needs to be a stream of images. Perhaps we have a flow-based programming diagram where we read images from a folder, pass them to the smoothing algorithm that then passes them to a node saving them back to the disk.

A Flow-Based Programming application that iterates over all of the images in a folder and smooths them using a Smooth Image subdiagram.

A Flow-Based Programming application that iterates over all of the images in a folder and smooths them using a Smooth Image subdiagram.

Now let’s take a look at the implementation of the Smooth Image subdiagram in the above flow-based programming example. The Smooth Image subdiagram is illustrated in the image below that takes advantage of the proposed new  synchronous structure. The synchronous structure in the subdiagram processes a stream of images, each of which is represented by a single information package (or an object in a OOP flavored language). The images arrive in an asynchronous manner from the previous Read Image step. In the examples, the asynchronous wires are represented with dotted lines and the synchronous wire with solid lines.

Embedding synchronous dataflow diagram into a flow-based programming diagram using a specific synchronous structure

Embedding synchronous dataflow diagram into a flow-based programming diagram using a specific synchronous structure

The synchronous structure starts executing upon startup of the subdiagram and remains executing until the input image stream is closed. The diagram waits for any incoming images (information packages) in the image stream. Each image in the example is expected to be a two dimensional array containing numeric values representing gray values for each pixel.

Every time an image arrives at the left connector of the synchronous structure, the content of the synchronous structure is executed exactly once in a synchronous dataflow manner. At the start of each execution of the content the synchronous structure, the left hand side image stream input gets the value of the received grayscale image. The image is then iterated with two for loop structures, outer for loop structure iterating each row in the image and inner for loop iterating each column in the image. Within the inner for loop, a 3×3 subarea around each pixel is averaged and the averaged value for each pixel is returned. The right hand edges of the for loops reconstruct the averaged pixels into the form of the original image and the smoothed image is passed to the right edge of the synchronous structure. The synchronous structure returns the image via its left hand side connector to the asynchronous surrounding diagram when its content has completed executing.

This hybrid dataflow approach using the synchronous structure allows easily enhancing the flow-based programming with synchronous dataflow subdiagrams. This greatly enhances the expressive power of the visual dataflow and flow-based programming languages by making them well equipped for both the complicated low-level operations requiring synchronization and high level orchestration well expressed with flow-based programming diagrams.

Hybrid Dataflow – Convergence of Flow-Based Programming with Synchronous Dataflow

In my previous post I discussed the differences between two visual programming paradigms; synchronous dataflow programming and asynchronous flow-based programming. Although the two programming paradigms are approaching the visual programming from little different perspectives, both approaches converge at some subset of the visual programming domain. Let’s look at a very simple example of multiplication of two constant numbers.

Flow diagram for multiplying two constants

Flow diagram for multiplying two constants

The above diagram would multiply two constant numbers 7 and 5 and return 35 when evaluated. This diagram would be exactly the same in synchronous dataflow and in asynchronous flow-based programming; in both approaches the diagram needs to be evaluated only once. Flow-based programming and synchronous dataflow can be seen as converging into a single approach when only constant values are being used. Of course one cannot solve any real computation problems with only constant values.

What if one of the inputs to the multiplication primitive was an asynchronous stream of numeric values instead. For each numeric value in the stream, the primitive would multiply the stream value with a provided constant value and stream the output values out as they are evaluated. This is the approach that the flow-based programming paradigm takes, with dialects that support replacing streams with constant values. This approach is illustrated in the image below.

Flow diagram for multiplying a constant with a stream

Flow diagram for multiplying a constant with a stream

I have used thick connector lines to illustrate streams and thin connector lines to illustrate one-time values. In the above example, one of the inputs (x) is treated as a stream the other input (y) is treated as a value.

Indeed there is no reason why a hybrid of the two approaches to dataflow programming couldn’t coexist in the same flow-based programming language. We are already using a hybrid approach in the above diagram by using both stream and constant value inputs to the same node. In a general convergent hybrid approach, each node with a stream input would continue executing as long as the stream is valid (i.e. the stream is not closed by the runtime). The node would process each element in the stream one at a time, in the conventional flow-based programming manner. Non-stream one-time inputs would be treated in a synchronous dataflow approach. One-time value input to a flow-based programming node would be treated as an infinite stream that would always return the same (not necessarily constant) value.

Consider the following example. Synchronously multiply two input values and use the result of the multiplication as an input for an asynchronous operation as one of the multipliers. This example is illustrated in the image below. The operation multiplies the result of the left most operation together with each value in the stream x’. The result is evaluated when ever values arrive at the stream x’ just like in our past examples.

Hybrid approach combining flow-based programming with synchronous dataflow programming.

Hybrid dataflow approach combining flow-based programming paradigm with synchronous dataflow programming.

Combining the synchronous dataflow and asynchronous flow-based programming as first class citizens in the same visual programming language has some clear advantages. Some problems are naturally easier to both comprehend and solve in a synchronous approach and some flow-based programming design patterns would become simpler if a synchronous dataflow was a first-class citizen of the language. This is especially true for low-level operations where synchronous dataflow is often the way programmers naturally approach problems. Asynchronous approach naturally works very well on orchestrating the system and is very natural in designing system architectures and defining dependencies.

Dataflow and Flow-Based Programming – Two Approaches To Visual Programming

In this blog I am going to cover topics around two visual programming paradigms, namely dataflow programming and flow-based programming. Literally speaking flow-based programming is one model of dataflow programming but often when people refer to flow-based programming they mean asynchronous dataflow programming specifically and when people refer to dataflow programming they mean synchronous dataflow programming. NoFlo would be an example of asynchronous flow-based programming model whereas LabVIEW represents the synchronous dataflow approach. As most of the readers are at most familiar with only one of the two paradigms, let me try to explain similarities and  differences.

In both programming models, the applications are represented by graphs represented by nodes and directed connections. The nodes represent different functional operations in the graph and the connections define how to pass data between the operations. Variables become unnecessary as data can be passed around purely by defining connections in the graph.

The synchronous dataflow programming approach resembles little more the conventional text-based programming approach. In the same way as conventional text-based program is executed once from top to bottom in sequential order, the synchronous dataflow diagram is executed once starting with a data from all the input terminals to the diagram. Each node or subdiagram in the diagram is executed once when the data of all of its input terminals becomes available. Once the node in the diagram completes execution, the data becomes available on all of its output terminals and as such available to all nodes directly downstream from the executing node. This data then “flows” to the input terminals of the downstream nodes allowing them to start executing and the process is followed the same way until the whole diagram is completed. Special case is subdiagrams such as for loop that repeats its content multiple times for example to iterate over an array input.

Below is an example of a synchronous dataflow implementation of recursively listing all files in a folder hierarchy using a List Files and Folders function starting from a source folder and then logging the file names using a Log String function. The List Files and Folders function returns a list of files on the top output terminal and a list of folder on the bottom output terminal. These terminals become available with populated lists of files and folders in the Source folder immediately once the List Files and Folders node finishes executing. To log the files in the Source folder the Files list is being iterated using a loop. Each file path is then converted to a string and logged one at a time. Concurrently all the subfolders under the Source folder are recursively passed to the function itself, again one at a time using another loop, to recursively log all the files in subfolders of the Source folder.

Logging all the files in a folder using dataflow programming approach

Logging all the files in a folder using dataflow programming approach

The asynchronous flow-based programming approach resembles more event-based and message passing programming models compared to synchronous dataflow model. The nodes in the diagram are connected with asynchronous messaging channels (e.g. queues) and all of the nodes are continuously waiting for messages to arrive to its inputs. Whereas in synchronous dataflow programming the nodes executed only once, in flow-based programming model the nodes are constantly waiting for new asynchronous messages to arrive from other nodes.

Below is the same example of a asynchronous flow-based programming implementation of recursively listing all files in a folder hierarchy again using a List Files and Folders function starting from a source folder and then logging the file names using a Log String function. Now, as List Files and Folders is continuously waiting for inputs via its input terminal, it can asynchronously pass the subfolders directly to itself. As such the List Files and Folders node recurses trough the folder structure by itself streaming the paths of all the files to the Path To String node that further streams the paths converted to a string to a log function.

Logging all the files in a folder using flow-based programming approach

Logging all the files in a folder using flow-based programming approach

There are variations of both models and I am not going into detail to compare the variations. The intent of this post is to paint a general picture of the different approaches with a simple but non-trivial example. Furthermore both models have their benefits and shortcomings and can very well co-exists in different parts of the same application, if supported by the programming language. What do you think are the benefits and shortcomings of each of the two models?