Unlimited parallelism & concurrency with recursive dataflow
In my previous post I introduced the design pattern of worker pool based on dataflow recursion. In this post the pattern is revisited. I explain the theoretical concept of unlimited parallelism and concurrency that is a consequence of recursion in dataflow based programming languages. Furthermore I present how worker threads can be reused to manage system resource consumption.
Before detailed study of the dataflow recursion based unlimited parallelism and concurrency, let’s first specify what we mean with these two concepts. Parallelism is all about performance whereas concurrency is about task independence. A parallel program is executed simultaneously by multiple processor cores for performance reasons. A concurrent program consists of multiple independent tasks that need to be executed all at the same time without one task blocking the execution of the others. Concurrent tasks need not to be executed in parallel by multiple processor cores but can be executed sequentially by some scheduling algorithm. Dataflow based language may not really distinguish between the two concepts. In LabVIEW based programs concurrent tasks are executed in parallel by the scheduling system if parallel threads are available.
In a dataflow program a task can be executed immediately when inputs of the task become valid. If the inputs of multiple tasks are ready at the same time, the tasks can be executed concurrently or in parallel. The execution system makes the decision when the task is actually executed. In eager-evaluations based scheduling the tasks are en-queued to the execution system immediately when the inputs become valid. In lazy-evaluation based scheduling the tasks are evaluated if they have a side effect or when another task that has a side effect depends on their result. A side effect is an operation that interacts with the world outside the program itself such as display, network, disk or measurement device. In the image below the data is flowing from left to right. The horizontal lines represent tasks and vertical lines represent data dependency between the tasks. During the execution of the diagram below, the number of concurrent tasks peaks at three.
If a single task is a function (subVI in LabVIEW) containing inner concurrent structure, then the inner concurrency of the function needs to be take into account when counting the concurrent tasks. In the diagram below there is a single function call within the outer diagram. The function call consists of three concurrent tasks and the total number of concurrent tasks in the diagram is five.
The concurrent evaluation becomes more interesting when we add a recursive call to the diagram itself. Then the number of concurrent tasks can increase to infinity. Each recursive call to the diagram itself adds extra concurrent tasks to the dataflow diagram. The theoretically unlimited concurrency of recursive function call can be controlled by stopping the recursion using control structures at some level.
As I explained in previous post, we can utilize this unlimited recursive concurrency to programmatically generate desired number of concurrent threads for task execution. The most obvious way to utilize recursive concurrency is to have a function to execute some specific task and concurrently call itself for executing other set of tasks. Combining this with the command design pattern or passing functions as value in functional programming, we can set-up a system for executing any number runtime specified tasks concurrently and parallel. There is no need to have language support for spawning new concurrent threads, the dataflow programming languages can support thread spawning within the dataflow paradigm as explained.
The pseudo-code below in imaginary parallel dataflow language illustrates parallel execution of a list of tasks. The code below is pure dataflow. However one can mix message passing concurrency to pass messages between the parallelly executing tasks or to create a parallel thread that spawns functions passed to it as messages. Pseudo-code added on Nov 12 13:41 UCT.
function parallelExecutor(Nil) {} // stop recursion if list is empty
function parallelExecutor(tasks : List[Function]) {
// let's make immutability explicit, not really necessary
val task = tasks.head; // immutable value task
// execute the first task in the tasks list
task.apply(); // execute task
// tasks.tail have no dataflow dependency on immutable value task
// therefore all functions that take tasks.tail as a parameter can
// be executed in parallel with task.apply()
parallelExecutor(tasks.tail) // execute rest of the tasks in parallel to task
}
I added an example implementation to my previous LabVIEW example project that supports recursive instance reusage for task execution after the previous task has been finished. The implementation is not based on pure dataflow but also utilizes asynchronous queues for passing information between the concurrently executing workers.
Worker Pool.zip 1.1
Please login or register now.
1 Comment
Make A CommentComments RSS Feed TrackBack URL
Leave a comment
You must be logged in to post a comment.

(4 votes, average: 4.50 out of 5)




November 12th, 2009 at 1:42 am
[...] Full Story [...]