Sep
15

Limits of for-loop parallelism, how parallel they really execute?

by Tomi Maila, Sep 15, 2009 at 7:21 pm
1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 4.00 out of 5)
Loading ... Loading ...

I was today attending LabVIEW Developer Education Day in Helsinki (Espoo), Finland. The NI application engineer was introducing the new parallel for-loop structure released in LabVIEW 2009. The idea is that a for-loop iterations can be run in parallel when the loop iterations do not depend on one another. The concept is nice and I have been waiting for NI to introduce such a concept for some years. I was positively surprised when I noticed that LabVIEW 2009 is shipping with this new feature.

The introduced parallel for-loop was not exactly what I was hoping for. I was hoping that the LabVIEW compiler would automatically parallelize a parallelizable loops, after all it is theoretically a task that the compiler or the runtime environment could do. However it may be that implementing such compilation technique with current LabVIEW runtime scheduler could have been too difficult. In the current implementation introduced in LabVIEW 2009, the programmer needs to configure the for-loop parallelism by defining the number of instances to work parallel on loop iterations. The number of instances can also be defined at runtime, or at least that is how I understood it.

So I decided to give parallel for-loops a try and test their limits. Do they really function the same way as if you would place the same code in parallel on the block diagram or are there some shortcomings you should be aware of. The first test I decided to write wire is to make the loop iterations to depend on one another by using shared queues. I made two parallel for loops. The first loop inserts elements to first set of queues and then waits for the elements to appear on the second set of queues. The second loop gets the elements from the first set of queues and then inserts the same elements to the second set of queues. I set the both loops to run in parallel and set the number of workers to equal to the number of iterations. If the loops iterations all really run in parallel, the same way as parallel code on block diagram runs, the application would not hang on a dead-lock. However, if there is no complete parallelism, the application would hang.

Parallel for-loops

So what is the result of our test. Well, this small test application hangs if you set the number of iterations high. The hanging threshold seems to depend on the development time set number of parallel loop instances. The runtime defined number of workers does not alone define the parallelism. The result indicates that there is a difference between copying code to multiple parallel instances on block diagram and relying on for-loop parallelism. If the code in your loop depend on shared data such as queues, data-value instances or notifiers, be aware of the dead-lock possibility. Also when you are using somebody elses code in your parallel for loop, think carefully if the dead-lock possibility exists.

Number-of-threads.vi

EDIT Sep 17, 2009 Mary Fletcher from NI R&D explained the implementation of the parallel for-loops in more detail in the comments of this post. The number of loop instances specified in the loop configuration dialog is the maximum number of workers that could work on parallel for executing the loop iterations. The actual number of workers is specified at runtime to be the value of P terminal, if it is smaller than the maximum number specified in the configuration dialog. If P is greater than the maximum number, then the maximum number of workers is used instead. If P is not connected, LabVIEW uses as many parallel workers as there are logical processors in the machine, however never exceeding the maximum number of workers specified at runtime. If there is a parallel loop within another parallel loop, only the outer parallel loop will be parallelized. This will change in LabVIEW 2009 SP 1 where LabVIEW will parallelize both loops resulting in P*P’ workers for the inner loop. This limitation of parallel loop within another parallel loop does not apply to subVI calls within parallel loop subVIs having parallel code themselves. If the number of workers specified at configuration time and at runtime both are equal or grater than the number of iterations, all the loop iterations will then execute truly in parallel and you can safely use design patterns such as producer-consumer pattern between loop iterations. Thank you Mary for this valuable information.

Print This Post Print This Post

10 Comments

Make A Comment
  • MaryFletcher Said:

    Interesting example on the potential dangers of using these types of objects in parallel. In this piece of code, a deadlock can occur when [N] is greater than the number of generated instances in the configuration dialog box (10 in your diagram). This causes some of the worker loops to execute more than one iteration sequentially. When [N] is 20, a worker from the top loop will operate on queues at indices 9 and 10 in that order, while a worker from the bottom loop will operate on queues 10 and 9 in that order. The same deadlock happens if you disable parallelism on both loops.

    The number you enter in the configuration dialog is the maximum number of loop instances, and the number you wire to [P] is the number of those that you want to use. The dialog box number is a cap.

    LabVIEW warns you about using queues, local variables, etc. in parallel for loops, assuming you have warnings enabled, but it doesn’t stop you. You can do some nifty things with these types of objects (see the example Parallel For Loop Iteration Order.vi), so we don’t forbid it.

  • MaryFletcher Said:

    By the way, if you want to easily find for loops that can be made parallel, you can use the “parallel for loop detector” in Tools>>Profile>>Find Parallelizable Loops. LabVIEW doesn’t automatically parallelize all of your for loops for you, since there can be a slight performance penalty on small loops.

  • Tomi Maila Said:

    Mary, I assume from your inside sounding information that you are working for NI R&D. Thanks for clarifying how the parallel for loops function behind the scenes. I tried to search the information from LabVIEW help, but couldn’t, so I ended up testing the functionality.

    If I understood correctly, LabVIEW generates the development time specified number of loop instances. The loop iterations are then divided to these loop instances by the runtime scheduler. The scheduler uses either all the parallel loop instances or the number of instances corresponding to the value of P terminal, which ever is smaller. If P is not connected, all loop instances are used. If number of loop instances is larger than the number of iterations, all iterations are executed truly in parallel.

    What happens if there are parallel items within a single iteration. Are these also executed truly in parallel, even if the number of iterations is equal to the number of loop instances.

    I am also a little confused of the terminology. On the dialog window you specify something called loop instaces but with P terminal you specify something called workers. Is there a documentation that clarifies the differences between workers and loop instances.

  • MaryFletcher Said:

    Sorry for the terminology confusion. Loop “instances” and “workers” are the same thing.

    What you restated is correct, except that when you don’t wire anything to [P], we actually try to use as many workers as there are logical processors (cores) on your machine.

    Are you asking what happens when the loop contains code that could also run in parallel? Parallelism in code the loop body isn’t restrained by the number of workers you are using for the loop. Your computer may get overwhelmed by too much parallelism though.

  • Tomi Maila Said:

    Yes I am asking what happens when the loop contains code that would also need to run in parallel? Say if I would have N workers working on a loop with N iterations and each iteration would consists of two enequeue-dequeue pairs instead of one as in my present example code, would we have a dead-lock? Well, I tested it and no dead-lock occurs which confirms true parallelism.

    What if I have a parallel loop inside a parallel loop inside a parallel loop? Outer loop has N1 workers and inner loop has N2 workers. How many parallel instances are actually solving the problem of the inner loop? N1*N2?

  • MaryFletcher Said:

    In 2009 SP1, if you put a parallel loop with P workers inside a parallel loop with P’ workers, you would get P*P’ parallel instances solving the problem. (Without the service pack, only the outer loop will execute in parallel.)

  • Tomi Maila Said:

    I assume the P*P’ worker multiplication rule of LV 2009 SP 1 applies also to subVI calls from within a parallel loop if the subVI contains another parallel loop. And in case of recursive VI call, we could get a whole army of workers :) Indeed, this gives me a nice idea. With a recursive VI call, we can at runtime specify the number of workers. We simply call the VI itself recursively from within the loop until the required number of workers have been reached. Maybe Mary you should add this as a test case for LabVIEW 2009 SP 1: “Generating arbitrary number of workers with recursive SubVI call”

  • MaryFletcher Said:

    Yep, the multiplication rule applies to code in SubVIs called from parallel for loops, even without the service pack. Thanks for the recursive parallelism test case idea.

  • AristosQueue Said:

    Tomi, in answer to your question, yes, Mary is part of LV R&D, and she is a primary developer of the parallel for loop, and one of our resident experts in parallel architectures.

  • Parallel For loop | ByteLABS Said:

    [...] articolo su expression flow, al proposito http://expressionflow.com/2009/09/15/limits-of-for-loop-parallelism-how-parallel-they-really-execute…. This entry was posted by admin on January 9, 2010 at 3:11 pm, and is filled under LabVIEW. [...]

Comments RSS Feed   TrackBack URL

Leave a comment

You must be logged in to post a comment.

Download Full Movie Online Bridget Jones's Diary download movie Perfect Parents download movie The Invisible download movie Once download movie Cypher download movie Night Skies download movie Spaceballs download movie Gotti download movie Little Murders download movie Nuns of saint archangel, the download movie Krull download movie Family business download movie Prague duet download movie Wake in fright download movie Robin hood download movie Are you being served download movie lg flare ringtones ringtones wallpaper free ringtones for you cell free ringtones 4 metro pcs Bridget Jones's Diary download movie Perfect Parents download movie The Invisible download movie Once download movie Cypher download movie Night Skies download movie Spaceballs download movie Gotti download movie Little Murders download movie Prometheus Triumphant: A Fugue in the Key of Flesh download movie The Grudge 3 download movie The Thirteenth Floor download movie Erik the Viking download movie V.I. Warshawski download movie Fire from Below download movie White Christmas download movie The Show download movie Get Rich or Die Tryin' download movie Coach Carter download movie Los bastardos download movie Dirty Pictures download movie Patriot Games download movie Very Young Girls download movie 'A' gai waak download movie Second in Command download movie Seeing Other People download movie The Hockey Champ download movie Ocean's Eleven download movie Gutterballs download movie The Five People You Meet in Heaven download movie Smiley Face download movie Invasion of the Body Snatchers download movie Finish Line download movie Double Agent download movie Rescue Dog download movie Anchorman: The Legend of Ron Burgundy download movie