Using the Tasking Feature

Five Tips, Tricks and Idioms in Using the OpenMP 3.0 Tasking Feature

by Yuan Lin, Sun Microsystems OpenMP team

1. Use single to start parallel region with a root task

To start a parallel region that has one initial root task, use the following idiom,

#pragma omp parallel
{
   #pragma omp single nowait
   {
      /* this is the initial root task */

       #pragma omp task
       {
          /* this is first child task */
       }

       #pragma omp task
       {
          /* this is second child task */
       }
   }
}

2. Avoiding creating extra tasks by noticing a child task is executed concurrently with its parent

Given the following piece of code and assume that foo() is called within an OpenMP task. Now you want make A() and B() execute concurrently by creating more tasks, how would you do that?

void foo ()
{
    A();
    B();
}

You could do the following,

void foo ()
{
    #pragma omp task
    A();

    #pragma omp task
    B();
}

But the following is often more efficient,

void foo ()
{
     #pragma omp task
     A();

     B();
}

There's no need to put B() in it's own task because it already is in the task that includes foo().

3. Create your own taskgroup construct

According to the OpenMP 3.0 specifications, "the taskwait construct specifies a wait on the completion of child tasks generated since the beginning of the current task". What to do if you want a point that waits on the completion of a subset of the child tasks generated instead of all child tasks generated so far?

For example, suppose we have the following piece of code.

/* Compute f2 (A, f1 (B, C))
 */
int foo ()
{
    int a, b, c, x, y;

    a = A();
    b = B();
    c = C();
    x = f1(b, c);
    y = f2(a, x);

    return y;
}

Let's assume that A(), B(), C(), f1(), f2() can all be executed concurrently. However, because of the data flow, we cannot start executing f1() until B() and C() finish, and cannot start executing f2() until A() and f1() finish, as illustrated below.

     A -----------------+
     B ------+          |--> f2
             |---> f1 --+
     C ------*

If we create three tasks (one for A(), B() and C() each), then we want to have a point that waits for B() and C(), but not for A(). So we can start f1() after this point. In otherwords, we want a taskgroup construct that allows us to do the following

/* Compute f2 (A, f1 (B, C))
 */
void foo ()
{
    int a, b, c, x, y;

    #pragma omp task shared(a)
    a = A();

    #pragma omp _taskgroup_
    {
        #pragma omp task shared(b)
        b = B();

        #pragma omp task shared(c)
        c = C();
    }
    x = f1 (b, c);

    #pragma omp taskwait

    y = f2 (a, x);
}

However, OpenMP 3.0 tasking does not provide such a taskgroup construct. All is not lost, however. We can simulate the taskgroup construct by using a task construct with an if (0) clause and a taskwait construct as illustrated below

#pragma omp _taskgroup_               #pragma omp task if (0)
{                                   {
    ...                  ====>         ...
                                       #pragma omp taskwait
}                                   }

We can now write our code like the following,

/* Compute f2 (A, f1 (B, C))
 */
void foo ()
{
    int a, b, c, x, y;

    #pragma omp task shared(a)
    a = A();

    #pragma omp task if (0) shared (b, c, x)
    {
        #pragma omp task shared(b)
        b = B();

        #pragma omp task shared(c)
        c = C();

        #pragma omp taskwait
    }
    x = f1 (b, c);

    #pragma omp taskwait

    y = f2 (a, x);
}

4. Do not use tasks where you can use worksharing for loops

In most programs, each of iteration of an OpenMP worksharing for loop can be executed concurrently and still gives you the correct result. Therefore, if you convert a worksharing for loop into a loop with tasks (see the following example), you will probably still get the correct result.

/* An OpenMP worksharing for loop */
#pragma omp for
for (i=0; i<n; i++) {
    foo(i);
}

/* The above loop converted to use tasks */
#pragma omp single nowait
{
   for (i=0; i<n; i++) {
       #pragma omp task firstprivate(i)
       foo(i);
   }
}

But don't do that just because you can. A worksharing for loop (especially one that uses static scheduling) usually has lower overhead per iteration than a task loop.

Use a task construct for parallel while loops, such as pointer chasing loops, that can not be expressed efficiently using worksharing for loops. Do not replace worksharing for loops with task loops.

5. How to do reductions in tasks?

Say you have a set of items, some of which are 'good' and the others are 'bad'. You want to find out the number of 'good' items. The set is implemented using a linked list. The sequential loop is

int count_good (item_t *item)
{
  int n = 0;
  while (item) {
       if (is_good(item))
          n ++;
       item = item->next;
  }
  return n;
}

You can use a task loop to parallelize the while loop. How about the increment of n?

The atomic construct is probably the most efficient way of doing reductions in tasks.

int count_good (item_t *item)
{
   int n = 0;
   #pragma omp parallel
   {
      #pragma omp single nowait
      {
          while (item) {
               #pragma omp task firstprivate(item)
               {
                   if (is_good(item)) {
                      #pragma omp atomic
                      n ++;
                   }
               }
               item = item->next;
          }
      }
   }
   return n;
}

Or you can use a thread specific variable to hold the partial sum per-thread and then do a cross-thread reduction to get the total sum.

int count_good (item_t *item)
{
   int n = 0;
   int pn[P]; /* P is the number of threads used. */
   #pragma omp parallel
   {
      pn[omp_get_thread_num()] = 0;
      #pragma omp single nowait
      {
          while (item) {
               #pragma omp task firstprivate(item)
               {
                   if (is_good(item)) {
                      pn[omp_get_thread_num()] ++;
                   }
               }
               item = item->next;
          }
      }
      #pragma omp atomic
      n += pn[omp_get_thread_num()];
   }
   return n;
}


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

© 2010, Oracle Corporation and/or its affiliates
Powered by Atlassian Confluence
Oracle Social Media Participation Policy Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact