Monday, August 23, 2010

Testing Intel’s Parallel Studio

Part 1 – OpenMP

Intel claims to bring simplified “end-to-end parallelism to Microsoft Visual Studio C/C++ developers with Intel® Parallel Studio” [1].

I used a simple OpenMP parallel HelloWorld program to study this new tool which comes as an add-on to Visual Studio 2008.The program listing is enclosed in Figure 1.

   1:  #include <omp.h>
   2:  #include <stdio.h>
   3:  // a function to consume cpu time:
   4:  void consume() {
   5:      int i;
   6:      long n=100000000;
   7:      double s=0;
   8:      for (i=1;i<n;i++)
   9:          s+=(double)1/(double)i;
  10:  }
  11:   int main (int argc, char *argv[]) {
  12:     int th_id, nthreads=5;
  13:     omp_set_num_threads(nthreads);
  14:     #pragma omp parallel private(th_id)
  15:     {
  16:       th_id = omp_get_thread_num();
  17:       consume();
  18:       printf("Hello World from thread %d\n", th_id);
  19:       #pragma omp barrier
  20:       if ( th_id == 0 ) {
  21:         nthreads = omp_get_num_threads();
  22:         printf("There are %d threads\n",nthreads);
  23:       }
  24:     }
  25:     return 0;
  26:   }

Figure 1: A “HelloWorld” in C with OpenMP directives

After compiling the code I tested it with the Parallel Amplifier (Profiler).

My machine has two physical cores and since it uses hyper-threading it looks as if it has 4 cores. I executed the program with 4 threads. In Figure 2 a screen dump of the profiler is shown.


Figure 2: Visual Studio 2008 with Parallel Studio screen dump

Let’s zoom at lower right plot:


Figure 3: A summary plot of the CPU resources utilization

In Figure 3 we can see that there were 4 threads on 4 cores and the performance was  optimal.

I would however recommend everyone to repeat every execution several times since there is a large variability between runs and taking average  results might be a good idea.

Focusing now on the center of the Visual Studio screen, Figure 4,


Figure 4: The Intel Parallel Studio Profiler

one can see the actual profiler analysis which points to the bottleneck in the code, i.e. the function consume() in this case.

Of course there is a lot to go deeper but as a first impression Parallel Amplifier can be helpful for OpenMP applications (and TBB too). Now let’s see what about MPI.

Part 2 – MPI

Originally MPI was created for Distributed Memory systems but MPI can also be used on a single (SMP) machine, like OpenMP, by using multiple threads. I installed both MPICH2 and Microsoft HPC Pack 2008 SDK on my computer.

Both can be set inside Visual Studio [2]. MPI programs, by default, are being executed using mpiexec executable.

There is no problem to compile the code with Visual Studio and to run it from the command line, but this way Parallel Studio won’t be useful. Therefore, one must execute the program from inside Visual Studio. This is doable. One should insert the mpiexec and the number of processes, e.g. –n 4 to the right place in the project properties, see Figure 5.

imageFigure 5: Setting mpiexec and the number of tasks from inside Visual Studio.

But here is the catch: since mpiexec is already compiled, its source is not given and in addition it is not part of the built (project), Parallel Amplifier can not analyze the user program and it only “sees” the mpiexec and therefore it only reports about mpiexec cpu consumption, see Figure 6.


Figure 6: The profiler can not penetrate beyond mpiexec and dive into the user’s code.

Therefore unless mpiexec will somehow be skipped by Parallel Studio it looks that this tool can not help much to analyze MPI applications!

You are welcome to post your comments / corrections.


[1] Intel Parallel Studio home page:



Daniel Moth said...

Hi Guy

Regarding the launching of MPI, I have not tried it with Intel's tools, but Visual Studio has the ability to launch MPI programs as described here:



Guy Tel-Zur said...

In my post I mentioned that MPI can be executed from inside Visual Studio and I gave a reference for that [2].
The current version of Intel's Parallel Studio is not supported yet in VS2010 but only on VS2008. Their next release will support VS2010.
Thanks for your comment.