Product SiteDocumentation Site

4.3. Using libtcr for computational problems

In some applications, say image processing, a high number of operations need to be performed on a huge data set. The convenience macro tc_parallel_for(3) can be used to express parallelism in the body of a for loop.
With this macro, we invoke a tc_thread_pool allocation, and all iterations of the parallelized for loop execute potentially in parallel, on all CPUs available to libtcr.
Obviously this parallel for loop should be used at the out-most level of the algorithm, since the loop overhead increases slightly. This due to the fact that the increment on the loop variable, and the test of the loop's condition, execute under a spinlock.
When implementing recursive algorithms, one can use the tc_parallel(3) and tc_with(3) macros to express parallelism in those kinds of algorithms.
With both variants of expressing parallelism, one may observe a small delay at the beginning. This is caused by the fact that the other system level worker threads need to awake through notifications sent over a eventfd first. However, as soon as the number of available execution contexts exceeds the number of worker threads, the overhead of creating/destroying/switching between the tc_thread contexts is very small (compared to their pthread counterparts).