Power Usage of Parallel Algorithms in C#

The Power Usage of Algorithms in C# blog presents power usage and efficiency of several C# sequential (single core) algorithms is shown. In this blog power usage and efficiency of parallel algorithms is explored.

Power Measurement Setup

The same setup is used as in Power Usage of Algorithms in C#.

Parallel Sorting

An array of 200 million 32-bit unsigned integers filled with random values is the input to all sorting algorithms, to avoid caching effects.

AlgorithmPerformance OrderPerformancePower
Parallel Merge SortNlgN122 – 13199 – 103
Parallel LSD Radix SortN528 – 596100 – 105

Performance units are millions of unsigned integers per second. Power units are watts, which show power in addition to the baseline power level of about 45 watts.

Parallel LSD Radix Sort scaled in performance about 4X from the sequential implementation in Power Usage of Algorithms in C#. Power usage increased from 30 watts for sequential algorithm to 100 watts for parallel algorithm – i.e., 3.3X – less than performance. Thus, the Parallel LSD Radix Sort algorithm continues to scale in performance with more cores, with better power efficiency then the sequential version.

Parallel Merge Sort uses the same amount of power as Parallel LSD Radix Sort, but runs at 1/4-th the performance. Thus, Parallel LSD Radix Sort is the highest performance and the most power efficient algorithm. It reaches nearly 6 million unsigned integers per watt, performing more power efficiently than the single core sequential version which sorted 4.4 million per watt.

Parallel Merge Sort is showing significantly more power efficiency when compared to its sequential version. The sequential version uses 3.5 watts per million integers. However, the parallel version uses less than one watt per million integers, being nearly 4X more power efficient.

All tests were performed on a Dell Alienware laptop with an Intel 12th Gen i7-12700H processor, which has 6 P-cores (performance cores) and 8 E-cores (efficient cores). It is unclear whether the additional power efficiency gains are coming from the parallel algorithms or from the use of the efficiency cores.

P-Cores Only

Dell Alienware laptop BIOS supports disabling all of the E-cores. The following Table shows performance and power measurements when using only the 6 P-cores of the laptop processor, which support hyperthreading:

AlgorithmPerformanceAdditional Power
Merge Sort825 – 30
Parallel Merge Sort73 – 9070 – 75
LSD Radix Sort161 – 17031 – 34
Parallel LSD Radix Sort434 – 45778 – 81

Parallel Merge Sort increased performance by 9X and power usage by 2.8X, providing substantially more power efficiency over the sequential version. Thus, the majority of increased power efficiency is from the Parallel Merge Sort algorithm.

Parallel LSD Radix Sort increased performance by 2.7X and power usage by 2.5X, providing nearly linear gain in both, and nearly equal power efficiency. Thus, the increased power efficiency, when using P-cores and E-cores, is from the use of E-cores and not the Parallel LSD Radix Sort algorithm.

E-Cores Only

Sadly, the Dell Alienware laptop BIOS does not allow running with E-cores only.

Parallel Summation

In the works…

Parallel Copy

In the works…

Conclusions

Several parallel algorithms and their power usage were explored. Parallel Merge Sort and Parallel LSD Radix Sort have higher power efficiency than their sequential versions. Both benefit from more power efficient E-cores within the laptop processor. Parallel Merge Sort benefits from a more power efficient parallel algorithm. Parallel LSD Radix Sort scales linearly with performance and power usage. Parallel LSD Radix Sort has the highest performance and the highest power efficiency.

One thought on “Power Usage of Parallel Algorithms in C#

Leave a comment