C++ includes a standard set of generic algorithms, which used to be called STL (Standard Template Library). On Windows, Microsoft provides parallel versions of standard algorithms, listed below with the first argument being std::. Also, Intel provides parallel version of standard algorithms, listed below with the first argument being dpl::. The following Table shows performance […]
Read more "C++ Parallel STL Benchmark"
This page provides additional resources, such as correction, additions, errata, and updates to the book. Contact Information email@example.com has been setup for correspondence about the book. Don’t hesitate to a-mail a note with questions, suggestions, corrections, improvements, or missing information.
Read more "Practical Parallel Algorithms Book Additional Resources"
This blog explores methods to reach maximum read bandwidth. This is a useful basic operation which limits performance of many algorithms serial or parallel. Knowing how to reach the maximum available read bandwidth is beneficial in many instances. One way to test performance of memory reading is to implement a Summation. Summing elements of an […]
Read more "Maximum Read Bandwidth"
Before synthesis, chip designers used to implement all aspects of chip design by hand. This included combinational logic implementation, storage elements such as SRAMs, flip-flops and latches. Netlist as well as place-and-route were also done fairly manually. High-level design implementations, either in C, Verilog or another high-level language, were manually translated into gates. When synthesis […]
Read more "When to Trust Chip Synthesis"
This blog progressively develops a simple parallel algorithm described in Chapter 6 of a soon to be released book, as a fun parallelism exercise. The goal of this exercise is to sort a large array of unsigned bytes as quickly as possible, using multiple cores if that provides performance improvement, in-place if possible, targeting the […]
Read more "Parallel Sort of Byte Arrays"
This a reprint of my September 1997 letter to IEEE Computer magazine’s Open Channel. It’s still relevant today. For instance, when designing add-in computer cards or chips that connect over PCI-express connection. Preface Back in the 1990’s when designing add-in cards for PC’s and Apple computers various busses such as PCI (precursor to PCI Express), […]
Read more "Two Writes Make a Read"
This blog is a repost of an article I wrote in Dr. Dobb’s Journal in November of 2009, since Dr. Dobb’s Journal has ceased operation and its website has broken since then. The algorithm developed is a novel In-Place Radix Sort, which sorts in Linear Time. The article is still available through the amazing web […]
Read more "In-Place N-bit Radix Sort"
This blog is a repost of an article I wrote in Dr. Dobb’s Journal in October of 2009, since Dr. Dobb’s Journal has ceased operation and its website has broken since then. The article is still available through the amazing web archival of the Wayback machine, as provided below: V.J. Duvanenko, “Algorithm Improvement through Performance […]
Read more "In-Place Binary Radix Sort"
Two truly simple sorting algorithms are shown in my previous blog https://duvanenko.tech.blog/2021/11/26/two-simple-sorting-algorithms/. These algorithms are both in-place and perform surprisingly well compared to standard C++ sort, especially when sufficient memory is available. In this blog, let’s see if we can raise the performance bar higher by adding slightly more complexity. Hybrid Bottom Up Merge Sort […]
Read more "Hybrid In-Place Merge Sort"
This is a re-print of my Dr. Dobb’s article from April of 2012. The original article used to be available: https://www.drdobbs.com/parallel/benchmarking-block-swapping-algorithms/232900395 It is still available https://web.archive.org/web/20130130063936/http://www.drdobbs.com/parallel/benchmarking-block-swapping-algorithms/232900395 Below is the recreated version, with code fixes to one of the algorithms. Swapping two elements within an array quickly is simple, requiring a single temporary storage element. Swapping […]
Read more "Benchmarking Block Swapping Algorithms"