← Back to Projects

Compiler optimizations for improving performance of Harris Corner Detection Algorithm on multicore/SIMD CPUs


The objective of the assignment is to optimize and tune the Harris corner detection algorithm for performance using locality, SIMD and multicore parallelism transformations. We tune the
Using suitable compiler flags, transforms and optimizations we obtain a speed up of 11.5X over unparallelized reference implementation and 13.5X over OpenCV using GCC 4.9.2 compiler and 11.3X over unparallelized reference implementation and 14.6X over OpenCV using ICC 15.0 compiler. All experiments were performaned on Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz [Haswell μarch, 4 core, 64 KB L1 private / 256 KB L2 private / 8 MB L3 shared cache].
Detailed performance speedup comparison of ICC vs GCC and vectorization, parallelism etc. available in the below report.
The complete code for the project is available on github here https://github.com/adarshpatil/e0255-opt-asst


Speedup and Execution time (in ms) by vectorization and locality transforms

OpenCV Reference Optimized Speedup by locality transforms
No Vectorize 3515.29 3767.32 2442.4 1.54
Vectorize 3566.35 3035.41 930.90 3.26
Vectorization - 1.24x 2.62x
Speedup


Speedup and Execution time (in ms) using ICC 15.0

OpenCV Reference Optimized Speedup w.r.t Reference
1 core 3567.95 2755.83 904.61 3.04x
2 core - 1617.88 355.724 4.54x
4 core - 1444.89 243.19 5.94x
Speedup by - 1.90x 3.72x
Parallelism


Speedup and Execution time (in ms) using GCC 4.9.2

OpenCV Reference Optimized Speedup w.r.t Reference
1 core 3566.35 3035.41 930.90 3.26x
2 core - 1990.6 422.54 4.71x
4 core - 1940.92 264.73 7.34x
Speedup by - 1.56x 3.52x
Parallelism


Other projects similar to this: