Parallel implementation of resampling methods for particle filtering on graphics processing units
Date of Award
Doctor of Philosophy (PhD)
Electrical and Computer Engineering
Buren E. Wells
Laurie L Joiner
Graphics processing units., Parallel programming (Computer science), Resampling (Statistics)
Particle filter techniques are common methods used to estimate the evolving state of nonlinear, non-Gaussian time-variant systems by utilizing a periodic sequence of noisy measurements. The accuracy of particle filtering has often been shown to be superior to other state estimation techniques, such as the extended Kalman filter (EKF), for many applications. Unfortunately, the high computational cost and highly nondeterministic run-time behavior of particle filters often precludes their use in hard, real-time environments where filter response must meet the strict timing requirements of the application. Particle filter algorithms are composed of three main stages: prediction, update, and resampling. General purpose graphics processing units (GPGPUs) have been successfully employed in previous research to accelerate the computation of both the prediction and update stages by exploiting their natural fine-grain parallelism. This research focuses on accelerating the resampling stage for GPGPU execution, which is much more difficult to parallelize due to its inherent sequentially. This dissertation introduces a novel GPGPU implementation of the systematic and stratified resampling algorithms that exploit the monotonically increasing nature of the prefix-sum and the evolutionary nature of the particle weighting process to allow the re-indexing portion of the algorithms to occur in a two phase, multi-threaded manner. This resulting measured factor of performance improvement for the systematic and stratified algorithms, when executed on a GPGPU, was 16x and 33x, respectively, over the serial implementations. This dissertation also describes the steps taken to optimize the naive parallel implementation to mitigate limitations such as memory dependencies stalls, low thread execution efficiency, and sub-optimal occupancy. These steps range from minimizing global memory accesses, utilizing shared memory, manually coding software prefetching, and grid-stride looping. In all, the optimizations provide a 2x - 4x improvement to execution times over the naive implementation of the resampling methods. Mathematical analysis is provided to describe algorithm efficiencies regarded to time, work, and space. L'Hôpital's rule shows that while time complexity can be reduced from O(N) → O(1), for the best case scenario, it costs double the amount of work. Lastly, multiple pseudorandom number generators (PRNGs) are evaluated with different implementation techniques to investigate statistical quality and timing performance as it pertains to particle filters.
Nicely, Matthew A., "Parallel implementation of resampling methods for particle filtering on graphics processing units" (2019). Dissertations. 181.