E-Book, Englisch, 886 Seiten
GPU Computing Gems Emerald Edition
1. Auflage 2011
ISBN: 978-0-12-384989-2
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
E-Book, Englisch, 886 Seiten
Reihe: Applications of GPU Computing Series
ISBN: 978-0-12-384989-2
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research. The first volume in Morgan Kaufmann's Applications of GPU Computing Series, this book offers the latest insights and research in computer vision, electronic design automation, and emerging data-intensive applications. It also covers life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, video and image processing. This book is intended to help those who are facing the challenge of programming systems to effectively use GPUs to achieve efficiency and performance goals. It offers developers a window into diverse application areas, and the opportunity to gain insights from others' algorithm work that they may apply to their own projects. Readers will learn from the leading researchers in parallel programming, who have gathered their solutions and experience in one volume under the guidance of expert area editors. Each chapter is written to be accessible to researchers from other domains, allowing knowledge to cross-pollinate across the GPU spectrum. Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution. The insights and ideas as well as practical hands-on skills in the book can be immediately put to use. Computer programmers, software engineers, hardware engineers, and computer science students will find this volume a helpful resource. For useful source codes discussed throughout the book, the editors invite readers to the following website: ...' - Covers the breadth of industry from scientific simulation and electronic design automation to audio / video processing, medical imaging, computer vision, and more - Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution - Offers insights and ideas as well as practical 'hands-on' skills you can immediately put to use
Autoren/Hrsg.
Weitere Infos & Material
1;Table of Contents;6
2;Editors, Reviewers, and Authors;12
3;Introduction;20
4;Section 1: Scientific Simulation;22
4.1;Chapter 1. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals;26
4.1.1;1.1.Introduction, Problem Statement, and Context;26
4.1.2;1.2.Core Method;27
4.1.3;1.3.Algorithms, Implementations, and Evaluations;29
4.1.4;1.4.Final Evaluation;37
4.1.5;1.5.Future Directions;39
4.1.6;References;39
4.2;Chapter 2. Large-Scale Chemical Informatics on GPUs;40
4.2.1;2.1.Introduction, Problem Statement, and Context;40
4.2.2;2.2.Core Methods;43
4.2.3;2.3.Gaussian Shape Overlay: Parallelization and Arithmetic Optimization;43
4.2.4;2.4.LINGO: Algorithmic Transformation and Memory Optimization;48
4.2.5;2.5.Final Evaluation;51
4.2.6;2.6.Future Directions;54
4.2.7;Acknowledgments;54
4.2.8;References;55
4.3;Chapter 3. Dynamical Quadrature Grids: Applications in Density Functional Calculations;56
4.3.1;3.1.Introduction;56
4.3.2;3.2.Core Method;57
4.3.3;3.3.Implementation;58
4.3.4;3.4.Performance Improvement;60
4.3.5;3.5.Future Work;62
4.3.6;References;63
4.4;Chapter 4. Fast Molecular Electrostatics Algorithms on GPUs;64
4.4.1;4.1.Introduction, Problem Statement, and Context;64
4.4.2;4.2.Core Method;66
4.4.3;4.3.Algorithms, Implementations, and Evaluations;66
4.4.4;4.4.Final Evaluation;75
4.4.5;4.5.Future Directions;79
4.4.6;References;79
4.5;Chapter 5. Quantum Chemistry: Propagation of Electronic Structure on a GPU;80
4.5.1;5.1.Problem Statement;80
4.5.2;5.2.Core Technology and Algorithm;82
4.5.3;5.3.The Key Insight on the Implementation—the Choice of Building Blocks;86
4.5.4;5.4.Final Evaluation and Benefits;90
4.5.5;5.5.Conclusions and Future Directions;93
4.5.6;Acknowledgments;93
4.5.7;References;94
4.6;Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm;96
4.6.1;6.1.Introduction, Problem Statement, and Context;96
4.6.2;6.2.Core Methods;97
4.6.3;6.3.Algorithms and Implementations;99
4.6.4;6.4.Evaluation and Validation of Results, Total Benefits, and Limitations;109
4.6.5;6.5.Future Directions;113
4.6.6;Acknowledgments;113
4.6.7;References;113
4.7;Chapter 7. Leveraging the Untapped Computation Power of GPUs: Fast Spectral Synthesis Using Texture Interpolation;114
4.7.1;7.1.Background and Problem Statement;114
4.7.2;7.2.Flux Calculation and Aggregation;116
4.7.3;7.3.The GRASSY Platform;118
4.7.4;7.4.Initial Testing;121
4.7.5;7.5.Impact and Future Directions;122
4.7.6;Acknowledgments;122
4.7.7;References;123
4.8;Chapter 8. Black Hole Simulations with CUDA;124
4.8.1;8.1.Introduction;124
4.8.2;8.2.The Post-Newtonian Approximation;125
4.8.3;8.3.Numerical Algorithm;126
4.8.4;8.4.GPU Implementation;127
4.8.5;8.5.Performance Results;128
4.8.6;8.6.GPU Supercomputing Clusters;128
4.8.7;8.7.Statistical Results for Black Hole Inspirals;130
4.8.8;8.8.Conclusion;130
4.8.9;Acknowledgments;131
4.8.10;References;131
4.9;Chapter 9. Treecode and Fast Multipole Method for N-Body Simulation with CUDA;134
4.9.1;9.1.Introduction;134
4.9.2;9.2.Fast N-Body Simulation;135
4.9.3;9.3.CUDA Implementation of the Fast N-Body Algorithms;137
4.9.4;9.4.Improvements of Performance;141
4.9.5;9.5.Detailed Description of the GPU Kernels;143
4.9.6;9.6.Overview of Advanced Techniques;150
4.9.7;9.7.Conclusions;152
4.9.8;References;152
4.10;Chapter 10. Wavelet-Based Density Functional Theory Calculation on Massively Parallel Hybrid Architectures;154
4.10.1;10.1.Introduction, Problem Statement, and Context;154
4.10.2;10.2.Core Method;156
4.10.3;10.3.Algorithms, Implementations, and Evaluations;159
4.10.4;10.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations;165
4.10.5;10.5.Conclusions and Future Directions;168
4.10.6;References;172
5;Section 2: Life Sciences;174
5.1;Chapter 11. Accurate Scanning of Sequence Databases with the Smith-Waterman Algorithm;176
5.1.1;11.1.Introduction, Problem Statement, and Context;176
5.1.2;11.2.Core Method;177
5.1.3;11.3.CUDA Implementation of the SW Algorithm for Identification of Homologous Proteins;177
5.1.4;11.4.Discussion;190
5.1.5;11.5.Final Evaluation;191
5.1.6;References;191
5.2;Chapter 12. Massive Parallel Computing to Accelerate Genome-Matching;194
5.2.1;12.1.Introduction, Problem Statement, and Context;194
5.2.2;12.2.Core Methods;195
5.2.3;12.3.Algorithms, Implementations, and Evaluations;197
5.2.4;12.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations;204
5.2.5;12.5.Future Directions;204
5.2.6;References;205
5.3;Chapter 13. GPU-Supercomputer Acceleration of Pattern Matching;206
5.3.1;13.1.Introduction, Problem Statement, and Context;206
5.3.2;13.2.Core Method;207
5.3.3;13.3.Algorithms, Implementations, and Evaluations;208
5.3.4;13.4.Final Evaluation;214
5.3.5;13.5.Future Direction;217
5.3.6;Acknowledgments;217
5.3.7;Appendix;217
5.3.8;References;219
5.4;Chapter 14. GPU Accelerated RNA Folding Algorithm;220
5.4.1;14.1.Problem Statement;220
5.4.2;14.2.Core Method;221
5.4.3;14.3.Algorithms, Implementations, and Evaluations;222
5.4.4;14.4.Final Evaluation;228
5.4.5;14.5.Future Directions;230
5.4.6;References;230
5.5;Chapter 15. Temporal Data Mining for Neuroscience;232
5.5.1;15.1.Introduction;232
5.5.2;15.2.Core Methodology;233
5.5.3;15.3.GPU Parallelization: Algorithms and Implementations;235
5.5.4;15.4.Experimental Results;243
5.5.5;15.5.Discussion;247
5.5.6;References;248
6;Section 3: Statistical Modeling;250
6.1;Chapter 16. Parallelization Techniques for Random Number Generators;252
6.1.1;16.1.Introduction;252
6.1.2;16.2. L'Ecuyer's Multiple Recursive Generator MRG32k3a;253
6.1.3;16.3.Sobol Generator;256
6.1.4;16.4.Mersenne Twister MT19937;258
6.1.5;16.5.Performance Benchmarks;263
6.1.6;Acknowledgments;265
6.1.7;References;266
6.2;Chapter 17. Monte Carlo Photon Transport on the GPU;268
6.2.1;17.1.Physics of Photon Transport;268
6.2.2;17.2.Photon Transport on the GPU;270
6.2.3;17.3.The Complete System;277
6.2.4;17.4.Results and Evaluation;279
6.2.5;17.5.Future Directions;280
6.2.6;References;282
6.3;Chapter 18. High-Performance Iterated Function Systems;284
6.3.1;18.1.Problem Statement and Mathematical Background;284
6.3.2;18.2.Core Technology;287
6.3.3;18.3.Implementation;287
6.3.4;18.4.Final Evaluation;291
6.3.5;18.5.Conclusion;293
6.3.6;References;293
7;Section 4: Emerging Data-Intensive Applications;296
7.1;Chapter 19. Large-Scale Machine Learning;298
7.1.1;19.1.Introduction;298
7.1.2;19.2.Core Technology;299
7.1.3;19.3.GPU Algorithm and Implementation;301
7.1.4;19.4.Improvements of Performance;308
7.1.5;19.5.Conclusions and Future Work;311
7.1.6;Acknowledgments;312
7.1.7;References;312
7.2;Chapter 20. Multiclass Support Vector Machine;314
7.2.1;20.1.Introduction, Problem Statement, and Context;314
7.2.2;20.2.Core Method;315
7.2.3;20.3.Algorithms, Implementations, and Evaluations;317
7.2.4;20.4.Final Evaluation;327
7.2.5;20.5.Future Direction;331
7.2.6;References;331
7.3;Chapter 21. Template-Driven Agent-Based Modeling and Simulation with CUDA;334
7.3.1;21.1.Introduction, Problem Statement, and Context;334
7.3.2;21.2.Final Evaluation and Validation of Results;341
7.3.3;21.3.Conclusions, Benefits and Limitations, and Future Work;344
7.3.4;References;345
7.4;Chapter 22. GPU-Accelerated Ant Colony Optimization;346
7.4.1;22.1.Introduction, Problem Statement, and Context;346
7.4.2;22.2.Core Method;347
7.4.3;22.3.Algorithms, Implementations, and Evaluations;348
7.4.4;22.4.Final Evaluation;358
7.4.5;22.5.Future Direction;360
7.4.6;Acknowledgments;361
7.4.7;References;361
8;Section 5: Electronic Design Automation;362
8.1;Chapter 23. High-Performance Gate-Level Simulation with GP-GPUs;364
8.1.1;23.1.Introduction;364
8.1.2;23.2.Simulator Overview;366
8.1.3;23.3.Compilation and Simulation;368
8.1.4;23.4.Experimental Results;376
8.1.5;23.5.Future Directions;383
8.1.6;Related Work;384
8.1.7;References;384
8.2;Chapter 24. GPU-Based Parallel Computing for Fast Circuit Optimization;386
8.2.1;24.1.Introduction, Problem Statement, and Context;386
8.2.2;24.2.Core Method;388
8.2.3;24.3.Algorithms, Implementations, and Evaluations;390
8.2.4;24.4.Final Evaluation;394
8.2.5;24.5.Future Direction;397
8.2.6;References;399
9;Section 6: Ray Tracing and Rendering;400
9.1;Chapter 25. Lattice Boltzmann Lighting Models;402
9.1.1;25.1.Introduction, Problem Statement, and Context;402
9.1.2;25.2.Core Methods;403
9.1.3;25.3.Algorithms, Implementation, and Evaluation;404
9.1.4;25.4.Final Evaluation;414
9.1.5;25.5.Future Directions;416
9.1.6;25.6.Derivation of the Diffusion Equation;416
9.1.7;Acknowledgments;419
9.1.8;References;419
9.2;Chapter 26. Path Regeneration for Random Walks;422
9.2.1;26.1.Introduction;422
9.2.2;26.2.Path Tracing as Case Study;423
9.2.3;26.3.Random Walks in Path Tracing;423
9.2.4;26.4.Implementation Details;427
9.2.5;26.5.Results;429
9.2.6;26.6.Discussion;432
9.2.7;Acknowledgments;432
9.2.8;References;433
9.3;Chapter 27. From Sparse Mocap to Highly Detailed Facial Animation;434
9.3.1;27.1.System Overview;434
9.3.2;27.2.Background;435
9.3.3;27.3.Core Technology and Algorithms;435
9.3.4;27.4.Future Directions;446
9.3.5;Acknowledgments;447
9.3.6;References;447
9.4;Chapter 28. A Programmable Graphics Pipeline in CUDA for Order-Independent Transparency;448
9.4.1;28.1.Introduction, Problem Statement, and Context;448
9.4.2;28.2.Core Method;449
9.4.3;28.3.Algorithms, Implementations, and Evaluations;449
9.4.4;28.4.Final Evaluation;454
9.4.5;28.5.Future Direction;456
9.4.6;References;456
10;Section 7: Computer Vision;458
10.1;Chapter 29. Fast Graph Cuts for Computer Vision;460
10.1.1;29.1.Introduction, Problem Statement, and Context;460
10.1.2;29.2.Core Method;460
10.1.3;29.3.Algorithms, Implementations, and Evaluations;461
10.1.4;29.4.Final evaluation and validation of results;468
10.1.5;29.5.Multilabel Graph Cuts;469
10.1.6;References;471
10.2;Chapter 30. Visual Saliency Model on Multi-GPU;472
10.2.1;30.1.Introduction;472
10.2.2;30.2.Visual Saliency Model;473
10.2.3;30.3.GPU Implementation;475
10.2.4;30.4.Results;487
10.2.5;30.5.Conclusion;492
10.2.6;References;492
10.3;Chapter 31. Real-Time Stereo on GPGPU Using Progressive Multiresolution Adaptive Windows;494
10.3.1;31.1.Introduction, Problem Statement, and Context;494
10.3.2;31.2.Core Method;496
10.3.3;References;515
10.4;Chapter 32. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU;518
10.4.1;32.1.Introduction;518
10.4.2;32.2.Methods;520
10.4.3;32.3.Implementation;526
10.4.4;32.4.Results and Discussion;528
10.4.5;32.5.Conclusion and Future Work;534
10.4.6;References;535
10.5;Chapter 33. Haar Classifiers for Object Detection with CUDA;538
10.5.1;33.1.Introduction;538
10.5.2;33.2.Viola-Jones Object Detection Retrospective;538
10.5.3;33.3.Object Detection Pipeline with NVIDIA CUDA;547
10.5.4;33.4.Benchmarking and Implementation Details;562
10.5.5;33.5.Future Direction;564
10.5.6;33.6.Conclusion;564
10.5.7;References;564
11;Section 8: Video and Image Processing;566
11.1;Chapter 34. Experiences on Image and Video Processing with CUDA and OpenCL;568
11.1.1;34.1.Introduction, Problem Statement, and Background;568
11.1.2;34.2.Core Technology or Algorithm;569
11.1.3;34.3.Key Insights from Implementation and Evaluation;572
11.1.4;34.4.Final Evaluation;586
11.1.5;References;588
11.2;Chapter 35. Connected Component Labeling in CUDA;590
11.2.1;35.1.Introduction;590
11.2.2;35.2.Core Algorithm;591
11.2.3;35.3.CUDA Algorithm and Implementation;593
11.2.4;35.4.Final Evaluation and Results;598
11.2.5;References;602
11.3;Chapter 36. Image De-Mosaicing;604
11.3.1;36.1.Introduction, Problem Statement, and Context;604
11.3.2;36.2.Core Method;606
11.3.3;36.3.Algorithms, Implementations, and Evaluations;606
11.3.4;36.4.Final Evaluation;618
11.3.5;References;619
12;Section 9: Signal and Audio Processing;620
12.1;Chapter 37. Efficient Automatic Speech Recognition on the GPU;622
12.1.1;37.1.Introduction, Problem Statement, and Context;622
12.1.2;37.2.Core Methods;624
12.1.3;37.3.Algorithms, Implementations, and Evaluations;625
12.1.4;37.4.Conclusion and Future Directions;636
12.1.5;References;638
12.2;Chapter 38. Parallel LDPC Decoding;640
12.2.1;38.1.Introduction, Problem Statement, and Context;640
12.2.2;38.2.Core Technology;641
12.2.3;38.3.Algorithms, Implementations, and Evaluations;643
12.2.4;38.4.Final Evaluation;647
12.2.5;38.5.Future Directions;648
12.2.6;References;648
12.3;Chapter 39. Large-Scale Fast Fourier Transform;650
12.3.1;39.1.Introduction;650
12.3.2;39.2.Memory Hierarchy of GPU Clusters;652
12.3.3;39.3.Large-Scale Fast Fourier Transform;654
12.3.4;39.4.Algebraic Manipulation of Array Dimensions;656
12.3.5;39.5.Performance Results;660
12.3.6;39.6.Conclusion and Future Work;660
12.3.7;References;663
13;Section 10: Medical Imaging;664
13.1;Chapter 40. GPU Acceleration of Iterative Digital Breast Tomosynthesis;668
13.1.1;40.1.Introduction;668
13.1.2;40.2.Digital Breast Tomosynthesis;670
13.1.3;40.3.Accelerating Iterative DBT using GPUs;671
13.1.4;40.4.Conclusions;677
13.1.5;Acknowledgments;677
13.1.6;References;678
13.2;Chapter 41. Parallelization of Katsevich CT Image Reconstruction Algorithm on Generic Multi-Core Processors and GPGPU;680
13.2.1;41.1.Introduction, Problem, and Context;680
13.2.2;41.2.Core Methods;680
13.2.3;41.3.Algorithms, Implementations, and Evaluations;682
13.2.4;41.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations;693
13.2.5;41.5.Related Work;696
13.2.6;41.6.Future Directions;697
13.2.7;41.7.Summary;697
13.2.8;References;697
13.3;Chapter 42. 3-D Tomographic Image Reconstruction from Randomly Ordered Lines with CUDA;700
13.3.1;42.1.Introduction;700
13.3.2;42.2.Core Methods;703
13.3.3;42.3.Implementation;705
13.3.4;42.4.Evaluation and Validation of Results, Total Benefits, and Limitations;707
13.3.5;42.5.Future Directions;711
13.3.6;References;712
13.4;Chapter 43. Using GPUs to Learn Effective Parameter Settings for GPU-Accelerated Iterative CT Reconstruction Algorithms;714
13.4.1;43.1.Introduction, Problem Statement, and Context;714
13.4.2;43.2.Core Method(s);715
13.4.3;43.3.Algorithms, Implementations, and Evaluations;716
13.4.4;43.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations;721
13.4.5;43.5.Future Directions;727
13.4.6;References;728
13.5;Chapter 44. Using GPUs to Accelerate Advanced MRI Reconstruction with Field Inhomogeneity Compensation;730
13.5.1;44.1.Introduction;730
13.5.2;44.2.Core Method: Advanced Image Reconstruction Toolbox for MRI;731
13.5.3;44.3.MRI Reconstruction Algorithms and Implementation on GPUs;734
13.5.4;44.4.Final Results and Evaluation;740
13.5.5;44.5.Conclusion and Future Directions;741
13.5.6;References;742
13.6;Chapter 45. ?1 Minimization in ?1-SPIRiT Compressed Sensing MRI Reconstruction;744
13.6.1;45.1.Introduction, Problem Statement, and Context;744
13.6.2;45.2.Core Methods (High Level Description);747
13.6.3;45.3.Algorithms, Implementations, and Evaluations (Detailed Description);748
13.6.4;45.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations;754
13.6.5;45.5.Discussion and Conclusion;756
13.6.6;References;756
13.7;Chapter 46. Medical Image Processing Using GPU-Accelerated ITK Image Filters;758
13.7.1;46.1.Introduction;758
13.7.2;46.2.Core Methods;758
13.7.3;46.3.Implementation;761
13.7.4;46.4.Results;767
13.7.5;46.5.Future Directions;769
13.7.6;46.6. Acknowledgments;769
13.7.7;References;770
13.8;Chapter 47. Deformable Volumetric Registration Using B-Splines;772
13.8.1;47.1.Introduction;772
13.8.2;47.2.An Overview of B-Spline Registration;773
13.8.3;47.3.Implementation Details;777
13.8.4;47.4.Results;788
13.8.5;47.5.Conclusions;790
13.8.6;References;790
13.9;Chapter 48. Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs;792
13.9.1;48.1.Introduction, Problem Statement, and Context;792
13.9.2;48.2.Core Methods;795
13.9.3;48.3.Algorithms, Implementations, and Evaluations;796
13.9.4;48.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations;807
13.9.5;48.5.Future Directions;810
13.9.6;Acknowledgments;811
13.9.7;References;812
13.10;Chapter 49. GPU-Accelerated Brain Connectivity Reconstruction and Visualization in Large-Scale Electron Micrographs;814
13.10.1;49.1.Introduction;814
13.10.2;49.2.Core Methods;814
13.10.3;49.3.Implementation;818
13.10.4;49.4.Results;830
13.10.5;49.5.Future Directions;832
13.10.6;Acknowledgments;833
13.10.7;References;833
13.11;Chapter 50. Fast Simulation of Radiographic Images Using a Monte Carlo X-Ray Transport Algorithm Implemented in CUDA;834
13.11.1;50.1.Introduction, Problem Statement, and Context;834
13.11.2;50.2.Core Methods;835
13.11.3;50.3.Algorithms, Implementations, and Evaluations;836
13.11.4;50.4.Final Evaluation and Validation of Results, Total Benefits, and Limitations;843
13.11.5;50.5.Future Directions;848
13.11.6;References;849
14;Index;852