NVIDIA GPU Computing
GPU computing is the use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing. The model for GPU computing is to use a CPU and GPU together in a heterogeneous computing model. The sequential part of the application runs on the CPU and the computationally intensive part runs on the GPU. From the user’s perspective, the application just runs faster because it is using the high-performance of the GPU to boost performance.
GPU computing is enabled by the massively parallel architecture of NVIDIA’s GPUs called the CUDA architecture. The CUDA architecture consists of 100s of processor cores that operate together to crunch through the data set in the application.
CUDA Parallel Architecture and Programming Model
The CUDA parallel hardware architecture is accompanied by the CUDA parallel programming model that provides a set of abstractions that enable expressing fine-grained and coarsegrain data and task parallelism. The programmer can choose to express the parallelism in high-level languages such as C, C++, Fortran or driver APIs such as OpenCL and DirectX-11 Compute.
The CUDA parallel programming model guides programmers to partition the problem into coarse sub-problems that can be solved independently in parallel. Fine grain parallelism in the sub-problems is then expressed such that each sub-problem can be solved cooperatively in parallel. The CUDA GPU architecture and the corresponding CUDA parallel computing model are now widely deployed with 100s of applications and nearly a 1000 published research papers.
GPU Computing With CUDA
NVIDIA CUDA technology leverages the massively parallel processing power of NVIDIA GPUs. The CUDA architecture is a revolutionary parallel computing architecture that delivers the performance of NVIDIA’s world-renowned graphics processor technology to general-purpose GPU Computing. Applications that run on the CUDA architecture can take advantage of an installed base of over one hundred million CUDA-enabled GPUs in desktop and notebook computers, professional workstations, and supercomputer clusters.
With the CUDA architecture and tools, developers are achieving dramatic speedups in fields such as medical imaging and natural resource exploration, and creating breakthrough applications in areas such as image recognition and real-time HD video playback and encoding. CUDA enables this unprecedented performance via standard APIs such as the soon to be released OpenCL and DirectX Compute, and high level programming languages such as C/C++, Fortran, Java, Python, and the Microsoft .NET Framework.
CUDA: The Developer's View
The CUDA package includes three important components: the CUDA Driver API (also known as “Low-Level API”), the CUDA toolkit (the actual development environment including runtime libraries) and a Software Development Kit (CUDA SDK) with code examples.
The CUDA toolkit is in principle a C development environment and includes the actual compiler (nvcc), an update of the PathScale C compiler, optimized FFT and BLAS libraries as well as a visual profi ler (cudaprof), a gdb-based debugger (cudagdb), shared libraries for the runtime environment for CUDA programs (the “Runtime API”) and last but not least, comprehensive documentation including a developer’s manual.
The CUDA Developer SDK includes examples with source codes for matrix calculation, pseudo random number generators, image convolution, wavelet calculations and a lot more besides.








