The NVIDIA® Tesla® P100 GPU Accelerator for PCIe is a dual-slot 10.5 inch PCI Express Gen3 card with a single NVIDIA® Pascal™ GP100 graphics processing unit (GPU). It uses a passive heat sink for cooling, which requires system air flow to properly operate the card within its thermal limits. The Tesla P100 PCIe supports double precision (FP64), single precision (FP32) and half precision (FP16) compute tasks, unified virtual memory and page migration engine. For performance optimization, NVIDIA GPU Boost™ feature is supported. By adjusting the GPU clock dynamically, maximum performance is achieved within the power cap limit.
Tesla P100 PCIe boards are shipped with ECC enabled by default to protect the GPU’s memory interface and the on-board memories. ECC protects the memory interface by detecting any single, double, and all odd-bit errors. The GPU will replay any memory transaction that have an ECC error until the data transfer is error-free. ECC protects the DRAM content by fixing any single-bit errors and detecting double-bit errors. There is no replay associated with ECC. The Tesla P100 PCIe with HBM2 memory has native support for ECC and has no ECC overhead, both in memory capacity and bandwidth. For more information on compute capabilities, HBM2, unified virtual memory, and page migration engine visit NVIDIA official website.
Exponential Performance Leap with Pascal Architecture
The NVIDIA Pascal™ architecture enables the Tesla P100 to deliver superior performance for HPC and hyperscale workloads. With more than 21 teraflops of FP16 performance, Pascal is optimized to drive exciting new possibilities in deep learning applications. Pascal also delivers over 5 and 10 teraflops of double and single precision performance for HPC workloads.
Unprecedented Efficiency with CoWoS with HBM2
The Tesla P100 tightly integrates compute and data on the same package by adding CoWoS® (Chip-on-Wafer-on-Substrate) with HBM2 technology to deliver 3X memory performance over the NVIDIA Maxwell™ architecture. This provides a generational leap in time-to-solution for data-intensive applications.
Simpler Programming with Page Migration Engine
Page Migration Engine frees developers to focus more on tuning for computing performance and less on managing data movement. Applications can now scale beyond the GPU’s physical memory size to virtually limitless amount of memory.
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords.
The CUDA Toolkit from NVIDIA provides everything you need to develop GPU-accelerated applications. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime.
Performance Specification for NVIDIA Tesla P100 Accelerators
|P100 for PCIe-Based
|P100 for NVLink-Optimized Servers|
|Double-Precision Performance||4.7 teraflops||5.3 teraflops|
|Single-Precision Performance||9.3 teraflops||10.6 teraflops|
|Half-Precision Performance||18.7 teraflops||21.2 teraflops|
|NVIDIA NVLink™ Interconnect Bandwidth||–||160 GB/s|
|PCIe x16 Interconnect Bandwidth||32 GB/s||32 GB/s|
|CoWoS HBM2 Stacked Memory Capacity||16 GB or 12 GB||16 GB|
|CoWoS HBM2 Stacked Memory Bandwidth||732 GB/s or 549 GB/s||732 GB/s|
|Enhanced Programmability with Page Migration Engine|
|ECC Protection for Reliability|
|Server-Optimized for Data Center Deployment|