The NVIDIA® Tesla® P4 is a single-slot, low profile, 6.6 inch PCI Express Gen3 GPU Accelerator with an NVIDIA® Pascal™ graphics processing unit (GPU). The Tesla P4 has 8 GB GDDR5 memory and a 75 W maximum power limit. The Tesla P4 is offered as a 75 W or 50 W passively cooled board that requires system air flow to properly operate the card within thermal limits. The NVIDIA Tesla P4 features optimized INT8 instructions aimed at deep learning inference computations. As a result, the NVIDIA Tesla P4 delivers 21 TOPs (TeraOperations per second) of inference performance, enabling smart responsive artificial intelligence (AI)-based services. For performance optimization this board utilizes NVIDIA GPU Boost™, which will dynamically adjust the GPU clock to maximize performance within thermal limits.
Responsive Experience with Real-Time Inference
Responsiveness is key to user engagement for services such as interactive speech, visual search, Internet of Things (IoT) and video recommendations. As models increase in accuracy and complexity, CPUs are no longer capable of delivering a responsive user experience. The Tesla P4 delivers 22 TOPs of inference performance with INT8 operations
50x Higher Throughput to Keep Up with Expanding Workloads
The volume of data generated every day in the form of sensor logs, images, videos, and records is economically impractical to process on CPUs. Volta-powered Tesla V100 GPUs give data centers a dramatic boost in throughput for deep learning workloads to extract intelligence from this tsunami of data. A server with single Tesla V100 can replace up to 50 CPU-only servers for deep learning inference workloads, so you get dramatically higher throughput with lower acquisition cost.
A Dedicated Decode Engine for New AI-based Video Services
The Tesla P4 GPU can analyze up to 39 HD video streams in real time, powered by a dedicated hardware-accelerated decode engine that works in parallel with the NVIDIA® CUDA® cores performing inference. By integrating deep learning into the video pipeline, customers can offer new levels of smart, innovative video services that facilitate video search and other video-related services.
Unprecedented Efficiency for Low-Power Scale-out Servers
The ultra-efficient Tesla P4 GPU accelerates density-optimized scale-out servers with a small form factor and
50/75 W power footprint design. It delivers an incredible 52X better energy efficiency than CPUs for deep learning inference workloads so that hyperscale customers can scale within their existing infrastructure and service the exponential growth in demand for AI-based applications.
Faster Deployment With NVIDIA TensorRT™ and DeepStream SDK
NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. It includes libraries to streamline deep learning models for production deployment, taking trained neural nets—usually in 32-bit or 16-bit data—and optimizing them for reduced-precision INT8 operations on Tesla P4, or FP16 on Tesla V100. NVIDIA DeepStream SDK taps into the power of Tesla GPUs to simultaneously decode and analyze video streams.
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords.
The CUDA Toolkit from NVIDIA provides everything you need to develop GPU-accelerated applications. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime.
Performance Specifications for NVIDIA Tesla P4, P40 and V100 Accelerators
|Tesla V100: The Universal Datacenter GPU||Tesla P4 for Ultra-Efficient Scale-Out Servers||Tesla P40 for Inference Throughput Servers|
|Single-Precision Performance (FP32)||14 teraflops (PCIe)
15.7 teraflops (SXM2)
|5.5 teraflops||12 teraflops|
|Half-Precision Performance (FP16)||112 teraflops (PCIe)
125 teraflops (SXM2)
|Integer Operations (INT8)||—||22 TOPS*||47 TOPS*|
|GPU Memory||16 GB HBM2||8 GB||24 GB|
|Memory Bandwidth||900 GB/s||192 GB/s||346 GB/s|
|System Interface/Form Factor||Dual-Slot, Full-Height PCI Express Form Factor
SXM2 / NVLink
|Low-Profile PCI Express Form Factor||Dual-Slot, Full-Height PCI Express Form Factor|
|50 W/75 W||250 W|
|Hardware-Accelerated Video Engine||—||1x Decode Engine, 2x Encode Engines||1x Decode Engine, 2x Encode Engines|
*Tera-Operations per Second with Boost Clock Enabled