The NVIDIA® Tesla® P40 GPU accelerator works with NVIDIA Quadro vDWS software and is the first system to combine an enterprise-grade visual computing platform for simulation, HPC rendering, and design with virtual applications, desktops, and workstations. This gives organizations the freedom to virtualize both complex visualization and compute (CUDA and OpenCL) workloads. The NVIDIA® Tesla® P40 taps into the industry-leading NVIDIA Pascal™ architecture to deliver up to twice the professional graphics performance of the NVIDIA® Tesla® M60 (Refer to Performance Graph). With 24 GB of frame buffer and 24 NVENC encoder sessions, it supports 24 virtual desktops (1 GB profile) or 12 virtual workstations (2 GB profile), providing the best end-user scalability per GPU. This powerful GPU also supports eight different user profiles, so virtual GPU resources can be efficiently provisioned to meet the needs of the user. And it’s available in a wide variety of industry-standard servers.
Virtualize any Workload, Anywhere
With NVIDIA virtual GPU software and the NVIDIA Tesla P40, organizations can now virtualize highend applications with large, complex datasets for rendering and simulations, as well as virtualizing modern business applications. Resource allocation ensures that users have the right GPU acceleration for the task at hand. NVIDIA software shares the power of Tesla P40 GPUs across multiple virtual workstations, desktops, and apps. This means you can deliver an immersive user experience for everyone from office workers to mobile professionals to designers through virtual workspaces with improved management, security, and productivity.
Exceptional User Experience
Get the ultimate user experience for any workload or vGPU profile. NVIDIA Quadro vDWS software with Tesla P40 GPU supports compute workloads (CUDA and OpenCL) for every vGPU, enabling professional and design engineering workflows at peak performance. The Tesla P40 delivers up to 2X the graphics performance compared to the M60 (Refer to Performance Graph). Users can count on consistent performance with the new resource scheduler, which provides deterministic QoS and eliminates the problem of a “noisy neighbor.”
Optimal Management and Monitoring Management
NVIDIA tools give you vGPU visibility into the host or guest level, with application level monitoring capabilities. This lets IT intelligently design, manage, and support their end user’s experience. End-to-end management and monitoring also deliver real-time insight into GPU performance. And integration with VMware vRealize Operations (vROps), Citrix Director and XenCenter put flexibility and control in the palm of your hand.
Flexible GPU Infrastructure
Support up to 50% more users per Pascal GPU relative to a single Maxwell GPU, for scaling high performance virtual graphics and compute. More granular user profiles give you more precise provisioning of vGPU resources, and larger profile sizes – up to 3X larger GPU framebuffer than the M60 – for supporting your most demanding users. The P40 provides utilization and flexibility to your NVIDIA Quadro vDWS solution helping you drive down overall TCO
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords.
The CUDA Toolkit from NVIDIA provides everything you need to develop GPU-accelerated applications. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime.
Performance Specifications for NVIDIA Tesla P4, P40 and V100 Accelerators
|Tesla V100: The Universal Datacenter GPU||Tesla P4 for Ultra-Efficient Scale-Out Servers||Tesla P40 for Inference Throughput Servers|
|Single-Precision Performance (FP32)||14 teraflops (PCIe)
15.7 teraflops (SXM2)
|5.5 teraflops||12 teraflops|
|Half-Precision Performance (FP16)||112 teraflops (PCIe)
125 teraflops (SXM2)
|Integer Operations (INT8)||—||22 TOPS*||47 TOPS*|
|GPU Memory||16 GB HBM2||8 GB||24 GB|
|Memory Bandwidth||900 GB/s||192 GB/s||346 GB/s|
|System Interface/Form Factor||Dual-Slot, Full-Height PCI Express Form Factor
SXM2 / NVLink
|Low-Profile PCI Express Form Factor||Dual-Slot, Full-Height PCI Express Form Factor|
|50 W/75 W||250 W|
|Hardware-Accelerated Video Engine||—||1x Decode Engine, 2x Encode Engines||1x Decode Engine, 2x Encode Engines|
*Tera-Operations per Second with Boost Clock Enabled