Supermicro SYS-2029GP-TR 2U 6xNVIDIA Tesla V100/P40/P4 GPU Intel Scalable SP Xeon 2S 2TB 8×2.5 SAS/SATA SIOM R2000W BigData AI Deep Learning Server

X11 Servers Featuring New Intel Skylake Scalable Xeon® Processors

Supermicro’s new X11 servers are engineered to unleash the full performance and rich feature sets on the new Intel® Xeon® Scalable processor family, supporting more cores and higher TDP envelopes of 205 watts and higher, increased number of memory channels and higher bandwidth, more PCI-E 3.0 lanes, 100G/40G/25G/10G Ethernet, 100G EDR InfiniBand (on select servers) and integrated  Intel® Omni-Path Architecture networking fabrics. The elevated compute performance, density, I/O capacity, and efficiency are coupled with industry’s most comprehensive support for NVMe NAND Flash and Intel® Optane SSDs for unprecedented application responsiveness and agility. For exact sever specifications, please see highlights below and also refer to detailed technical specifications.

“At Supermicro, we understand that customers need the newest technologies as early as possible to drive leading performance and improved TCO. With the industry’s strongest and broadest product line, our designs not only take full advantage of Xeon Scalable Processors’ new features such as three UPI, faster DIMMs and more core count per socket, but they also fully support NVMe through unique non-blocking architectures to achieve the best data bandwidth and IOPS.  For instance, one Supermicro 2U storage server can deliver over 16 million IOPS!”

“Supermicro designs the most application-optimized GPU systems and offers the widest selection of GPU-optimized servers and workstations in the industry. Our high performance computing solutions enable deep learning, engineering and scientific fields to scale out their compute clusters to accelerate their most demanding workloads and achieve fastest time-to-results with maximum performance per watt, per square foot and per dollar. With our latest innovations incorporating the new NVIDIA V100 PCI-E and V100 SXM2 GPUs in performance-optimized 1U and 4U systems with next-generation NVLink, our customers can accelerate their applications and innovations to help solve the world’s most complex and challenging problems.”  

Charles Liang, President and CEO of Supermicro

Server Systems Management

Supermicro Server Manager (SSM) provides capabilities to monitor the health of server components including memory, hard drives and RAID controllers. It enables the datacenter administrator to monitor and manage power usage across all Supermicro servers allowing users to maximize their CPU payload while mitigating the risk of tripped circuit. Firmware upgrades on Supermicro servers became easier now with a couple of clicks. Administrators can now mount an ISO image on multiple servers and reboot the servers with those images. The tool also provides pre-defined reports and many more features that will make managing Supermicro servers simpler. Download the SSM_brochure for more info or download Supermicro SuperDoctor® device monitoring and management software.

Technical Specifications

Product SKUs
SYS-2029GP-TR
  • SuperServer 2029GP-TR (Black)
Motherboard
Super X11DPG-SN
Processor/Cache
CPU
  • Dual Socket P (LGA 3647)
  • Intel® Xeon® Scalable Processors,
    Dual UPI up to 10.4GT/s
  • Support CPU TDP 70-205W
Cores
  • Up to 28 Cores with Intel® HT Technology
GPU Support
System Memory
Memory Capacity
  • 16 DIMM slots
  • Up to 2TB ECC 3DS LRDIMM, 2TB ECC RDIMM, DDR4 up to 2666MHz
Memory Type
  • 2666/2400/2133MHz ECC DDR4 SDRAM
On-Board Devices
Chipset
  • Intel® C621 chipset
SATA
  • SATA3 (6Gbps) with RAID 0, 1, 5, 10
IPMI
  • Support for Intelligent Platform Management Interface v.2.0
  • IPMI 2.0 with virtual media over LAN and KVM-over-LAN support
Graphics
  • ASPEED AST2500 BMC
Input / Output
SATA
  • 2 SATA3 (6Gbps) ports, 2 miniSAS ports
LAN
  • 1 RJ45 Dedicated IPMI LAN port
USB
  • 2 USB 3.0 ports (rear)
Video
  • 1 VGA port
Serial Header
  • 1 Fast UART 16550 header
System BIOS
BIOS Type
  • AMI 256Mb SPI Flash ROM
Management
Software
Power Configurations
  • ACPI / APM Power Management
PC Health Monitoring
CPU
  • Monitors for CPU Cores, Chipset Voltages, Memory.
  • 4+1 Phase-switching voltage regulator
FAN
  • Fans with tachometer monitoring
  • Status monitor for speed control
  • Pulse Width Modulated (PWM) fan connectors
Temperature
  • Monitoring for CPU and chassis environment
  • Thermal Control for fan connectors
Chassis
Form Factor
  • 2U Rackmount
Model
  • CSE-218GH-R2K03B
Dimensions and Weight
Height
  • 3.5″ (89mm)
Width
  • 17.2″ (437mm)
Depth
  • 31″ (787mm)
Package
  • 26.5″ (H) x 11″ (W) x 44.5″ (D)
Weight
  • Net Weight: 39.5 lbs (17.9 kg)
  • Gross Weight: 54 lbs (24.5 kg)
Available Color
  • Black
Front Panel
Buttons
  • Power On/Off button
  • System Reset button
LEDs
  • Power LED
  • Hard drive activity LED
  • Network activity LEDs
  • System Overheat LED / Fan fail LED /
    UID LED
Expansion Slots
PCI-Express
  • 6 PCI-E 3.0 x16 slots
  • 1 PCI-E 3.0 x8 (in x16, LP) slot
Drive Bays
Hot-swap
  • 10 Hot-swap 2.5″ drive bays
System Cooling
Fans
  • 5x 8cm Heavy duty fans with optimal fan speed control
Power Supply
2000W 1U Redundant Power Supplies with PMBus
Total Output Power
  • 1000W/1800W/1980W/2000W
Dimension
(W x H x L)
  • 73.5 x 40 x 265 mm
Input
  • 100-120Vac / 12.5-10.5A / 50-60Hz
  • 200-220Vac / 10-9.5A / 50-60Hz
  • 220-230Vac / 10-9.8A / 50-60Hz
  • 230-240Vac / 10-9.8A / 50-60Hz
  • 200-240Vac / 10-9.8A (UL/cUL only)
  • 200-240Vdc / 10-9.1A (CCC only)
+12V
  • Max: 83.3A / Min: 0.1A (100-120Vac)
  • Max: 150A / Min: 0.1A (200-220Vac)
  • Max: 165A / Min: 0.1A (220-230Vac)
  • Max: 166.7A / Min: 0.1A (230-240Vac)
  • Max: 166.7A / Min: 0.1A (200-240Vac) (UL/cUL only)
12V SB
  • Max: 3.5A / Min: 0A
Output Type
  • 25 Pairs Gold Finger Connector
Certification Platinum Level Certified94%+  Platinum Level
Test Report ]
Operating Environment
RoHS
  • RoHS Compliant
Environmental Spec.
  • Operating Temperature:
    10°C ~ 35°C (50°F ~ 95°F)
  • Non-operating Temperature:
    -40°C to 60°C (-40°F to 140°F)
  • Operating Relative Humidity:
    8% to 90% (non-condensing)
  • Non-operating Relative Humidity:
    5% to 95% (non-condensing)

Supermicro SYS-1029GP-TR 1U 3xNVIDIA Tesla V100/P40 GPU Skylake Scalable SP Xeon 2S 2TB 2×2.5SAS/SATA SIOM R1600W BigData CAD O&G Deep Learning Server

X11 Servers Featuring New Intel Skylake Scalable Xeon® Processors

Supermicro’s new X11 servers are engineered to unleash the full performance and rich feature sets on the new Intel® Xeon® Scalable processor family, supporting more cores and higher TDP envelopes of 205 watts and higher, increased number of memory channels and higher bandwidth, more PCI-E 3.0 lanes, 100G/40G/25G/10G Ethernet, 100G EDR InfiniBand (on select servers) and integrated  Intel® Omni-Path Architecture networking fabrics. The elevated compute performance, density, I/O capacity, and efficiency are coupled with industry’s most comprehensive support for NVMe NAND Flash and Intel® Optane SSDs for unprecedented application responsiveness and agility. For exact sever specifications, please see highlights below and also refer to detailed technical specifications.

“At Supermicro, we understand that customers need the newest technologies as early as possible to drive leading performance and improved TCO. With the industry’s strongest and broadest product line, our designs not only take full advantage of Xeon Scalable Processors’ new features such as three UPI, faster DIMMs and more core count per socket, but they also fully support NVMe through unique non-blocking architectures to achieve the best data bandwidth and IOPS.  For instance, one Supermicro 2U storage server can deliver over 16 million IOPS!”

“Supermicro designs the most application-optimized GPU systems and offers the widest selection of GPU-optimized servers and workstations in the industry. Our high performance computing solutions enable deep learning, engineering and scientific fields to scale out their compute clusters to accelerate their most demanding workloads and achieve fastest time-to-results with maximum performance per watt, per square foot and per dollar. With our latest innovations incorporating the new NVIDIA V100 PCI-E and V100 SXM2 GPUs in performance-optimized 1U and 4U systems with next-generation NVLink, our customers can accelerate their applications and innovations to help solve the world’s most complex and challenging problems.”  

Charles Liang, President and CEO of Supermicro

Server Systems Management

Supermicro Server Manager (SSM) provides capabilities to monitor the health of server components including memory, hard drives and RAID controllers. It enables the datacenter administrator to monitor and manage power usage across all Supermicro servers allowing users to maximize their CPU payload while mitigating the risk of tripped circuit. Firmware upgrades on Supermicro servers became easier now with a couple of clicks. Administrators can now mount an ISO image on multiple servers and reboot the servers with those images. The tool also provides pre-defined reports and many more features that will make managing Supermicro servers simpler. Download the SSM_brochure for more info or download Supermicro SuperDoctor® device monitoring and management software.

Technical Specifications

Product SKUs
SYS-1029GP-TR
  • SuperServer 1029GP-TR (Black)
Motherboard
Super X11DPG-SN
Processor/Cache
CPU
  • Dual Socket P (LGA 3647)
  • Intel® Xeon® Scalable Processors,
    3 UPI up to 10.4GT/s
  • Support CPU TDP 70-165W*
Cores
  • Up to 28 Cores with Intel® HT Technology
Note * 165W support ambient up to 25C.
GPU Support
System Memory
Memory Capacity
  • 16 DIMM slots
  • Up to 2TB ECC 3DS LRDIMM, 2TB ECC RDIMM, DDR4 up to 2666MHz
Memory Type
  • 2666/2400/2133MHz ECC DDR4 SDRAM
On-Board Devices
Chipset
  • Intel® C621 chipset
SATA
  • SATA3 (6Gbps) with RAID 0, 1, 5, 10
IPMI
  • Support for Intelligent Platform Management Interface v.2.0
  • IPMI 2.0 with virtual media over LAN and KVM-over-LAN support
Graphics
  • ASPEED AST2500 BMC
Input / Output
SATA
  • 2 SATA3 (6Gbps) ports, 2 miniSAS ports
LAN
  • 1 RJ45 Dedicated IPMI LAN port
USB
  • 2 USB 3.0 ports (rear)
Video
  • 1 VGA port
Serial Header
  • 1 Fast UART 16550 header
System BIOS
BIOS Type
  • AMI 32Mb SPI Flash ROM
Management
Software
Power Configurations
  • ACPI / APM Power Management
PC Health Monitoring
CPU
  • Monitors for CPU Cores, Chipset Voltages, Memory.
  • 4+1 Phase-switching voltage regulator
FAN
  • Fans with tachometer monitoring
  • Status monitor for speed control
  • Pulse Width Modulated (PWM) fan connectors
Temperature
  • Monitoring for CPU and chassis environment
  • Thermal Control for fan connectors
Chassis
Form Factor
  • 1U Rackmount
Model
  • CSE-118GH-R1K66B2
Dimensions and Weight
Height
  • 1.7″ (43mm)
Width
  • 17.2″ (437mm)
Depth
  • 30.6″ (777mm)
Weight
  • Net Weight: 45 lbs (15.9 kg)
  • Gross Weight: 58 lbs (21.8 kg)
Available Color
  • Black
Front Panel
Buttons
  • Power On/Off button
  • System Reset button
LEDs
  • Power LED
  • Hard drive activity LED
  • Network activity LEDs
  • System Overheat LED / Fan fail LED /
    UID LED
Expansion Slots
PCI-Express
  • 4 PCI-E 3.0 x16 (FHFL) slots
  • 1 PCI-E 3.0 x8 (in x16, LP) slot
Drive Bays
Hot-swap
  • 4 Hot-swap 2.5″ drive bays
System Cooling
Fans
  • 10 Heavy duty 4cm counter-rotating fans with air shroud & optimal fan speed control
Power Supply
1600W 1U Redundant Power Supplies with PMBus
Total Output Power
  • 1000W/1600W
Dimension
(W x H x L)
  • 73.5 x 40 x 265 mm
Input
  • 100-127Vac / 12.9A Max / 50-60Hz
  • 200-240Vac / 9.5A Max / 50-60Hz
+12V
  • Max: 82A / Min: 0.1A (100-127Vac)
  • Max: 132A / Min: 0.1A (200-240Vac)
12V SB
  • Max: 2A / Min: 0.2A
Output Type
  • 25 Pairs Gold Finger Connector
Certification Platinum Level Certified94%+  Platinum Level
Test Report ]
Operating Environment
RoHS
  • RoHS Compliant
Environmental Spec.
  • Operating Temperature:
    10°C ~ 35°C (50°F ~ 95°F)
  • Non-operating Temperature:
    -40°C to 60°C (-40°F to 140°F)
  • Operating Relative Humidity:
    8% to 90% (non-condensing)
  • Non-operating Relative Humidity:
    5% to 95% (non-condensing)

HPE ProLiant DL385 Gen10 878714-B21 AMD EPYC 7251 (Max 2)16GB ECC E208i-a 8 SFF Max 3 AMD Radeon / NVIDIA V100 GPU Virtualization Deep Learning Server

Featuring New AMD EPYC™ 7000 Series Processors

HPE’s new AMD servers are engineered to unleash the full performance and rich feature sets on the new AMD EPYC™ 7000 Series processor family with more cores, more memory, more I/O, and more security. AMD EPYC powers your application with 32 cores, 64 threads, 8 memory channels with up to 2 TB of memory per socket, and 128 PCIe-3 lanes coupled with the industry’s first hardware-embedded x86 server security solution. With servers built on EPYC technology, cloud environments can drive greater scale and performance, virtualized datacenters can further increase consolidation ratios while delivering better performing virtual machines and Big Data, and analytics environments can collect and analyze larger data sets much faster. High performance applications in research labs can solve complex problem sets in a significantly accelerated manner. EPYC, with all the critical compute, memory, I/O, and security resources brought together in the SoC with the right ratios, delivers industry-leading performance and enables lower TCO. With the flexibility to choose from 8 to 32 cores, EPYC enables you to deploy the right hardware platforms to meet your workload needs from virtualized infrastructure to large-scale big-data and analytics platforms and legacy line-of-business applications. For exact sever specifications, please see highlights below and also refer to detailed technical specifications.

Technical Specifications

Processor : AMD
Processor family : AMD EPYC™ 7000 Series
Processor core available : 8, per processor
Processor cache : 32.00 MB L3
Processor : AMD EPYC™ 7251 (8 core, 2.1 Ghz, 32 MB, 120W)
Processor number : 1 processor included
Processor speed : 2.1 GHz
Maximum memory : 4.0 TB with 128 GB DDR4 [2]
Memory slots : 32
Memory type : HPE DDR4 SmartMemory
Memory, standard : 16 GB (1x 16 GB) RDIMMs
Memory protection features : ECC
Drive type : 8 or 12 LFF SAS/SATA/SSD 8, 10, 16, 18 or 24 SFF SAS/SATA/SSD 6 SFF rear drive optional or 3 LFF rear drive optional and 2 SFF rear drive optional 24 SFF NVMe optional NVMe support via Express Bay will limit maximum drive capacity
Included hard drives : None ship standard, 8 SFF supported
Optical drive type : Optional; None ship standard
Infrastructure management : HPE iLO Standard with Intelligent Provisioning (embedded), HPE OneView Standard (requires download) (standard) HPE iLO Advanced, HPE iLO Advanced Premium Security Edition, and HPE OneView Advanced (optional)
Power supply type : 1 HPE 500W Flexible Slot Power Supply
Expansion slots : 8, for detail descriptions reference the QuickSpecs
Network controller : HPE 1 Gb 331i Ethernet adapter 4-ports per controller
Storage controller : HPE Smart Array E208i-a SR Gen10 Controller
System fan features : Standard
Form factor : 2U
Warranty : 3/3/3 – Server Warranty includes three years of parts, three years of labor, three years of on-site support coverage. Additional information regarding worldwide limited warranty and technical support is available at: http://h20564.www2.hpe.com/hpsc/wc/public/home. Additional HPE support and service coverage for your product can be purchased locally. For information on availability of service upgrades and the cost for these service upgrades, refer to the HPE website at http://www.hpe.com/support.
Product dimensions (H x W x D) : 3.44 x 17.54 x 28.75 in
Weight : 32.6 lb

Dihuni OptiReady Supermicro 9029GP-TNVRT-V16-1 HGX-2 16x NVIDIA Tesla V100 SXM3 32GB NVLink2 GPU 2S Scalable Xeon NVMe 2x10GbE Deep Learning Server

NVSwitch and NVLink Performance

NVSwitch enables every GPU to communicate with every other GPU at full bandwidth of 2.4TB/sec to solve the largest of AI and HPC problems. Every GPU has full access to 0.5TB of aggregate HBM2 memory to handle the most massive of datasets. By enabling a unified server node, NVSwitch dramatically accelerates complex AI and HPC applications.

Designed for Next Generation AI

AI models are exploding in complexity and require large memory, multiple GPUs, and an extremely fast connection between the GPUs to work. With NVSwitch connecting all GPUs and unified memory, HGX-2 provides the power to handle these new models for faster training of advanced AI. A single HGX-2 replaces 300 CPU-powered servers, saving significant cost, space, and energy in the data center.

Ready for Max Density HPC

HPC applications require strong server nodes with the computing power to perform a massive number of calculations per second. Increasing the compute density of each node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For HPC simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors to facilitate computation, making GPUs connected by NVSwitch ideal. A single HGX-2 server replaces 60 CPU-only servers.

X11 Servers Featuring New Intel Skylake Scalable Xeon® Processors

Supermicro’s new X11 servers are engineered to unleash the full performance and rich feature sets on the new Intel® Xeon® Scalable processor family, supporting more cores and higher TDP envelopes of 205 watts and higher, increased number of memory channels and higher bandwidth, more PCI-E 3.0 lanes, 100G/40G/25G/10G Ethernet, 100G EDR InfiniBand (on select servers) and integrated  Intel® Omni-Path Architecture networking fabrics. The elevated compute performance, density, I/O capacity, and efficiency are coupled with industry’s most comprehensive support for NVMe NAND Flash and Intel® Optane SSDs for unprecedented application responsiveness and agility. For exact sever specifications, please see highlights below and also refer to detailed technical specifications.

“Supermicro’s new SuperServer based on the HGX-2 platform will deliver more than double the performance of current systems, which will help enterprises address the rapidly expanding size of AI models that sometimes require weeks to train,” said Charles Liang, president and CEO of Supermicro. “Our new HGX-2 system will enable efficient training of complex models. It combines sixteen Tesla V100 32GB SXM3 GPUs connected via NVLink and NVSwitch to work as a unified 2 PetaFlop accelerator with half a terabyte of aggregate GPU memory to deliver unmatched compute power.”

Charles Liang, President and CEO of Supermicro

Server Systems Management

Supermicro Server Manager (SSM) provides capabilities to monitor the health of server components including memory, hard drives and RAID controllers. It enables the datacenter administrator to monitor and manage power usage across all Supermicro servers allowing users to maximize their CPU payload while mitigating the risk of tripped circuit. Firmware upgrades on Supermicro servers became easier now with a couple of clicks. Administrators can now mount an ISO image on multiple servers and reboot the servers with those images. The tool also provides pre-defined reports and many more features that will make managing Supermicro servers simpler. Download the SSM_brochure for more info or download Supermicro SuperDoctor® device monitoring and management software.

Posted on

Dihuni Introduces Supermicro’s NVIDIA HGX-2 based 16 Tesla V100 32GB SXM2 GPU Server for Deep Learning, AI, HPC and IoT Predictive Analytics

We are pleased to announce our plans to introduce Supermicro’s upcoming NVIDIA® HGX-2 based cloud server platform which the company describes as the world’s most powerful system for artificial intelligence (AI) and high-performance computing (HPC) capable of performing at 2 PetaFLOPS. Continue reading Dihuni Introduces Supermicro’s NVIDIA HGX-2 based 16 Tesla V100 32GB SXM2 GPU Server for Deep Learning, AI, HPC and IoT Predictive Analytics

HPE NVIDIA Tesla V100 GPU 32GB HBM2 Volta CUDA PCIe for Accelerated Machine Deep Learning AI BigData Finance Oil Gas CAD HPC Physics Research

Double Memory than Previous Generation V100

The Tesla V100 GPU, widely adopted by the world’s leading researchers, has received a 2x memory boost to handle the most memory-intensive deep learning and high performance computing workloads.

Now equipped with 32GB of memory, Tesla V100 GPUs will help data scientists train deeper and larger deep learning models that are more accurate than ever. They can also improve the performance of memory-constrained HPC applications by up to 50 percent compared with the previous 16GB version.

GroundBreaking Volta Architecture

By pairing CUDA Cores and Tensor Cores within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPU servers for traditional HPC and Deep Learning. TENSOR CORE Equipped with 640 Tensor Cores, Tesla V100 delivers 125 TeraFLOPS of deep learning performance. That’s 12X Tensor FLOPS for DL Training, and 6X Tensor FLOPS for DL Inference when compared to NVIDIA Pascal™ GPUs.

Next Generation NVLink

NVIDIA NVLink in Tesla V100 delivers 2X higher throughput compared to the previous generation. Up to eight Tesla V100 accelerators can be interconnected at up to 300 GB/s to unleash the highest application performance possible on a single server. HBM2 With a combination of improved raw bandwidth of 900 GB/s and higher DRAM utilization efficiency at 95%, Tesla V100 delivers 1.5X higher memory bandwidth over Pascal GPUs as measured on STREAM.

Maximum Efficiency Mode

The new maximum efficiency mode allows data centers to achieve up to 40% higher compute capacity per rack within the existing power budget. In this mode, Tesla V100 runs at peak processing efficiency, providing up to 80% of the performance at half the power consumption.

Programmability

Tesla V100 is architected from the ground up to simplify programmability. Its new independent thread scheduling enables finer-grain synchronization and improves GPU utilization by sharing resources among small jobs.

CUDA Ready

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords.

The CUDA Toolkit from NVIDIA provides everything you need to develop GPU-accelerated applications. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime.

Performance Specifications for NVIDIA Tesla P4, P40 and V100 Accelerators

Tesla V100: The Universal Datacenter GPU Tesla P4 for Ultra-Efficient Scale-Out Servers Tesla P40 for Inference Throughput Servers
Single-Precision Performance (FP32) 14 teraflops (PCIe)
15.7 teraflops (SXM2)
5.5 teraflops 12 teraflops
Half-Precision Performance (FP16) 112 teraflops (PCIe)
125 teraflops (SXM2)
Integer Operations (INT8) 22 TOPS* 47 TOPS*
GPU Memory 16/32 GB HBM2 8 GB 24 GB
Memory Bandwidth 900 GB/s 192 GB/s 346 GB/s
System Interface/Form Factor Dual-Slot, Full-Height PCI Express Form Factor
SXM2 / NVLink
Low-Profile PCI Express Form Factor Dual-Slot, Full-Height PCI Express Form Factor
Power 250W (PCIe)
300W (SXM2)
50 W/75 W 250 W
Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engines 1x Decode Engine, 2x Encode Engines

*Tera-Operations per Second with Boost Clock Enabled

HPE NVIDIA Tesla V100 16GB GPU HBM2 Volta CUDA PCIe for Accelerated Machine Deep Learning AI BigData Finance Oil Gas CAD HPC Physics Research

GroundBreaking Volta Architecture

By pairing CUDA Cores and Tensor Cores within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPU servers for traditional HPC and Deep Learning. TENSOR CORE Equipped with 640 Tensor Cores, Tesla V100 delivers 125 TeraFLOPS of deep learning performance. That’s 12X Tensor FLOPS for DL Training, and 6X Tensor FLOPS for DL Inference when compared to NVIDIA Pascal™ GPUs.

Next Generation NVLink

NVIDIA NVLink in Tesla V100 delivers 2X higher throughput compared to the previous generation. Up to eight Tesla V100 accelerators can be interconnected at up to 300 GB/s to unleash the highest application performance possible on a single server. HBM2 With a combination of improved raw bandwidth of 900 GB/s and higher DRAM utilization efficiency at 95%, Tesla V100 delivers 1.5X higher memory bandwidth over Pascal GPUs as measured on STREAM.

Maximum Efficiency Mode

The new maximum efficiency mode allows data centers to achieve up to 40% higher compute capacity per rack within the existing power budget. In this mode, Tesla V100 runs at peak processing efficiency, providing up to 80% of the performance at half the power consumption.

Programmability

Tesla V100 is architected from the ground up to simplify programmability. Its new independent thread scheduling enables finer-grain synchronization and improves GPU utilization by sharing resources among small jobs.

CUDA Ready

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords.

The CUDA Toolkit from NVIDIA provides everything you need to develop GPU-accelerated applications. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime.

Performance Specifications for NVIDIA Tesla P4, P40 and V100 Accelerators

Tesla V100: The Universal Datacenter GPU Tesla P4 for Ultra-Efficient Scale-Out Servers Tesla P40 for Inference Throughput Servers
Single-Precision Performance (FP32) 14 teraflops (PCIe)
15.7 teraflops (SXM2)
5.5 teraflops 12 teraflops
Half-Precision Performance (FP16) 112 teraflops (PCIe)
125 teraflops (SXM2)
Integer Operations (INT8) 22 TOPS* 47 TOPS*
GPU Memory 16/32 GB HBM2 8 GB 24 GB
Memory Bandwidth 900 GB/s 192 GB/s 346 GB/s
System Interface/Form Factor Dual-Slot, Full-Height PCI Express Form Factor
SXM2 / NVLink
Low-Profile PCI Express Form Factor Dual-Slot, Full-Height PCI Express Form Factor
Power 250W (PCIe)
300W (SXM2)
50 W/75 W 250 W
Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engines 1x Decode Engine, 2x Encode Engines

*Tera-Operations per Second with Boost Clock Enabled

Lenovo ThinkSystem NVIDIA Tesla V100 GPU 16GB HBM2 Volta CUDA PCIe for Accelerated Machine Deep Learning AI Finance Oil Gas CAD HPC Physics Research

Lenovo ThinkSystem servers support GPU technology to accelerate different computing workloads, maximize performance for graphic design, virtualization, artificial intelligence and high performance computing applications in Lenovo servers.

The following table summarizes the server support for the GPUs. The numbers listed in the server columns represent the number of GPUs supported.

GroundBreaking Volta Architecture

By pairing CUDA Cores and Tensor Cores within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPU servers for traditional HPC and Deep Learning. TENSOR CORE Equipped with 640 Tensor Cores, Tesla V100 delivers 125 TeraFLOPS of deep learning performance. That’s 12X Tensor FLOPS for DL Training, and 6X Tensor FLOPS for DL Inference when compared to NVIDIA Pascal™ GPUs.

Next Generation NVLink

NVIDIA NVLink in Tesla V100 delivers 2X higher throughput compared to the previous generation. Up to eight Tesla V100 accelerators can be interconnected at up to 300 GB/s to unleash the highest application performance possible on a single server. HBM2 With a combination of improved raw bandwidth of 900 GB/s and higher DRAM utilization efficiency at 95%, Tesla V100 delivers 1.5X higher memory bandwidth over Pascal GPUs as measured on STREAM.

Maximum Efficiency Mode

The new maximum efficiency mode allows data centers to achieve up to 40% higher compute capacity per rack within the existing power budget. In this mode, Tesla V100 runs at peak processing efficiency, providing up to 80% of the performance at half the power consumption.

Programmability

Tesla V100 is architected from the ground up to simplify programmability. Its new independent thread scheduling enables finer-grain synchronization and improves GPU utilization by sharing resources among small jobs.

CUDA Ready

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords.

The CUDA Toolkit from NVIDIA provides everything you need to develop GPU-accelerated applications. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime.

Performance Specifications for NVIDIA Tesla P4, P40 and V100 Accelerators

Tesla V100: The Universal Datacenter GPU Tesla P4 for Ultra-Efficient Scale-Out Servers Tesla P40 for Inference Throughput Servers
Single-Precision Performance (FP32) 14 teraflops (PCIe)
15.7 teraflops (SXM2)
5.5 teraflops 12 teraflops
Half-Precision Performance (FP16) 112 teraflops (PCIe)
125 teraflops (SXM2)
Integer Operations (INT8) 22 TOPS* 47 TOPS*
GPU Memory 16/32 GB HBM2 8 GB 24 GB
Memory Bandwidth 900 GB/s 192 GB/s 346 GB/s
System Interface/Form Factor Dual-Slot, Full-Height PCI Express Form Factor
SXM2 / NVLink
Low-Profile PCI Express Form Factor Dual-Slot, Full-Height PCI Express Form Factor
Power 250W (PCIe)
300W (SXM2)
50 W/75 W 250 W
Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engines 1x Decode Engine, 2x Encode Engines

*Tera-Operations per Second with Boost Clock Enabled

Lenovo ThinkSystem NVIDIA Tesla V100 GPU 32GB HBM2 Volta CUDA PCIe for Accelerated Machine Deep Learning AI Finance Oil Gas CAD HPC Physics Research

Lenovo ThinkSystem servers support GPU technology to accelerate different computing workloads, maximize performance for graphic design, virtualization, artificial intelligence and high performance computing applications in Lenovo servers.

The following table summarizes the server support for the GPUs. The numbers listed in the server columns represent the number of GPUs supported.

Double Memory than Previous Generation V100

The Tesla V100 GPU, widely adopted by the world’s leading researchers, has received a 2x memory boost to handle the most memory-intensive deep learning and high performance computing workloads.

Now equipped with 32GB of memory, Tesla V100 GPUs will help data scientists train deeper and larger deep learning models that are more accurate than ever. They can also improve the performance of memory-constrained HPC applications by up to 50 percent compared with the previous 16GB version.

GroundBreaking Volta Architecture

By pairing CUDA Cores and Tensor Cores within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPU servers for traditional HPC and Deep Learning. TENSOR CORE Equipped with 640 Tensor Cores, Tesla V100 delivers 125 TeraFLOPS of deep learning performance. That’s 12X Tensor FLOPS for DL Training, and 6X Tensor FLOPS for DL Inference when compared to NVIDIA Pascal™ GPUs.

Next Generation NVLink

NVIDIA NVLink in Tesla V100 delivers 2X higher throughput compared to the previous generation. Up to eight Tesla V100 accelerators can be interconnected at up to 300 GB/s to unleash the highest application performance possible on a single server. HBM2 With a combination of improved raw bandwidth of 900 GB/s and higher DRAM utilization efficiency at 95%, Tesla V100 delivers 1.5X higher memory bandwidth over Pascal GPUs as measured on STREAM.

Maximum Efficiency Mode

The new maximum efficiency mode allows data centers to achieve up to 40% higher compute capacity per rack within the existing power budget. In this mode, Tesla V100 runs at peak processing efficiency, providing up to 80% of the performance at half the power consumption.

Programmability

Tesla V100 is architected from the ground up to simplify programmability. Its new independent thread scheduling enables finer-grain synchronization and improves GPU utilization by sharing resources among small jobs.

CUDA Ready

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords.

The CUDA Toolkit from NVIDIA provides everything you need to develop GPU-accelerated applications. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime.

Performance Specifications for NVIDIA Tesla P4, P40 and V100 Accelerators

Tesla V100: The Universal Datacenter GPU Tesla P4 for Ultra-Efficient Scale-Out Servers Tesla P40 for Inference Throughput Servers
Single-Precision Performance (FP32) 14 teraflops (PCIe)
15.7 teraflops (SXM2)
5.5 teraflops 12 teraflops
Half-Precision Performance (FP16) 112 teraflops (PCIe)
125 teraflops (SXM2)
Integer Operations (INT8) 22 TOPS* 47 TOPS*
GPU Memory 16/32 GB HBM2 8 GB 24 GB
Memory Bandwidth 900 GB/s 192 GB/s 346 GB/s
System Interface/Form Factor Dual-Slot, Full-Height PCI Express Form Factor
SXM2 / NVLink
Low-Profile PCI Express Form Factor Dual-Slot, Full-Height PCI Express Form Factor
Power 250W (PCIe)
300W (SXM2)
50 W/75 W 250 W
Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engines 1x Decode Engine, 2x Encode Engines

*Tera-Operations per Second with Boost Clock Enabled

Dihuni OptiReady Supermicro 4029GP-TRT2-V100-1 4U 5x NVIDIA Tesla V100 32GB GPU 2S Xeon 4116 2.1GHz 128GB 250GBSSD 1TBHDD 2x10GbE Deep Learning Server

X11 Servers Featuring New Intel Skylake Scalable Xeon® Processors

Supermicro’s new X11 servers are engineered to unleash the full performance and rich feature sets on the new Intel® Xeon® Scalable processor family, supporting more cores and higher TDP envelopes of 205 watts and higher, increased number of memory channels and higher bandwidth, more PCI-E 3.0 lanes, 100G/40G/25G/10G Ethernet, 100G EDR InfiniBand (on select servers) and integrated  Intel® Omni-Path Architecture networking fabrics. The elevated compute performance, density, I/O capacity, and efficiency are coupled with industry’s most comprehensive support for NVMe NAND Flash and Intel® Optane SSDs for unprecedented application responsiveness and agility. For exact sever specifications, please see highlights below and also refer to detailed technical specifications.

“At Supermicro, we understand that customers need the newest technologies as early as possible to drive leading performance and improved TCO. With the industry’s strongest and broadest product line, our designs not only take full advantage of Xeon Scalable Processors’ new features such as three UPI, faster DIMMs and more core count per socket, but they also fully support NVMe through unique non-blocking architectures to achieve the best data bandwidth and IOPS.  For instance, one Supermicro 2U storage server can deliver over 16 million IOPS!”

“Supermicro designs the most application-optimized GPU systems and offers the widest selection of GPU-optimized servers and workstations in the industry. Our high performance computing solutions enable deep learning, engineering and scientific fields to scale out their compute clusters to accelerate their most demanding workloads and achieve fastest time-to-results with maximum performance per watt, per square foot and per dollar. With our latest innovations incorporating the new NVIDIA V100 PCI-E and V100 SXM2 GPUs in performance-optimized 1U and 4U systems with next-generation NVLink, our customers can accelerate their applications and innovations to help solve the world’s most complex and challenging problems.”  

Charles Liang, President and CEO of Supermicro

Support for 8 Double Width GPUs for Deep Learning

The 4029GP-TRT2 takes full advantage of the new Xeon Scalable Processor Family  PCIe lanes to support 8 double-width GPUs to deliver a very high performance Artificial Intelligence and Deep Learning system suitable for autonomous cars, molecular dynamics, computational biology, fluid simulation, advanced physics and Internet of Things (IoT) and Big Data Analytics etc. With NVIDIA Tesla cards, this server delivers unparalleled acceleration for compute intensive applications.

Server Systems Management

Supermicro Server Manager (SSM) provides capabilities to monitor the health of server components including memory, hard drives and RAID controllers. It enables the datacenter administrator to monitor and manage power usage across all Supermicro servers allowing users to maximize their CPU payload while mitigating the risk of tripped circuit. Firmware upgrades on Supermicro servers became easier now with a couple of clicks. Administrators can now mount an ISO image on multiple servers and reboot the servers with those images. The tool also provides pre-defined reports and many more features that will make managing Supermicro servers simpler. Download the SSM_brochure for more info or download Supermicro SuperDoctor® device monitoring and management software.

Technical Specifications

Mfr Part # SYS-4029GP-TRT2
Motherboard Super X11DPG-OT-CPU
CPU Dual Socket P (LGA 3647); Intel® Xeon® Scalable Processors,
Dual UPI up to 10.4GT/s; Dual UPI up to 10.4GT/s; Support CPU TDP 70-205W2 x Intel Skylake Xeon Silver 4116 2.1 GHz 12 Core CPU Installed
Cores Up to 28 Cores with Intel® HT Technology
GPU / Coprocessor Support Please refer to: Compatible GPU list
Memory Capacity 24 DIMM slots; Up to 3TB ECC 3DS LRDIMM, 1TB ECC RDIMM, DDR4 up to 2666MHz

128 GB DDR4-2666MHz (32GB x 4) Installed

Memory Type 2666/2400/2133MHz ECC DDR4 SDRAM
Chipset Intel® C622 chipset
SATA SATA3 (6Gbps) with RAID 0, 1, 5, 10
Network Controllers Dual Port 10GbE from C622
IPMI Support for Intelligent Platform Management Interface v.2.0; IPMI 2.0 with virtual media over LAN and KVM-over-LAN support
Graphics ASPEED AST2500 BMC
SATA 10 SATA3 (6Gbps) ports
LAN 2 RJ45 10GBase-T LAN ports; 1 RJ45 Dedicated IPMI LAN port
USB 4 USB 3.0 ports (rear)
Video 1 VGA Connector
COM Port 1 COM port (rear)
BIOS Type AMI 32Mb SPI Flash ROM
Software Intel® Node Manager; IPMI 2.0; KVM with dedicated LAN; SSM, SPM, SUM; ,; SuperDoctor® 5; Watchdog
CPU Monitors for CPU Cores, Chipset Voltages, Memory.; 4+1 Phase-switching voltage regulator
FAN Fans with tachometer monitoring; Status monitor for speed control; Pulse Width Modulated (PWM) fan connectors
Temperature Monitoring for CPU and chassis environment; Thermal Control for fan connectors
Form Factor 4U Rackmountable; Rackmount Kit (MCP-290-00057-0N)
Model CSE-418GTS-R4000B
Height 7.0″ (178mm)
Width 17.2″ (437mm)
Depth 29″ (737mm)
Net Weight: 80 lbs (36.2 kg); Gross Weight: 135 lbs (61.2 kg)
Available Colors Black
Hot-swap Up to 24 Hot-swap 2.5″ SAS/SATA drive bays; 8x 2.5″ drives supported natively

  • 1 x SamsungSM863a 240GB SATA 6Gb/s,VNAND,V48,2.5″,7mm SSD Installed
  • 1 x Seagate 2.5″ 1TB SATA 6Gb/s, 7.2K RPM, 4kN, 128MB HDD Installed
PCI-Express 11 PCI-E 3.0 x16 (FH, FL) slots; 1 PCI-E 3.0 x8 (FH, FL, in x16) slot

  • 5 x NVIDIA V100 32GB PCIe3.0 GPU Installed
Fans 8 Hot-swap 92mm cooling fans
Shrouds 1 Air Shroud (MCP-310-41808-0B)
Total Output Power 1000W/1800W/1980W/2000W
Dimension
(W x H x L)
73.5 x 40 x 265 mm
Input 100-120Vac / 12.5-9.5A / 50-60Hz; 200-220Vac / 10-9.5A / 50-60Hz; 220-230Vac / 10-9.8A / 50-60Hz; 230-240Vac / 10-9.8A / 50-60Hz; 200-240Vac / 11.8-9.8A / 50-60Hz (UL/cUL only)
+12V Max: 83.3A / Min: 0A (100-120Vac); Max: 150A / Min: 0A (200-220Vac); Max: 165A / Min: 0A (220-230Vac); Max: 166.7A / Min: 0A (230-240Vac); Max: 166.7A / Min: 0A (200-240Vac) (UL/cUL only)
12Vsb Max: 2.1A / Min: 0A
Output Type 25 Pairs Gold Finger Connector
Certification Titanium Level; [ Test Report ]
RoHS RoHS Compliant
Environmental Spec. Operating Temperature:
10°C ~ 35°C (50°F ~ 95°F); Non-operating Temperature:
-40°C to 60°C (-40°F to 140°F); Operating Relative Humidity:
8% to 90% (non-condensing); Non-operating Relative Humidity:
5% to 95% (non-condensing)