This Lenovo reference architecture
describes an entry-level, cluster architecture using Lenovo ThinkSystem compute
servers and ThinkSystem DM Series storage systems optimized for Artificial
Intelligence (AI) training workflows accelerated by GPUs. The architecture enables
small and medium sized teams where most compute jobs are single node (single or
multi-GPU) or distributed over a few computational nodes.
This document covers testing and validation of the compute/storage
configuration consisting of four accelerated ThinkSystem SR670 servers and an
entry-level 10GbE network connected ThinkSystem DM storage system, providing an
efficient and cost-effective solution for small and medium-sized organizations
starting out with AI that require the enterprise-grade capabilities of ONTAP®
cloud-connected data storage available with DM Series storage.
This document is intended for Data scientists and data engineers who are looking for
efficient ways to achieve deep learning (DL) and machine learning (ML) development
goals, Enterprise architects who design solutions for the development of AI models
and software, and IT decision makers and business leaders who want to achieve the
fastest time to market possible from AI initiatives.
