In machine learning applications, model inference costs are often critical to manage as business requirements expand. Minimizing inference costs, without compromising on performance, is essential to improve ROI and sustain a competitive edge in the market. AWS offers a wide variety of instances, based on CPUs as well as accelerators, and tools optimized for different machine learning use cases.
AWS Inferentia is designed to provide high performance inference in the cloud, to drive down the total cost of inference, and to make it easy for developers to integrate machine learning into their business applications. In this session, we will discuss strategies to optimize performance and cost of inference, for deep learning models, with AWS Inferentia.