AI Inference Workloads: Overcoming Challenges of taking AI to Production
AI and MLOps engineers in federal agencies often struggle to deploy models on GPUs. Most Al research initiatives never make it to production. Why? Researchers are facing bottlenecks due to static allocations of GPUs, and different technology sets complicate moving models from training to production.
During this on-demand session, you will learn from our experts how to:
- Run multiple inference workloads on the same GPU by using the concept of fractional GPUs
- Remove the bottlenecks which prevent almost 80% of workflows from reaching production
- Get dynamic MIG slices for each new job when using the NVIDIA A100 GPU
- Improve GPU utilization when running inference workloads
- Maintain high throughput and low latency for model serving
Speaker and Presenter Information
Guy Salton, Global Head of SE, Run:ai
Relevant Government Agencies
Other Federal Agencies, Federal Government, State & Local Government
Event Type
On-Demand Webcast
This event has no exhibitor/sponsor opportunities
Cost
Complimentary: $ 0.00
Where
Free Webinar https://ca
Website
Click here to visit event website
Event Sponsors
Organizer
Run.AI Government Team at Carahsoft