Haowen Huang
Haowen Huang is currently a Senior Developer Advocate at Amazon Web Services (AWS). He has over 20 years of experience in the telecommunications, internet, and cloud computing industries. He has previously worked for companies such as Microsoft, Sun, and China Telecom. He currently focuses on creating and sharing technical content in the areas of generative AI, large language models (LLMs), machine learning, and data science, and empowering developers around the world.
Hong Kong
Company / Organisation –Amazon
Session
This talk will cover various aspects of optimizing Large Language Models (LLMs) with Python, including quick start, availability optimization, and throughput optimization. Explore cutting-edge techniques involved in areas such as model compilation, model compression, model inference batching, distributed training, and Large Model Inference (LMI) containers. Discover practical examples of optimizing some open-source models using techniques like LMI containers, Low-Rank Adaptation (LoRA), Fully Sharded Data Parallelism (FSDP), Paged Attention, Rolling Batch, and more.