“1530 - 1700 Workshop 4B : Do you know how well your model is doing? Evaluate your LLMs”
Cheuk Ting Ho;
Workshop
Large Language Models (LLMs) are becoming central to modern applications, yet effectively evaluating their performance remains a significant challenge. How do you objectively compare different models, benchmark the impact of fine-tuning, or ensure your LLM responses adhere to safety guidelines (guard-railing)? This hands-on workshop addresses these critical questions.
We will begin with an essential revision of the Hugging Face Transformers library, covering basic LLM inference and fine-tuning. The core of the workshop will introduce and provide deep practice with Lighteval, an efficient and powerful LLM evaluation framework. Participants will learn how to leverage Lighteval to compare various LLMs available on the Hugging Face Hub using a range of pre-built tasks and metrics.
Finally, we will delve into advanced evaluation techniques, focusing on creating custom tasks and metrics tailored to unique, real-world application requirements. Participants will learn how to prepare custom datasets on the Hugging Face Hub and integrate them into Lighteval for precise, domain-specific evaluation. By the end of this workshop, you will possess the practical skills to rigorously evaluate, benchmark, and fine-tune your LLMs with confidence.
“Generative AI & The Future of Python Developers”
Thu Ya Kyaw;
Keynote
AI is the hottest topic in tech right now, and Python is the undisputed language driving it. What does this mean for you? To stay ahead of the curve, we must move past vibe coding and building simple chatbots. We need to look ahead and apply real engineering skills. This keynote explores the practical path forward: from understanding Pydantic for robust data validation to mastering agentic skills by leveraging LangChain and Agent Development Kits.