PyCon Hong Kong 2024

0.17 PyCon Hong Kong 2024 pyconhk2024 2024-11-16 2024-11-17 2 00:05 https://pretalx.com Asia/Hong_Kong LT9 Opening Remarks by PyCon HK Short talk 2024-11-16T10:00:00+08:00 10:00 00:10 Opening remarks and announcement pyconhk2024-56372-opening-remarks-by-pycon-hk /media/pyconhk2024/submissions/BUTWTL/logo_with_text_rOR5xUv.gif Scotty Kwok en false https://pretalx.com/pyconhk2024/talk/BUTWTL/ https://pretalx.com/pyconhk2024/talk/BUTWTL/feedback/ LT9 Opening Remarks by CityU HK Lightning talk 2024-11-16T10:10:00+08:00 10:10 00:10 Opening Speech by College of Computing CityU HK pyconhk2024-56575-opening-remarks-by-cityu-hk /media/pyconhk2024/submissions/QVV8XV/CityU_Logo_0uSHmkn.svg Dr. Ray Cheung en Opening Speech by College of Computing CityU HK false https://pretalx.com/pyconhk2024/talk/QVV8XV/ https://pretalx.com/pyconhk2024/talk/QVV8XV/feedback/ LT9 [Keynote] PyCon Hong Kong - the Story Lightning talk 2024-11-16T10:25:00+08:00 10:25 00:05 [Pre-recorded Video] Passionate volunteer start the PyCon HK in Hong Kong in 2015, which turns the local python community to its next decade. In this short video, Sammy tells the story of PyCon HK and how did he involve it. pyconhk2024-54806--keynote-pycon-hong-kong-the-story Perspectives Sammy Fung en false https://pretalx.com/pyconhk2024/talk/XAFXDU/ https://pretalx.com/pyconhk2024/talk/XAFXDU/feedback/ LT9 [Sponsored Keynote] Large Language Models Optimization with Python Talk 2024-11-16T10:30:00+08:00 10:30 00:30 This talk will cover various aspects of optimizing Large Language Models (LLMs) with Python, including quick start, availability optimization, and throughput optimization. Explore cutting-edge techniques involved in areas such as model compilation, model compression, model inference batching, distributed training, and Large Model Inference (LMI) containers. Discover practical examples of optimizing some open-source models using techniques like LMI containers, Low-Rank Adaptation (LoRA), Fully Sharded Data Parallelism (FSDP), Paged Attention, Rolling Batch, and more. pyconhk2024-52540--sponsored-keynote-large-language-models-optimization-with-python LLM Haowen Huang en The talk will delve into techniques and strategies for optimizing Large Language Models (LLMs) with Python. It focuses on addressing the computational challenges associated with training and deploying these models efficiently. One key aspect discussed is model parallelism, which involves distributing the model across multiple devices or instances to overcome memory limitations. Tensor parallelism, a form of model parallelism, is explored, where individual tensors are split across devices. Pipeline parallelism, another technique, enables concurrent execution of different model components on separate devices. The talk will also cover distributed training strategies, such as data parallelism and tensor-parallel language models, which leverage multiple devices to accelerate training. Techniques for reducing memory footprint, like quantization, pruning, and distillation, are explored as means to optimize LLM deployment. Optimizations for inference are also discussed, including model compression methods like quantization-aware training, pruning, and distillation. Kernel fusion, a technique that combines multiple operations into a single optimized kernel, is highlighted for improving inference performance. Additionally, the document explores accelerated inference using hardware accelerators. The talk aims to provide guidance on leveraging Python's capabilities for efficient LLM training, deployment, and inference. It covers a range of strategies and techniques to address the computational challenges associated with these models, enabling researchers and practitioners to optimize LLMs for improved performance and cost-effectiveness. false https://pretalx.com/pyconhk2024/talk/W9X8DD/ https://pretalx.com/pyconhk2024/talk/W9X8DD/feedback/ LT9 Sign and verify Python package with Sigstore keyless signing Short talk 2024-11-16T11:10:00+08:00 11:10 00:15 Organizations are challenged in ensuring that the container image they are deploying is exactly what was produced in development and nothing has changed before it runs in production. Cryptographic signing of container images helps to verify the integrity of the image and makes sure it has not been tampered since its creation. Verification of the image signature also confirms that the expected software creator, whose identity was certified at the moment of signing, published the container image in their possession. In this presentation, I will use an open source project “Sigstore”: a cryptographic signature tool that is for improving software supply chain security. The Sigstore framework empowers software developers and consumers to securely sign and verify software artifacts. Signatures are generated with ephemeral signing keys so there’s no need to manage keys. Signing events are recorded in a tamper-resistant public log so software developers can audit signing events. pyconhk2024-56461-sign-and-verify-python-package-with-sigstore-keyless-signing Libraries / Tools /media/pyconhk2024/submissions/RUHCB8/images_qcKRBEw.png Frankie Ng zh-hant false https://pretalx.com/pyconhk2024/talk/RUHCB8/ https://pretalx.com/pyconhk2024/talk/RUHCB8/feedback/ LT9 Network Automation for Improved Efficiency using Ansible and Python Short talk 2024-11-16T11:25:00+08:00 11:25 00:15 This presentation highlights how network automation with Python and Ansible enhances management efficiency. It addresses the challenges of traditional manual network tasks, which are time-consuming, error-prone, and hard to scale. By showcasing Python's and Ansible's capabilities, the presentation demonstrates how automation leads to improved efficiency, reduced errors, and faster deployments, allowing network administrators to focus on critical tasks and ensuring a more reliable and secure network environment. pyconhk2024-53881-network-automation-for-improved-efficiency-using-ansible-and-python DevOps TIMOTHY LAM en This presentation explores how network automation can significantly enhance management efficiency. Traditional manual network configuration, provisioning, and maintenance are time-consuming, error-prone, and difficult to scale with network growth. Ansible, a popular open-source tool, is introduced as a key component for automating network configurations and workflows. We explain Ansible's agentless architecture and the use of playbooks—YAML files that define automated tasks for network devices. Through code examples and a sample playbook, we demonstrate how Python scripts can interact with network devices and be integrated into Ansible playbooks, enabling the automation of complex tasks efficiently. false https://pretalx.com/pyconhk2024/talk/NA7LPP/ https://pretalx.com/pyconhk2024/talk/NA7LPP/feedback/ LT9 Build AI-powered RAG application with MySQL9.0 Talk 2024-11-16T11:50:00+08:00 11:50 00:30 In this session, we will show you how you can quickly build an AI-powered RAG application with MySQL9.0 and python: * Build your own document repository in vector store in MySQL 9.0 * Integrate LLM in your application to process questions * Generate context-aware answers from your vector store pyconhk2024-55026-build-ai-powered-rag-application-with-mysql9-0 LLM Ryan Kuan en false https://pretalx.com/pyconhk2024/talk/YXQRCF/ https://pretalx.com/pyconhk2024/talk/YXQRCF/feedback/ LT9 Two roads diverged: the gap between web development and data science in Python Short talk 2024-11-16T12:25:00+08:00 12:25 00:15 While being a Data Scientist was once touted the sexiest job of the 21st century, how different is the life of a Python Web Developer? The author, who has served in both roles, shares her take on the difference (or lack thereof) between the two professions. pyconhk2024-53231-two-roads-diverged-the-gap-between-web-development-and-data-science-in-python Perspectives Chan Sau Yee en ## Goal This talk aims to illustrate the differences and similarities between the Data Scientist and the Web Developer in Python. This talk will be of interest to an audience who is contemplating which path to take, or simply curious about how Python is used in different capacities. ## Outline - Part 1 (5 min): Mutual Myths. In this section I will discuss some common misconceptions about being a Data Scientist or Web Developer. - Part 2 (5 min): Transferrable Skills. In this section I will focus on the day-to-day responsibilities of the Web Developer, highlighting skills that are transferrable or learnable by the Data Scientist. - Part 3 (5 min): Recommendations. In this section, I will attempt to make some actionable recommendations for those contemplating to switch to the Software Engineering path in Python. false https://pretalx.com/pyconhk2024/talk/THDBMS/ https://pretalx.com/pyconhk2024/talk/THDBMS/feedback/ LT9 End-to-end GPU Acceleration for scikit-learn and XGBoost Short talk 2024-11-16T12:40:00+08:00 12:40 00:15 With the ever-growing data size and the increasing complexity of data science workflows, high-performance computing becomes crucial for data scientists to tackle real-world problems. Attendees will learn to leverage RAPIDS projects with GPUs to accelerate and scale up scikit-learn and XGBoost model training workflows. pyconhk2024-55032-end-to-end-gpu-acceleration-for-scikit-learn-and-xgboost Performance Jiaming Yuan en This talk will explore GPU acceleration beyond deep learning models and provide an overview of GPU-accelerated data science workflows. Python’s rich ecosystem has made it one of the most popular programming languages today. RAPIDS offers a suite of open-source Python libraries and primitives to accelerate core data science libraries, including pandas, scikit-learn, and NetworkX, without requiring any code changes. Additionally, the latest XGBoost integrates with RAPIDS to deliver a fully accelerated model training experience. We will demonstrate how to enable a GPU-accelerated end-to-end pipeline for training scikit-learn and XGBoost models, highlighting the significant speed improvements for various scikit-learn estimators. Then, we will delve into new features that facilitate scaling XGBoost using the latest NVIDIA Grace Hopper superchip to handle large datasets. We can discuss some details about the implementation and share our experience with GPU acceleration. Finally, we will outline our roadmap for future developments. false https://pretalx.com/pyconhk2024/talk/HN3F8L/ https://pretalx.com/pyconhk2024/talk/HN3F8L/feedback/ LT9 Python to Deploy Enterprise-Grade Delta Lake on AWS Talk 2024-11-16T14:00:00+08:00 14:00 00:30 This session explores how enterprises can build robust transactional data lakes using the open-source Delta Lake format and Python tools. The presenters will first discuss the exponential growth in enterprise data volumes and how data lakes provide a compelling solution to cost-effectively retain and extract value from vast amounts of structured and unstructured data. The session outlines key limitations of traditional data lakes, such as the lack of database-like capabilities for efficient updates, maintaining performance at scale, and ensuring data consistency. To address these challenges, the session will showcase how the Delta Lake format, along with complementary Python tools like PySpark and Delta-rs, can be leveraged on AWS to build highly optimized and manageable data lake architectures. The presenters will dive into two real-world use cases, covering both large-scale batch processing and smaller-scale data workloads, highlighting best practices and architectural patterns for Python developers. pyconhk2024-57439-python-to-deploy-enterprise-grade-delta-lake-on-aws Libraries / Tools Jacky KwokAlan, Ka Hei Ng en false https://pretalx.com/pyconhk2024/talk/PCZC3H/ https://pretalx.com/pyconhk2024/talk/PCZC3H/feedback/ LT9 Autonomous AI Agents for Dummies Talk 2024-11-16T14:40:00+08:00 14:40 00:30 AI Agentic workflows will drive massive AI progress this year. This is what Professor Andrew Ng said about the rise of agents. With the growing popularity of large language models, Agents are what everyone is talking about. In simple terms, Agents can be defined as LLMs with the ability to self-reason and plan, just like humans. In my talk, I will focus on how to build an Autonomous Agentic workflow and the components required. Additionally, I will cover the concepts of planning and reasoning Agentic prompting such as REACT, LATS and so on to motivate the audience to stay updated with the Agentic world. pyconhk2024-52073-autonomous-ai-agents-for-dummies LLM Tarun Jain en ### Problem Statment Large Language Models (LLMs) like GPT-4 have several limitations that hinder their full potential. They often struggle with maintaining contextual understanding over extended conversations, leading to disjointed or repetitive interactions. Additionally, LLMs can lack accuracy, fail to provide real-time information, self-reasoning to decompose while planning the task. This is where Agents comes. Unlike traditional LLMs, AI agents are designed to self-reason and plan, mimicking human cognitive processes. They can interact with their environment, make decisions, and take actions autonomously. This capability enables them to overcome some of the contextual and reasoning challenges that LLMs face, making them more suitable for complex, dynamic tasks. ### My talk will cover: - How a simple execution of task is formed by humans? - Agentic Workflow and the major components required: This includes: Task, Memory, Tools, Agents, LLM and so on. - Planning and Reasoning: Under this I will cover Chain of thoughts, REACT, LATS prompt techniques, that is used to build Agentic workflow. - Conclusion false https://pretalx.com/pyconhk2024/talk/ES7DKX/ https://pretalx.com/pyconhk2024/talk/ES7DKX/feedback/ LT9 Accelerating Python's performance with C and Cython Short talk 2024-11-16T15:15:00+08:00 15:15 00:15 Although Python is popular and known for rapid development and great ecosystem, performance may often be an issue when running certain types of tasks. For example, we ran into performance issues when we had to parse and chunk CSV files of arbitrary sizes. Utilising C for core functions and a wrapper with Cython, we were able to quickly build a lightweight Python module that improved the performance of our use case! Through this talk, we'd demonstrate how tools like Cython could make Python even more powerful. pyconhk2024-52959-accelerating-python-s-performance-with-c-and-cython Performance Leo Chen en false https://pretalx.com/pyconhk2024/talk/B9QPQF/ https://pretalx.com/pyconhk2024/talk/B9QPQF/feedback/ LT9 How to organize and deploy your Python applications with Docker Talk 2024-11-16T16:00:00+08:00 16:00 00:30 Deploying Python applications can be difficult. In this introductory talk, I will share what I learned to help you avoid common problems. We will learn how to set up your Python projects to make them easy to deploy in Docker containers, from simple scripts to medium-sized projects made of several packages. We will also look at how to use Docker Compose and understand its strengths and limits. By the end of this talk, you will know how to deploy your Python applications using Docker easily and effectively. pyconhk2024-53944-how-to-organize-and-deploy-your-python-applications-with-docker DevOps Cristiano Pierandrei en false https://pretalx.com/pyconhk2024/talk/TDYGRG/ https://pretalx.com/pyconhk2024/talk/TDYGRG/feedback/ LT9 Algorithmic Artistry: Musical Ideas for Pythonists Talk 2024-11-16T16:40:00+08:00 16:40 00:30 The almighty Python language has earned its widespread popularity. Can we build a music machine with Python that creates captivating music, with mind-blowing sounds and instruments we haven't heard before? There are numerous Python libraries for music manipulation and generation. In this talk, we will explore interactions ranging from sound synthesis to algorithmic music making, catering to both coding musicians and musical coders, all within the Python ecosystem! (Materials are available as a Colab Notebook: https://tinyurl.com/pycon24-music ) pyconhk2024-54239-algorithmic-artistry-musical-ideas-for-pythonists Perspectives Chuck-jee Chau en false https://pretalx.com/pyconhk2024/talk/ZB8QGA/ https://pretalx.com/pyconhk2024/talk/ZB8QGA/feedback/ LT9 Data Validation in Python Lightning talk 2024-11-16T17:20:00+08:00 17:20 00:05 For data engineers to set up reliable data pipelines, it is crucial to conduct validations at each step to ensure the high quality of the data. Great Expectations (GX) is an open source framework that provides an intuitive way -- Expectations -- to define and manage data quality. This lighting talk will explore how to leverage Great Expectations to automate data quality checks and increase transparency in your data pipeline. To demonstrate GX's features, we will explore two examples of Expectations (basic and conditional). pyconhk2024-52957-data-validation-in-python Lightning ⚡ Meixin Wang en false https://pretalx.com/pyconhk2024/talk/YPKQZM/ https://pretalx.com/pyconhk2024/talk/YPKQZM/feedback/ LT9 Async all the way: FastAPI and the ASGI era Lightning talk 2024-11-16T17:25:00+08:00 17:25 00:05 In the rapidly-evolving world of Python web development, the emergence of frameworks like FastAPI has changed how software engineers build highly-performant, asynchronous applications in Python. This lightning talk will explore how the FastAPI framework makes use of the power of an Asynchronous Server Gateway Interface (ASGI) web server to enable concurrent request handling, in contrast with the traditional Web Server Gateway Interface (WSGI) approach. Through a live demonstration, the differences between WSGI and ASGI applications will be illustrated. Synchronous applications can be blocked by long-running tasks such as I/O operations, whereas asynchronous web servers handle requests concurrently even while the thread is occupied with different tasks. By showcasing how FastAPI's design allows for concurrent processing of multiple requests without blocking, this talk will highlight the significant performance benefits and scalability advantages provided by ASGI-powered frameworks. pyconhk2024-52951-async-all-the-way-fastapi-and-the-asgi-era Lightning ⚡ Taemin Ha en false https://pretalx.com/pyconhk2024/talk/87VP87/ https://pretalx.com/pyconhk2024/talk/87VP87/feedback/ LT9 Creative Problem Solving with Graphs Lightning talk 2024-11-16T17:30:00+08:00 17:30 00:05 Graphs and networks are fundamental data structures that are rapidly growing in popularity with practicing engineers due to their use of simple elements like nodes and edges. Many real-world problems can be translated into graph problems, and we can use Python libraries, such as NetworkX, for creative problem-solving. Take a day-to-day example as simple as arranging desks in an office. Is there an optimal way to arrange people by organizational structure and/or proximity to their informal social network in order to facilitate a conducive workplace environment that also takes into account noise levels for at-desk meetings, etc.? If we were to model this in a graph data structure with nodes being people with different properties, and edges describing relationships, this will enable us to see things in different perspectives and to come up with some innovative solutions to this problem. pyconhk2024-52958-creative-problem-solving-with-graphs Lightning ⚡ Xiao Ying en false https://pretalx.com/pyconhk2024/talk/3RF9DP/ https://pretalx.com/pyconhk2024/talk/3RF9DP/feedback/ LT9 Interactive game with Mediapipe Lightning talk 2024-11-16T17:35:00+08:00 17:35 00:05 我哋時不時喺商場啲攤位見到啲互動遊戲-今次會講吓Python喺呢度點樣起作用 Everyone loves gaming - especially those with movements. In this we share how we made our interactive game and other "teamlab" ideas with Python tools such as Mediapipe. pyconhk2024-48836-interactive-game-with-mediapipe Lightning ⚡ Judy Wong zh-hant false https://pretalx.com/pyconhk2024/talk/ZR3XP3/ https://pretalx.com/pyconhk2024/talk/ZR3XP3/feedback/ LT9 How many iPhones does it take to ship a new Python version? Lightning talk 2024-11-16T17:40:00+08:00 17:40 00:05 For people new to Python, it can be hard to grasp the difference between, say, Python 3.5 and Python 3.6. This lightning talk aims to conceptualise the effort it requires to ship a new Python version, by comparing that to the development of iPhone models. pyconhk2024-58841-how-many-iphones-does-it-take-to-ship-a-new-python-version- Lightning ⚡ Chan Sau Yee en false https://pretalx.com/pyconhk2024/talk/UPTBUH/ https://pretalx.com/pyconhk2024/talk/UPTBUH/feedback/ LT9 匯豐咩價位? / How much is HSBC now? Lightning talk 2024-11-16T17:45:00+08:00 17:45 00:05 Using Playwright, you can write a program to harvest the stock price of HSBC in 5 minutes' work. pyconhk2024-58842--how-much-is-hsbc-now- Lightning ⚡ Dr. Adrian Tam zh-hant false https://pretalx.com/pyconhk2024/talk/JC3QG3/ https://pretalx.com/pyconhk2024/talk/JC3QG3/feedback/ LT9 Closing remarks Short talk 2024-11-16T17:55:00+08:00 17:55 00:15 Closing remarks and announcements pyconhk2024-56373-closing-remarks /media/pyconhk2024/submissions/HTGZMQ/logo_with_text_XHGMjJQ.gif Scotty Kwok en false https://pretalx.com/pyconhk2024/talk/HTGZMQ/ https://pretalx.com/pyconhk2024/talk/HTGZMQ/feedback/ LT8 Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model Talk 2024-11-16T11:10:00+08:00 11:10 00:30 [ONLINE presentation] This talk introduces TAIWAN-LLM, a pioneering Large Language Model specifically designed for Traditional Chinese as used in Taiwan. We'll discuss how TAIWAN-LLM addresses the underrepresentation of Traditional Chinese in existing language models, bridging the linguistic and cultural divide. The presentation will cover our approach to developing a culturally aligned model, including the use of a comprehensive Taiwanese corpus, instruction fine-tuning, and real user feedback incorporation. We'll share evaluation results demonstrating TAIWAN-LLM's superior performance in understanding and generating Traditional Chinese text compared to existing models. pyconhk2024-52838-taiwan-llm-bridging-the-linguistic-divide-with-a-culturally-aligned-language-model LLM /media/pyconhk2024/submissions/LY8XCV/DALLE_2023-12-25_22.58.49_-_A_cute_cartoon-style_llama_in_24XhaeG.png Yenting Lin en This will be an online presentation. In this presentation, we'll dive deep into the development and capabilities of TAIWAN-LLM, the first Large Language Model tailored for Traditional Chinese speakers in Taiwan. Key topics include: 1. The challenge of linguistic underrepresentation in existing LLMs 2. Our three-phase methodology: Continue-Pretraining, Supervised Fine-Tuning, and Feedback Supervised Fine-Tuning 3. The composition and curation of our Taiwanese corpus 4. Evaluation results on various NLP tasks, including contextual QA, summarization, and classification 5. Real-world applications and use cases of TAIWAN-LLM 6. The importance of culturally aligned language models for preserving linguistic diversity We'll also discuss the open-source release of TAIWAN-LLM and its potential impact on NLP research and applications for Traditional Chinese. false https://pretalx.com/pyconhk2024/talk/LY8XCV/ https://pretalx.com/pyconhk2024/talk/LY8XCV/feedback/ LT8 Hackman: IoT & membership platform on Raspberry Pi & Nix for Hong Kong's first Hackerspace Talk 2024-11-16T11:50:00+08:00 11:50 00:30 We build everything ourselves at Dim Sum Labs, Hong Kong's first Hackerspace. This includes Hackman, the IoT system that manages our membership and controls the space's door, lights, appliances, and electricity usage. Hackman is written in Python and runs entirely on a Raspberry Pi 5. We will bring a replica of Hackman and some attachments (we can't bring the door, though) and give a live demo. You will hear about how we use Nix, DevOps/GitOps, a staging Raspberry Pi, and software engineering practices that make it reliable. We will show how it can be easily extended using Redis. pyconhk2024-53872-hackman-iot-membership-platform-on-raspberry-pi-nix-for-hong-kong-s-first-hackerspace DevOps /media/pyconhk2024/submissions/K7VYAH/DSL-logo_EYz3d5p.png Nigel Choi en Hackman is a Hackerspace management system for Dim Sum Labs, Hong Kong's first Hackerspace. It is a Django application running on a Raspberry Pi 5 that manages our membership, controls access to the space, and much more. It serves a critical function for our space hence it must be reliable. Our members should easily add features to it while keeping the core stable. In this talk, we will talk about how we use various software engineering practices to make it stable and reliable while making it extensible. We'll discuss: * How the Django application is structured * How we use Redis as a means to easily extend the system * How we use Nix to make the operating system and the application easily reproducible * How we use DevOps/GitOps along with ample tests to manage software deployments Dim Sum Labs is Hong Kong’s first and longest-running Hackerspace since 2011, open to anyone interested in hacking: the intellectual challenge to creatively overcome or otherwise “hack” the limitations, capabilities, purposes, forms, etc. of virtually anything — or in other words: to mess around and build anything for fun. The members extend this ethos to its membership management system and the IoT in the space. For more information, visit us at https://www.dimsumlabs.com/ . Hackman is open source and can be accessed at https://github.com/dimsumlabs/hackman false https://pretalx.com/pyconhk2024/talk/K7VYAH/ https://pretalx.com/pyconhk2024/talk/K7VYAH/feedback/ LT8 如果HK Python User Group唔壯大，錦鯉就大鑊了! Talk 2024-11-16T12:25:00+08:00 12:25 00:30 作為一個多年組織 OSC /PYCON的中年大叔，想分享一下如何經營本地社區，再分享外地的經驗，希望更多人可以參與到，讓我能有機會變回一個參加者身份。 pyconhk2024-53753-hk-python-user-group-- Perspectives /media/pyconhk2024/submissions/HZAX7P/0262039C-19CF-42A1-AA52-178E652AD20B_1_105_c_InLTyrF.jpeg Calvin Tsang zh-hant 我會分享過往超過十年 PYCON HK/ OSHK 的經驗，加左近年參加外地PYCON APAC之間的活動，去了解本地社群的不足，有更多事情可以改善，讓Python User Group 能夠有機地成長。包括宣傳，設計，計劃，操作各範疇。 false https://pretalx.com/pyconhk2024/talk/HZAX7P/ https://pretalx.com/pyconhk2024/talk/HZAX7P/feedback/ LT8 Numbast: Bridging the gap between CUDA C++ and Python Talk 2024-11-16T14:00:00+08:00 14:00 00:30 - Numba is a popular JIT compiler that translates Python code into optimized machine code for various hardware targets, and Numba-CUDA supports compilation of Python code for execution on NVIDIA devices. Whilst Numba-CUDA provides many basic accelerated programming function blocks out of the box, manually creating bindings for a CUDA device library is still laborious. - Numbast is an auto device binding generation tool created by NVIDIA. Numbast provides an end-to-end binding generation mechanism that quickly bridges the gap between the CUDA ecosystem and Python CUDA. - In this talk, attendees will learn about recent progress of accelerated computing in Python with Numba-CUDA, the internal mechanisms of Numbast, and get hands-on experience of crafting CUDA kernels in Numba-CUDA, as well as creating bindings with Numbast. - Additionally, we will provide an insight into how Numba is used across RAPIDS, Nvidia’s accelerated computing solution that focuses on making accelerated computing more accessible to the general python community. Time permitting, we will also introduce how user-defined functions (UDF) are used in `cudf.pandas`. pyconhk2024-55027-numbast-bridging-the-gap-between-cuda-c-and-python Performance Michael Wang en See Abstract. false https://pretalx.com/pyconhk2024/talk/UGFV3S/ https://pretalx.com/pyconhk2024/talk/UGFV3S/feedback/ LT8 Operate with Confidence -- OpenTelemetry in Python Talk 2024-11-16T14:40:00+08:00 14:40 00:30 In this era of microservices, the '3 Pillars of System Observability', a.k.a. **logging**, **metrics monitoring** and **traffic tracing** are pivotal in giving developers quick feedbacks on the **performance** and **behaviour** of their application. With good observability practices, not only developers can understand the bottleneck and stability of their app, they can even be benefited from faster iteration cycle due to reliable feedback loops. In this talk, the usage of **OpenTelemetry** package, a well-known open-source Observability stack, in Python will be addressed. OpenTelemetry is a **vendor-** and **tool-agnostic**, Observability stack integrating with a broad variety of Observability backends, including open source tools like **Jaeger** and **Prometheus**, as well as commercial offerings. With the introduction of usage of OpenTelemetry in Python, it is hoped that users have more **understanding on the '3 Pillars of System Observability'**, **actively and confidently monitor** their Python workloads, and define a **suitable and meaningful Service Level Objective (SLO)** for their program. pyconhk2024-52489-operate-with-confidence-opentelemetry-in-python Libraries / Tools Alex Au en This talk will be broken down into following sessions: - Introduction: 3 Pillars of System Observability (~5 mins) - OpenTelemetry on Python FastAPI and AWS Lambda, with Visualizations (~15 - 20 mins) - Why not just Logging? (~2 mins) - Why not just Metrics (with Prometheus)? (~2 mins) - Integration with other Cloud Monitoring Platforms (~2-3 mins) false https://pretalx.com/pyconhk2024/talk/9HHTWK/ https://pretalx.com/pyconhk2024/talk/9HHTWK/feedback/ LT8 Leveraging Multi-Models and Open WebUI to Mimic ChatGPT with Data Security Considerations Short talk 2024-11-16T15:15:00+08:00 15:15 00:15 In an era where data security and control are paramount, leveraging local and in-house AI solutions has become increasingly significant. This presentation will explore how to use Open WebUI to build on-device GPT models or in-house server-based GPT systems, offering robust alternatives to cloud-based AI solutions like Copilot, ChatGPT-4o. The focus will be on ensuring data remains local or within company control, addressing key security considerations. pyconhk2024-53649-leveraging-multi-models-and-open-webui-to-mimic-chatgpt-with-data-security-considerations LLM Dr. Chung Ng en The talk will cover the following aspects: ##### 1. Introduction to Open WebUI: ◦ Overview of Open WebUI, an extensible and feature-rich self-hosted WebUI designed to operate offline. ◦ Discussion on its capabilities to integrate various LLM runners, including Ollama and OpenAI-compatible APIs. ##### 2. Setting Up Open WebUI: ◦ Step-by-step guide on installing and configuring Open WebUI using Docker or Kubernetes for seamless deployment. ◦ Instructions on integrating GPU support for enhanced performance. ##### 3. Multi-Model Integration: ◦ Demonstration of how to leverage multiple models within Open WebUI, allowing for versatile and powerful interactions. ◦ Examples of using models such as LLaVA, Llama3, Phi-3 Mini, and more for diverse applications. ##### 4. Enhancing Functionality with Plugins: ◦ Introduction to the Pipelines Plugin Framework to incorporate custom logic and Python libraries. ◦ Examples of plugins for web search, document search, Discord integration, and more. ##### 5. Data Security and Control: ◦ Discussion on the importance of keeping data local or within company infrastructure. ◦ Best practices for ensuring data security and compliance while using in-house AI solutions. ##### 6. Building a Powerful Interface: ◦ Tips on extending Open WebUI to create a user interface similar to ChatGPT-4o. ◦ Leveraging features such as Markdown and LaTeX support, hands-free voice/video call, and retrieval-augmented generation (RAG) for a dynamic user experience. false https://pretalx.com/pyconhk2024/talk/GYCVTG/ https://pretalx.com/pyconhk2024/talk/GYCVTG/feedback/ LT8 Time to Skip Tedious Steps - Spare Efforts with PyTorch Lightning Talk 2024-11-16T16:00:00+08:00 16:00 00:30 With the rapid advancement in deep learning, models become super large and consume significant resources, making efficiency and simplicity more critical than ever. In this talk, we introduce PyTorch Lightning, a deep learning framework that emerges as a powerful tool that streamlines the process of building, training, and scaling models, allowing researchers and practitioners to focus on what truly matters: innovation. We will begin with an overview of PyTorch Lightning, discussing the key benefits it offers over traditional PyTorch. We will explore how PyTorch Lightning abstracts away the boilerplate code associated with model training, making it easier to implement and experiment with complex models. Then, we walk through the process of training a ResNet in PyTorch Lightning for image classification task and explore some advanced features in PyTorch Lightning. For those interested in revisiting the content from the talk, feel free to check out the links below: - GitHub Code Demo: [github/wyhwong/PyConHK2024-PyTorch-Lightning](https://github.com/wyhwong/PyConHK2024-PyTorch-Lightning) - Slides: [OneDrive](https://1drv.ms/p/c/7adfdf652c41fb6c/EZ-Bd0i38FJKmzw5_ZYT6UIBDWWKBVsM30SlBTa2R2Cx1A) pyconhk2024-52631-time-to-skip-tedious-steps-spare-efforts-with-pytorch-lightning Libraries / Tools Henry, Wai Yin Wong en false https://pretalx.com/pyconhk2024/talk/UVKEGD/ https://pretalx.com/pyconhk2024/talk/UVKEGD/feedback/ LT8 Simplifying Python Web App Operations: Automating K8s Ops with Open Source Talk 2024-11-16T16:40:00+08:00 16:40 00:30 After creating a great web app using Python such as with flask, the next hurdle to production is how to make it available to users and operate it. And not just your app, but also ingress, the database, observability and the list goes on. We will go through your options for simplifying the operations of your web app using open source tooling. This will include using k8s directly, helm charts, IaaC using pulumi and new tooling developed by Canonical using juju. By the end of the talk you will have seen the benefits and drawbacks of each which will help you make an informed decision on which tool best suits your needs! pyconhk2024-50952-simplifying-python-web-app-operations-automating-k8s-ops-with-open-source DevOps YangSoo Yoon en false https://pretalx.com/pyconhk2024/talk/WQJV7B/ https://pretalx.com/pyconhk2024/talk/WQJV7B/feedback/ LT7 Pydantic Logfire: Empowering Python Observability Talk 2024-11-16T11:10:00+08:00 11:10 00:30 Pydantic Logfire is an advanced observability platform tailored for Python applications, integrating seamlessly with the popular Pydantic library. Built on the principles of simplicity and power, Pydantic Logfire offers deep insights into application behavior through Python-centric telemetry, structured logging, and powerful SQL querying capabilities. Leveraging OpenTelemetry for comprehensive instrumentation, it ensures Python developers can efficiently monitor, debug, and optimize their codebases. From small scripts to enterprise deployments, Pydantic Logfire transforms raw data into actionable insights, simplifying the observability journey for Python developers. pyconhk2024-52539-pydantic-logfire-empowering-python-observability Libraries / Tools HEMANGI en false https://pretalx.com/pyconhk2024/talk/FJLLGF/ https://pretalx.com/pyconhk2024/talk/FJLLGF/feedback/ LT7 High Throughput Python Talk 2024-11-16T11:50:00+08:00 11:50 00:30 Python is infamous for its slowless and the GIL problem. In Python 3.4, asyncio was introduced to allow non-blocking I/O, and concurrent.futures was introduced for an easier syntax to write parallel code. But still, Python is not super fast. In this talk, I will show case 10+ ways of generating data in parallel in Python and compare their performance. The unfortunate conclusion is that nothing is always the best, even with the free-threaded Python of 3.13, the silver bullet does not exist. pyconhk2024-53730-high-throughput-python Performance Dr. Adrian Tam en The talk is focus on CPython and experiment is done on Apple Silicon. It is to generate numpy array of random float as an example, and later to extend into Python lists of floats. I will compare the data generation throughput amongst using multiprocessing, threading, and concurrency.futures modules in Python as well as numba and joblib external libraries. The key result is to highlight the trade off between threading vs multiprocessing, in which if you want to use multiprocessing to work around the GIL, you pay the price of inter-process communication overhead. Even with free-threaded Python in 3.13 that you can avoid the GIL, you can't find a solution that is always better. false https://pretalx.com/pyconhk2024/talk/LNUWEC/ https://pretalx.com/pyconhk2024/talk/LNUWEC/feedback/ LT7 Local 知識擂台LLM大格鬥 Talk 2024-11-16T12:25:00+08:00 12:25 00:30 [ONLINE presentation] In the LLM world, every one is using GPT-4 as the golden standard. Are there cases where smaller language model can outperform GPT4 in terms of its richness in language expression and wealth of local knowledge? We want to present our discovery work on evaluating a suite of LLMs in the area of Cantonese skills, Hong Kong local geography and social knowledge. We will demo our open-sourced platform for others to play with the Cantonese, Hong Kong-specific chatbot arena as well. 一齊黎挑機啦！ pyconhk2024-49762-local-llm LLM Winnie YeungMarcus Lau zh-hant This will be an online presentation. In the LLM world, every one is using GPT-4 as the golden standard. Are there cases where smaller language model can outperform GPT4 in terms of its richness in language expression and wealth of local knowledge? We want to present our discovery work on evaluating a suite of LLMs in the area of Cantonese skills, Hong Kong local geography and social knowledge. We will demo our open-sourced platform for others to play with the Cantonese, Hong Kong-specific chatbot arena as well. 一齊黎挑機啦！ false https://pretalx.com/pyconhk2024/talk/CEWPVB/ https://pretalx.com/pyconhk2024/talk/CEWPVB/feedback/ LT7 How do I debug my PySpark workloads? Talk 2024-11-16T14:00:00+08:00 14:00 00:30 PySpark is widely adopted for data analysis in distributed computing environments. It supports not only the standard DataFrame API but also Python User Defined Functions (UDFs), Python Data Sources, Python UDTFs, and more. However, debugging and profiling applications in such distributed environments are often challenging - you can't simply add a breakpoint and inspect variables in your IDE. In this presentation, I will demonstrate effective methods for debugging and profiling PySpark applications using existing tools. These include profiling tools that utilize cProfile, a standard Python profiler, along with various tricks and best practices for monitoring and debugging PySpark applications. pyconhk2024-53294-how-do-i-debug-my-pyspark-workloads- Libraries / Tools Hyukjin KwonAllison Wang en false https://pretalx.com/pyconhk2024/talk/VRPMHX/ https://pretalx.com/pyconhk2024/talk/VRPMHX/feedback/ LT7 Spark-less local data stack in 2024 Talk 2024-11-16T14:40:00+08:00 14:40 00:30 In 2024, the Composable Data Stack is getting more mature and it's only getting easier to mix tools for different use cases. The capabilities of local data stacks continue to grow with advancements in tools like Polars and DuckDB, the necessity of using Spark for end users is increasingly being questioned. Traditionally, Spark has been regarded as the most mature and reliable data processing framework, making it a default choice for many. However, the landscape has evolved significantly by 2024, with numerous libraries now offering more efficient and versatile local data processing solutions. This presentation will explore these new alternatives, focusing on: SQLFrame: A framework providing a Spark DataFrame API that can interface with different computing engines. Ibis: A unified API that seamlessly integrates dataframes and databases, eliminating the need to commit to a single engine. SQLGlot: A powerful tool for transpiling SQL queries between different dialects, enhancing compatibility and flexibility. Our goal is not to declare the obsolescence of Spark but to highlight efficient alternatives that may be better suited for specific environments and use cases. Attendees will gain insights into how these modern tools can be leveraged to optimize their local data processing workflows, potentially reducing the need for Spark in certain scenarios. pyconhk2024-50215-spark-less-local-data-stack-in-2024 Libraries / Tools Nok Lam Chan en false https://pretalx.com/pyconhk2024/talk/KZPSVD/ https://pretalx.com/pyconhk2024/talk/KZPSVD/feedback/ LT7 Power PyTorch Training with Centralized AI Data Lake and Advanced Data Selection Techniques Short talk 2024-11-16T15:15:00+08:00 15:15 00:15 AI data is often stored in separate silos: databases, parquets/ORC files in cloud storage, and embeddings in vector databases, creating complexities in data management. To address the above issue, the Lance columnar format is specifically designed for multimodal AI. It has unique combination of capabilities including fast scan and point query, storing large blobs inline, and zero-cost schema evolutions, enabling the creation of a centralized, massive-scale, all-in-one data lake that can store all kinds of AI data—structured, unstructured, and embeddings—in one cohesive dataset. Lance-Pytorch Dataset utilizes Lance’s embedded query engine. Written in Rust, it can quickly identify the most relevant and useful data for training without ever materializing such datasets using external systems. PyTorch training can leverage this unified data lake to seamlessly access and train from all data types, facilitating the creation of high-quality models. This approach allows organizations to train or fine-tune foundation models that encompass comprehensive organizational knowledge, and significantly accelerates the training process while maintaining model quality. pyconhk2024-52735-power-pytorch-training-with-centralized-ai-data-lake-and-advanced-data-selection-techniques Libraries / Tools Yang Cen en false https://pretalx.com/pyconhk2024/talk/BHXJZA/ https://pretalx.com/pyconhk2024/talk/BHXJZA/feedback/ LT7 Enhancing Web Image Accessibility for Visually Impaired Individuals with Gemini Pro Vision and Google Cloud Platform Talk 2024-11-16T16:00:00+08:00 16:00 00:30 # Background The inability of visually impaired individuals access image information due to the lack of adherence to W3C web accessibility initiatives by websites. Currently, about 60% of websites lack meaningful alternate text for their images. Moreover, it is unfeasible to retroactively add descriptive text to all existing websites manually. pyconhk2024-52103-enhancing-web-image-accessibility-for-visually-impaired-individuals-with-gemini-pro-vision-and-google-cloud-platform LLM /media/pyconhk2024/submissions/X9UZSA/Gemini_Pro_Vision_AI_Screen_Reader_Cover_4vOJl1M.png Cyrus WongMarkus TsangYIU Kelvin zh-hant # GeProVis AI Screen Reader GeProVis is an abbreviated term for Gemini Pro Vision, and my students have significantly enhanced the conventional Google ChromeVox Screen Reader by incorporating the robust capabilities of Google Gemini Pro Vision. This blog post will focus on the details of Google Cloud Platform (GCP). In brief, ChromeVox can extract the image source url and send it to GCP. In this talk, we will explain the technical details of the Python Google Cloud Function in this project. ### Tech Blog https://medium.com/google-developer-experts/enhancing-web-image-accessibility-for-visually-impaired-individuals-with-gemini-pro-vision-and-07190b97fc38 ### Story https://medium.com/google-developer-experts/hkiit-students-use-gemini-pro-vision-to-develop-ai-screen-reader-acace2a0f830 Hong Kong Google Cloud Summit 2024 GeProVis AI Screen Reader MD and GM, Google Hong Kong Michael Yue https://youtu.be/VqSYB62xrz8 ### Awards - City I&T Grand Challenge 2024 - Innovation Award - Google GDSC 2024 Solution Challenge Global Top 100 ### Speakers: Cyrus Wong - AWS ML Hero + Microsoft MVP - Azure AI + Google Developer Experts - GCP & AI/ML(GenAI) https://www.linkedin.com/in/cyruswong/ Hin Pak Markus Tsang (曾憲柏) - HKIIT 雲端系統及數據中心管理高級文憑課程 https://www.linkedin.com/in/hin-pak-markus-tsang-%E6%9B%BE%E6%86%B2%E6%9F%8F-327949b8/ Kelvin Yiu - AWS Cloud Club Captain & HKIIT 雲端系統及數據中心管理高級文憑課程 https://www.linkedin.com/in/kelvin-yiu-9a25b1290/ false https://pretalx.com/pyconhk2024/talk/X9UZSA/ https://pretalx.com/pyconhk2024/talk/X9UZSA/feedback/