Domagoj Marić
Domagoj Marić graduated from the Faculty of Electrical Engineering and Computing in Zagreb, where he initially specialized in the field of information security, while towards the end of his studies, he shifted more deeply into the field of data science with a focus on web content extraction (web scraping/crawling). He began his business career at Megatrend poslovna rješenja, where he worked on the development of Python applications and data solutions with a focus on creating virtual assistants and later as the head of the data science department. That role continued through his career at Comping. Today, he works at Pontis Technology as the AI customer delivery manager, leading the delivery of projects in the area of natural language processing, computer vision, predictive analytics and generative AI. In addition to responsibilities in the data science domain, he also has experience and works as a lecturer in the field of programming.
Session
To satisfy the need for data in generative and traditional AI, in a rapidly evolving environment, the ability to efficiently extract data from the web has become indispensable for businesses and developers. This presentation delves into the methodology and tools of web crawling and web scraping, with an overview of the ethical and legal side of the process, including the best practices on how to crawl politely and efficiently and use the data to not violate any privacy or intellectual property laws.