OxyCon 2024: key insights into the future of web scraping
The fifth edition of OxyCon, the flagship conference of the public web data collection industry, concluded last week with resounding success.
Organised by Oxylabs, a premium web intelligence collection platform, the event brought together data-gathering professionals and tech leaders for a day filled with insightful presentations, engaging discussions, and practical tips. The highlight of the conference was the introduction of Oxylabs' innovation, OxyCopilot, the first AI copilot for web scraping.
Julius Černiauskas, CEO of Oxylabs, welcomed the participants, emphasising the importance of collaboration and knowledge sharing within the web scraping community in the rapidly evolving industry landscape.
The company's Chief Customer Officer, Gabrielė Montvilė, provided a concise market overview, focusing on three crucial areas: AI adoption, ethical and compliant web scraping, and advancing anti-bot techniques.
"As all of these trends intersect, businesses must balance maintaining their own scrapers, their infrastructure, their costs with relying on third-party vendors and data-as-a-service solutions as they become more and more important today," said Montvilė.
Twelve experts shared their expertise throughout the conference through six presentations and a panel discussion. Žydrūnas Tamašauskas, CTO at Oxylabs, kicked off presentations by offering practical advice on various aspects of scaling data collection infrastructure, including selecting proxy providers and managing loads.
Vilius Visockas, CEO of City Now, shared his experience of growing a real estate intelligence startup, highlighting the human and technical aspects of building scraping pipelines.
Tadas Gedgaudas, developer at Oxylabs, explained how to mimic mouse movements when scraping to emulate organic user behaviour using Bezier, Gaussian, and Perlin algorithms. Gedgaudas finished by introducing Oxy Mouse, an open-source tool that provides a hassle-free way to utilise this technique.
In the following session, Paul Felby, CTO at Adthena, demonstrated how multi-agent architecture can improve the performance of large language models (LLMs) used for knowledge mining. Aleksandras Šulženko continued the AI theme, explaining how Oxylabs' AI-layer architecture underlying Web Scraper API, the unified web scraping platform, addresses the main challenges of web data collection.
Šulženko's presentation ended with a demonstration of how OxyCopilot can follow natural language prompts to build parsers in less than a minute. OxyCon attendees and viewers were the first to witness this latest major innovation of the public web data-gathering industry at work.
"If you're happy with what it does, feel free to use it. If not, rest assured that we analyse what prompts you submit and how well they work so that we are able to improve upon your experience and give you better results," Šulženko concluded the live demonstration.
The final session started with Nerijus Šveistys, senior legal counsel at Oxylabs, explaining different approaches to AI regulation by the US, the European Union, and China, and what these differences mean for business and innovation. Šveistys overviewed the recent major lawsuits related to AI and data collection and stressed the importance of staying up-to-date with the evolving legal landscape, finishing with friendly advice: "Make sure to be mindful of these changes when doing anything in terms of data scraping for AI training, or simply make sure to register to OxyCon next year and we will make our best effort to update you."
The conference concluded with a panel discussion on advanced anti-blocking strategies, hosted by Oxylabs' COO Juras Juršėnas. The panelists were Brecht Stamper, Senior Crawler Engineer at Lighthouse Intelligence, Hocine Amrane, Agile and Rollout Director at Data Impact, Jonny Smyth, CTO at Ceartas DMCA, Paulius Gervė, python developer at Oxylabs, and Carl Eklof, Senior Director at Wiser Solutions. They discussed the present and future of the web scraping cat-and-mouse game. The experts agreed that this game has no end in sight, as websites keep perfecting their anti-bot measures while data-driven businesses develop solutions to circumvent them.
The overwhelmingly positive feedback from attendees reflects the immense success of the fifth OxyCon, leaving participants eager for future editions. After Tamašauskas' presentation, Fabien Vauchelles, the creator of Scrapoxy, said: "I'm absolutely blown away by Žydrunas' talk. He just dropped the latest insights on the web scraping stack requirements for 2024–2025, and it was pure gold."
Pierluigi Vinciguerra, Co-Founder and CTO at DataBoutique, commented: "The event was a fantastic showcase of the strides we're making in web scraping technology. It's clear that staying at the forefront of innovation is key in this ever-evolving field. Can't wait to see where we'll go from here." Vinciguerra, who is also the creator of The Web Scraping Club, highlighted presentations by Tamašauskas and Gedgaudas, and the introduction of OxyCopilot as stand-out points of the event.
Everyone in the web scraping community can benefit from the insights of OxyCon by watching free on-demand videos of the presentations. Register at the official conference website to access the videos.