Q&A with Oxylabs: the future of data analytics
Rytis Ulys holds over eight years of experience in various analytical and consulting roles in both startup businesses and enterprise-grade organisations. Currently, he is leading a team of seven data professionals at Oxylabs, a web intelligence acquisition platform. Rytis managed to build one of the company’s core teams from scratch in just two years.
We spoke to him about the future of data analytics in business intelligence and the role of machine learning and AI.
What key trends do you see shaping the future of data analytics in business intelligence?
In a little more than a decade, data analytics went through several big transformations. First, it became digitised. Second, we witnessed the emergence of ‘big data’ analytics, driven partly by digitisation and partly by massively improving storage and processing capabilities. Finally, in the last couple of years, analytics has been transformed once again by emerging generative AI models that can analyse data at a previously unseen scale and speed. Gen AI is becoming a data analyst’s personal assistant, taking over less exciting tasks – from basic code generation to data visualisation.
I believe the key effect of generative AI – and the main future trend for data analytics – is data democratisation. Recently, there’s been a lot of activity around ‘text to SQL’ products to run queries in natural language, meaning that people without specialisation in data sciences get the possibility to dive deeper into data analysis.
However, we shouldn’t get carried away with the hype too quickly. Those AI-powered tools are neither 100% accurate nor error-free, and noticing errors is more difficult for less experienced users. The holy grail of analytics is precision combined with a nuanced understanding of the business landscape – skills that are impossible to automate unless we reach some sort of a ‘general’ AI.
The second trend that is critical for business data professionals is moving towards a single umbrella-like AI system capable of integrating sales, employee, finance, and product analytics into a single solution. It could bring immense business value due to cost savings (ditching separate software) and also help with the data democratisation efforts.
Can you elaborate on the role of machine learning and AI in next-generation data analytics for businesses?
Generative AI somehow drew an artificial arbitrary line between next-gen analytics (powered by Gen AI) and ‘legacy’ AI systems (anything that came before Gen AI). In the public discourse around AI, people often miss the fact that the ‘traditional’ AI isn't an outdated legacy; Gen AI is intelligent only on the surface; and both fields are actually complementary.
In my previous answer, I highlighted the main challenges of using generative AI models for business data analytics. Gen AI isn’t, strictly speaking, intelligence – it is a stochastic technology functioning on statistical probability, which is its ultimate limitation.
Increased data availability and innovative data scraping solutions were the main drivers behind the Gen AI ‘revolution’; however, further progress can’t be achieved by simply pouring in more data and computational power. Moving towards a ‘general’ artificial intelligence, developers will have to reconsider what ‘intelligence’ and ‘reasoning’ mean. Before this happens, there’s little possibility that generative models will bring to data analytics something more substantial than they have already done.
Saying this, I don’t mean there are no methods to improve generative AI accuracy and make it better at domain-specific tasks. A number of applications already do it. For example, guardrails sit between an LLM and users, ensuring the model provides outputs that follow the organisation’s rules, while retrieval augmented generation (RAG) is increasingly employed as an alternative to LLM fine-tuning. RAG is based on a set of technologies, such as vector databases (think Pinecone, Weaviate, Qdrant, etc.), frameworks (LlamaIndex, LangChain, Chroma), and semantic analysis and similarity search tools.
How can businesses effectively harness big data to gain actionable insights and drive strategic decisions?
In today’s globalised digital economy, businesses don't have a choice of avoiding data-driven decisions, unless they operate in a very confined local market and are of limited size. To drive competitiveness, an increasing number of businesses are collecting not only consumer data they can get from their owned channels but also publicly available information from the web for price intelligence, market research, competitor analysis, cybersecurity, and other purposes.
Up to a point, businesses might try to get away without using data-backed decisions; however, when the pace of growth increases, companies that rely on gut feeling only unavoidably start lagging behind. Unfortunately, there are no universal approaches to harnessing data effectively that would suit all companies. Any business has to start from the basics: first, define the business problem; second, answer, very specifically, what kind of data might help to solve it. Over 75% of data businesses collect ends up as ‘dark data’. Thus, deciding what data you don’t need is no less important than deciding what data you need.
In what ways do you envision data visualisation evolving in the context of business intelligence and analytics?
Most data visualisation solutions today have AI-powered functionalities that provide users with a more dynamic view and enhanced accuracy. Further, AI-driven automation also allows businesses to analyse patterns and generate insights from larger and more complex datasets while freeing analysts from mundane visualisation tasks.
I believe data visualisation solutions will have to evolve towards more democratic and noob-friendly alternatives, bringing data insights beyond data teams and into sales, marketing, product, and client support departments. It is hard to tell, unfortunately, when we could expect such tools to arrive. Up until now, the focus of the industry hasn’t been on finding the single best visualisation solution. There are many different tools available on the market, and they all have their advantages and disadvantages.
Could you discuss the importance of data privacy and security in the era of advanced analytics, and how businesses can ensure compliance while leveraging data effectively?
Data privacy and security were no less important before the era of advanced analytics. However, the increased scale and complexity of data collection and processing activities also increased the risks related to data mismanagement and sensitive data leaks. Today, the importance of proper data governance cannot be understated: mistakes can lead to financial penalties, legal liability, reputational damage, and consumer distrust.
In some cases, companies deliberately ‘cut corners’ in order to cut costs or gain other business benefits, resulting in data mismanagement. In many cases, however, improper data conduct is unintentional.
Let’s take the example of Gen AI developers who need massive amounts of multifaceted data to train and test ML models. When collecting data at such a scale, it is easy for a company to miss that parts of these datasets contain personal data or copyrighted material that the company wasn’t authorised to collect and process. Even worse, getting consent from thousands of internet users who might be technically regarded as ‘copyright’ owners is virtually impossible.
So, how can businesses ensure compliance? Again, it depends on the context, such as the company’s country of origin. US, UK, and EU data regimes are quite different, with the EU having the most stringent one. The newly released EU AI Act will definitely have an additional effect on data governance as it tackles both developers and deployers of AI systems within the EU. Although generative models fall in the low-risk zone, in certain cases, they might still be subject to transparency requirements, obliging developers to reveal the sources of data the AI systems have been trained on as well as data management procedures.
However, there are basic principles that apply to any company. First, companies must thoroughly evaluate the nature of the data they are planning to fetch. Second, more data doesn't equal better data – deciding which data brings added value for the business and omitting data that is excessive or unnecessary is the first step towards better compliance and fewer data management risks.
How can businesses foster a culture of data-driven decision-making throughout their organisations?
The first step is, of course, laying down the data foundation – building the Customer Data Platform (CDP), which integrates structured and cleaned data from various sources the company uses. To be successful, such a platform must include no-code access to data for non-technical stakeholders, and this isn’t an easy task to achieve.
No-code access means that the chosen platform (or ‘solution’) must hold both an SQL interface for experienced data users and some sort of ‘drag and drop’ function for beginners. At Oxylabs, we chose Apache Superset to advance our self-service analytics. However, there is no solution that would fit any company and would only have pros and no cons. Moreover, these solutions require well-documented data modeling.
When you have the necessary applications in place, the second big challenge is building data literacy and confidence of non-technical users. It requires proper training to ensure that employees handle data, interpret it, and draw insights correctly. Why is this a challenge? Because it is a slow process, and it will take time away from the data teams.
Fostering a data-driven culture isn’t a one-off project – to turn data into action, you will need a culture shift inside the organisation, as well as constant monitoring and refinement efforts to ensure that non-technical employees feel confident about deploying data in everyday decisions. Management support and well-established cooperation between teams are key to making self-service analytics (or data democratisation, as it is often called) work for your company.