MOSTLY AI launches industry-grade open source toolkit for synthetic data creation

Anna Wood is the Editor of Startups Magazine. She joined…

The move will make the firm’s state-of-the-art synthetic data technology freely available, so that any developer, business, enterprise, or organisation can create synthetic data – artificial data that mimics real-world data – from their real-life data in situ, without having to engage MOSTLY AI as a paid partner.

The announcement comes as industry experts warn the AI sector is already running out of data to train its large models on – Elon Musk said this month that AI training data has already been ‘exhausted’. At the same time, reports indicate that data privacy and security issues concerning customer data are holding back the training and development of AI tools and products at the enterprise level (CSO).

MOSTLY AI is the global leader in privacy-preserving synthetic data. The firm raised $25 million in a Series B funding round in 2022, and $31 million since its launch, counting Molten Ventures, Citi Ventures, 42CAP, and Earlybird among its investors.

The business, which launched in 2017, counts among its clients some of the world’s largest and most influential enterprises, including Citi Bank, the U.S. Department of Homeland Security, Erste Group, Telefonica, and two of the five largest US banks.

It has been predicted by Gartner that 75% of businesses will use generative AI to create synthetic customer data by 2026, up from 5% in 2023. The UK Government’s AI Opportunities Action Plan also calls for the exploration of synthetic data generation for constructing privacy-preserving versions of sensitive datasets, demonstrating a growing awareness of the technology’s potential.

MOSTLY AI’s synthetic data technology allows customer data to be realistically mimicked, retaining the statistical and analytical value of the data, but without any personal data points attached. The synthetic data then poses no privacy or security risk and can be shared without friction within and between organisations, transforming the ability of teams to develop and test new AI tools and data products.

MOSTLY AI argues that synthetic data, which is anonymous yet analytically as useful as real customer data, is the solution to breaking this AI innovation bottleneck.

Alexandra Ebert, Chief AI and Data Democratisation Officer at MOSTLY AI, said: “Enterprises across the world are stuck between a rock and a hard place. They know they need to rapidly innovate their AI capabilities to stay ahead of the curve – but they’re forced to lock up the customer data needed to do that for fear of breaking data privacy regulations.

“Both among C-suite executives and society more widely, the huge potential of AI to move the dial on some of the world’s most intractable issues is being held back by the inability of organisations to use their proprietary, sensitive data for AI training and development.

“Our mission has always been to empower every business and every individual with safe access to data. With the open-source release of our industry-proven synthetic data toolkit, we can unlock the ability of all businesses to harness the full power of their proprietary data with zero compromises on privacy.”

For more startup news, check out the other articles on the website, and subscribe to the magazine for free. Listen to The Cereal Entrepreneur podcast for more interviews with entrepreneurs and big-hitters in the startup ecosystem.