YCombinator’s Cosine unveils AI 22x more advanced than GPT-4
Cosine, a human reasoning lab focused on developing artificial general developers, has announced a significant achievement: a 30% score on the SWE-Bench, the industry’s standard for evaluating AI-driven software engineering skills.
This score represents a 56% improvement over the previous record of 19% held by Factory and a remarkable 2,196% leap over OpenAI's GPT-4, which scored just 1.31%. SWE-Bench evaluates AI models on their ability to handle real-world tasks like software architecture, debugging, and feature implementation in existing codebases.
This milestone marks the highest score ever recorded and comes on the heels of Cosine’s recent $2.5 million funding round, led by Uphonest and SOMA Capital, with contributions from Lakestar, Focal, and others.
Based between San Francisco and London, Cosine's artificial developer, Genie, operates much like a highly skilled human developer. Genie can autonomously or collaboratively solve bugs, build features, and refactor code, among other tasks. Cosine’s focus on fine-tuning models to mimic human reasoning has allowed it to outperform competitors such as AWS’s Amazon Q Developer and Cognition’s Devin, both of which scored under 20% on the same benchmark. Notably, Cognition was recently valued at $2 billion following an investment from Peter Thiel’s Founders Fund.
Cosine is now offering this cutting-edge, human-like AI software developer to companies seeking to integrate autonomous AI talent into their teams without the challenges of traditional hiring.
“Our breakthrough in codifying human reasoning is allowing us to train AI models to operate far beyond the narrow range of tasks and tightly restricted prompts currently available to teams developing software,” commented Cosine CEO, Alistair Pullen, who published and monetised his first software application aged 9.
“We’ve developed a product capable of beating OpenAI and others in completing complex software tasks – in a fraction of the time and money it has taken our competitors to achieve the same results. We’re on course to radically transform the way development and developers work”, continued COO Yang Li.
Founded in 2022, Cosine’s software was created out of the founder’s realisation of the potential in using LLMs to perform complex tasks in the coding space by imitating human software developers’ behaviours. It is uncannily ‘human’ in its approach to reasoning as a result, with the founders’ primary goal to create truly resilient AI capable of tackling open-ended problems across various domains.
“We are focused on creating a colleague, not a co-pilot,” commented Sam Stenner, CIO at Cosine. “After we figured out how to generate data sets that codify human reasoning which can then be used to train LLMs, we knew the potential for what we had built and worked with OpenAI to fine tune their largest context window LLMs. We’re confident we now have the capabilities to consistently beat our own top score.”
"Cosine is not just improving AI; they’re fundamentally teaching AI to reason, providing companies with a true AI colleague", said Ellen Ma, Partner at Uphonest Capital.
"I’ve seen thousands of AI startups, and no one has managed to capture human reasoning like Cosine. Genie is proof that their vision, strategy and team is perfect to get us closer to AGI”, said Ben Tossell, Founder of Ben's Bites.