OpenAI O1: A new paradigm in AI
OpenAI has launched a new flagship model called O1, that can "reason" on itself, leading to a new paradigm in AI and LLMs
🚀 OpenAI launches O1
OpenAI has just launched its new O1 model that can "reason" on itself before answering a user's query, shattering benchmakrs across the board for complex tasks.
The new OpenAI model, also coded "strawberry/Q*" internally was rumoured for a long while, even leading to conspiracy theories like "What did Ilya see?" on twitter. People had long suspected that its a self-reasoning, self-improving model and that has come to light now.
🙋♀️ How does it work?
OpenAI O1 or strawberry is a self-reasoning model that can reason multiple steps before answering the question. The model breaks down a complex task into steps and tries to solve it then. It is also capable of self-critiquing which means that it can self-correct itself if its going in the wrong direction based on the given context.
This is very similar to how COT or chain-of-thought prompting works, but the key difference here is that the COT steps are themselves trained via RL and this unlocks a new paradigm of scaling. Hence rolling back the naming to "O1" from GPT-4o
Earlier LLMs had a long pre-training step where a large amount of compute was used so that the LLM creates a world model and captures all the information. Then at test time (i.e. when we ask it a question), it needs to just answer that directly based on what it has learned. But now with O1, the LLM takes multiple steps to self-reason on the input and then gives an answer. At the beginning, with O1 the reasoning steps are compartively smaller i.e. 10-20 steps taking 15-20 seconds but OpenAI plans to scale this to hours, days and weeks! Imagine asking an LLM to formulate a cure for cancer and then it reasons for weeks and gives the answer.
📊 Where does O1 stand?
In terms of benchmarks, O1 shatters all of the top complex benchmarks when compared to GPT-4o (and by extension Claude Sonnet 3.5). Here complex tasks are writing code, understanding and analyzing a PRD, going through a medical report or writing a novel. Basically anything that needs critical thinking.
But on the other hand, O1 is capped on the basic capabilites and sometimes performs even worse than GPT-4o for simple tasks like writing a personal message or editing a blog.
💥 How to try out O1!
Now coming to how can we use O1! Presently ChatGPT Plus users can use O1 directly on chatGPT but with very strict rate-limits.
O1-preview : 30 requests per week O1-mini : 50 requests per week
You can also check out O1 via Merlin Pro, with much better rate-limits!
😮 The future
OpenAI O1 is a big step, its not just a new model after gpt-4o but its a new way of training LLMs, thinking about compute and means that there is a long runway to exploit performance as we are just scratching the surface with O1-preview and there is a lot more to come in the next 1 year.
AI wars that were getting stagnant are going to get heated back again with OpenAI establishing its strong lead once more.
Experience the full potential of ChatGPT with Merlin
Bhavesh Chaudhari
Programmer. Bringing ideas to life. Full Stack Web Developer.