🚀 OpenAI launches O1
🙋‍♀️ How does it work?
📊 Where does O1 stand?
💥 How to try out O1!
😮 The future

OpenAI O1: A new paradigm in AI

OpenAI has launched a new flagship model called O1, that can "reason" on itself, leading to a new paradigm in AI and LLMs

🚀 OpenAI launches O1

OpenAI has just launched its new O1 model that can "reason" on itself before answering a user's query, shattering benchmakrs across the board for complex tasks.

The new OpenAI model, also coded "strawberry/Q*" internally was rumoured for a long while, even leading to conspiracy theories like "What did Ilya see?" on twitter. People had long suspected that its a self-reasoning, self-improving model and that has come to light now.

🙋‍♀️ How does it work?

OpenAI O1 or strawberry is a self-reasoning model that can reason multiple steps before answering the question. The model breaks down a complex task into steps and tries to solve it then. It is also capable of self-critiquing which means that it can self-correct itself if its going in the wrong direction based on the given context.

This is very similar to how COT or chain-of-thought prompting works, but the key difference here is that the COT steps are themselves trained via RL and this unlocks a new paradigm of scaling. Hence rolling back the naming to "O1" from GPT-4o

Earlier LLMs had a long pre-training step where a large amount of compute was used so that the LLM creates a world model and captures all the information. Then at test time (i.e. when we ask it a question), it needs to just answer that directly based on what it has learned. But now with O1, the LLM takes multiple steps to self-reason on the input and then gives an answer. At the beginning, with O1 the reasoning steps are compartively smaller i.e. 10-20 steps taking 15-20 seconds but OpenAI plans to scale this to hours, days and weeks! Imagine asking an LLM to formulate a cure for cancer and then it reasons for weeks and gives the answer.

📊 Where does O1 stand?

In terms of benchmarks, O1 shatters all of the top complex benchmarks when compared to GPT-4o (and by extension Claude Sonnet 3.5). Here complex tasks are writing code, understanding and analyzing a PRD, going through a medical report or writing a novel. Basically anything that needs critical thinking.

But on the other hand, O1 is capped on the basic capabilites and sometimes performs even worse than GPT-4o for simple tasks like writing a personal message or editing a blog.

💥 How to try out O1!

Now coming to how can we use O1! Presently ChatGPT Plus users can use O1 directly on chatGPT but with very strict rate-limits.

O1-preview : 30 requests per week O1-mini : 50 requests per week

You can also check out O1 via Merlin Pro, with much better rate-limits!

😮 The future

OpenAI O1 is a big step, its not just a new model after gpt-4o but its a new way of training LLMs, thinking about compute and means that there is a long runway to exploit performance as we are just scratching the surface with O1-preview and there is a lot more to come in the next 1 year.

AI wars that were getting stagnant are going to get heated back again with OpenAI establishing its strong lead once more.

Experience the full potential of ChatGPT with Merlin