The Exponential Growth of AI

Artificial intelligence is advancing at an unprecedented pace. Every few months we see new models that surpass their predecessors, but how can we measure this progress meaningfully? A new study proposes a fascinating metric: the length of tasks AI systems can complete autonomously.

A key finding

Recent research reveals an alarming pattern: the length of tasks AI models can complete doubles approximately every 7 months. This trend has been consistent over the past 6 years.

If this progression continues, in less than five years we could see AI agents capable of independently completing much of the software work that currently takes days or weeks for humans.

The paradox of current capabilities

Have you noticed that current AIs are extraordinary at some things but surprisingly limited at others? This research helps explain this apparent contradiction:

Strengths:

Current models outperform humans in text prediction and knowledge tasks
They achieve better results than experts on exam-type problems, at a fraction of the cost
They work as useful tools in many specific applications

Limitations:

They cannot carry out substantial projects on their own
They are unable to reliably handle even relatively simple computer-based work
They cannot directly replace human work in many contexts

Measuring progress through “task length”

The study shows that the time a human expert needs to complete a task strongly predicts the model’s success on that same task:

Tasks under 4 minutes: current models have nearly 100% success
Tasks over 4 hours: less than 10% success

For each model, we can characterize its capabilities by determining “the (human) length of tasks the model can successfully complete with an x% probability.”

The time horizon of current models

Using a 50% success probability as a reference, the study found that current best models (like Claude 3.7 Sonnet) have a “time horizon” of approximately one hour.

This explains why, despite their superhuman performance on many tests, these models don’t seem to be solidly useful for automating parts of people’s daily work: they can perform some tasks that even take human experts hours, but they can only reliably complete tasks up to a few minutes in duration.

What does this mean for the future?

If the trend of the past 6 years continues until the end of this decade, cutting-edge AI systems will be capable of carrying out one-month-long projects autonomously. This would have enormous implications, both in terms of potential benefits and risks.

The slope of the trend means our forecasts about when different capabilities will arrive are relatively robust even against large measurement errors. For example, if absolute measurements are off by a factor of 10x, that only changes the arrival time by about 2 years.

Conclusions

This research has important implications for AI benchmarks, forecasting, and risk management:

For evaluations: Measuring AI performance in terms of the length of tasks it can complete provides meaningful interpretation of absolute performance, not just relative performance.
For forecasting: There is a fairly robust exponential trend in a parameter that matters for real-world impact. This metric enables more accurate predictions about future capabilities.
For strategy: If you lead a team or company, you should start preparing for a future where AIs can handle increasingly longer and more complex projects.

How to prepare for this future?

Identify processes in your organization that could benefit from AI automation
Experiment early with current capabilities to be ready when more advanced ones arrive
Develop complementary skills that will be valuable alongside AI
Stay informed about advances in AI capabilities to anticipate changes in your industry

This research not only helps us understand current AI capabilities but also provides a roadmap for what’s to come. The question is no longer whether AI will be able to perform complex tasks, but when it will and how we will adapt to that change.

Are you prepared for a future where AI can handle projects of weeks or months in duration? How would that change your work or business?

Based on the research “Measuring AI Ability to Complete Long Tasks”

Do you have questions about how to implement AI in your business? Book a free 30-minute session.