2025 playbook for business AI success, from agents to evals


Join our daily and weekly newsletters for the latest updates and exclusive content on industry leading AI coverage. Learn more


2025 is poised to be a pivotal year for business AI. Last year saw rapid innovation, and this year will see the same. This makes it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are five critical areas businesses should prioritize for their AI strategy this year.

1. Agents: the next generation of automation

AI agents are no longer theoretical. By 2025, they will be must-have tools for businesses looking to streamline operations and improve customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and seamlessly integrate tools and APIs.

At the start of 2024, agents aren’t ready for prime time, making frustrating mistakes like hallucinating URLs. They started to get better as the models of the major languages ​​themselves evolved.

“Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and recently reviewed 48 agents built last year. “Interestingly, what we built at the beginning of the year, most of those worked better at the end of the year simply because the models were better.” Witteveen shares this in the video podcast we filmed to discuss these five major trends in detail.

The models are getting better and less hallucinating, and they are also trained to perform agent tasks. Another area that model providers are exploring is a way to use LLM as a judge, and while models may be cheaper (something we’ll talk about below), companies can use three or many more models to choose the best output to make a decision. in.

Another part of the secret sauce? Retrieval-augmented generation (RAG), which allows agents to store and reuse knowledge efficiently, is improving. Imagine a travel agent bot that not only plans trips but books flights and hotels in real time based on updated preferences and budget.

Takeaway: Businesses need to identify use cases where agents can deliver high ROI – be it in customer service, sales, or internal workflows. Tool use and advanced reasoning capabilities will define the winners in this space.

2. Evals: the foundation of reliable AI

Evaluations, or “evals,” are the backbone of any robust AI deployment. This is the process of choosing which LLM — among the hundreds now available — to use for your assignment. This is important for accuracy, but also for aligning AI outputs with business goals. A good eval ensures that a chatbot understands the tone, a recommendation system provides suitable options, and a predictive model avoids costly mistakes.

For example, a company’s evaluation for a customer support chatbot might include metrics for average resolution time, accuracy of responses, and customer satisfaction scores.

Many companies invest a lot of time in processing inputs and outputs so that they conform to a company’s expectations and workflows, but this can be time consuming and resource intensive. As the models themselves are getting better, many companies are saving effort by relying on the models themselves to do the job, so choosing the right one becomes even more important.

And this process forces clear communication and better decisions. If you are “more conscious of how to evaluate the output of something and what you want, not only does that make you better at LLMs and AI, it actually makes you better at people ,” Witteveen said. “If you can clearly tell someone: This is what I want, here’s what I want it to look like, here’s what I expect from it. When you get specific about that, people suddenly act more you’re good.”

Witteveen noticed that company managers and other developers told him: “Oh, you know, I’m better at giving directions to my team just from being good at rapid engineering or just being skilled, you know— an you, to see how to write the correct evals for the models.

By writing clear evals, businesses force themselves to clarify goals – a win for both humans and machines.

Takeaway: Creating high quality evals is essential. Start with clear basics: accuracy of response, time to resolution, and alignment with business goals. This ensures that your AI not only works but aligns with your brand values.

3. Cost efficiency: scaling AI without breaking the bank

AI is getting cheaper, but strategic deployment remains key. Improvements at each level of the LLM chain lead to significant cost reductions. Intense competition among LLM providers, and from open-source rivals, has led to regular price cuts.

Meanwhile, post-training software techniques make LLMs more efficient.

Competition from new hardware vendors such as Groq’s LPUs, and improvements by legacy GPU provider Nvidia, have dramatically reduced inference costs, making AI accessible for more use cases.

The real improvements come from optimizing the way the models are set up to work in applications, which is during inference, rather than during training, when the models are first built using data. Other techniques such as model distillation, along with hardware innovations, mean that companies can do more with less. It’s no longer whether you can afford AI — you can do most projects cheaper this year than six months ago — but how you scale it.

Takeaway: Conduct a cost efficiency analysis for your AI projects. Compare hardware options and explore techniques such as model distillation to reduce costs without compromising performance.

4. Memory personalization: tailoring AI to your users

Personalization is no longer optional — it’s expected. By 2025, memory-powered AI systems will make this a reality. By remembering user preferences and past interactions, AI can deliver more tailored and effective experiences.

Memory personalization is not widely or openly discussed because users are often uncomfortable about AI applications storing personal information to improve the service. There are privacy concerns, and the ick factor when a model comes out with answers that show she knows about you – for example, how many kids you have, what you do for a living, and what your personal life is like. preferences. OpenAI, for one, protects information about ChatGPT users in its system memory – which can be turned off and deleted, although it is on by default.

While businesses using OpenAI and other models that create it cannot get the same information, what they can do is create their own memory systems using RAG, ensuring that the data is both safe and have an effect. However, businesses must be very careful, balancing personalization with privacy.

Takeaway: Create a clear strategy for memory personalization. Opt-in systems and transparent policies build trust while providing value.

5. Inference and test-time computation: the new efficiency and reasoning frontier

Inference is where AI meets the real world. In 2025, the focus is on making this process faster, cheaper and more powerful. Chain-of-mind reasoning – where models break down tasks into logical steps – is changing how businesses approach complex problems. Tasks that require deep reasoning, such as strategy planning, can now be solved effectively by AI.

For example, OpenAI’s o3-mini model is expected to be released later this month, followed by the full o3 model at a later date. They introduce advanced reasoning capabilities that break down complex problems into manageable chunks, thereby reducing AI hallucinations and improving decision-making accuracy. These reasoning enhancements work in areas such as math, coding, and scientific applications where extra thinking helps – although in other areas, such as language synthesis, improvements may be limited. .

However, these improvements also come with increased computational demands, and higher operational costs. The o3-mini is intended to provide a compromise offering with contained costs while maintaining high performance.

Takeaway: Identify workflows that benefit from advanced inference techniques. Implementing special chain of thought reasoning steps in your own company, and choosing optimized models, can give you an advantage here.

Conclusion: Turning insights into action

AI in 2025 is not just about adopting new tools; it’s about making strategic choices. Whether it’s deploying agents, refining evals, or scaling cost-efficiently, the path to success lies in thoughtful implementation. Businesses must embrace these trends with a clear, focused strategy.

For more detail on these trends, check out the full video podcast between Sam Witteveen and myself here:



Source link
  • Related Posts

    Denmark reported a significant decline in overall gambling spending – but was led by just one market

    The Danish gambling authority found a significant decrease in total gambling spending compared to the same period in 2024. The Danish gambling authority has shared data for October 2025, which…

    Meta is reportedly working on a new AI model called ‘Avocado’ and may not be open source

    Mark Zuckerberg has had many months publicly initiated that he is backing from open source AI. Now, Meta’s latest AI Pivot is starting to come into focus. The company is…

    Leave a Reply

    Your email address will not be published. Required fields are marked *