Corporate Gen AI Projects Should Plan For Failure — And That’s OK

The generative AI revolution kicked off by the launch of ChatGPT-3.5 in November 2022 has unleashed a frenzy of activity as corporations look to take advantage of the new technology. But building and maintaining a high-quality gen AI assistant presents a very different challenge than the standard corporate tech build. As we close in on the two-year anniversary of the launch of ChatGPT-3.5, it’s become clear that there’s a very high chance your organization will fail at building a generative AI assistant. There’s a high likelihood your firm will make the wrong choices, requiring a significant rebuild of the AI assistant sometime within the next three years.

Let’s use two imaginary corporate initiatives at an airline to help you understand why gen AI builds are so different from other tech initiatives. In the first scenario, ImaginAiry Airlines has decided to build a new client-facing mobile app that lets customers manage their bookings. The typical path for a large organization to build a complex technology solution like a new mobile app follows three steps. First, the firm works to understand the business case, costs, requirements, etc. of this new app. Then, the business case is presented to senior stakeholders (usually in the form of one or more meetings with a “steerco”) for approval. Once the initiative and the budget are approved, the app is built. While not perfect, the standard approach generally works for more straightforward non-AI software development.

Now imagine a second scenario where that same airline wants to build a generative AI-based assistant that can provide customer service in a conversational manner. The budget, business case, and vendor(s) that airline leadership approves in 2024 have a high likelihood of being wrong within one to three years. Generative AI technology is rapidly evolving and it’s a very dynamic space. The typical corporate approach of approving a plan and then going heads down to build it is not well suited to gen AI.

There are three main risk factors that can derail your organization’s generative AI initiative. First, your firm could pick the wrong large language model, or LLM, provider. Second, your organization could make the wrong choice between open source and closed LLMs. Third, the technology is moving so quickly that there could be a breakthrough that upends the way generative AI assistants are built. Any of these scenarios would likely require your organization to at best significantly re-engineer previous work or at worst to completely scrap past efforts.

I’ll discuss these three main risk factors below. After reviewing the three risk factors, I’ll share best practices for building a generative AI assistant given all of the uncertainties.

Your Organization Could Pick The Wrong LLM Vendor

To oversimplify, LLMs are the foundation that enables generative AI assistants to achieve general-purpose language generation. As of September 2024, most organizations are not going to build their own LLM. This means firms must evaluate numerous LLM options and choose an LLM provider to power their generative AI assistant.

The performance of the various LLMs is constantly changing. According to Milan De Reede, co-founder of NanoGPT, a service offering consumers access to 20 top AI models within a single portal, “the gen AI landscape changes rapidly as new models are released. Use cases which were not practical in state-of-the-art models suddenly become trivially easy with a new release. We have seen first-hand that our clients’ preferred gen AI model can change literally overnight as new models are released.”

The top LLMs of 2026 or 2027 could look very different from the leading LLMs of September 2024. While the LLM your organization selects in 2024 may be good enough, it’s also very possible that the LLM your firm chooses rapidly becomes significantly worse than the industry leader(s).

Your Firm Could Make The Wrong Choice Between Closed and Open-Source Models

Related to the challenges associated with selecting an LLM provider, your firm must also choose between open source and closed LLMs. To broadly generalize, closed services (like ChatGPT) are easier to implement but charge high fees, offer less customization options, and can present vendor lock-in challenges. In contrast, open-source LLMs (like Meta’s Llama 3.1) are generally cheaper, provide greater transparency, and offer more customization options. The downside of deploying an open source model is that it usually requires more engineering expertise and does not offer corporate clients robust support infrastructure.

This summary is an oversimplification, and there is some debate around what counts as open source. Regardless, your firm will be forced to choose between the two options. Some experts believe that closed models backed by large corporations with dedicated teams and huge resources will prove superior. Other experts think that the quality of generative AI models will converge over time and there won’t be a meaningful difference between the quality of expensive closed LLMs and open-source LLMs. In this scenario, open-source models with lower costs and more control would be superior. Only time will tell.

How painful would a switch from closed to open source (or vice versa) be for your organization? The answer depends on the size of your organization, existing engineering talent, and the complexity of your AI needs. A leading tech company with in-house data scientists and AI experts could make this change relatively quickly. On the other hand, it would be a lengthy and complex undertaking for organizations that lack top engineering talent and/or have very complex needs that span the globe (like an airline or an international insurance firm).

Technology Breakthroughs Could Drastically Change The Way We Build And Maintain Generative AI Assistants

To broadly generalize, the current best practice for building a generative AI assistant is to use a retrieval augmented generation or RAG database structure that calls the LLM. This structure offers many advantages, including making it relatively easier to change LLM providers should the organization make the wrong choice. But this is the best practice as of September 2024. Researchers and companies on the cutting edge of AI are working on a variety of potential innovations that could change the way organizations build and maintain gen AI assistants.

Here are four examples of potential technology breakthroughs, and this list is not comprehensive. First, a new approach that uses multiple AI models working together to check each other’s outputs could greatly improve accuracy. Second, it may become cheaper and easier for organizations to build a proprietary, in-house LLM instead of relying on external LLMs. Third, breakthroughs in the way gen AI maintains memory could greatly improve conversational abilities. Fourth, Neuro-symbolic AI could turn out to be the best approach for building gen AI assistants.

According to AI researcher Christos Ziakas, “each of these potential technology breakthroughs would probably replace the current RAG database + LLM call best practice for building generative AI assistants. For example, if Neuro-symbolic AI develops and becomes the superior technology, your organization would likely need to scrap much of your existing codebase. Neuro-symbolic AI is focused on integrating reasoning and business logic into the generative AI assistant, which would mark a significant shift in how these systems are developed and maintained.”

The Uncertainties Around Generative AI Means Firms Must Develop New Ways of Working

How should organizations build gen AI assistants knowing that there’s a high likelihood they will make the wrong choices? All of these unknowns mean organizations must focus on the right operational processes and take a different approach to budgeting.

On the people and processes side, your firm cannot follow the typical business case approval to heads-down build approach. Your organization needs to set up a cross-functional team of senior stakeholders that meets on an ongoing basis to monitor the AI build and technology developments. Most large organizations already have a process and infrastructure in place for issues that need frequent and quick decisions from multiple senior stakeholders, such as pricing decisions or seasonal product launches. Your firm should set up similar operational and people processes that enable quick AI-related decisions.

On the budget side, the financing for the AI initiative should not be viewed as a fixed one-off investment in a new tech build. Your organization should also have an off-the-shelf budget and plan for changing course if one of the three previously mentioned scenarios happen.

While the exact budget and team structure varies based on your organization and needs, in an ideal world your organization should budget for a robust dedicated team working on the generative AI assistant year after year. Depending on your organization’s tech stack and existing legacy systems, it may be beneficial to opt for the more expensive AI build option that includes any needed modernization of your overall infrastructure.

According to Jeroen de Bel, founder and partner at Fincog, “the complexities and uncertainties involved in gen AI builds may seem intimidating, but it can be a catalyst for change at your organization. Building a gen AI assistant requires new ways of working and a nimbler engineering organization. Gen AI’s need for high-quality data can also drive a modernization of legacy infrastructure. AI assistants will only get more powerful over time, and your organization needs to make this investment to remain competitive in the long term.”

Read the full article here