From RAG to riches with industrial AI – what is retrieval augmented generation (RAG)?
Speaking at an event in London on Wednesday (July 10), Hewlett Packard Enterprise (HPE) presented its portfolio of joint AI solutions and integrations with Nvidia, along with its channel strategy and training regime, to UK journalists and analysts that did not make the trip to Las Vegas to witness its grand Discover 2024 jamboree in late June. It was a good show, with none of the dazzle but all of the content, designed to draw attention to the US firm’s credentials as an elite-level delivery partner for Industry 4.0 projects, now covering sundry enterprise AI interests.
Its new joint package with Nvidia, called Nvidia AI Computing by HPE, bundles and integrates the two firm’s respective AI-related technology offers, in the form of Nvidia’s computing stack and HPE’s private cloud technology. They have been combined under the name HPE Private Cloud AI, available in the third quarter of 2024. The new portfolio solution offers support for inference, retrieval-augmented generation (RAG), and fine-tuning of AI workloads that utilise proprietary data, the pair said, as well as for data privacy, security, and governance requirements.
Matt Armstong-Barnes, chief technology officer for AI, paused during his presentation to explain the whole RAG thing. It is relatively new, in the circumstances, and very important – was the message; and HPE, mob-handed with Nvidia (down to “cutting code with” it), has the tools to make it easy, it said. HPE is peddling a line about “three clicks for instant [AI] productivity” – in part because of its RAG tools, plus other AI mechanics, and all the Nvidia graphics acceleration and AI microservices arrayed for power requirements across different HPE hardware stacks.
He explained: “Organisations are inferencing,… and fine-tuning foundation models… [But] there is a middle ground where [RAG] plays a role – to bring gen AI techniques into [enterprise] organisations using [enterprise] data, with [appropriate] security and governance to manage it. That is the heartland… to tackle this type of [AI adoption] problem. Because AI, using algorithmic techniques to find hidden patterns in data, is different from generative AI, which is the creation of digital assets. And RAG brings these two technologies together. “
Which is a neat explanation, by itself. But there are colourful ones everywhere. Nvidia itself has a blog that imagines a judge in a courtroom, stuck on a case. An interpretation of its analogy is that judge is the generative AI, and the courtroom (or the case that is being heard) is the algorithmic AI, and that some further “special expertise” is required to make a judgement on it; and so the judge sends the court clerk to a law library to search out rarefied precedents to inform the ruling. “The court clerk of AI is a process called RAG,” explains Nvidia.
“RAG is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources,” it writes. Any clearer? Well, in another useful blog, AWS imagines generative AI, or the large language models (LLMs) it is based on, as an “over-enthusiastic new employee who refuses to stay informed with current events but will always answer every question with absolute confidence”. In other words, it gets stuff wrong; if it does not know an answer, based on the limited historical data it has been trained on, then it is designed to lie.
AWS writes: “Unfortunately, such an attitude can negatively impact user trust and is not something you want your chatbots to emulate. RAG is one approach to solving some of these challenges. It redirects the LLM to retrieve relevant information from authoritative, predetermined knowledge sources. Organisations have greater control over the generated text output, and users gain insights into how the LLM generates the response.” In other words, RAG links LLM-based AI to external resources to pull-in authoritative knowledge outside of its original training sources.
Importantly, general-purpose RAG “recipes” can be used by nearly any LLM to connect with practically any external resource, notes Nvidia. RAG is essential for AI in Industry 4.0, it seems – where off-the-shelf foundational models like GPT and Llama lack the appropriate knowledge to be helpful in most settings. In the broad enterprise space, LLMs are required to be trained on private domain-specific data about products, systems, and policies, and also micro-managed and controlled to minimise and track hallucinations, bias, drift, and other dangers.
But they need the AI equivalent of a factory clerk – in the Industry 4.0 equivalent of our courtroom drama – to retrieve data from industrial libraries and digital twins, and suchlike. AWS writes: “LLMs are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the… capabilities of LLMs to… an organisation’s internal knowledge base – all without the need to retrain the model. It is a cost-effective approach to improving LLM output.”
RAG techniques also provide guardrails and reduce hallucinations – and build trust in AI, ultimately, as AWS notes. Nvidia adds: “RAG gives models sources they can cite, like footnotes in a research paper, so users can check claims. That builds trust. What’s more, the technique can help models clear up ambiguity in a user query. It also reduces the possibility… [of] hallucination. Another advantage is it’s relatively easy. Developers can implement the process with as few as five lines of code [which] makes [it] faster and [cheaper] than retraining a model with additional datasets”
Back to Armstong-Barnes, at the HPE event in London; he sums up: “RAG is about taking organisational data and putting it in a knowledge repository. [But] that knowledge repository doesn’t speak a language – so you need an entity that’s going to work with it to provide a linguistic interface and a linguistic response. That is how (why) we are bringing in RAG – to put LLMs together with knowledge repositories. This is really where organisations want to get to because if you use RAG, you have all of the control wrapped around how you bring LLMs into your organisation.”
He adds: “That’s really where we’ve been driving this co-development with Nvidia – [to provide] turnkey solutions that [enable] inferencing, RAG, and ultimately fine tuning into [enterprises].” Most of the rest of the London event explained how HPE, together with Nvidia, has the smarts and services to bring this to life for enterprises. The Nvidia and AWS blogs are very good, by the way; Nvidia relates the whole origin story, as well, and also links in the blog to a more technical description of RAG mechanics.
But the go-between clerk analogy is a good starting point. In the meantime, here is a taster from Nvidia’s technical notes.
“When users ask an LLM a question, the AI model sends the query to another model that converts it into a numeric format so machines can read it. The numeric version of the query is sometimes called an embedding or a vector [model]. The embedding / vector model then compares these numeric values to vectors in a machine-readable index of an available knowledge base. When it finds a match or multiple matches, it retrieves the related data, converts it to human-readable words and passes it back to the LLM.
“Finally, the LLM combines the retrieved words and its own response to the query into a final answer it presents to the user, potentially citing sources the embedding model found. In the background, the embedding model continuously creates and updates machine-readable indices, sometimes called vector databases, for new and updated knowledge bases as they become available.”
Comments are closed.