Interview on How to Apply Generative Artificial Intelligence

In this article, we interview Carlos Narciso, a Project Manager involved in various Artificial Intelligence projects, to discuss what is needed to apply Generative Artificial Intelligence: the architecture, the data, and technical concepts like RAG, LLM, hallucinations, fine-tuning...

Understanding Generative Artificial Intelligence:

1. Nowadays, it's all about AI. It's what we're experiencing, and it's what's revolutionizing the technology sector. So, let's start from the beginning: How do these types of Generative Artificial Intelligence solutions work? What is required?

These solutions aim to provide information to users in a conversational format. To achieve this, we need to define the use case we want to address. Once defined, we must implement an RAG that contains precise and verifiable information that can be transferred as conversational context to a natural language model to generate an appropriate response.

2. That sounds interesting, but for those with a less technical background, can you tell us what an RAG is?

An RAG architecture, referring to Retrieval Augmented Generation, is an advanced artificial intelligence technique that combines natural language processing (NLP) with information retrieval systems. This methodology significantly enhances text generation by allowing language models to rely on a specific database to construct their responses. This "box" of information prevents the model from using external data, thus avoiding incorrect responses or hallucinations.

Large Language Models (LLMs), such as GPT, are high-capacity neural networks trained on large amounts of text through self-supervised learning. These models possess extensive conversational context, enabling them to engage in conversations on a wide range of almost unlimited topics. However, when faced with queries from specific domains, such as detailed information about a company, their performance may be inaccurate due to lack of direct access to specific data.

Relationship between RAG and LLM:

Here is where the relationship between RAG and LLM comes into play.

Integrating RAG with LLM is crucial to overcoming this limitation. While LLM offers the ability to generate coherent and fluent dialogues on a wide range of topics, RAG provides the necessary "experience" in specific subjects. By informing the LLM model to generate responses based on the specific information provided by RAG, a synergy is achieved that allows the model to act as an expert on the topic.

Essentially, RAG directs LLM to use only relevant and updated information during response generation, narrowing down the conversational context to what is truly important.

3. What is its main advantage? And the most significant challenge?

conversational context and adapts it to provide the information that interests us at any given moment.

The main challenge for RAG + LLM architectures is understanding that the response generation will depend on the information contained in RAG. In other words, the information must exist and must be up-to-date.

Additionally, having a proper data architecture is important as it allows the conversational context to have the necessary information and be as accurate as possible.

4. What is necessary for a correct data architecture?

Reliable and up-to-date data. A good data governance that allows providing context to language models that use it.

Companies with a more mature data strategy may be able to generate better conversational context than those that do not. Although we can also implement solutions whose conversational context is based on information contained on a website or even on documentation of all kinds that is generated.

RAGs offered by Cloud services provide the tools to successfully tackle the integration of different data sources and keep them updated.

5. A term that is often mentioned when talking about AI is "hallucinations."

These types of solutions are very confident and will always try to provide a response. This can be a problem because they may end up conveying directly incorrect information, which is what is called hallucinations.

6. How can they be avoided?

We always propose a strategy that is divided into two main blocks:

Narrowing down the conversational context, that is, the resources that RAG will have available and that are passed to the language model to generate a response. When RAG has a lot of information, it may be that the resource it obtains to convey the content to LLM is not what we want. That's why it's important to narrow down the information well and gradually add new information, making the solution increasingly have more conversational context.
Fine-tuning: Inevitable in this type of solutions. It consists of fine-tuning the solution so that when asked about a specific topic, it will search for it in the resources you mark. For this, it is necessary to consider that this type of projects must be carried out hand in hand with the client, accompanying them throughout the process to refine the solution.

7. Interesting Fine-Tuning, how can IAG models be adjusted to obtain more precise responses in the desired tone?

There are several strategies for refining the responses offered by these types of solutions. If you agree, let's discuss three:

Prioritizing results: the question asked by the user first goes to RAG, where it will search for information sources. It is possible that this response is found in various resources of RAG and that some are more important than others as long as the response adapts to the real interest of the question. For this, it is important to prioritize these results that are finally sent to LLM to generate it.
Correctly narrowing down the information contained in RAG, this also involves excluding certain information that may end up "confusing" the chosen language model. Manually including information in RAG that allows answering questions that are transversal to the organization and not only to the conversational context. That is, if we make a chatbot for free subjects for a University, we not only want to include information about the subjects, but also transversal information such as study method, contact, or other peculiarities of the organization.
Modifying the question: The question we ask the AI is called PROMPT. Depending on this Prompt, the result may be different. These types of solutions work by sending to LLM a PROMPT that is usually something like:

prompt_template = """

            The following is a friendly conversation between a human and an AI.

            The AI is communicative and provides many specific details about its context.

            If the AI doesn't know the answer to a question, it honestly replies that it doesn't know.

            {context}

            Instruction: Based on the previous documents, provide a detailed response to {question}. Respond with "information not found" if it is not present in the document. Solution:

          """

Although there are more, I consider these four refinement actions essential before deploying this type of solutions.

Once made available to users, it will be important to open a period of active monitoring and continue applying fine-tuning.

8. Tell me about the implementation.

Although there are more options, I will show you what the architecture would be like with AWS and what it would be like with Azure.

With AWS:

With AZURE:

It is important to understand that, although they are not excessively complex architectures, you need to have an in-depth knowledge of the services being used because the entire subsequent fine-tuning process can become extremely complex.

Well, although we could spend all day talking about this, I think we've given a good overview of this type of solutions, so if you agree, we'll leave it here. Thank you very much for the chat!

If you want to watch this interview with Carlos Narciso, you can access the video here.