Three practical architectures for exploiting Large Language Models

With the “boom” of artificial intelligence, specifically with in the field of natural language, more people and companies are interested on use cases to improve their business with them. In this article I tried to give you some architectures, one more easy to implement than others, to help you understand and inspire to build your own solutions.

Using a mathematical representation of words and semantic search, you can build a solution using an Large Language Model (LLM) in just a few steps to improve the relevance and precision of interaction with your own data.

What is an embedding?

To find an easy way to explain it, an embedding is like a magical tool used to translate any piece of text, whatever the language in which it was written, into an universal language. This universal language isn’t Spanish, English or Chinese; it’s a mathematical language where are represented ideas, concepts and words as a points in a high-dimensional space.

One of the most famous examples of how word embeddings can capture semantic relationships is the analogy “king” is to “queen” as “man” is to “woman.” In the vector space, the difference between the embeddings of “king” and “queen” closely resembles the difference between “man” and “woman.” This means that by understanding the relationship between “king” and “queen,” the model can infer a similar relationship between “man” and “woman,” demonstrating an understanding of gender roles and titles that goes beyond simple word associations.

Distance in a 3D plane of embeddings representation
Figure 1 – Distance in a 3D plane of embeddings representation. Photo credit: Jaron Collis on Quora

Another intriguing example of word embeddings’ ability to capture semantic relationships is seen when comparing capital cities to their respective countries. For instance, the relationship between “Madrid” and “Spain” mirrors the relationship between “Rome” and “Italy”. In the vector space, the model can deduce that the difference between the embeddings for “Madrid” and “Spain” is similar to the difference between “Rome” and “Italy”, capturing the “capital city” is to “country” relationship. This example showcases the model’s ability to understand geographical hierarchies and the concept of capital cities in relation to their countries, further illustrating the breadth of knowledge that can be encoded within word embeddings.

Distance of embedding representations of capitals/countries
Figure 2 – Distance of embedding representations of capitals/countries. Photo credit: Michigan AI Lab

Through these examples, word embeddings demonstrate their remarkable capability to encode and retrieve complex semantic relationships from a vast pool of knowledge.

Integrate embeddings with LLMs

So now that we understand what an embedding is, we can apply this concept to convert our own private data into embeddings. Because embeddings do not contain identifiable information, they are very useful for ensuring the privacy and security of our data. Additionally, organizations can comply with data protection regulations such as the GDPR and HIPAA.

Semantic search diagram
Figure 3 – Semantic search diagram

Here, you can see an overview of how this solution works. First of all, we have to create a vector database to store our private information. To do that, we need to gather our documents, which can be in plain text files, PDFs, audios, etc., and split them into small pieces, commonly referred to as chunks. Once we have our information partitioned, the following step is to convert those partitions into embeddings and store them in a vector database.

After the data preprocessing is complete, you will have all the knowledge from your data indexed in a vector database, ready to be used.

The next stage involves handling user queries, which begins when a user ask a question, prompting the AI system to spring into action. The first step in this process is to convert the user’s question into an embedding. This vector embedding captures the essence of the question. Then, the system performs magic by taking the vector embedding of the question and comparing it through an algorithm that measures the similarity with other vector embeddings stored in the vector database. The result is the identification of the top k similar vectors that are deemed most relevant to the user’s query vector.

The final step involves feeding the query, along with the context provided by the k similar vectors identified during the semantic search of the database, into a Large Language Model (LLM) such as Llama2 or GPT-4. Consequently, the LLM generates an answer to the query using the information retrieved from the vector database.

2. Fine-tune an existing LLM

This solution utilizes the concept of using all the knowledge already present in a pre-trained LLM and further training it on your own specialized data. Imagine teaching an already educated adult new skills specific to your company’s operations rather than training a child from scratch.

This process allows the LLM to adapt its understanding and outputs to be more relevant and accurate for specific tasks or industries. For companies that use their own concepts, products, or customer service rules, fine-tuning ensures the AI’s answers align well with their specific needs and language.

Fine-tuned process diagram
Figure 4 – Fine-tuned process diagram

As you can see in the diagram above, we perform the fine-tuning process on a pre-trained LLM, so the first step is to choose one to start from. I’m sure that you have already heard about pre-trained LLMs like GPT-4 from OpenAI, Bert from Google, or Llama2 from Meta, but every model has its strong and weak points, so it is very important to select one that is suited to our needs.

Once we have selected our pre-trained model, we have to carefully select our domain-specific data to well represent the specific use case. The dataset could be anything from customer service transcripts to technical manuals. This dataset is used to train the model and also evaluate how well the fine-tuned model performs. Through this process of teaching the model with our data, the model learns to predict and generate text that closely matches the domain-specific dataset, gradually improving its ability to answer appropriately in real-world scenarios.

At the end of this fine-tuning process, we have our fine-tuned LLM to use it with our users to answer questions about what we taught it.

3. Using a structured or nonstructured database

Many companies have structured databases where they store information in a fixed format, making it easy for computers to understand because of the existence of field names and metadata that can contain descriptions of those fields or IDs to relate between database tables. On the other hand, we have unstructured databases, which are more flexible and can handle various types of data. The structure depends on whether it is an email, social media posts, or data from different sensors. When these types of databases are used in conjunction with LLMs, we can create a solution that allows asking about that database information, providing responses to the user’s queries not only intelligently but also highly customized to the user’s needs.

Database in conjuntion with LLM flow
Figure 5 – Database in conjuntion with LLM flow

This solution consists of two principal steps. The first one is to connect a database with all your information to an LLM. This involves setting up a system where the model can generate a query based on the user’s question, retrieve the information, and use that information to inform its responses. For example, if a user asks about the sales of the company for the current month for product X, the LLM can pull the data directly from the company’s sales database to provide a specific answer. This setup requires careful planning to ensure the model can understand the structure of the database and how to extract the right information.

The technical integration is not just about incorporating a database but also about designing the interaction flow, often called a pipeline. This solution has essential steps after establishing that flow, like defining how and when the AI should access the database, what kind of information it can request depending on the user, and how to incorporate the retrieved data into its answers. These situations are typically addressed by programming the model to recognize certain types of questions or requests to choose what it does and how it does it. The main goal is to create a system where the user feels that they are conversing with a knowledgeable assistant, rather than triggering automated lookups to a database.

Conclusion

There are many variations of these architectures, including combinations of the three mentioned and more. However, I tried to make this article accessible to anyone interested in these technologies, wanting to learn a bit more without the use of too many technical terms and techniques.

Also, the insights and architectures discussed in this article are not just guidelines but hope a source of inspiration for businesses to explore, adapt and innovate with Artificial Intelligence, paving for more intelligent, effective and responsive future.

Thank you for reading! 😄