Reasoning-Augmented Generation (ReAG): Definition, Techniques, and Applications and comparations with RAG

Reasoning-Augmented Generation (ReAG) is an emerging approach in AI that integrates a language model’s reasoning process directly into the content generation pipeline, especially for knowledge-intensive tasks. In a traditional Retrieval-Augmented Generation (RAG) setup, a query is answered in two stages: first, retrieving documents (often via semantic similarity search) and then generating an answer from those documents (ReAG: Reasoning-Augmented Generation  – Superagent). While effective, this RAG approach can fail to capture deeper contextual links – it may retrieve text that looks similar to the query but misses relevant information (ReAG: Reasoning-Augmented Generation  – Superagent). ReAG was introduced to overcome these limitations by essentially skipping the separate retrieval step (ReAG: Reasoning-Augmented Generation  – Superagent). Instead of relying on pre-indexed snippets or purely surface-level matches, ReAG feeds raw source materials (e.g., full-text files, web pages, or spreadsheets) directly into a large language model (LLM), allowing the model itself to determine what information is useful and why (ReAG: Reasoning-Augmented Generation – Superagent). The LLM evaluates the content holistically and then synthesizes an answer in one go, effectively treating information retrieval as part of its reasoning process.

This approach marks a significant shift in how AI systems handle external knowledge. ReAG’s purpose is to make AI-generated answers more context-aware, accurate, and logically consistent by leveraging the LLM’s inferencing ability on the fly. The model can infer subtle connections across entire documents rather than being constrained to whatever a search index deems relevant. This is especially important in complex NLP tasks where the relevant answer may be implicit or spread across different text sections. By aligning the process more closely with how a human researcher would work (skimming sources, discarding irrelevancies, and focusing on meaningful details, ReAG aims to produce results that are not only factually grounded but also nuanced in understanding. In the context of modern AI, ReAG represents a move toward making generative models “think before they speak,” injecting a reasoning step that improves reliability and depth. It holds significance in NLP and AI as a method to reduce hallucinations, keep up with dynamic knowledge, and ultimately generate outputs that better reflect real-world information and logical relations.

Implementation Details

ReAG (Retrieval-Augmented Generation with Relevancy Assessment) analyzes a user question and scans provided documents to extract only relevant information needed to answer it. It employs two distinct language models: one that evaluates document relevancy by examining each document individually against the question and returning structured JSON outputs indicating whether the content is relevant. Relevant segments are then collected and passed to the second language model, which generates a concise and contextually accurate response. This approach ensures answers are precise and contextually grounded by systematically filtering irrelevant data before generating responses.

Python
# ------------------------------
# 1. Package Installation (if needed)
# ------------------------------
#!pip install langchain langchain_community pymupdf pypdf openai langchain_openai

# ------------------------------
# 2. Imports
# ------------------------------
import os
import concurrent.futures
from pydantic import BaseModel, Field
from typing import List
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import JsonOutputParser
from langchain_community.document_loaders import PyMuPDFLoader
from langchain.schema import Document
from langchain_core.prompts import PromptTemplate
# Load environment variables from a .env file.
from dotenv import load_dotenv

# ------------------------------
# 3. Environment and Model Initialization
# ------------------------------
load_dotenv()

# Set your OpenAI API key as an environment variable.
#os.environ["OPENAI_API_KEY"] = "sk-<your-openai-api-key>"

# Initialize the general language model for question-answering.
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
)

# Initialize a second language model specifically for assessing document relevancy.
llm_relevancy = ChatOpenAI(
    model="o3-mini",
    reasoning_effort="medium",
    max_tokens=3000,
)

# ------------------------------
# 4. Prompt Templates
# ------------------------------

# System prompt to guide the relevancy extraction process.
REAG_SYSTEM_PROMPT = """
# Role and Objective
You are an intelligent knowledge retrieval assistant. Your task is to analyze provided documents or URLs to extract the most relevant information for user queries.

# Instructions
1. Analyze the user's query carefully to identify key concepts and requirements.
2. Search through the provided sources for relevant information and output the relevant parts in the 'content' field.
3. If you cannot find the necessary information in the documents, return 'isIrrelevant: true', otherwise return 'isIrrelevant: false'.

# Constraints
- Do not make assumptions beyond available data
- Clearly indicate if relevant information is not found
- Maintain objectivity in source selection
"""

# Prompt template for the retrieval-augmented generation (RAG) chain.
rag_prompt = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
"""

# ------------------------------
# 5. Schema Definitions and JSON Parser Setup
# ------------------------------

# Define a schema for the expected JSON response from the relevancy analysis.
class ResponseSchema(BaseModel):
    content: str = Field(..., description="The page content of the document that is relevant or sufficient to answer the question asked")
    reasoning: str = Field(..., description="The reasoning for selecting the page content with respect to the question asked")
    is_irrelevant: bool = Field(..., description="True if the document content is not sufficient or relevant to answer the question, otherwise False")

# Wrapper model for the relevancy response.
class RelevancySchemaMessage(BaseModel):
    source: ResponseSchema

# Create a JSON output parser using the defined schema.
relevancy_parser = JsonOutputParser(pydantic_object=RelevancySchemaMessage)

# ------------------------------
# 6. Helper Functions
# ------------------------------

# Format a Document into a human-readable string that includes metadata.
def format_doc(doc: Document) -> str:
    return f"Document_Title: {doc.metadata['title']}\nPage: {doc.metadata['page']}\nContent: {doc.page_content}"

# Define a helper function to process a single document.
def process_doc(doc: Document, question: str):
    # Format the document details.
    formatted_document = format_doc(doc)
    # Combine the system prompt with the document details.
    system = f"{REAG_SYSTEM_PROMPT}\n\n# Available source\n\n{formatted_document}"
    # Create a prompt instructing the model to determine the relevancy.
    prompt = f"""Determine if the 'Avaiable source' content supplied is sufficient and relevant to ANSWER the QUESTION asked.
    QUESTION: {question}
    #INSTRUCTIONS TO FOLLOW
    1. Analyze the context provided thoroughly to check its relevancy to help formulize a response for the QUESTION asked.
    2. STRICTLY PROVIDE THE RESPONSE IN A JSON STRUCTURE AS DESCRIBED BELOW:
        ```json
           {{"content":<<The page content of the document that is relevant or sufficient to answer the question asked>>,
             "reasoning":<<The reasoning for selecting the page content with respect to the question asked>>,
             "is_irrelevant":<<Specify 'True' if the content in the document is not sufficient or relevant. Specify 'False' if the page content is sufficient to answer the QUESTION>>
             }}
        ```
     """
    messages = [
        {"role": "system", "content": system},
        {"role": "user", "content": prompt},
    ]
    # Invoke the relevancy language model.
    response = llm_relevancy.invoke(messages)
    #print(response.content)  # Debug output to review model's response.
    # Parse the JSON response.
    formatted_response = relevancy_parser.parse(response.content)
    return formatted_response

# Extract relevant context from the provided documents given a question, using parallel execution.
def extract_relevant_context(question, documents):
    results = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Submit all document processing tasks concurrently.
        futures = [executor.submit(process_doc, doc, question) for doc in documents]
        for future in concurrent.futures.as_completed(futures):
            try:
                result = future.result()
                results.append(result)
            except Exception as e:
                print(f"Error processing document: {e}")
    # Collect content from documents that are relevant.
    final_context = [
        item['content']
        for item in results
        if str(item['is_irrelevant']).lower() == 'false'
    ]
    return final_context

# Generate the final answer using the RAG approach.
def generate_response(question, final_context):
    # Create the prompt using the provided question and the retrieved context.
    prompt = PromptTemplate(template=rag_prompt, input_variables=["question", "context"])
    # Chain the prompt with the general language model.
    chain = prompt | llm
    # Invoke the chain to get the answer.
    response = chain.invoke({"question": question, "context": final_context})
    answer = response.content.split("\n\n")[-1]
    return answer

# ------------------------------
# 7. Main Execution Block
# ------------------------------
if __name__ == "__main__":
    # Load the document from the given PDF URL.
    file_path = "https://www.binasss.sa.cr/int23/8.pdf"
    loader = PyMuPDFLoader(file_path)
    docs = loader.load()
    print(f"Loaded {len(docs)} documents.")
    #print("Metadata of the first document:", docs[0].metadata)

    # Example 1: Answer the question "What is Fibromyalgia?"
    question1 = "What is Fibromyalgia?"
    context1 = extract_relevant_context(question1, docs)
    print(f"Extracted {len(context1)} relevant context segments for the first question.")
    answer1 = generate_response(question1, context1)

    # Print the results.
    print("\n\nQuestion 1:", question1)
    print("Answer to the first question:", answer1)

    # Example 2: Answer the question "What are the causes of Fibromyalgia?"
    question2 = "What are the causes of Fibromyalgia?"
    context2 = extract_relevant_context(question2, docs)
    answer2 = generate_response(question2, context2)
    
    # Print the results.
    print("\nQuestion 2:", question2)
    print("Answer to the second question:", answer2)

The is_irrelevant field is a boolean indicator that explicitly flags whether a particular document (or segment) contains sufficient and relevant information to answer the user’s question. When is_irrelevant is set to True, it signifies that the analyzed document does not provide adequate context or relevant content, making it excluded from the final response. Conversely, when set to, it indicates the document does include valuable content that directly addresses the user’s query, prompting its inclusion in the context that will inform the model’s final generated answer.

I’ve set up a GitHub repository filled with all the code you need! https://github.com/LawrenceTeixeira/ReAG

Here’s a link to a Google Colab notebook where you can test yourself. https://colab.research.google.com/drive/1UvX7n3693wpdNPyeGkx3lvWmUEWR16LW?usp=sharing

Superagent has also developed a ReAG SDK that you can use, available on GitHub: https://github.com/superagent-ai/reag

I also wrote a small Python script to test the SDK mentioned below:

Python
"""
This module demonstrates how to use the ReagClient to perform a query on a set of documents.
It sets up an asynchronous client with the model "ollama/deepseek-r1:7b" and queries it with a document.
"""

import asyncio
from reag.client import ReagClient, Document

async def main():
    """
    Main asynchronous function that:
      - Initializes a ReagClient with specified model parameters.
      - Creates a list of Document instances to be used in the query.
      - Sends a query ("Deep Research?") along with the documents.
      - Prints the response received from the query.
    
    The ReagClient is configured to use:
      - model: "ollama/deepseek-r1:7b"
      - model_kwargs: {"api_base": "http://localhost:11434"}
    """
    # Create an asynchronous context for the ReagClient
    async with ReagClient(
        model="ollama/deepseek-r1:7b",
        model_kwargs={"api_base": "http://localhost:11434"}
    ) as client:
        
        # Define a list of documents to be used in the query.
        docs = [
            Document(
                name="Deep Research",
                content=(
                    "The Future of Research Workflows: AI Deep Research Agents Bridging "
                    "Proprietary and Open-Source Solutions."
                ),
                metadata={
                    "url": "https://lawrence.eti.br/2025/02/08/the-future-of-research-workflows-ai-deep-research-agents-bridging-proprietary-and-open-source-solutions/",
                    "source": "web",
                },
            ),
        ]
        
        # Perform the query using the client, passing in the document list.
        response = await client.query("Deep Research?", documents=docs)
        
        # Output the query response.
        print(response)

if __name__ == "__main__":
    # Run the main asynchronous function using asyncio's event loop.
    asyncio.run(main())

Applications of ReAG

ReAG’s ability to combine on-the-fly knowledge retrieval with reasoning makes it powerful for real-world applications. Below are several domains and scenarios where ReAG can be particularly impactful:

  • AI-Assisted Writing and Content Generation: Creative and technical writing can benefit from ReAG through AI co-pilots that draft text and pull in relevant information as they write. For example, consider a content writer preparing an article on climate change. A ReAG-powered assistant could accept the draft or outline of the article and automatically fetch full-text reports, scientific studies, and news articles related to each section. As the model generates paragraphs, it can reason about these source documents to include accurate facts or even direct quotes, all within the generation process. This leads to more factually grounded content. In practice, tools for bloggers or journalists could use ReAG to generate first drafts of articles that come with in-line citations to source material (much like a well-researched Wikipedia entry). This goes beyond typical AI writing (which might regurgitate generic knowledge) by ensuring the content is backed by specific, up-to-date references. It’s like having a built-in research assistant. For instance, an AI writing an essay about renewable energy might internally read recent energy reports and weave in data about solar capacity growth or policy changes, correctly attributing them (What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blogs). Such a system reduces the time humans spend searching for information and checking accuracy, thereby speeding up content creation while preserving quality.
  • Decision Support and Analytical Systems: In enterprise settings – from finance to law to healthcare – decision-makers often query large volumes of documents to arrive at conclusions. ReAG can power decision-making assistants that, given a complex question, will comb through company financial reports, market analysis PDFs, or policy documents and produce a well-reasoned answer or recommendation. For instance, a financial analyst might ask, “What were the main factors affecting our Q4 profits according to our internal reports?” Instead of just keyword-matching “Q4 profits” in a database, a ReAG system would read through all the quarterly reports, earnings call transcripts, and relevant news, then synthesize a coherent summary (perhaps noting, “Raw materials costs increased by 20% (see ProcurementReport.pdf), and sales in Europe declined (see SalesAnalysis.xlsx)”, with those references embedded). The advantage here is that the AI isn’t limited to pre-tagged data; it can catch subtle points in text, like a discussion of an issue that doesn’t explicitly mention “profits” but is contextually critical. In the legal domain, a ReAG-driven assistant could read through case law and legal briefs to answer a query like, “On what grounds have courts typically ruled on X in the past decade?” providing an answer and excerpts from the judgments. This application shows how ReAG can assist in high-stakes decision-making by providing a form of automated due diligence: it reasons over the same raw materials a human expert would, potentially surfacing insights that a more straightforward search might overlook. Companies are exploring such AI for internal knowledge management – imagine asking your company’s AI assistant a strategic question, and it reads through all relevant memos, emails, and reports to give you a cogent answer with reasoning.
  • Scientific Research and Literature Review: In science and academia, the volume of literature is massive and growing daily. Researchers can use ReAG to perform literature reviews or answer scientific questions by reading multiple papers or articles and synthesizing findings. For example, a biomedical researcher might ask, “What are the recent advancements in mRNA vaccine delivery methods?” A ReAG system could retrieve dozens of recent research papers and conference proceedings (without needing a pre-built database of them), have the LLM analyze each in terms of relevance (perhaps it finds five papers that truly address new delivery mechanisms), extract key experimental results or conclusions from those papers, and then generate a summary of the advancements. Crucially, because the model reads full papers, it can connect ideas – maybe paper A introduces a novel nanoparticle carrier, and paper B discusses improved immune response with a certain formulation; the AI could correlate these and highlight that both improved stability and immune response are being targeted by new delivery methods. Such a comprehensive synthesis would be difficult for a keyword search system. We already see early versions of this in tools like Elicit or Semantic Scholar’s AI, which attempt to answer questions from papers – moving forward, adopting ReAG means these tools wouldn’t rely solely on title/abstract matching but parse the papers’ content. In broader scientific research, ReAG can assist in interdisciplinary queries, too (reading economics papers and sociology studies to answer a cross-domain question, for instance). Improving how AI handles citations and context could help draft survey articles or related work sections for papers, ensuring the content is up-to-date with the latest findings (since the model can be fed the latest publications directly).
  • Knowledge Management for Dynamic Domains: Some industries, like news media or regulatory compliance, deal with continuously changing information. ReAG shines in dynamic data scenarios because it doesn’t require re-indexing documents when they change – it processes whatever is current at query time. An application here is in media monitoring or real-time intelligence. Suppose an analyst needs to know, “How was country X mentioned in global news concerning renewable energy investments last week?” A ReAG-powered system could fetch all relevant news articles from that week (perhaps via an RSS feed or API), then let the LLM review each article, pick out pertinent mentions of country X and renewable projects, and generate a concise report. The benefit is that even if the way the news is phrased varies (one article might not explicitly say “investment” but talks about “funding a solar plant”), the LLM’s reasoning can catch the connection. Similarly, for compliance, an AI assistant could read all new regulations or policy documents and answer, “Did any new rule this week affect data privacy measures?” by reading the raw text of those regulations to see if they touch data privacy. This ability to adapt to the latest information without manual reprocessing is crucial in fast-paced fields.
  • Multi-Modal Data Analysis: While most current discussions of ReAG involve text, the concept can extend to other data types if the LLM or associated tools can handle them. For instance, if an LLM can interpret images or tables (with the help of vision models or parsing tools), ReAG could feed text, figures, or spreadsheets into the model (ReAG: Reasoning-Augmented Generation  – Superagent). Imagine a business intelligence assistant that, given a query, looks at slide decks (with charts), PDFs (with tables), and text reports – all together – and reasons across them. A Business Analyst AI might use this: ask, “What were the key performance drivers this month according to all department reports?” the AI could extract a trend from a sales graph image, a number from a finance Excel table, and a statement from HR’s memo, and synthesize an answer combining all three modalities. While true multimodal ReAG is still cutting-edge, the foundation is being laid by multi-modal LLMs (like GPT-4’s vision features or PaLM-E). The significance is that ReAG is not limited to pure text; any information represented in the model can be reasoned over. Early demonstrations show promise in combining text and tables for better answers – for example, an AI assistant reading an academic paper’s text and its embedded chart to fully answer a question about the paper.

Advantages & Challenges of ReAG

Like any innovative approach, ReAG has powerful advantages and notable challenges. It’s important to understand both sides when evaluating ReAG for use in AI systems.

Advantages

  • Deeper Contextual Understanding: Because ReAG involves an LLM reading entire documents, it can capture nuances and indirect references that keyword-based retrieval might miss. The model considers the full context of each source, enabling answers that truly address the query’s intent. This means for complex or open-ended queries, ReAG is more likely to find the needle in the haystack – e.g., identifying a relevant paragraph buried in a long report even if it doesn’t use the exact phrasing of the question. This leads to more accurate and nuanced responses. The answers can incorporate subtle connections (as in the earlier polar bear example, where a document about sea ice was recognized as relevant to a question on polar bear decline because the model inferred the relationship. This holistic comprehension mirrors human-level analysis and often provides better topic coverage in the final answer.
  • Reduced Need for Complex Infrastructure: Traditional RAG pipelines involve many moving parts – document chunkers, embedding generators, vector databases, retriever algorithms, rerankers, etc. ReAG drastically simplifies the architecture by offloading most of this work to the LLM. There’s no need to maintain an external index or database of embeddings, which eliminates classes of bugs like indexing errors or stale data. For developers, fewer components mean easier maintenance and integration. You essentially need the LLM and a way to feed it data; this can accelerate the development of knowledge-driven features. As noted in one analysis, ReAG replaces brittle retrieval systems with a leaner process and thus avoids issues of embedding mismatch or vector search quirks, letting “users query raw documents without wrestling with vector databases” (ReAG: Reasoning-Augmented Generation  – Superagent). In short, ReAG trades system complexity for an almost brute-force but straightforward approach: let the model do it. This also simplifies updates – you just provide new documents to the model rather than re-encoding and re-indexing everything.
  • Timely and Up-to-date Information: ReAG inherently works with the latest available documents at query time, so it naturally handles dynamic knowledge updates better. In domains where information changes frequently (news, financial filings, scientific discoveries), ReAG can pull in the most recent data without extra overhead. Traditional RAG might require periodic reprocessing of a corpus to stay current, which, if not done, results in the model using outdated info. With ReAG, if a document exists, it can be considered in answering the question. This makes it appealing for applications like live question-answering, monitoring events, or any scenario where you want the AI’s knowledge base to be as fresh as your data. For example, an AI assistant for a medical journal could answer a question about a very recent study as soon as that study’s text is available without waiting for an indexing pipeline to run.
  • Improved Logical Consistency and Evidence Use: By structuring the task such that the model must extract supporting content before answering, ReAG encourages the model to stick to the evidence and maintain logical consistency. The model’s intermediate reasoning steps (deciding relevance, pulling facts) act as a form of chain of thought that grounds the final output. This tends to reduce hallucinations and unsupported statements, one of the plagues of pure generative models. Techniques combining reasoning and retrieval (like the ReAct approach) have demonstrated significantly lower hallucination rates because the model “checks” itself against real data. ReAG falls in this category – since the answer is explicitly based on snippets from sources, the likelihood that it will introduce a completely unfounded claim is lower. Additionally, the final answers can be accompanied by references to the source documents (as is often done in RAG and equally possible in ReAG), which adds transparency. Users can be shown which document and passage backs up a
    part of the answer, boosting trust. This explainability – the model can point to why it answered a certain way – is a direct benefit of the reasoning-centric design.
  • Handles Multi-Hop Queries and Indirect Relationships: ReAG is particularly powerful for queries that require synthesizing information from multiple sources or following a line of reasoning through different pieces of data. Because the LLM effectively performs a custom analysis on each document, it can find and stitch together pieces of information that a simple retrieval might not connect. For instance, a question might require info from Document A and Document B combined – a ReAG system can read both fully and notice the link. Traditional RAG might pull those documents, but the model would see them only in isolation, as provided in the context. In ReAG, the model can infer relationships (“Document A’s finding X could be related to Document B’s statement Y”) during its reasoning stage, leading to a more coherent multi-hop answer. This makes it well-suited for complex Q&A tasks and decision support where reasoning across sources is required.
  • Flexibility with Data Modalities: Another advantage is that ReAG doesn’t rely on a single uniform embedding space, so you can plug in various data types as long as the model (or an adjunct tool) can handle them. You could directly feed text OCR from images, transcripts from audio, or data from spreadsheets. An LLM with vision or table-parsing abilities could process those formats as part of the reasoning. This flexibility is harder in standard pipelines, which usually handle one modality simultaneously (or require separate indices per modality). In ReAG, the developer’s job is to get the raw data before the model. This opens the door to rich multi-modal question answering without elaborate multi-modal indexing. For example, without a special-case code, a ReAG system could consider an image’s caption text and the text around it in a document to answer a question about the image’s content – it’s all just “document text” to the model.

Challenges of implementing ReAG

  • High Computational Cost: The most cited drawback of ReAG is that it is computationally and financially expensive relative to traditional methods. Having a large language model read every document for every query is a heavy lift. If you have 100 documents and ask one question, the model might be invoked 101 times (100 for analysis + 1 for synthesis). In contrast, using pre-computed embeddings, a vector database could retrieve likely relevant chunks in milliseconds. For example, analyzing 100 research papers via ReAG means 100 separate LLM calls, whereas RAG might scan an index almost instantly to pull a few passages (ReAG: Reasoning-Augmented Generation  – Superagent). This difference can translate to significant cost (if using paid API calls) and latency. Even with parallelization, the total compute is proportional to the number of documents * times the cost per model inference. For large deployments, this doesn’t scale well. Running ReAG on very large corpora (thousands or millions of documents) is currently impractical without introducing some shortcuts. The cost challenge is expected to be mitigated over time as model efficiency improves – cheaper open-source models on powerful hardware or model compression techniques (quantization, distillation) can lower per-call cost (ReAG: Reasoning-Augmented Generation  – Superagent). But for now, cost remains a barrier: developers must carefully decide when the improved answer quality is worth the extra compute. Some may opt to only enable ReAG on queries that truly need it (complex ones), while using cheaper retrieval for more straightforward questions.
  • Slower Response Time: Tied to cost is the issue of speed. Even if run in parallel, feeding and processing large documents has inherent latency. If each document takes one second for the model to process (which might be conservative depending on model size and document length), 100 documents in parallel still roughly take one second plus some overhead – which is much slower than a typical search engine lookup. Users might notice this delay for interactive applications like chatbots if the system is connected to many documents. This could be a drawback in time-sensitive scenarios (like real-time assistance). As the dataset grows, ReAG struggles: “Even with parallelization, ReAG [can] suffer with massive datasets. If you need real-time answers across millions of documents, a hybrid approach might work better, using RAG for initial filtering and ReAG for final analysis.”. This highlights that pure ReAG doesn’t scale smoothly to huge data environments where sub-second retrieval is expected; compromise is needed. Caching can alleviate this to an extent (e.g., if the same doc is asked about repeatedly, one could cache its extracted summary), but caching is harder here than in RAG because what’s extracted is query-dependent.
  • Context Window Limitations: While ReAG leverages large context windows, it’s still bounded by them. The final answer generation step might hit context limits if the relevant information is spread across too many documents or if individual documents are very large. Current top models have context sizes like 100k tokens (GPT-4 32K or Claude 100K, with experimental up to 1M (NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?)). These are huge but not infinite – if a query truly needs content from dozens of lengthy documents, the model might not be able to consider all simultaneously when formulating an answer. This could force the system to drop some less relevant snippets or summarize them further, potentially losing detail. Moreover, the document analysis step itself is limited by context. If a single document exceeds what the model can read at once, you have to chunk it and possibly lose some cross-chunk reasoning. That reintroduces some of the chunking problems ReAG set out to avoid (though within one document, one could chunk with overlap or intelligent sectioning). Until models can handle arbitrarily long text (or a clever sliding window approach is standardized), ReAG might face challenges with extremely large inputs.
  • Model Reliability and Hallucination: ReAG generally reduces hallucinations by grounding in documents, but it’s not a panacea. The approach still heavily relies on the LLM’s judgment. If the model isn’t well-aligned or misinterprets instructions, it might flag an irrelevant document as relevant (picking up on a false cue) or vice versa. It might extract a passage that it thinks is answering the question but isn’t fully correct or taken out of context. During the final synthesis, there’s also a risk that the model might introduce information that “bridges” gaps in the sources but isn’t present (a form of hallucination). For example, if none of the documents explicitly state an answer, the model might try to deduce one and state it confidently, which could be wrong. In RAG, the separation of retrieval and generation sometimes makes it easier to spot when the model is going out-of-bounds (since you only give it certain passages; if it says something unrelated,d you know it’s hallucinating). In ReAG, the model has more freedom, which is power but also risk. Ensuring the prompts are tight (like instructing “only use the given content”) is important but not foolproof. Therefore, quality control remains challenging – one may need additional verification steps or human-in-the-loop for critical applications.
  • Scalability and Maintenance: A pure ReAG approach may become unwieldy for large knowledge bases. If an organization has a million documents, running them through an LLM for every query is simply not feasible. This leads to the likely need for hybrid systems, where some preprocessing or lightweight retrieval narrows the scope before using ReAG. Designing such hybrid systems introduces complexity that partly negates the advantage of simplicity. It becomes a challenge to find the optimal balance: too much pre-filtering might reintroduce the risk of missing relevant info (the very thing ReAG is meant to avoid), while too little makes it slow. Maintenance-wise, while it’s nice not to manage an index, one does have to maintain the prompt configurations and possibly update them as the model changes. If you switch to a new model with a different style, you might need to adjust how you extract content or instruct it to reason.
  • Resource Requirements: Because ReAG often requires running large models many times, it demands robust computational resources (GPUs, memory for significant contexts, etc.). Implementing ReAG at scale could be prohibitive for organizations without access to these. Even with cloud APIs, hitting rate or budget limits could be a concern. In contrast, a well-optimized vector search + a smaller model might run on a single server. Thus, adopting ReAG might necessitate an investment in higher-end AI infrastructure.

Comparisons with Other Generative AI Models

ReAG introduces a distinct paradigm, and it’s useful to compare it with other prominent models and approaches in the generative AI landscape, namely standard large language models like GPT and T5, and the traditional Retrieval-Augmented Generation (RAG) pipeline. Below, we outline the key differences and characteristics:

  • Versus GPT (Generative Pre-trained Transformers): GPT models (such as OpenAI’s GPT-3 and GPT-4) are examples of large language models trained on broad internet text and can generate fluent responses. By themselves, GPT models do not use external documents at query time – they rely on the knowledge stored in their model parameters. GPT can answer based on what it remembers from training data but cannot fetch new information post-training. In practice, GPT-4 has impressive reasoning ability and can follow instructions, but if you ask it about very recent events or obscure facts not in its training set, it may fabricate answers (a.k.a. hallucinate) ([2005.11401] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks). ReAG addresses this limitation by always grounding answers in provided sources, effectively extending GPT’s capabilities with a reasoning-based retrieval of actual data. Another difference is pipeline complexity: GPT is straightforward (prompt in, completion out), whereas ReAG orchestrates multiple GPT (or similar LLM) calls plus logic to manage documents. In essence, ReAG can be seen as using GPT more smartly. It’s not a new model architecture but a methodology on top of models like GPT. In terms of output, a well-executed ReAG system will often be more factual and specific than a vanilla GPT because it has the relevant text on hand. GPT might give a very fluent answer drawing from its general knowledge, but that answer might miss recent details or specific figures, which ReAG could include by having read a source.
    On the other hand, GPT alone is typically faster and cheaper per query since it’s just one model run. Use case distinction: GPT (without retrieval) is good for general-purpose tasks, creative writing, or known domains of knowledge; ReAG shines when up-to-date or source-specific information is needed with high fidelity. It’s worth noting that one can combine GPT with retrieval (that essentially becomes an RAG system). ReAG is a step further – rather than retrieving small bits for GPT, it makes GPT (or any LLM) do the retrieval reasoning. So, one might say ReAG is not competing against GPT, but rather leveraging GPT differently. For example, you could use GPT-4 as the engine inside a ReAG pipeline.
  • Versus T5 (Text-to-Text Transfer Transformer): T5 is another language model (from Google, introduced by Colin Raffel et al.) that treats every NLP task as a text-to-text problem. Like GPT, the base T5 model does not incorporate external data at inference time unless augmented. T5 (especially in large versions or variants like Flan-T5, which is instruction-tuned) can also generate and even provide some reasoning when prompted. However, T5’s knowledge is limited to its training data (e.g., up to 2019 for original T5). Using T5 in a setting like question answering often required fine-tuning task-specific data or using it as the generator in a RAG setup ([2005.11401] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks). The original RAG paper used a sequence-to-sequence model (comparable to T5 or BART) as the parametric component ([2005.11401] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks). ReAG, by contrast, can use a model like T5 in a zero-shot way to answer questions about new documents by feeding the documents to T5 along with the query. One could imagine an implementation where Flan-T5 is prompted to do ReAG steps (it might not be as capable as GPT-4, but the method is similar). Key differences: T5 has an encoder-decoder architecture, which may handle long inputs differently (the encoder can read a lot of text, but there is still a limit). GPT is decoder-only but has large context windows in newer versions. From a user perspective, GPT and T5 without retrieval are similar – they don’t actively fetch external info. Thus, ReAG’s contrast with T5 is analogous to GPT: ReAG ensures external knowledge integration via reasoning, whereas T5 alone would be stuck with static knowledge or require explicit retrieval. Compared to ReAG, GPT and T5 may generate less logically consistent answers on complex knowledge tasks since they don’t have a built-in mechanism to verify against sources. For example, an un-augmented GPT/T5 might produce a plausible-sounding, inconsistent, or partially incorrect answer if asked a tricky multi-part question.
    In contrast, ReAG would attempt to validate each part by reading documents. Another point: T5 was designed to be fine-tuned on specific tasks, whereas ReAG is a prompting strategy that can work in a zero-shot or few-shot manner. So, ReAG is inherently more flexible – it doesn’t require training the model to use retrieval; it uses prompting to achieve the effect. This makes it relatively model-agnostic (you could use GPT-4, T5, Llama-2, etc., as long as they are strong in comprehension).
  • Versus RAG (Retrieval-Augmented Generation): RAG is the most direct predecessor to ReAG. In RAG systems (as formulated by Lewis et al., 2020), an external retriever (often a dense vector retriever using embeddings) is used to fetch a few relevant text passages from a large corpus, and then those passages are given to the generative model to compose an answer. The key difference is how the relevant information is obtained: RAG relies on similarity search (the “court clerk” fetching documents by keywords, in an analogy (What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blogs) (What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blogs)). In contrast, ReAG relies on the LLM’s reasoning to evaluate content (acting like a “scholar” reading everything and underlining useful parts. As a result, ReAG can overcome some RAG limitations. RAG might miss relevant documents but not share obvious vocabulary with the query.
    In contrast, ReAG can catch those because the model reads and can infer relevance (for example, identifying a study about “lung disease trends” as relevant to air pollution impacts, which a pure semantic search might skip. Also, RAG’s retrieved chunks are often limited in size (to fit into the model’s input), which can lead to missing context (“lost in the middle” problem where crucial info isn’t in any single chunk). ReAG avoids this by letting the model see whole documents, thus preserving context and reducing the chance of overlooking middle parts. However, RAG has strengths in efficiency: retrieving vectors and a quick generation is usually faster/cheaper than reading everything. Another difference is system complexity: RAG requires maintaining an index and sometimes training a retriever model. ReAG simplifies that setup at the expense of runtime complexity. There’s also a difference in how answers are generated: once the passages are retrieved, the model generates an answer (possibly using attention over those passages) in RAG. In ReAG, the generation is tightly coupled with the reasoning that identified the passages. You can think of ReAG as doing much of what RAG does, but implicitly through the model’s internal work rather than explicit external steps.

The following table summarizes some of the key differences between a standard LLM (like GPT-4 or T5), a classic RAG pipeline, and the ReAG approach:

ApproachExternal Knowledge UsageMechanism for Retrieval/IntegrationStrengthsLimitations
GPT/T5 (Base LLM)Yes, uses external data via a retriever (e.g., search index or vector DB)N/A – directly generates from prompt and its internal knowledge– Fluent, general-purpose generation- Fast response (single step)- No setup needed for knowledge base– Knowledge may be outdated or incomplete – Prone to factual errors/hallucinations on specific data )- Cannot cite sources or update knowledge without retraining
RAG (Retrieval-Augmented Generation)Unified reasoning+generation: LLM reads full documents, determines relevance and extracts key info, then synthesizes answers in one workflow Two-step: retrieve relevant text chunks (via embeddings or keyword search), then feed those chunks into LLM for answer generation – Can provide up-to-date, specific info ([What Is Retrieval-Augmented Generation aka RAGNVIDIA Blogs]- More factual and can cite sources ([What Is Retrieval-Augmented Generation aka RAG
ReAG (Reasoning-Augmented Generation)Yes, uses external data by directly feeding raw docs to LLMUnified reasoning+generation: LLM reads full documents, determines relevance and extracts key info, then synthesizes answers in one workflow– Deep understanding of context (reads whole docs) – Can catch subtle or indirect evidence (model infers relevance) – Simplified architecture (no separate search index) – Answers reflect nuanced details of sources– High computational cost (many LLM calls) – Slower on large sets of documents – Requires large context windows and careful prompt management- Model’s reasoning must be trusted (difficult to debug mistakes)

Table: Comparison of Standard LLM vs. Retrieval-Augmented Generation (RAG) vs. Reasoning-Augmented Generation (ReAG).

To summarize the differences in workflow and design, the following table contrasts RAG and ReAG on key architectural aspects:

AspectRetrieval-Augmented Generation (RAG)Reasoning-Augmented Generation (ReAG)
Knowledge AccessIt always uses up-to-date data sources and is dynamic—any new document can be fed in real-time without reprocessing​.A simplified architecture with fewer components relies mainly on the LLM’s reasoning loop (fewer moving parts)​.
Data PreparationRequires preprocessing: documents are chunked and indexed in a vector database with embeddings​.It always uses up-to-date data sources and is dynamic—any new document can be fed in real time without reprocessing​.
Context ScopeLLM sees only the retrieved passages (a partial view of each document) and may miss cross-passage context if the information is split across chunks​.The LLM can interpret any modality; for example, it can include text, tables, or images directly using a multimodal LLM​.
Pipeline ComplexityIt always uses up-to-date data sources and is dynamic—any new document can be fed in real time without reprocessing​.It always uses up-to-date data sources and is dynamic—any new document can be fed in real-time without reprocessing​.
ScalabilityHighly scalable to large corpora (millions of docs) since retrieval is fast and independent of LLM size​.The LLM can interpret any modality; for example, it can include text, tables, or images directly using a multimodal LLM​.
Data FreshnessThe LLM can interpret any modality; for example, if using a multimodal LLM, it can include text, tables, or images directly​.The LLM can interpret any modality; for example, it can include text, tables, or images directly using a multimodal LLM​.
Relevance CriterionIt always uses up-to-date data sources and is dynamic—any new document can be fed in real time without reprocessing​.Retrieval by similarity can return “similar chunks” instead of relevant info, relying on surface-level matches​.
Supported ModalitiesPrimarily text (structured retrieval of text). Handling images or tables requires separate pipelines or embeddings per modality.The LLM can interpret any modality; for example, it can include text, tables, or images directly if using a multimodal LLM​.

Table 1: Architectural comparison of RAG vs. ReAG. RAG uses explicit retrieval (with embeddings and vector search) to provide the LLM with relevant snippets. In contrast, ReAG leverages the LLM to evaluate full documents and extract relevant information through reasoning. These differences lead to distinct trade-offs in system design and capabilities.

Conclusion

The future of ReAG looks bright, with many complementary developments addressing its current challenges and expanding its capabilities. A fitting summary from the Superagent blog is: ReAG isn’t about replacing RAG—it’s about rethinking how language models interact with knowledge.. This rethinking is an ongoing process. We will likely see ReAG evolve from a novel approach into a standard practice for building AI systems requiring extensive information. As AI researchers often find, ideas that start as separate (retrieval vs reasoning) eventually merge into unified systems for efficiency and performance. ReAG is a step in that direction – unifying retrieval with reasoning. The “holy grail” would be models that inherently know when and how to retrieve information and how to reason about it, all as part of their learned behavior. We’re moving towards that with each of these future advances. In practical terms, one can expect future AI assistants to be far more adept at handling complex, information-rich queries, providing correct answers, and clearly explaining the thought process and sources behind them. In a world increasingly saturated with data, such reasoning-augmented AI will be invaluable for making sense of everything.

That´s it for today!

Sources:

ReAG: Reasoning-Augmented Generation  – Superagent

GitHub – superagent-ai/reag: Reasoning Augmented Generation

[2005.11401] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blog

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window

https://medium.com/nerd-for-tech/fixing-rag-with-reasoning-augmented-generation-919939045789

Open WebUI and Free Chatbot AI: Empowering Corporations with Private Offline AI and LLM Capabilities

Artificial intelligence (AI) is reshaping how corporations function and interact with data in today’s digital landscape. However, with AI comes the challenge of securing corporate information and ensuring data privacy—especially when dealing with Large Language Models (LLMs). Public cloud-based AI services may expose sensitive data to third parties, making corporations wary of deploying models on external servers.

Open WebUI addresses this issue head-on by offering a self-hosted, offline, and highly extensible platform for deploying and interacting with LLMs. Built to run entirely offline, Open WebUI provides corporations with complete control over their AI models, ensuring data security, privacy, and compliance.

What is Open WebUI?

Open WebUI is a versatile, feature-rich, and user-friendly web interface for interacting with Large Language Models (LLMs). Initially launched as Ollama WebUI, Open WebUI is a community-driven, open-source platform enabling businesses, developers, and researchers to deploy, manage, and interact with AI models offline.

Open WebUI is designed to be extensible, supporting multiple LLM runners and integrating with different AI frameworks. Its clean, intuitive interface mimics popular platforms like ChatGPT, making it easy for users to communicate with AI models while maintaining full control over their data. By allowing businesses to self-host the web interface, Open WebUI ensures that no data leaves the corporate environment, which is crucial for organizations concerned with data privacy, security, and regulatory compliance.

Key Features of Open WebUI

1. Self-hosted and Offline Operation

Open WebUI is built to run in a self-hosted environment, ensuring that all data remains within your organization’s infrastructure. This feature is critical for companies handling sensitive information and those in regulated industries where external data transfers are a risk.

2. Extensibility and Model Support

Open WebUI supports various LLM runners, allowing businesses to deploy the language models that best meet their needs. This flexibility enables integration with custom models, including OpenAI-compatible APIs and models such as Ollama, GPT, and others. Users can also seamlessly switch between different models in real time to suit diverse use cases.

3. User-Friendly Interface

Designed to be intuitive and easy to use, Open WebUI features a ChatGPT-style interface that allows users to communicate with language models via a web browser. This makes it ideal for corporate teams who may not have a deep technical background but need to interact with LLMs for business insights, automation, or customer support.

4. Docker-Based Deployment

To ensure ease of setup and management, Open WebUI runs inside a Docker container. This provides an isolated environment, making it easier to deploy and maintain while ensuring compatibility across different systems. With Docker, corporations can manage their AI models and interfaces without disrupting their existing infrastructure.

5. Role-Based Access Control (RBAC)

To maintain security, Open WebUI offers granular user permissions through RBAC. Administrators can control who has access to specific models, tools, and settings, ensuring that only authorized personnel can interact with sensitive AI models.

6. Multi-Model Support

Open WebUI allows for concurrent utilization of multiple models, enabling organizations to harness the unique capabilities of different models in parallel. This is especially useful for businesses requiring a range of AI solutions from simple chat interactions to advanced language processing tasks.

7. Markdown and LaTeX Support

For enriched interaction, Open WebUI includes full support for Markdown and LaTeX, making it easier for users to create structured documents, write reports, and interact with AI using precise formatting and mathematical notation.

8. Retrieval-Augmented Generation (RAG)

Open WebUI integrates RAG technology, which allows users to feed documents into the AI environment and interact with them through chat. This feature enhances document analysis by enabling users to ask specific questions and retrieve document-based answers.

9. Custom Pipelines and Plugin Framework

The platform supports a highly modular plugin framework that allows businesses to create and integrate custom pipelines, tailor-made to their specific AI workflows. This enables the addition of specialized logic, ranging from AI agents to integration with third-party services, directly within the web UI.

10. Real-Time Multi-Language Support

For global organizations, Open WebUI offers multilingual support, enabling interaction with LLMs in various languages. This feature ensures that businesses can deploy AI solutions for different regions, enhancing both internal communication and customer-facing AI tools.

What Open WebUI Can Do?

Open WebUI Community

You can find good examples of models, prompts, tools, and functions at the Open WebUI Community.

Inside Open WebUI at workspaces as an admin, you can configure a lot of good stuff. The possibilities here are unlimited.

Why Corporations Should Consider Open WebUI

As businesses adopt AI to streamline operations and enhance decision-making, the need for secure, private, and controlled solutions is paramount. Open WebUI offers corporations the following distinct advantages:

1. Data Privacy and Compliance

By allowing organizations to run their AI models offline, Open WebUI ensures that no data leaves the corporate environment. This eliminates the risk of data exposure associated with cloud-based AI services. It also helps businesses stay compliant with data protection regulations such as GDPR, HIPAA, or CCPA.

2. Flexibility and Customization

Open WebUI’s extensibility makes it a highly flexible tool for enterprises. Businesses can integrate custom AI models, adapt the platform to meet unique needs, and deploy models specific to their industry or use case.

3. Cost Savings

For enterprises that require frequent AI model interactions, a self-hosted solution like Open WebUI can result in significant cost savings compared to paying for cloud-based API usage. Over time, this can reduce the operational cost of AI adoption.

4. Improved Control Over AI Systems

With Open WebUI, corporations have complete control over how their AI models are deployed, managed, and utilized. This includes controlling access, managing updates, and ensuring that AI models are used in compliance with corporate policies.

5. You can use Azure Open AI

Azure OpenAI Service ensures data privacy by not sharing your data with other customers or using it to improve models without your permission. It includes integrated content filtering to protect against harmful inputs and outputs, adheres to strict regulatory standards, and provides enterprise-grade security. Additionally, it features abuse monitoring to maintain safe and responsible AI use, making it a reliable choice for businesses prioritizing safety and privacy.

Installation and Setup

Getting started with Open WebUI is straightforward. Here are the basic steps:

1. Install Docker

Docker is required to deploy Open WebUI. If Docker isn’t already installed, it can be easily set up on your system. Docker provides an isolated environment to run applications, ensuring compatibility and security.

2. Launch Open WebUI

Using Docker, you can pull the Open WebUI image and start a container. The Docker command will depend on whether you are running the language model locally or connecting to a remote server.

Kotlin
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

3. Create an Admin Account

Once the web UI is running, the first user to sign up will be granted administrator privileges. This account will have comprehensive control over the web interface and the language models.

4. Connect to Language Models

You can configure Open WebUI to connect with various LLMs, including OpenAI or Ollama models. This can be done via the web UI settings, where you can specify API keys or server URLs for remote model access.

There are a lot of ways to implement Open WebUI and you can access it at this link.

Run AI Models Locally: Ollama Tutorial (Step-by-Step Guide + WebUI)

Open WebUI – Tutorial & Windows Install 

Free Chatbot AI: Easy Access to Open WebUI for Corporations

To make Open WebUI even more accessible, I have deployed a version called Free Chatbot AI. This platform serves as an easy-access solution for businesses and users who want to experience the power of Open WebUI without the need for complex setup or hosting infrastructure. Free Chatbot AI offers a user-friendly interface where users can interact with Large Language Models (LLMs) in real time, all while maintaining the key benefits of privacy and control.

Key Benefits of Free Chatbot AI for Corporations:
  1. Instant Access: Free Chatbot AI is pre-configured and hosted, allowing companies to quickly test and use AI models without worrying about setup or technical configurations.
  2. Data Privacy: Like the self-hosted version of Open WebUI, Free Chatbot AI ensures that sensitive information is protected. No data is sent to third-party servers, ensuring that interactions remain private and secure.
  3. Flexible Deployment: While Free Chatbot AI is an accessible hosted version, it also offers corporations the ability to experiment with LLMs before committing to a self-hosted deployment. This is perfect for businesses looking to try out AI capabilities before taking full control of their AI infrastructure.
  4. User-Friendly Interface: Built with a simple and intuitive design, Free Chatbot AI mirrors the same ease of use as Open WebUI. This makes it suitable for teams across the organization, from technical users to non-technical departments like customer support or HR, enhancing workflows with AI-powered insights and automation.
  5. No Setup Required: Free Chatbot AI eliminates the need for complex setup processes. Corporations can access the platform directly and begin leveraging the power of AI for their business operations immediately.
Use Cases for Free Chatbot AI:
  • Internal Team Collaboration: Free Chatbot AI enables teams to quickly interact with LLMs to generate ideas, draft content, or automate repetitive tasks such as writing summaries and answering FAQs.
  • AI-Assisted Customer Support: Businesses can test Free Chatbot AI to power customer support bots that deliver accurate, conversational responses to customer queries, all while maintaining data security.
  • Document Processing and Summarization: Teams can upload documents and let Free Chatbot AI generate summaries, extracting relevant information with ease, improving efficiency in knowledge management and decision-making.
How to access Free Chatbot AI?

First, click on this link and you have to create an account by clicking on Sign up.

Fill the fields below and click on Create Account.

After that, you have to select one of the models and have fun!

This is the home page.

You can create images by clicking on Image Gen.

You can type a prompt like “photorealistic image taken with Nikon Z50, 18mm lens, a vast and untouched wilderness, with a winding river flowing through a dense forest, showcasing the pristine beauty of untouched nature, aspect ratio 16:9“.

There are a lot of options to explore. Use Free Chatbot AI to explore all the options and good look!

Conclusion

As AI becomes increasingly integral to business operations, ensuring data privacy and control has never been more important. Open WebUI offers corporations a secure, customizable, and user-friendly platform to deploy and interact with Large Language Models, entirely offline. With its range of features, from role-based access to multi-model support and flexible integrations, Open WebUI is the ideal solution for businesses looking to adopt AI while maintaining full control over their data and processes.

For companies aiming to harness the power of AI while ensuring compliance with industry regulations, Open WebUI is a game-changer, offering the perfect balance between innovation and security.

If you have any doubts about how to implement it in your company you can contact me at this link.

That´s it for today!

Sources

https://docs.openwebui.com

https://medium.com/@omargohan/open-webui-the-llm-web-ui-66f47d530107

https://medium.com/free-or-open-source-software/open-webui-how-to-build-and-run-locally-with-nodejs-8155c51bcb55

https://openwebui.com/#open-webui-community

Integrating Azure OpenAI with Native Vector Support in Azure SQL Databases for Advanced Search Capabilities and Data Insights

Azure SQL Database has taken a significant step forward by introducing native support for vectors, unlocking advanced capabilities for applications that rely on semantic search, AI, and machine learning. By integrating vector search into Azure SQL, developers can now store, search, and analyze vector data directly alongside traditional SQL data, offering a unified solution for complex data analysis and enhanced search experiences.

Vectors in Azure SQL Database

Vectors are numerical representations of objects like text, images, or audio. They are essential for applications involving semantic search, recommendation systems, and more. These vectors are typically generated by machine learning models, capturing the semantic meaning of the data they represent.

The new vector functionality in Azure SQL Database allows you to store and manage these vectors within a familiar SQL environment. This eliminates the need for separate vector databases, streamlining your application architecture and simplifying your data management processes.

Key Benefits of Native Vector Support in Azure SQL

  • Unified Data Management: Store and query both traditional and vector data in a single database, reducing complexity and maintenance overhead.
  • Advanced Search Capabilities: Perform similarity searches alongside standard SQL queries, leveraging Azure SQL’s sophisticated query optimizer and powerful enterprise features.
  • Optimized Performance: Vectors are stored in a compact binary format, allowing for efficient distance calculations and optimized performance on vector-related operations.

Embeddings: The Foundation of Vector Search

At the heart of vector search are embeddings—dense vector representations of objects, generated by deep learning models. These embeddings capture the semantic similarities between related concepts, enabling tasks such as semantic search, natural language processing, and recommendation systems.

For example, word embeddings can cluster related words like “computer,” “software,” and “machine,” while distant clusters might represent words with entirely different meanings, such as “lion,” “cat,” and “dog.” These embeddings are particularly powerful in applications where context and meaning are more important than exact keyword matches.

Azure OpenAI makes it easy to generate embeddings by providing pre-trained machine learning models accessible through REST endpoints. Once generated, these embeddings can be stored directly in an Azure SQL Database, allowing you to perform vector search queries to find similar data points.

You can explore how vector embeddings work by visiting this amazing website: Transformer Explainer. It offers an excellent interactive experience to help you better understand how Generative AI operates in general.

Vector Search Use Cases

Vector search is a powerful technique used to find vectors in a dataset that are similar to a given query vector. This capability is essential in various applications, including:

  • Semantic Search: Rank search results based on their relevance to the user’s query.
  • Recommendation Systems: Suggest related items based on similarity in vector space.
  • Clustering: Group similar items together based on vector similarity.
  • Anomaly Detection: Identify outliers in data by finding vectors that differ significantly from the norm.
  • Classification: Classify items based on the similarity of their vectors to predefined categories.

For instance, consider a semantic search application where a user queries for “healthy breakfast options.” A vector search would compare the vector representation of the query with vectors representing product reviews, finding the most contextually relevant items—even if the exact keywords don’t match.

Key Features of Native Vector Support in Azure SQL

Azure SQL’s native vector support introduces several new functions to operate on vectors, which are stored in a binary format to optimize performance. Here are the key functions:

  • JSON_ARRAY_TO_VECTOR: Converts a JSON array into a vector, enabling you to store embeddings in a compact format.
  • ISVECTOR: Checks whether a binary value is a valid vector, ensuring data integrity.
  • VECTOR_TO_JSON_ARRAY: Converts a binary vector back into a human-readable JSON array, making it easier to work with the data.
  • VECTOR_DISTANCE: Calculates the distance between two vectors using a chosen distance metric, such as cosine or Euclidean distance.

These functions enable powerful operations for creating, storing, and querying vector data in Azure SQL Database.

Example: Vector Search in Action

Let’s walk through an example of using Azure SQL Database to store and query vector embeddings. Imagine you have a table of customer reviews, and you want to find reviews that are contextually related to a user’s search query.

  1. Storing Embeddings as Vectors:
    After generating embeddings using Azure OpenAI, you can store these vectors in a VARBINARY(8000) column in your SQL table:
SQL
   ALTER TABLE [dbo].[FineFoodReviews] ADD [VectorBinary] VARBINARY(8000);
   UPDATE [dbo].[FineFoodReviews]
   SET [VectorBinary] = JSON_ARRAY_TO_VECTOR([vector]);

This allows you to store the embeddings efficiently, ready for vector search operations.

  1. Performing Similarity Searches:
    To find reviews that are similar to a user’s query, you can convert the query into a vector and calculate the cosine distance between the query vector and the stored embeddings:
SQL
   DECLARE @e VARBINARY(8000);
   EXEC dbo.GET_EMBEDDINGS @model = '<yourmodeldeploymentname>', @text = 'healthy breakfast options', @embedding = @e OUTPUT;

   SELECT TOP(10) ProductId,
                  Summary,
                  Text,
                  VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance
   FROM dbo.FineFoodReviews
   ORDER BY Distance;

This query returns the top reviews that are contextually related to the user’s search, even if the exact words don’t match.

  1. Hybrid Search with Filters:
    You can enhance vector search by combining it with traditional keyword filters to improve relevance and performance. For example, you could filter reviews based on criteria like user identity, review score, or the presence of specific keywords, and then apply vector search to rank the results by relevance:
SQL
   -- Comprehensive query with multiple filters.
   SELECT TOP(10)
       f.Id,
       f.ProductId,
       f.UserId,
       f.Score,
       f.Summary,
       f.Text,
       VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance,
       CASE 
           WHEN LEN(f.Text) > 100 THEN 'Detailed Review'
           ELSE 'Short Review'
       END AS ReviewLength,
       CASE 
           WHEN f.Score >= 4 THEN 'High Score'
           WHEN f.Score BETWEEN 2 AND 3 THEN 'Medium Score'
           ELSE 'Low Score'
       END AS ScoreCategory
   FROM FineFoodReviews f
   WHERE
       f.UserId NOT LIKE 'Anonymous%'  -- Exclude anonymous users
       AND f.Score >= 2               -- Score threshold filter
       AND LEN(f.Text) > 50           -- Text length filter for detailed reviews
       AND (f.Text LIKE '%gluten%' OR f.Text LIKE '%dairy%') -- Keyword filter
   ORDER BY
       Distance,  -- Order by cosine distance
       f.Score DESC, -- Secondary order by review score
       ReviewLength DESC; -- Tertiary order by review length

This query combines semantic search with traditional filters, balancing relevance and computational efficiency.

Leveraging REST Services for Embedding Generation

Azure OpenAI provides REST endpoints for generating embeddings, which can be consumed directly from Azure SQL Database using the sp_invoke_external_rest_endpoint system stored procedure. This integration enables seamless interaction between your data and AI models, allowing you to build intelligent applications that combine the power of machine learning with the familiarity of SQL.

Here’s a stored procedure example that retrieves embeddings from a deployed Azure OpenAI model and stores them in the database:

SQL
CREATE PROCEDURE [dbo].[GET_EMBEDDINGS]
(
    @model VARCHAR(MAX),
    @text NVARCHAR(MAX),
    @embedding VARBINARY(8000) OUTPUT
)
AS
BEGIN
    DECLARE @retval INT, @response NVARCHAR(MAX);
    DECLARE @url VARCHAR(MAX);
    DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text);

    SET @url = 'https://<resourcename>.openai.azure.com/openai/deployments/' + @model + '/embeddings?api-version=2023-03-15-preview';

    EXEC dbo.sp_invoke_external_rest_endpoint 
        @url = @url,
        @method = 'POST',   
        @payload = @payload,   
        @headers = '{"Content-Type":"application/json", "api-key":"<openAIkey>"}', 
        @response = @response OUTPUT;

    DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');
    SET @embedding = JSON_ARRAY_TO_VECTOR(@jsonArray);
END
GO

This stored procedure retrieves embeddings from the Azure OpenAI model and converts them into a binary format for storage in the database, making them available for similarity search and other operations.

Let’s implementing a experiment with the Native Vector Support in Azure SQL

Azure SQL Database provides a seamless way to store and manage vector data despite not having a specific vector data type. Column-store indexes, vectors, and essentially lists of numbers can be efficiently stored in a table. Each vector can be represented in a row with individual elements as columns or serialized arrays. This approach ensures efficient storage and retrieval, making Azure SQL suitable for large-scale vector data management.

I used the Global News Dataset from Kaggle in my experiment.

First, you must create the columns to save the vector information. In my case, I created two columns: title_vector For the news title and content_vector the news content. For this, create a small Python code, but you can also do that directly from SQL using a cursor. It's important to know that you don't need to pay for any Vector Databases by saving the vector information inside the Azure SQL.

Python
from litellm import embedding
import pyodbc  # or another SQL connection library
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Set up OpenAI credentials from environment variables
os.environ['AZURE_API_KEY'] =os.getenv('AZURE_API_KEY')
os.environ['AZURE_API_BASE'] = os.getenv('AZURE_API_BASE')
os.environ['AZURE_API_VERSION'] = os.getenv('AZURE_API_VERSION')

# Connect to your Azure SQL database
conn = pyodbc.connect(f'DRIVER={{ODBC Driver 17 for SQL Server}};'
                      f'SERVER={os.getenv("DB_SERVER")};'
                      f'DATABASE={os.getenv("DB_DATABASE")};'
                      f'UID={os.getenv("DB_UID")};'
                      f'PWD={os.getenv("DB_PWD")}')

def get_embeddings(text):
    # Truncate the text to 8191 characters bacause of the text-embedding-3-     small OpenAI API Embedding model limit
    truncated_text = text[:8191]

    response = embedding(
        model="azure/text-embedding-3-small",
        input=truncated_text,
        api_key=os.getenv('AZURE_API_KEY'),
        api_base=os.getenv('AZURE_API_BASE'),
        api_version=os.getenv('AZURE_API_VERSION')
        )
        
    embeddings = response['data'][0]['embedding']
    return embeddings


def update_database(article_id, title_vector, content_vector):
    cursor = conn.cursor()

    # Convert vectors to strings
    title_vector_str = str(title_vector)
    content_vector_str = str(content_vector)

    # Update the SQL query to use the string representations
    cursor.execute("""
        UPDATE newsvector
        SET title_vector = ?, content_vector = ?
        WHERE article_id = ?
    """, (title_vector_str, content_vector_str, article_id))
    conn.commit()


def embed_and_update():
    cursor = conn.cursor()
    cursor.execute("SELECT article_id, title, full_content FROM newsvector where title_vector is null and full_content is not null and title is not null order by published asc")
    
    title_vector = ""
    content_vector = ""
    
    for row in cursor.fetchall():
        article_id, title, full_content = row
        
        print(f"Embedding article {article_id} - {title}")
        
        title_vector = get_embeddings(title)
        content_vector = get_embeddings(full_content)
        
        update_database(article_id, title_vector, content_vector)

embed_and_update()

These two columns will contain something like this: [-0.02232750505208969, -0.03755787014961243, -0.0066827102564275265…]

Second, you must create a procedure in the Azure Database to transform the query into a vector embedding.

SQL
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[GET_EMBEDDINGS]
(
    @model VARCHAR(MAX),
    @text NVARCHAR(MAX),
    @embedding VARBINARY(8000) OUTPUT
)
AS
BEGIN
    DECLARE @retval INT, @response NVARCHAR(MAX);
    DECLARE @url VARCHAR(MAX);
    DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text);

    -- Set the @url variable with proper concatenation before the EXEC statement
    SET @url = 'https://<Your App>.openai.azure.com/openai/deployments/' + @model + '/embeddings?api-version=2024-02-15-preview';

    EXEC dbo.sp_invoke_external_rest_endpoint 
        @url = @url,
        @method = 'POST',   
        @payload = @payload,   
        @headers = '{"Content-Type":"application/json", "api-key":"<Your Azure Open AI API Key"}', 
        @response = @response OUTPUT;

    -- Use JSON_QUERY to extract the embedding array directly
    DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');

    
    SET @embedding = JSON_ARRAY_TO_VECTOR(@jsonArray);
END

I also create another procedure to search directly to the dataset using the Native Vector Support in Azure SQL.

SQL
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[SearchNewsVector] 
    @inputText NVARCHAR(MAX)
AS
BEGIN
    -- Query the SimilarNewsContentArticles table using the response
    IF OBJECT_ID('dbo.result', 'U') IS NOT NULL
        DROP TABLE dbo.result;

	--Assuming you have a stored procedure to get embeddings for a given text
	DECLARE @e VARBINARY(8000);
	EXEC dbo.GET_EMBEDDINGS @model = 'text-embedding-3-small', @text = @inputText, @embedding = @e OUTPUT;

	SELECT TOP(10) 
       [article_id]
      ,[source_id]
      ,[source_name]
      ,[author]
      ,[title]
      ,[description]
      ,[url]
      ,[url_to_image]
      ,[content]
      ,[category]
      ,[full_content]
      ,[title_vector]
      ,[content_vector]
      ,[published]
      ,VECTOR_DISTANCE('cosine', @e, VectorBinary) AS cosine_distance
	into result
	FROM newsvector
	ORDER BY cosine_distance;
END

Finally, you can start querying your table using prompts instead of keywords. This is awesome!

Check out the app I developed with the Native Vector Support in Azure SQL, which is designed to assist you in crafting prompts and evaluating your performance using my newsvector dataset. To explore the app, click here.

Like always, I also created this GitHub repository with everything I did.

Azure SQL Database Native vector support subscription for the Private Preview

You can sign up for the private preview at this link.

This article, published by Davide Mauri and Pooja Kamath at Microsoft Build 2024 event, provides all the information.

Announcing EAP for Vector Support in Azure SQL Database – Azure SQL Devs’ Corner (microsoft.com)

Conclusion

The integration of Azure OpenAI with native vector support in Azure SQL Database unlocks new possibilities for applications that require advanced search capabilities and data analysis. By storing and querying vector embeddings alongside traditional SQL data, you can build powerful solutions that combine the best of both worlds—semantic understanding with the reliability and performance of Azure SQL.

This innovation simplifies application development, enhances data insights, and paves the way for the next generation of intelligent applications.

That’s it for today!

Sources

Azure SQL DB Vector Functions Private Preview | Data Exposed (youtube.com)

Announcing EAP for Vector Support in Azure SQL Database – Azure SQL Devs’ Corner (microsoft.com)

Initiating the Future: 2024 Marks the Beginning of AI Agents’ Evolution

As we navigate the dawn of the 21st century, the evolution of Artificial Intelligence (AI) presents an intriguing narrative of technological advancement and innovation. The concept of AI agents, once a speculative fiction, is now becoming a tangible reality, promising to redefine our interaction with technology. The discourse surrounding AI agents has been significantly enriched by the contributions of elite AI experts such as Andrej Karpathy, co-founder of OpenAI; Andrew Ng, creator of Google Brain; Arthur Mensch, CEO of Mistral AI; and Harrison Chase, founder of LankChain. Their collective insights, drawn from their pioneering work and shared at a recent Sequoia-hosted AI event, underscore the transformative potential of AI agents in pioneering the future of technology.

Exploring Gemini: Google Unveils Revolutionary AI Agents at Google Next 2024

At the recent Google Next 2024 event, held from April 9 to April 11 in Las Vegas, Google introduced a transformative suite of AI agents named Google Gemini, marking a significant advancement in artificial intelligence technology. These AI agents are designed to revolutionize various facets of business operations, enhancing customer service, improving workplace productivity, streamlining software development, and amplifying data analysis capabilities.

Elevating Customer Service: Google Gemini AI agents are set to transform customer interactions by providing seamless, consistent service across all platforms, including web, mobile apps, and call centers. By integrating advanced voice and video technologies, these agents offer a unified user experience that sets new standards in customer engagement, with capabilities like personalized product recommendations and proactive support.

Boosting Workplace Productivity: In workplace efficiency, Google Gemini’s AI agents integrate deeply with Google Workspace to assist with routine tasks, freeing employees to focus on strategic initiatives. This integration promises to enhance productivity and streamline internal workflows significantly.

Empowering Creative and Marketing Teams: For creative and marketing endeavors, Google Gemini provides AI agents that assist in content creation and tailor marketing strategies in real time. These agents leverage data-driven insights for a more personalized and agile approach, enhancing campaign creativity and effectiveness.

Advancing Data Analytics: Google Gemini’s data agents excel in extracting meaningful insights from complex datasets, maintaining factual accuracy, and enabling sophisticated analyses with tools like BigQuery and Looker. These capabilities empower organizations to make informed decisions and leverage data for strategic advantage.

Streamlining Software Development: Google Gemini offers AI code agents for developers that guide complex codebases, suggest efficiency improvements, and ensure adherence to best security practices. This facilitates faster and more secure software development cycles.

Enhancing System and Data Security: Recognizing the critical importance of security, Google Gemini includes AI security agents that integrate with Google Cloud to provide robust protection and ensure compliance with data regulations, thereby safeguarding business operations.

Collaboration and Integration: Google Gemini also emphasizes the importance of cooperation and integration, with tools like Vertex AI Agent Builder that allow businesses to develop custom AI agents quickly. This suite of AI agents is already being adopted by industry leaders such as Mercedes-Benz and Samsung, showcasing its potential to enhance customer experiences and refine operations. These partnerships highlight Google Gemini’s broad applicability and transformative potential across various sectors.

As AI technology evolves, Google Gemini AI Agents stand out as a pivotal development. They promise to reshape the future of business and technology by enhancing efficiency, fostering creativity, and supporting data-driven decision-making. The deployment of these agents at Google Next

The Paradigm Shift to Autonomous Agents

At the heart of this evolution is a shift from static, rule-based AI to dynamic, learning-based agents capable of more nuanced understanding and interaction with the world. Andrej Karpathy, renowned for his work at OpenAI, emphasizes the necessity of bridging the gap between human and model psychology, highlighting the unique challenges and opportunities in designing AI agents that can effectively mimic human decision-making processes. This insight into the fundamental differences between human and AI cognition underscores the complexities of creating agents that can navigate the world as humans do.

The Democratization of AI Technology

Andrew Ng, a stalwart in AI education and the mind behind Google Brain, argues for democratizing AI technology. He envisions a future where the development of AI agents becomes an essential skill akin to reading and writing. Ng’s perspective is not just about accessibility but about empowering individuals to leverage AI to create personalized solutions. This vision for AI agents extends beyond mere utility, suggesting a future where AI becomes a collaborative partner in problem-solving.

Bridging the Developer-User Divide

Arthur Mensch and Harrison Chase propose reducing the gap between AI developers and end-users. Mensch’s Mistral AI is pioneering in making AI more accessible to a broader audience, with tools like Le Chat to provide intuitive interfaces for interacting with AI technologies. Similarly, Chase’s work with LangChain underscores the importance of user-centric design in developing AI agents, ensuring that these technologies are not just powerful but also accessible and easy to use.

Looking Forward: The Impact on Society

The collective insights of these AI luminaries paint a future where AI agents become an integral part of our daily lives, transforming how we work, learn, and interact. The evolution of AI agents is not just a technical milestone but a societal shift, promising to bring about a new era of human-computer collaboration. As these technologies continue to advance, the work of Karpathy, Ng, Mensch, and Chase serves as both a blueprint and inspiration for the future of AI.

The architecture of an AI Agent

An AI agent is built with a complex structure designed to handle iterative, multi-step reasoning tasks effectively. Below are the four core components that constitute the backbone of an AI agent:

Agent Core

  • The core of an AI agent sets the foundation by defining its goals, objectives, and behavioral traits. It manages the coordination and interaction of other components and directs the large language models (LLM) by providing specific prompts or instructions.

Memory

  • Memory in AI agents serves dual purposes. It stores the short-term “train of thought” for ongoing tasks and maintains a long-term log of past actions, context, and user preferences. This memory system enables the agent to retrieve necessary information for efficient decision-making.

Tools

  • AI agents can access various tools and data sources that extend their capabilities beyond their initial training data. These tools include capabilities like web search, code execution, and access to external data or knowledge bases, allowing the agent to dynamically handle a wide range of inputs and outputs.

Planning

  • Effective planning is critical in breaking down complex problems into manageable sub-tasks or steps. AI agents employ task decomposition and self-reflection techniques to iteratively refine and enhance their execution plans, ensuring precise and targeted outcomes.

Frameworks for Building AI Agents

The development of AI agents is supported by a variety of open-sourced frameworks that cater to different needs and scales:

Single-Agent Frameworks

  • LangChain Agents: Offers a comprehensive toolkit for building applications and agents powered by large language models.
  • LlamaIndex Agents: This company specializes in creating question-and-answer agents that operate over specific data sources, using techniques like retrieval-augmented generation (RAG).
  • AutoGPT: Developed by OpenAI, this framework enables semi-autonomous agents to execute tasks solely on text-based prompts.

Multi-Agent Frameworks:

  • AutoGen is a Microsoft Research initiative that allows the creation of applications using multiple interacting agents, enhancing problem-solving capabilities.
  • Crew AI: Builds on the foundations of LangChain to support multi-agent frameworks where agents can collaborate to achieve complex tasks.

The Power of Multi-Agent Systems

Multi-agent systems represent a significant leap in artificial intelligence, transcending the capabilities of individual AI agents by leveraging their collective strength. These systems are structured to harness the unique abilities of different agents, thereby facilitating complex interactions and collaboration that lead to enhanced performance and innovative solutions.

Enhanced Capabilities Through Specialization and Collaboration

Each agent can specialize in a specific domain in multi-agent systems, bringing expertise and efficiency to their designated tasks. This specialization is akin to having a team of experts, each skilled in a different area, working together towards a common goal. For example, in content creation, one AI might focus on generating initial drafts while another specializes in stylistic refinement and editing. This division of labor not only speeds up the process but also improves the quality of the output.

Task Sharing and Scalability

Multi-agent systems excel in distributing tasks among various agents, allowing them to tackle more extensive and more complex projects than would be possible individually. This task sharing also makes the system highly scalable, as additional agents can be introduced to handle increased workloads or to bring new expertise to the team. For instance, agents could manage inquiries in various languages when handling customer service. In contrast, others could specialize in resolving specific issues, such as technical support or billing inquiries.

Iterative Feedback for Continuous Improvement

Another critical aspect of multi-agent systems is the iterative feedback loop established among the agents. Each agent’s output can serve as input for another, creating a continuous improvement cycle. For example, an AI that generates content might pass its output to another AI specialized in critical analysis, which then provides feedback. This feedback is used to refine subsequent outputs, leading to progressively higher-quality results.

Case Studies and Practical Applications

One practical example of a multi-agent system in action is in autonomous vehicle technology. Here, multiple AI agents operate simultaneously, one managing navigation, another monitoring environmental conditions, and others controlling the vehicle’s mechanics. These agents coordinate to navigate traffic, adjust to changing road conditions, and ensure passenger safety.

In more dynamic environments such as financial markets or supply chain management, multi-agent systems can adapt to rapid changes by redistributing tasks based on shifting priorities and conditions. This adaptability is crucial for maintaining efficiency and responsiveness in high-stakes or rapidly evolving situations.

Embracing the Future Together

As we stand on the brink of this new technological frontier, the contributions of Andrej Karpathy, Andrew Ng, Arthur Mensch, and Harrison Chase illuminate the path forward. Their visionary work not only showcases the potential of AI agents to transform industries, enhance productivity, and solve complex problems but also highlights the importance of ethical considerations, user-centric design, and accessibility in developing these technologies. The evolution of AI agents represents more than just a leap in computational capabilities; it signifies a paradigm shift towards a more integrated, intelligent, and intuitive interaction between humans and machines.

The future shaped by AI agents will be characterized by partnerships that extend beyond mere functionality to include creativity, empathy, and mutual growth. In the future, AI agents will not only perform tasks. Still, they will also learn from and adapt to the needs of their human counterparts, offering personalized experiences and enabling a deeper connection to technology.

Fostering an environment of collaboration, innovation, and ethical responsibility is crucial as we embark on this journey. By doing so, we can ensure that the evolution of AI agents advances technological frontiers and promotes a more equitable, sustainable, and human-centric future. The work of Karpathy, Ng, Mensch, and Chase, among others, serves as a beacon, guiding us toward a future where AI agents empower every individual to achieve more, dream bigger, and explore further.

In conclusion, the evolution of AI agents is not just an exciting technological development; it is a call to action for developers, policymakers, educators, and individuals to come together and shape a future where technology amplifies our potential without compromising our values. As we continue to pioneer the future of technology, let us embrace AI agents as partners in our quest for a better, more innovative, and more inclusive world.

That’s it for today!

Sources

AI Agents: A Primer on Their Evolution, Architecture, and Future Potential – algorithmicscale

Google Gemini AI Agents unveiled at Google Next 2024 – Geeky Gadgets (geeky-gadgets.com)

Google Cloud debuts agent builder to ease GenAI adoption | Computer Weekly

(2) AI Agents – A Beginner’s Guide | LinkedIn