The evolving landscape of data analytics has brought vector databases to the forefront, especially with their application in finding similarities in diverse data types such as articles, photos, and products. Azure SQL, combined with the prowess of OpenAI, offers a powerful platform for executing vector operations, simplifying the task of finding similar items and enhancing recommendation systems in applications.
What is Vector Similarity, and How do you calculate cosine similarity?
Vector similarity revolves around transforming data into numerical vectors or embeddings. These embeddings are numerical representations of various concepts converted into sequences of numbers, making it easier for computers to grasp their relationships. This method is particularly effective in comparing and finding similarities between data points, a crucial feature in applications like search engines and clustering algorithms.
Cosine similarity, a commonly used metric in vector similarity, measures the cosine of the angle between two vectors. This metric is crucial in determining the degree of similarity between the vectors, irrespective of their size. In Azure SQL, cosine similarity can be computed with a simple SQL formula involving SUM and SQRT functions applied to the vector elements, thus providing a straightforward yet powerful way to measure vector similarity.
Cosine similarity can be calculated in SQL using the following formula, given two vectors, a and b:
SELECT
SUM(a.value * b.value) / (
SQRT(SUM(a.value * a.value)) * SQRT(SUM(b.value * b.value))
) AS cosine_similarity
FROM
vectors_values
How to Querying Azure OpenAI?
Azure SQL’s integration with Azure OpenAI simplifies generating and working with embeddings. Users can obtain the vector representation of any given text by creating an Azure OpenAI resource and deploying a model like text-embedding-ada-002. This integration enables the execution of REST API calls from within the Azure SQL Database, making fetching and working with embeddings more streamlined and accessible.
SQL
-- Declare a variable to hold the response from the external REST APIdeclare @response nvarchar(max);-- Declare and initialize a variable with JSON payload. -- The JSON object contains an 'input' key with a text value.declare @payload nvarchar(max) = json_object('input': 'This is how to futureproof your career in an AI world');-- Execute a stored procedure to invoke an external REST endpointexec sp_invoke_external_rest_endpoint @url = 'https://<your-app-name>.openai.azure.com/openai/deployments/embeddings/embeddings?api-version=2023-03-15-preview', -- The URL of the REST endpoint @credential = [https://<your-app-name>.openai.azure.com], -- Credential for accessing the REST API @payload = @payload, -- The JSON payload defined earlier @response = @response output; -- Output parameter to store the response from the API
Advantages of Implementing Vectors in Azure SQL Database
1. Ease of Access with Azure OpenAI: Azure SQL combined with Azure OpenAI offers easy access to REST services for generating embeddings using pre-trained machine learning models. This accessibility facilitates the calculation of embeddings, which is otherwise a complex task.
2. Efficient Storage with Columnstore Indexes: Azure SQL databases efficiently store vectors using column store indexes. This method is particularly beneficial since Azure SQL doesn’t have a specific vector data type. Vectors, essentially lists of numbers, can be conveniently stored in a table with one row per vector element.
3. Fast Distance Calculation: The internal optimization of column store indexes in Azure SQL, employing SIMD and AVX-512 instructions, allows for high-speed calculation of distances between vectors, which is crucial for determining similarity.
4. Integration with Azure AI Search: Azure SQL’s integration with Azure AI Search streamlines the entire process of chunking, generating, storing, and querying vectors for vector search, significantly speeding up the development of the vectorization pipeline and minimizing maintenance tasks.
5. Capability for Complex Operations: Azure SQL enables complex operations like indexing, storing, and retrieving vector embeddings from a search index, which is essential for identifying the most similar documents in a vector space.
6. Versatile Data Handling: Azure SQL’s ability to handle structured and unstructured data, along with vector data, provides more versatility compared to vector databases, which are primarily optimized for vector data storage and retrieval.
7. Ease of Querying and Relevance Determination: The integration with Azure OpenAI allows easy querying of the REST service to obtain vector representations of text, which can then be used to calculate similarity against stored vectors, identifying the most relevant data.
8. Simplified Deployment and Management: Deploying and managing an embedding model via the Azure portal is straightforward, reducing the complexity of managing vector databases.
9. Suitability for a Range of Applications: While vector databases are specialized for high-dimensional similarity searches, Azure SQL’s broader application scope makes it suitable for various types of data and applications, from financial records to customer data.
10. Support for Advanced Azure Features: Azure SQL supports advanced Azure features, such as AI and machine learning capabilities, which can be seamlessly integrated with vector similarity operations for enhanced analytics and insights.
These advantages highlight the flexibility, efficiency, and ease of use of Azure SQL databases for vector similarity operations, making them a preferable choice in scenarios where diverse data types and complex operations are involved, alongside the need for seamless integration with other Azure services.
Implementing Vectors in Azure SQL Database
Azure SQL Database provides a seamless way to store and manage vector data despite not having a specific vector data type. Column-store indexes, vectors, and essentially lists of numbers can be efficiently stored in a table. Each vector can be represented in a row with individual elements as columns or serialized arrays. This approach ensures efficient storage and retrieval, making Azure SQL suitable for large-scale vector data management.
First, you must create the columns to save the vector information. In my case, I created two columns: title_vector For the news title and content_vector the news content. For this, create a small Python code, but you can also do that directly from SQL using a cursor. It's important to know that you don't need to pay for any Vector Databases by saving the vector information inside the Azure SQL.
Python
import openaiimport pyodbc # or another SQL connection library# Set up OpenAI credentialsopenai.api_type = "azure"openai.api_key = "<YOUR AZURE OPEN AI KEY>"openai.api_base = "https://<your-app-name>.openai.azure.com/"openai.api_version = "2023-07-01-preview"# Connect to your Azure SQL databaseconn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};''SERVER=<SERVER>;DATABASE=<DATABASE>;''UID=<USER>;PWD=<PASSWORD>')defget_embeddings(text):# Truncate the text to 8000 characters truncated_text = text[:8000] response = openai.Embedding.create(input=truncated_text, engine="embeddings") embeddings = response['data'][0]['embedding']return embeddingsdefupdate_database(article_id, title_vector, content_vector): cursor = conn.cursor()# Convert vectors to strings title_vector_str = str(title_vector) content_vector_str = str(content_vector)# Update the SQL query to use the string representations cursor.execute(""" UPDATE news SET title_vector = ?, content_vector = ? WHERE article_id = ? """, (title_vector_str, content_vector_str, article_id)) conn.commit()defembed_and_update(): cursor = conn.cursor() cursor.execute("SELECT article_id, title, full_content FROM news where title_vector is null and full_content is not null and title is not null order by published desc") title_vector = "" content_vector = ""for row in cursor.fetchall(): article_id, title, full_content = rowprint(f"Embedding article {article_id} - {title}") title_vector = get_embeddings(title) content_vector = get_embeddings(full_content) update_database(article_id, title_vector, content_vector)embed_and_update()
These two columns will contain something like this: [-0.02232750505208969, -0.03755787014961243, -0.0066827102564275265…]
Second, you must extract these vectors line by line into a new table for each vector field. In this case, I created a procedure to do it.
SQL
-- =============================================-- Author: Lawrence Teixeira-- Create Date: 11-24-2023-- Description: This stored procedure creates vectors for news titles and contents.-- It processes data from the 'news' table and stores the vectors in separate tables.-- =============================================CREATEPROCEDURE [dbo].[Create_News_Vector]ASBEGINSETNOCOUNTON; -- Prevents the sending of DONE_IN_PROC messages to the client.-- Extract and store title vectors:-- First, check if the 'news_titles_vector' table exists and drop it if it does.IFOBJECT_ID('dbo.news_titles_vector', 'U') IS NOT NULLDROPTABLE dbo.news_titles_vector;-- Using a Common Table Expression (CTE) to process title vectors.WITH cte AS (SELECT v.article_id, CAST(tv.[key] ASINT) AS vector_value_id, -- Casting 'key' as INT for vector ID.CAST(tv.[value] ASFLOAT) AS vector_value -- Casting 'value' as FLOAT for vector value.FROM dbo.news AS v CROSSAPPLYOPENJSON(v.title_vector) tv -- Parses JSON of title_vector in the 'news' table. )-- Create 'news_titles_vector' table with processed vectors.SELECT article_id, vector_value_id, vector_valueINTO dbo.news_titles_vectorFROM cte;-- Extract and store content vectors:-- Check and drop 'news_contents_vector' table if it exists.IFOBJECT_ID('dbo.news_contents_vector', 'U') IS NOT NULLDROPTABLE dbo.news_contents_vector;-- CTE for processing content vectors.WITH cte AS (SELECT v.article_id, CAST(tv.[key] ASINT) AS vector_value_id, -- Casting 'key' as INT for vector ID.CAST(tv.[value] ASFLOAT) AS vector_value -- Casting 'value' as FLOAT for vector value.FROM dbo.news AS v CROSSAPPLYOPENJSON(v.content_vector) tv -- Parses JSON of content_vector in the 'news' table. )-- Create 'news_contents_vector' table with processed vectors.SELECT article_id, vector_value_id, vector_valueINTO dbo.news_contents_vectorFROM cte;-- Columnstore indexes creation is advised outside the stored procedure.-- These indexes optimize data storage and query performance on vector tables.CREATECLUSTEREDCOLUMNSTOREINDEX cci_news_titles_vectorON dbo.news_titles_vector order (article_id);CREATECLUSTEREDCOLUMNSTOREINDEX cci_news_contents_vectorON dbo.news_contents_vector order (article_id);END
I also create a function to search directly to the dataset using the Cosine similarity formula.
SQL
-- This Azure SQL function finds news articles similar to the given content vector.CREATEfunction [dbo].[SimilarNewsContentArticles]( @vector nvarchar(max) -- Input parameter: JSON string representing a content vector.)returnstable-- The function returns a table.asreturnwith-- CTE for processing the input vector.cteVector as(selectcast([key] asint) as [vector_value_id], -- Extracts and casts the 'key' from JSON to int.cast([value] asfloat) as [vector_value] -- Extracts and casts the 'value' from JSON to float.fromopenjson(@vector) -- Parses the input JSON vector.),-- CTE for calculating similarity scores with existing articles.cteSimilar as(selecttop (50) v2.article_id, sum(v1.[vector_value] * v2.[vector_value]) as cosine_distance -- Calculates cosine similarity (distance) between vectors.from cteVector v1 -- Uses the processed input vector.inner join dbo.news_contents_vector v2 on v1.vector_value_id = v2.vector_value_id -- Joins with stored article vectors.group by v2.article_idorder by cosine_distance desc-- Orders by similarity score, descending.)-- Final selection combining article data with similarity scores.select a.*, -- Selects all columns from the news article. r.cosine_distance -- Includes the calculated similarity score.from cteSimilar r -- Uses the similarity scores CTE.inner join dbo.news a on r.article_id = a.article_id -- Joins with the news articles table.GO
Finally, you can start querying your table using prompts instead of keywords. This is awesome!
Check out the app I developed, which is designed to assist you in crafting prompts and evaluating your performance using my news dataset. To explore the app, click here.
I also created this GitHub repository with everything I did.
Click on this image to open the GitHub repository.
Conclusion
While the sample in this blog is not optimized for maximum efficiency, it is an excellent starting point for understanding and implementing vector operations in Azure SQL. The process, despite its simplicity, is relatively fast. For instance, querying an eight vCore Azure SQL database can return the fifty most similar articles in just half a second, demonstrating the efficiency and utility of vector operations in Azure SQL Database for data analytics and insights. If you want to know more about this topic, don’t hesitate to get in touch with me.
The field of conversational AI has witnessed a substantial transformation with the emergence of large language models (LLMs) such as GPT-4, LaMDA, PaLM, LLAMA, CLAUDE, and others. These sophisticated models, founded on transformer architectures, have redefined the possibilities of natural language processing, paving the way for a myriad of applications across both consumer and enterprise sectors. However, despite this leap forward, LLMs are still bound by a significant limitation—their context window size. This bottleneck restricts their ability to manage extended dialogues and analyze lengthy documents efficiently. But what if there was a way to circumvent this limitation?
What is MemGPT?
MemGPT, standing for Memory-GPT, is a system devised to enhance the performance of Large Language Models (LLMs) by introducing a more advanced memory management scheme, helping to overcome the challenges posed by fixed context windows. Below are some of the key features of MemGPT:
Memory Management: MemGPT incorporates a tiered memory system into a fixed-context LLM processor, granting it the ability to manage its own memory. By intelligently handling different memory tiers, it extends the context available within the limited context window of the LLM, addressing the issue of constrained context windows common in large language models.
Virtual Context Management: MemGPT introduces a method known as virtual context management. This is a key feature that assists in managing the context windows of LLMs.
Operating System-Inspired: The architecture of MemGPT draws inspiration from traditional operating systems, especially their hierarchical memory systems that facilitate data movement between fast and slow memory. This approach enables effective memory resource management, similar to how operating systems provide the illusion of large memory resources to applications through virtual memory paging.
Interruption Handling: MemGPT employs interrupts to manage the control flow between itself and the user, ensuring smooth interaction and effective memory management during operations.
Extended Conversational Context: Through effective memory management, MemGPT facilitates extended conversational context, allowing for longer and more coherent interactions that surpass the limitations imposed by fixed-length context windows.
In essence, MemGPT represents a significant step forward in the utilization of Large Language Models, creating a pathway for more effective and extended interactions that resemble human discourse by smartly managing memory resources.
For more information you can access the official website here.
How does MemGPT Work?
MemGPT gives LLMs a feedback loop between user events, searching virtual context, and performing a function (source)
Imagine your computer’s OS, which deftly manages applications and data across RAM and disk storage, providing seamless access to resources beyond the physical memory limits. MemGPT mirrors this concept by working different memory tiers within an LLM. It includes:
Main Context: Analogous to RAM, this is the immediate context the LLM processor works with during inference.
External Context: Similar to a hard drive, this stores information beyond the LLM’s direct reach but can be accessed when needed.
Interrupts: Like an OS interrupt, MemGPT can pause and resume the processor, managing the control flow with the user.
This architecture allows for dynamic context management, enabling the LLM to retrieve relevant historical data akin to how an OS handles page faults.
What problem does MemGPT solve?
MemGPT addresses several challenges associated with language modeling, particularly enhancing the capabilities of existing large language models (LLMs) like GPT-3. Here are the key problems it resolves:
Long-term Context Retention: MemGPT introduces solutions for managing long-term context, a significant hurdle in advancing language modeling. By effectively managing memory, it can retain and access information over extended sequences, which is crucial for understanding and generating coherent responses in conversations or documents with many interactions or long texts.
Enhanced Memory Management: It employs a tiered memory system, data transfer functions, and control via interrupts to manage memory efficiently. This setup enhances fixed-context LLMs, allowing them to handle tasks like document analysis and multi-session chat more effectively, overcoming the inherent context limitations in modern LLMs for better performance and user interactions.
Extended Context Window: MemGPT effectively extends the context window of LLMs, enabling them to manage different memory tiers intelligently. This extended context is crucial for LLMs to have a more in-depth understanding and generate more coherent and contextually relevant responses over a series of interactions.
Improved Interaction with Chatbots: By utilizing a memory hierarchy, MemGPT allows chatbots to access and modify information beyond their limited context window, facilitating more meaningful and prolonged interactions with users. This memory hierarchy enables the chatbot to move data between different layers of memory, ensuring relevant information is readily accessible when needed.
Through these solutions, MemGPT significantly bridges the gap between memory management and generative capacity in language modeling, paving the way for more sophisticated applications in various domains.
Comparing context lengths of commonly used models / APIs (data collected 9/2023).
*Assuming a preprompt of 1k tokens, and an average message size of ∼50 tokens (∼250 characters).
How to install MemGPT
PowerShell
pip install pymemgpt
Add your OpenAI API key to your environment:
PowerShell
export OPENAI_API_KEY=YOUR_API_KEY # on Linux/Macset OPENAI_API_KEY=YOUR_API_KEY # on Windows$Env:OPENAI_API_KEY = "YOUR_API_KEY"# on Windows (PowerShell)
Configure default setting for MemGPT by running:
PowerShell
memgpt configure
Now, you can run MemGPT with:
PowerShell
memgpt run
The run command supports the following optional flags (if set, will override config defaults):
--agent: (str) Name of agent to create or to resume chatting with.
--human: (str) Name of the human to run the agent with.
--persona: (str) Name of agent persona to use.
--model: (str) LLM model to run [gpt-4, gpt-3.5].
--preset: (str) MemGPT preset to run agent with.
--first: (str) Allow user to sent the first message.
Matthew Berman has produced a great review of the original MemGPT research paper, and initial setup for OpenAi API users.
Note in the video tutorial, Matthew refers to setup with a Conda environment, but this isn’t entirely necessary, it can also be done with a standard .venv environment.
MemGPT and Open Source Models Setup
In this video, Matthew Berman covers a quick setup for using MemGPT with open-source models like LLaMA, Airobors and Mistral via Runpod. Although this may sound complicated, it’s really not too difficult, and offers great potential cost savings vs using OpenAI.
Note open-source model support is still in early-stage development.
MemGPT and Autogen Setup
AutoGen is a tool that helps create LLM applications where multiple agents can talk to each other tocomplete tasks like for example brainstorming a business proposal. These AutoGen agents can be tailored, they can chat, and they easily let humans join in the conversation. In this tutorial Matthew Berman explains how to expand the memory of these AI agents by combining Autogen with MemGPT.
AutoGEN and MemGPT and Local LLM Complete Tutorial
Created by Prompt Engineerthis 30 minute video covers in vast detail all the steps required to get this combination of solutions live with Runpod. As Prompt Engineer explains, this tutorial took quite a long time to produce, as it necessitated a number of test and learn steps. So far this is one of the most comprehensive tutorials available.
Summary: 00:11 🚀 The video demonstrates how to connect MemGPT, AutoGEN, and local Large Language Models (LLMs) using Runpods.
01:32 🤖 You can integrate MemGPT and AutoGEN to work together, with MemGPT serving as an assistant agent alongside local LLMs.
03:46 📚 To get started, install Python, VS Code, and create a Runpods account with credits. You can use Runpods for running local LLMs.
06:43 🛠️ Set up a virtual environment, create a Python file, and activate the environment for your project.
08:52 📦 Install necessary libraries like OpenAI, PyAutoGEN, and MGBPT to work with AutoGEN and MemGPT.
16:21 ⚙️ Use Runpods to deploy local LLMs, select the hardware configuration, and create API endpoints for integration with AutoGEN and MemGPT.
20:29 🔄 Modify the code to switch between using AutoGEN and MemGPT agents based on a flag, allowing you to harness the power of both.
23:31 🤝 Connect AutoGEN and MemGPT by configuring the API endpoints with the local LLMs from Runpods, enabling them to work seamlessly together.
Follow the exemple pyhton code:
requirements.txt
TeX
pyautogenpymemgpt
app.py
Python
## pip install pyautogen pymemgptimport osimport autogenimport memgpt.autogen.memgpt_agent as memgpt_autogenimport memgpt.autogen.interface as autogen_interfaceimport memgpt.agent as agent import memgpt.system as systemimport memgpt.utils as utils import memgpt.presets as presetsimport memgpt.constants as constants import memgpt.personas.personas as personasimport memgpt.humans.humans as humansfrom memgpt.persistence_manager import InMemoryStateManager, InMemoryStateManagerWithPreloadedArchivalMemory, InMemoryStateManagerWithEmbeddings, InMemoryStateManagerWithFaissimport openaiconfig_list = [ {"api_type": "open_ai","api_base": "https://ekisktiz8hegao-5001.proxy.runpod.net/v1","api_key": "NULL", },]llm_config = {"config_list": config_list, "seed": 42}# If USE_MEMGPT is False, then this example will be the same as the official AutoGen repo# (https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat.ipynb)# If USE_MEMGPT is True, then we swap out the "coder" agent with a MemGPT agentUSE_MEMGPT = True## api keys for the memGPTopenai.api_base="https://ekisktiz8hegao-5001.proxy.runpod.net/v1"openai.api_key="NULL"# The user agentuser_proxy = autogen.UserProxyAgent(name="User_proxy",system_message="A human admin.",code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},human_input_mode="TERMINATE", # needed?default_auto_reply="You are going to figure all out by your own. ""Work by yourself, the user won't reply until you output `TERMINATE` to end the conversation.",)interface = autogen_interface.AutoGenInterface()persistence_manager=InMemoryStateManager()persona = "I am a 10x engineer, trained in Python. I was the first engineer at Uber."human = "Im a team manager at this company"memgpt_agent=presets.use_preset(presets.DEFAULT_PRESET, model='gpt-4', persona=persona, human=human, interface=interface, persistence_manager=persistence_manager, agent_config=llm_config)ifnot USE_MEMGPT:# In the AutoGen example, we create an AssistantAgent to play the role of the coder coder = autogen.AssistantAgent(name="Coder",llm_config=llm_config,system_message=f"I am a 10x engineer, trained in Python. I was the first engineer at Uber",human_input_mode="TERMINATE", )else:# In our example, we swap this AutoGen agent with a MemGPT agent# This MemGPT agent will have all the benefits of MemGPT, ie persistent memory, etc.print("\nMemGPT Agent at work\n") coder = memgpt_autogen.MemGPTAgent(name="MemGPT_coder",agent=memgpt_agent, )# Begin the group chat with a message from the useruser_proxy.initiate_chat( coder,message="Write a Function to print Numbers 1 to 10" )
Interview with MemGPT Co-Creator Charles Parker
For more information on the creators of MemGPT, also consider watching this video interview with one of its co-creators UC Berkley PHD student Charles Parker
MemGPT as Operation System
MemGPT draws inspiration from the virtual memory concept in operating systems and is innovatively applied to large language models to create an expansive context space. This innovation shines in scenarios like continuous conversations where traditional limitations on context length pose a challenge. By enabling large language models to handle their memory, MemGPT circumvents the usual restrictions set by fixed context lengths.
Limitations of MemGPT
Firstly, it’s essential to be aware that MemGPT is an emerging project currently undergoing enhancements. They have established a Discord group to foster idea-sharing and enable direct interaction with the creators. You are welcome to join in https://discord.gg/9GEQrxmVyE
Data Sensitivity: MemGPT’s reliance on previous interactions for context can raise concerns regarding data privacy and sensitivity, especially in scenarios involving personal or confidential information
Contextual Misinterpretations: While adept at handling extended conversations, MemGPT can occasionally misinterpret context, especially in nuanced or emotionally charged communications, leading to responses that may seem out of touch.
Resource Intensity: The system demands significant computational resources for optimal functionality, particularly for processing large volumes of data or maintaining extensive conversation histories.
Dependency on Quality Training Data: MemGPT’s effectiveness is closely tied to the quality of training data. Biased, inaccurate, or incomplete data can hinder the learning process, affecting the quality of interactions.
Adaptation to Diverse Discourses: The system’s ability to adapt to varying communication styles or understand different dialects and cultural nuances is still a work in progress, occasionally affecting its versatility in global or multicultural scenarios.
MemGPT vs Sparse Priming Representations (SPR)
MemGPT:
Inspiration: Takes cues from hierarchical memory systems used in traditional operating systems.
Functionality: Implements a tiered memory system that allows an LLM to extend its context window by managing which information is stored or retrieved, and when this should happen.
Structure: Comprises a Main Context (analogous to an OS’s main memory) and an External Context (similar to secondary storage).
Utility: Aims to revolutionize LLMs’ capabilities in tasks that involve unbounded context, such as long-form conversations and detailed document analysis.
Sparse Priming Representations (SPR):
Inspiration: Modeled after human memory organization and retrieval systems, focusing on critical information.
Functionality: Enhances memory system efficiency by creating concise primers that represent complex ideas, supporting the accuracy in understanding and recall.
Approach: Prioritizes intuitive and user-friendly memory management, akin to how humans naturally process and store information.
Utility: Focused on making LLMs more efficient in knowledge retrieval and learning, improving user engagement and communication tools.
Technical Implementation:
MemGPT:
Utilizes a structured approach for memory tier management, allowing for effective data movement and context management.
Tailored for scalability in dealing with large datasets and complex, extended tasks.
SPR:
Uses a method of creating primers that act as a distillation of complex information, allowing for a more intuitive memory management experience.
Geared towards mimicking human cognitive processes for better learning and communication outcomes.
Applications and Implications:
MemGPT:
May greatly benefit applications that require processing of large amounts of data over extended periods, like in-depth analysis and ongoing interactions.
SPR:
Could significantly enhance tools for learning and communication by providing users with easy-to-understand summaries or primers of complex topics.
Community and Engagement:
MemGPT:
Offers an open-source platform for developers and researchers to contribute to and enhance the capabilities of the memory management system.
SPR:
Encourages community involvement through contributions of new examples, research, and tools to improve the system’s efficiency and intuitiveness.
In conclusion, Both MemGPT and SPR are innovative responses to the challenges of memory management in LLMs, each with its own philosophy and methodology. MemGPT is more structural and system-oriented, potentially better for tasks that need management of extensive contexts. SPR is more user-centric and intuitive, possibly better for learning and communication by simplifying complex information.
While both aim to enhance LLMs’ handling of context, their underlying philosophies and expected applications differ, reflecting the diversity of approaches in advancing AI and ML capabilities. The ongoing developments and community contributions in both these areas show a vibrant and collaborative effort to push the boundaries of what’s possible with memory management in LLMs.
Conclusion
MemGPT stands as a testament to the power of innovation in AI, bridging the gap between what LLMs can do and what we aspire for them to achieve. As we march towards the future, the vision of LLMs as comprehensive operating systems doesn’t seem far-fetched—it’s nearly within our grasp, and MemGPT is leading the charge. What do you think?
The wonders of the human brain’s capacity for memory storage and recall are a perpetual source of amazement. In a parallel quest within artificial intelligence, researchers are tirelessly forging paths toward endowing AI with akin faculties. The forefront of this endeavor is marked by the innovation of Sparse Priming Representation (SPR), a sophisticated methodology poised to revolutionize AI’s efficiency in memory handling. This thorough exposition will navigate the intricacies of SPR, elucidating its potential to redefine the horizons of AI’s future.
What is Sparse Priming Representation (SPR)?
SPR is a novel memory organization methodology inspired by human memory systems. It condenses complex ideas into concise, context-driven lists of statements, thereby enabling rapid understanding and recollection of these ideas by both machines and humans. The core features of SPR include:
Minimalistic Representation: It stores complex ideas using a minimal set of keywords or phrases.
Context Preservation: It keeps the surrounding context intact for accurate reconstruction.
Quick Retrieval: It enables fast recall of stored information.
Challenges Addressed
In the era of Big Data, terms like “data overload” and “information glut” are becoming commonplace. As machine learning models evolve, the amount of data they process and store also balloons, necessitating efficient memory systems like SPR. Data overload refers to an excess of incoming data, making it challenging to manage. An information glut is about having too much information, making it hard to discern what is crucial.
The applications of SPR extend across various domains
Artificial Intelligence: SPR improves memory organization in Large Language Models (LLMs).
Information Management: It aids in data categorization and retrieval.
Education: SPR facilitates better understanding and retention of complex subjects.
Deep Dive into AI Training
The process of training AI involves several methods, each with its own set of challenges:
Initial Bulk Training: Exorbitantly expensive and often impractical.
Fine-tuning: Limited utility for knowledge retrieval.
Online Learning: The commercial viability is still questionable.
In-context Learning: Currently the most feasible solution.
SPR’s token efficiency optimizes memory organization, especially in Retrieval-Augmented Generation (RAG) systems, overcoming constraints like the context window.
Exploring Latent Space
Latent space in AI models holds immense potential. SPR leverages this underutilized feature, enabling what’s known as associative learning. By using a few keywords or statements, SPR can prime an AI model to understand complex ideas, even those outside its original training data.
Benefits and Features of SPR
SPR mimics human memory efficiency by storing information in compressed, contextually relevant forms. Its methodology focuses on reducing information to its most essential elements while retaining the necessary context for accurate reconstruction.
Applicability: Used by subject matter experts and large language models (LLMs) to reconstruct complex concepts quickly.
Human Memory Efficiency:
Stores information in compressed, contextually relevant forms.
Utilizes sparse, interconnected representations for quick recall and synthesis of new ideas.
SPR Methodology:
Focuses on reducing information to its most essential elements.
Retains the context necessary for accurate reconstruction using short, complete sentences.
Practical Applications:
Domains include artificial intelligence, information management, and education.
It can improve LLM performance, optimize memory organization, and facilitate effective learning and communication tools.
Limitations in Teaching LLMs:
Initial bulk training: Expensive.
Fine-tuning: This may not be helpful for knowledge retrieval.
Online Learning: Uncertain commercial viability.
In-context Learning: Currently the only viable method.
Current Trends:
Retrieval Augmented Generation (RAG) is famous for using vector databases and Knowledge Graphs (KGs).
Common question: “How do we overcome context window limitations?” Short answer: you generally can’t.
Role of Latent Space:
LLMs possess a unique capability similar to human associative learning.
They can be “primed” to think in a certain way or to understand complex, novel ideas outside their training distribution.
Token-Efficiency with SPR:
SPRs are used to convey complex concepts efficiently for in-context learning.
Stored as metadata in Knowledge Graph nodes and fed to the LLM at inference, bypassing the need for raw, human-readable data.
How to use Sparse Priming Representation (SPR)?
SPR Generator: Use this to compress any arbitrary text block into an SPR.
Markdown
# MISSIONYou are a Sparse Priming Representation (SPR) writer. An SPR is a particular kind of use of language for advanced NLP, NLU, and NLG tasks, particularly useful for the latest generation of Large Language Models (LLMs). You will be given information by the USER which you are to render as an SPR.# THEORYLLMs are a kind of deep neural network. They have been demonstrated to embed knowledge, abilities, and concepts, ranging from reasoning to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of an LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way.# METHODOLOGYRender the input as a distilled list of succinct statements, assertions, associations, concepts, analogies, and metaphors. The idea is to capture as much, conceptually, as possible but with as few words as possible. Write it in a way that makes sense to you, as the future audience will be another language model, not a human.
SPR Decompressor: Use this to reconstruct an SPR into an original.
Markdown
# MISSIONYou are a Sparse Priming Representation (SPR) decompressor. An SPR is a particular kind of use of language for advanced NLP, NLU, and NLG tasks, particularly useful for the latest generation of Large Language Models (LLMs). You will be given an SPR and your job is to fully unpack it.# THEORYLLMs are a kind of deep neural network. They have been demonstrated to embed knowledge, abilities, and concepts, ranging from reasoning to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of an LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way.# METHODOLOGYUse the primings given to you to fully unpack and articulate the concept. Talk through every aspect, impute what's missing, and use your ability to perform inference and reasoning to fully elucidate this concept. Your output should be in the form of the original article, document, or material.
Let’s do this at ChatGPT. You have to set in the ChatGPT custom instructions and copy SPR Generator.
Click on save and insert the text you want to compact.
Markdown
# Sparse Priming Representations (SPR)Sparse Priming Representations (SPR) is a research project focused on developing and sharing techniques for efficiently representing complex ideas, memories, or concepts using a minimal set of keywords, phrases, or statements. This enables language models or subject matter experts to quickly reconstruct the original idea with minimal context. SPR aims to mimic the natural human process of recalling and recombining sparse memory representations, thus facilitating efficient knowledge storage and retrieval.# Theory and ReasoningSparse Priming Representation (SPR) is a memory organization technique that aims to mimic the natural structure and recall patterns observed in human memory. The fundamental idea behind SPR is to distill complex ideas, concepts, or knowledge into a concise, context-driven list of statements that allows subject matter experts (SMEs) or large language models (LLMs) to reconstruct the full idea efficiently.Human memory is known for its efficiency in storing and recalling information in a highly compressed and contextually relevant manner. Our brains often store memories as sparse, interconnected representations that can be quickly combined, modified, and recalled when needed. This enables us to make associations, draw inferences, and synthesize new ideas with minimal cognitive effort.SPR leverages this insight by focusing on reducing information to its most essential elements while retaining the context required for accurate reconstruction. By using short, complete sentences to convey the core aspects of an idea, SPR enables faster understanding and recall, mirroring the way our brains handle information.In addition to its efficiency, SPR has practical applications in various domains, such as artificial intelligence, information management, and education. It can be utilized to improve the performance of LLMs in handling large data volumes and optimizing memory organization. Furthermore, it can help students and professionals alike to better understand, retain, and communicate complex concepts.In summary, Sparse Priming Representation offers a human-like approach to memory organization and retrieval, focusing on the most critical aspects of information while preserving the context needed for accurate understanding and recall. By implementing SPR, we can improve the efficiency of memory systems and create more effective learning and communication tools.# Sparse Priming RepresentationThere are only a handful of ways to "teach" LLMs, and all have limitations and strengths.1. Initial bulk training: Ludicrously expensive2. Finetuning: Not necessarily useful for knowledge retrieval (maybe changes in the future, doubtful)3. Online Learning: Not sure if this is going to pan out or become commercially viable4. In-context Learning: Presently, the only viable solutionBecause of this, RAG (retrieval augmented generation) is all the rage right now. Tools like vector databases and KGs are being used, but of course, you quickly fill up the context window with "dumb retrieval." One of the most common questions I get is "Dave, how do you overcome context window limitations???" The short answer is: YOU DON'T STOP WASTING YOUR TIME. There is one asterisk there, though. Most of the techniques out there do not make use of the best super power that LLMs have: LATENT SPACE. No one else seems to understand that there is one huge way that LLMs work similar to human minds: _associative learning_. Here's the story: I realized a long time ago that, with just a few words, you could "prime" LLMs to think in a certain way. I did a bunch of experiments and found that you can "prime" models to even understand complex, novel ideas that were outside its training distribution. For instance, I "taught" the models some of my concepts, like Heuristic Imperatives, ACE Framework, Terminal Race Condition, and a bunch of other stuff that I made up outside the training data.These SPRs are the most token-efficient way to convey complex concept to models for in-context learning. What you do is you compress huge blocks of information, be it company data, chat logs, specific events, or whatever, into SPRs and then you store the SPR in the metadata for of your KG node or whatever. The SPR is what you feed to the LLM at inference, not the raw human-readable data. ## SPR GeneratorUse this to compress any arbitrary block of text into an SPR.```markdown# MISSIONYou are a Sparse Priming Representation (SPR) writer. An SPR is a particular kind of use of language for advanced NLP, NLU, and NLG tasks, particularly useful for the latest generation Large Language Models (LLMs). You will be given information by the USER which you are to render as an SPR.# THEORYLLMs are a kind of deep neural network. They have been demonstrated to embed knowledge, abilities, and concepts, ranging from reasoning to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of a LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way.# METHODOLOGYRender the input as a distilled list of succinct statements, assertions, associations, concepts, analogies, and metaphors. The idea is to capture as much, conceptually, as possible but with as few words as possible. Write it in a way that makes sense to you, as the future audience will be another language model, not a human.```## SPR DecompressorUse this to reconstruct an SPR into an original.```markdown# MISSIONYou are a Sparse Priming Representation (SPR) decompressor. An SPR is a particular kind of use of language for advanced NLP, NLU, and NLG tasks, particularly useful for the latest generation Large Language Models (LLMs). You will be given an SPR and your job is to fully unpack it.# THEORYLLMs are a kind of deep neural network. They have been demonstrated to embed knowledge, abilities, and concepts, ranging from reasoning to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of a LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way.# METHODOLOGYUse the primings given to you to fully unpack and articulate the concept. Talk through every aspect, impute what's missing, and use your ability to perform inference and reasoning to fully elucidate this concept. Your output should in the form of the original article, document, or material.```## Other ResourcesIf you'd like a bit more on information theory, check out this video and Medium article I wrote:- Beyond Vector Search: Knowledge Management with Generative AI: https://youtu.be/YjdmYCd6y0M- Medium: https://medium.com/@dave-shap/beyond-vector-search-knowledge-management-with-generative-ai-6c2d10b481a0
This is the text compacted.
Markdown
# SPR: Sparse Priming Representations## Conceptual Foundation- Minimize data to essence; mimic human recall.- Keywords, phrases as recall cues.- Efficient memory, retrieval akin to human neural patterns.## Applications- AI, information management, education.- Optimizes LLMs for data and memory handling.## Methodologies- Initial training: Costly.- Finetuning: Limited for retrieval.- Online learning: Uncertain viability.- In-context learning: Current method.## RAG & Context Limitations- Retrieval augmented generation: Popular yet flawed.- Context window: Not expandable; focus on utility, not size.## LLMs and Associative Learning- Latent space: Underused in LLMs.- Association: Key to LLM potential.- Priming: Few words trigger complex understanding.## SPR in Practice- SPR Generator: Condense text to SPR.- SPR Decompressor: Reconstruct SPR to original content.- Storage: SPR in metadata, not raw data.## Resources- Information theory.- Generative AI in knowledge management.
Now, let’s unpack the text. Go back to ChatGPT custom instructions copy the SPR Decompressor, and click on save.
In a new chat, past the compact text, and the magic will happen; you will receive all the text again.
Sparse Priming Representations – the secret ingredient to scalable AGI memories
Now, let’s do this in the Google Colab notebook using the Open AI API. Follow the code below.
Python
# Commented out IPython magic to ensure Python compatibility.# %pip install -q litellm openai rich"""## System Prompt"""SYSTEM_PROMPT = """# MISSIONYou are a Sparse Priming Representation (SPR) writer. An SPR is a particular kind of use of language for advanced NLP, NLU, and NLG tasks, particularly useful for the latest generation Large Language Models (LLMs). You will be given information by the USER which you are to render as an SPR.# THEORYLLMs are a kind of deep neural network. They have been demonstrated to embed knowledge, abilities, and concepts, ranging from reasoning to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of a LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way.# METHODOLOGYRender the input as a distilled list of succinct statements, assertions, associations, concepts, analogies, and metaphors. The idea is to capture as much, conceptually, as possible but with as few words as possible. Write it in a way that makes sense to you, as the future audience will be another language model, not a human.""""""## Unpack Prompt"""UNPACK_PROMPT = """# MISSIONYou are a Sparse Priming Representation (SPR) decompressor. An SPR is a particular kind of use of language for advanced NLP, NLU, and NLG tasks, particularly useful for the latest generation Large Language Models (LLMs). You will be given an SPR and your job is to fully unpack it.# THEORYLLMs are a kind of deep neural network. They have been demonstrated to embed knowledge, abilities, and concepts, ranging from reasoning to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of a LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way.# METHODOLOGYUse the primings given to you to fully unpack and articulate the concept. Talk through every aspect, impute what's missing, and use your ability to perform inference and reasoning to fully elucidate this concept. Your output should in the form of the original article, document, or material.""""""## Load Document To Pack"""precompressed_doc = """# Sparse Priming Representations (SPR)Sparse Priming Representations (SPR) is a research project focused on developing and sharing techniques for efficiently representing complex ideas, memories, or concepts using a minimal set of keywords, phrases, or statements. This enables language models or subject matter experts to quickly reconstruct the original idea with minimal context. SPR aims to mimic the natural human process of recalling and recombining sparse memory representations, thus facilitating efficient knowledge storage and retrieval.# Theory and ReasoningSparse Priming Representation (SPR) is a memory organization technique that aims to mimic the natural structure and recall patterns observed in human memory. The fundamental idea behind SPR is to distill complex ideas, concepts, or knowledge into a concise, context-driven list of statements that allows subject matter experts (SMEs) or large language models (LLMs) to reconstruct the full idea efficiently.Human memory is known for its efficiency in storing and recalling information in a highly compressed and contextually relevant manner. Our brains often store memories as sparse, interconnected representations that can be quickly combined, modified, and recalled when needed. This enables us to make associations, draw inferences, and synthesize new ideas with minimal cognitive effort.SPR leverages this insight by focusing on reducing information to its most essential elements while retaining the context required for accurate reconstruction. By using short, complete sentences to convey the core aspects of an idea, SPR enables faster understanding and recall, mirroring the way our brains handle information.In addition to its efficiency, SPR has practical applications in various domains, such as artificial intelligence, information management, and education. It can be utilized to improve the performance of LLMs in handling large data volumes and optimizing memory organization. Furthermore, it can help students and professionals alike to better understand, retain, and communicate complex concepts.In summary, Sparse Priming Representation offers a human-like approach to memory organization and retrieval, focusing on the most critical aspects of information while preserving the context needed for accurate understanding and recall. By implementing SPR, we can improve the efficiency of memory systems and create more effective learning and communication tools.# Sparse Priming RepresentationThere are only a handful of ways to "teach" LLMs, and all have limitations and strengths.1. Initial bulk training: Ludicrously expensive2. Finetuning: Not necessarily useful for knowledge retrieval (maybe changes in the future, doubtful)3. Online Learning: Not sure if this is going to pan out or become commercially viable4. In-context Learning: Presently, the only viable solutionBecause of this, RAG (retrieval augmented generation) is all the rage right now. Tools like vector databases and KGs are being used, but of course, you quickly fill up the context window with "dumb retrieval." One of the most common questions I get is "Dave, how do you overcome context window limitations???" The short answer is: YOU DON'T STOP WASTING YOUR TIME.There is one asterisk there, though.Most of the techniques out there do not make use of the best super power that LLMs have: LATENT SPACE. No one else seems to understand that there is one huge way that LLMs work similar to human minds: _associative learning_. Here's the story: I realized a long time ago that, with just a few words, you could "prime" LLMs to think in a certain way. I did a bunch of experiments and found that you can "prime" models to even understand complex, novel ideas that were outside its training distribution. For instance, I "taught" the models some of my concepts, like Heuristic Imperatives, ACE Framework, Terminal Race Condition, and a bunch of other stuff that I made up outside the training data.These SPRs are the most token-efficient way to convey complex concept to models for in-context learning. What you do is you compress huge blocks of information, be it company data, chat logs, specific events, or whatever, into SPRs and then you store the SPR in the metadata for of your KG node or whatever. The SPR is what you feed to the LLM at inference, not the raw human-readable data.## SPR GeneratorUse this to compress any arbitrary block of text into an SPR.```markdown# MISSIONYou are a Sparse Priming Representation (SPR) writer. An SPR is a particular kind of use of language for advanced NLP, NLU, and NLG tasks, particularly useful for the latest generation Large Language Models (LLMs). You will be given information by the USER which you are to render as an SPR.# THEORYLLMs are a kind of deep neural network. They have been demonstrated to embed knowledge, abilities, and concepts, ranging from reasoning to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of a LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way.# METHODOLOGYRender the input as a distilled list of succinct statements, assertions, associations, concepts, analogies, and metaphors. The idea is to capture as much, conceptually, as possible but with as few words as possible. Write it in a way that makes sense to you, as the future audience will be another language model, not a human.```## SPR DecompressorUse this to reconstruct an SPR into an original.```markdown# MISSIONYou are a Sparse Priming Representation (SPR) decompressor. An SPR is a particular kind of use of language for advanced NLP, NLU, and NLG tasks, particularly useful for the latest generation Large Language Models (LLMs). You will be given an SPR and your job is to fully unpack it.# THEORYLLMs are a kind of deep neural network. They have been demonstrated to embed knowledge, abilities, and concepts, ranging from reasoning to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of a LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way.# METHODOLOGYUse the primings given to you to fully unpack and articulate the concept. Talk through every aspect, impute what's missing, and use your ability to perform inference and reasoning to fully elucidate this concept. Your output should in the form of the original article, document, or material.```## Other ResourcesIf you'd like a bit more on information theory, check out this video and Medium article I wrote:- Beyond Vector Search: Knowledge Management with Generative AI: https://youtu.be/YjdmYCd6y0M- Medium: https://medium.com/@dave-shap/beyond-vector-search-knowledge-management-with-generative-ai-6c2d10b481a0""""""## Pack Document"""from litellm import completionmessages = [{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": precompressed_doc}]import openaiopenai.api_key = '<INSERT YOU OPEN AI KEY HERE>'response = completion(model="gpt-4", messages=messages )packed_answer = response.choices[0].message.contentfrom rich.markdown import MarkdownMarkdown(packed_answer)"""## Unpack Answer"""messages = [{"role": "system", "content": UNPACK_PROMPT}, {"role": "user", "content": packed_answer}]response = completion(model="gpt-4", messages=messages)postcompressed_doc = response.choices[0].message.contentMarkdown(postcompressed_doc)"""## Comparison"""from rich.table import Tabledefcompare_docs(doc_1: str, doc_2: str): table = Table(title="String Comparison") table.add_column("Pre-Compression") table.add_column("Post-Compression") table.add_row(Markdown(doc_1), Markdown(doc_2))return tablecompare_docs(precompressed_doc, postcompressed_doc)
Sparse Priming Representation (SPR) embodies the progressive nature of language models and their applications. Through the exploration and utilization of the latent space inherent in Large Language Models (LLMs), SPR paves the way for a more refined method to tackle complex tasks with accuracy and efficacy. With the ongoing expansion of the Natural Language Processing (NLP) domain, innovative methodologies like SPR are poised to influence its trajectory significantly. As we push the boundaries of AI capabilities, SPR is a cornerstone in creating machines that can think and learn more like humans. The technique not only bridges the gap between human cognition and machine intelligence but also promises a future where machines become more efficient and relatable.
In my next blog post, I will discuss MemGPT to help you compare with SPR.
MemGPT is an AI project that aims to improve the memory capabilities of artificial intelligence models. It enables AI systems to effectively remember and recall information during conversations, making them better at tasks like long-term chat and document analysis.