MemGPT: Unlimited Memory without Token Constraints for Generative AI Platforms, like GPT-4, LaMDA, PaLM, LLAMA, CLAUDE, and others

The field of conversational AI has witnessed a substantial transformation with the emergence of large language models (LLMs) such as GPT-4, LaMDA, PaLM, LLAMA, CLAUDE, and others. These sophisticated models, founded on transformer architectures, have redefined the possibilities of natural language processing, paving the way for a myriad of applications across both consumer and enterprise sectors. However, despite this leap forward, LLMs are still bound by a significant limitation—their context window size. This bottleneck restricts their ability to manage extended dialogues and analyze lengthy documents efficiently. But what if there was a way to circumvent this limitation?

What is MemGPT?

MemGPT, standing for Memory-GPT, is a system devised to enhance the performance of Large Language Models (LLMs) by introducing a more advanced memory management scheme, helping to overcome the challenges posed by fixed context windows. Below are some of the key features of MemGPT:

  1. Memory Management: MemGPT incorporates a tiered memory system into a fixed-context LLM processor, granting it the ability to manage its own memory. By intelligently handling different memory tiers, it extends the context available within the limited context window of the LLM, addressing the issue of constrained context windows common in large language models.
  2. Virtual Context Management: MemGPT introduces a method known as virtual context management. This is a key feature that assists in managing the context windows of LLMs.
  3. Operating System-Inspired: The architecture of MemGPT draws inspiration from traditional operating systems, especially their hierarchical memory systems that facilitate data movement between fast and slow memory. This approach enables effective memory resource management, similar to how operating systems provide the illusion of large memory resources to applications through virtual memory paging.
  4. Interruption Handling: MemGPT employs interrupts to manage the control flow between itself and the user, ensuring smooth interaction and effective memory management during operations.
  5. Extended Conversational Context: Through effective memory management, MemGPT facilitates extended conversational context, allowing for longer and more coherent interactions that surpass the limitations imposed by fixed-length context windows.

In essence, MemGPT represents a significant step forward in the utilization of Large Language Models, creating a pathway for more effective and extended interactions that resemble human discourse by smartly managing memory resources.

For more information you can access the official website here.

How does MemGPT Work?

MemGPT gives LLMs a feedback loop between user events, searching virtual context, and performing a function (source)

Imagine your computer’s OS, which deftly manages applications and data across RAM and disk storage, providing seamless access to resources beyond the physical memory limits. MemGPT mirrors this concept by working different memory tiers within an LLM. It includes:

  1. Main Context: Analogous to RAM, this is the immediate context the LLM processor works with during inference.
  2. External Context: Similar to a hard drive, this stores information beyond the LLM’s direct reach but can be accessed when needed.
  3. Interrupts: Like an OS interrupt, MemGPT can pause and resume the processor, managing the control flow with the user.

This architecture allows for dynamic context management, enabling the LLM to retrieve relevant historical data akin to how an OS handles page faults.

What problem does MemGPT solve?

MemGPT addresses several challenges associated with language modeling, particularly enhancing the capabilities of existing large language models (LLMs) like GPT-3. Here are the key problems it resolves:

  1. Long-term Context Retention:
    MemGPT introduces solutions for managing long-term context, a significant hurdle in advancing language modeling. By effectively managing memory, it can retain and access information over extended sequences, which is crucial for understanding and generating coherent responses in conversations or documents with many interactions or long texts.
  2. Enhanced Memory Management:
    It employs a tiered memory system, data transfer functions, and control via interrupts to manage memory efficiently. This setup enhances fixed-context LLMs, allowing them to handle tasks like document analysis and multi-session chat more effectively, overcoming the inherent context limitations in modern LLMs for better performance and user interactions.
  3. Extended Context Window:
    MemGPT effectively extends the context window of LLMs, enabling them to manage different memory tiers intelligently. This extended context is crucial for LLMs to have a more in-depth understanding and generate more coherent and contextually relevant responses over a series of interactions.
  4. Improved Interaction with Chatbots:
    By utilizing a memory hierarchy, MemGPT allows chatbots to access and modify information beyond their limited context window, facilitating more meaningful and prolonged interactions with users. This memory hierarchy enables the chatbot to move data between different layers of memory, ensuring relevant information is readily accessible when needed.

Through these solutions, MemGPT significantly bridges the gap between memory management and generative capacity in language modeling, paving the way for more sophisticated applications in various domains.

Comparing context lengths of commonly used models / APIs (data collected 9/2023).

*Assuming a preprompt of 1k tokens, and an average message size of ∼50 tokens (∼250 characters).

How to install MemGPT

pip install pymemgpt

Add your OpenAI API key to your environment:

export OPENAI_API_KEY=YOUR_API_KEY # on Linux/Mac
$Env:OPENAI_API_KEY = "YOUR_API_KEY" # on Windows (PowerShell)

Configure default setting for MemGPT by running:

memgpt configure

Now, you can run MemGPT with:

memgpt run

The run command supports the following optional flags (if set, will override config defaults):

  • --agent: (str) Name of agent to create or to resume chatting with.
  • --human: (str) Name of the human to run the agent with.
  • --persona: (str) Name of agent persona to use.
  • --model: (str) LLM model to run [gpt-4, gpt-3.5].
  • --preset: (str) MemGPT preset to run agent with.
  • --first: (str) Allow user to sent the first message.
  • --debug: (bool) Show debug logs (default=False)
  • --no-verify: (bool) Bypass message verification (default=False)
  • --yes/-y: (bool) Skip confirmation prompt and use defaults (default=False)

You can run the following commands in the MemGPT CLI prompt:

  • /exit: Exit the CLI
  • /attach: Attach a loaded data source to the agent
  • /save: Save a checkpoint of the current agent/conversation state
  • /dump: View the current message log (see the contents of main context)
  • /memory: Print the current contents of agent memory
  • /pop: Undo the last message in the conversation
  • /heartbeat: Send a heartbeat system message to the agent
  • /memorywarning: Send a memory warning system message to the agent

You can find more information on the official GitHub website.

MemGPT for OpenAI Setup

Matthew Berman has produced a great review of the original MemGPT research paper, and initial setup for OpenAi API users.

Note in the video tutorial, Matthew refers to setup with a Conda environment, but this isn’t entirely necessary, it can also be done with a standard .venv environment.

MemGPT and Open Source Models Setup

In this video, Matthew Berman covers a quick setup for using MemGPT with open-source models like LLaMA, Airobors and Mistral via Runpod. Although this may sound complicated, it’s really not too difficult, and offers great potential cost savings vs using OpenAI.

Note open-source model support is still in early-stage development.

MemGPT and Autogen Setup

AutoGen is a tool that helps create LLM applications where multiple agents can talk to each other to complete tasks like for example brainstorming a business proposal. These AutoGen agents can be tailored, they can chat, and they easily let humans join in the conversation. In this tutorial Matthew Berman explains how to expand the memory of these AI agents by combining Autogen with MemGPT.

AutoGEN and MemGPT and Local LLM Complete Tutorial

Created by Prompt Engineer this 30 minute video covers in vast detail all the steps required to get this combination of solutions live with Runpod. As Prompt Engineer explains, this tutorial took quite a long time to produce, as it necessitated a number of test and learn steps. So far this is one of the most comprehensive tutorials available.

Summary: 00:11 🚀 The video demonstrates how to connect MemGPT, AutoGEN, and local Large Language Models (LLMs) using Runpods.

01:32 🤖 You can integrate MemGPT and AutoGEN to work together, with MemGPT serving as an assistant agent alongside local LLMs.

03:46 📚 To get started, install Python, VS Code, and create a Runpods account with credits. You can use Runpods for running local LLMs.

06:43 🛠️ Set up a virtual environment, create a Python file, and activate the environment for your project.

08:52 📦 Install necessary libraries like OpenAI, PyAutoGEN, and MGBPT to work with AutoGEN and MemGPT.

16:21 ⚙️ Use Runpods to deploy local LLMs, select the hardware configuration, and create API endpoints for integration with AutoGEN and MemGPT.

20:29 🔄 Modify the code to switch between using AutoGEN and MemGPT agents based on a flag, allowing you to harness the power of both.

23:31 🤝 Connect AutoGEN and MemGPT by configuring the API endpoints with the local LLMs from Runpods, enabling them to work seamlessly together.

Follow the exemple pyhton code:



## pip install pyautogen pymemgpt

import os
import autogen
import memgpt.autogen.memgpt_agent as memgpt_autogen
import memgpt.autogen.interface as autogen_interface
import memgpt.agent as agent       
import memgpt.system as system
import memgpt.utils as utils 
import memgpt.presets as presets
import memgpt.constants as constants 
import memgpt.personas.personas as personas
import memgpt.humans.humans as humans
from memgpt.persistence_manager import InMemoryStateManager, InMemoryStateManagerWithPreloadedArchivalMemory, InMemoryStateManagerWithEmbeddings, InMemoryStateManagerWithFaiss
import openai

config_list = [
        "api_type": "open_ai",
        "api_base": "",
        "api_key": "NULL",

llm_config = {"config_list": config_list, "seed": 42}

# If USE_MEMGPT is False, then this example will be the same as the official AutoGen repo
# (
# If USE_MEMGPT is True, then we swap out the "coder" agent with a MemGPT agent


## api keys for the memGPT

# The user agent
user_proxy = autogen.UserProxyAgent(
    system_message="A human admin.",
    code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
    human_input_mode="TERMINATE",  # needed?
    default_auto_reply="You are going to figure all out by your own. "
    "Work by yourself, the user won't reply until you output `TERMINATE` to end the conversation.",

interface = autogen_interface.AutoGenInterface()
persona = "I am a 10x engineer, trained in Python. I was the first engineer at Uber."
human = "Im a team manager at this company"
memgpt_agent=presets.use_preset(presets.DEFAULT_PRESET, model='gpt-4', persona=persona, human=human, interface=interface, persistence_manager=persistence_manager, agent_config=llm_config)

if not USE_MEMGPT:
    # In the AutoGen example, we create an AssistantAgent to play the role of the coder
    coder = autogen.AssistantAgent(
        system_message=f"I am a 10x engineer, trained in Python. I was the first engineer at Uber",

    # In our example, we swap this AutoGen agent with a MemGPT agent
    # This MemGPT agent will have all the benefits of MemGPT, ie persistent memory, etc.
    print("\nMemGPT Agent at work\n")
    coder = memgpt_autogen.MemGPTAgent(

# Begin the group chat with a message from the user
    message="Write a Function to print Numbers 1 to 10"

Interview with MemGPT Co-Creator Charles Parker

For more information on the creators of MemGPT, also consider watching this video interview with one of its co-creators UC Berkley PHD student Charles Parker

MemGPT as Operation System

MemGPT draws inspiration from the virtual memory concept in operating systems and is innovatively applied to large language models to create an expansive context space. This innovation shines in scenarios like continuous conversations where traditional limitations on context length pose a challenge. By enabling large language models to handle their memory, MemGPT circumvents the usual restrictions set by fixed context lengths.

Limitations of MemGPT

Firstly, it’s essential to be aware that MemGPT is an emerging project currently undergoing enhancements. They have established a Discord group to foster idea-sharing and enable direct interaction with the creators. You are welcome to join in

Data Sensitivity: MemGPT’s reliance on previous interactions for context can raise concerns regarding data privacy and sensitivity, especially in scenarios involving personal or confidential information

Contextual Misinterpretations: While adept at handling extended conversations, MemGPT can occasionally misinterpret context, especially in nuanced or emotionally charged communications, leading to responses that may seem out of touch.

Resource Intensity: The system demands significant computational resources for optimal functionality, particularly for processing large volumes of data or maintaining extensive conversation histories.

Dependency on Quality Training Data: MemGPT’s effectiveness is closely tied to the quality of training data. Biased, inaccurate, or incomplete data can hinder the learning process, affecting the quality of interactions.

Adaptation to Diverse Discourses: The system’s ability to adapt to varying communication styles or understand different dialects and cultural nuances is still a work in progress, occasionally affecting its versatility in global or multicultural scenarios.

MemGPT vs Sparse Priming Representations (SPR)


  • Inspiration: Takes cues from hierarchical memory systems used in traditional operating systems.
  • Functionality: Implements a tiered memory system that allows an LLM to extend its context window by managing which information is stored or retrieved, and when this should happen.
  • Structure: Comprises a Main Context (analogous to an OS’s main memory) and an External Context (similar to secondary storage).
  • Utility: Aims to revolutionize LLMs’ capabilities in tasks that involve unbounded context, such as long-form conversations and detailed document analysis.

Sparse Priming Representations (SPR):

  • Inspiration: Modeled after human memory organization and retrieval systems, focusing on critical information.
  • Functionality: Enhances memory system efficiency by creating concise primers that represent complex ideas, supporting the accuracy in understanding and recall.
  • Approach: Prioritizes intuitive and user-friendly memory management, akin to how humans naturally process and store information.
  • Utility: Focused on making LLMs more efficient in knowledge retrieval and learning, improving user engagement and communication tools.

Technical Implementation:


  • Utilizes a structured approach for memory tier management, allowing for effective data movement and context management.
  • Tailored for scalability in dealing with large datasets and complex, extended tasks.


  • Uses a method of creating primers that act as a distillation of complex information, allowing for a more intuitive memory management experience.
  • Geared towards mimicking human cognitive processes for better learning and communication outcomes.

Applications and Implications:


  • May greatly benefit applications that require processing of large amounts of data over extended periods, like in-depth analysis and ongoing interactions.


  • Could significantly enhance tools for learning and communication by providing users with easy-to-understand summaries or primers of complex topics.

Community and Engagement:


  • Offers an open-source platform for developers and researchers to contribute to and enhance the capabilities of the memory management system.


  • Encourages community involvement through contributions of new examples, research, and tools to improve the system’s efficiency and intuitiveness.

In conclusion, Both MemGPT and SPR are innovative responses to the challenges of memory management in LLMs, each with its own philosophy and methodology. MemGPT is more structural and system-oriented, potentially better for tasks that need management of extensive contexts. SPR is more user-centric and intuitive, possibly better for learning and communication by simplifying complex information.

While both aim to enhance LLMs’ handling of context, their underlying philosophies and expected applications differ, reflecting the diversity of approaches in advancing AI and ML capabilities. The ongoing developments and community contributions in both these areas show a vibrant and collaborative effort to push the boundaries of what’s possible with memory management in LLMs.


MemGPT stands as a testament to the power of innovation in AI, bridging the gap between what LLMs can do and what we aspire for them to achieve. As we march towards the future, the vision of LLMs as comprehensive operating systems doesn’t seem far-fetched—it’s nearly within our grasp, and MemGPT is leading the charge. What do you think?

That’s it for today!


cpacker/MemGPT: Teaching LLMs memory management for unbounded context 📚🦙 (

MemGPT: Overcoming Context Limitations for ChatGPT and Other LLMs for Document Chats & More (


2310.08560.pdf (

What is MemGPT AI and MemGPT Installation Tutorial 2023 (

Haly AI