AI Memory Features for Personalization
ChatGPT, Gemini and Grok have added memory features for personalization. If offers promise of personalized assistants, but also the peril of losing privacy.

Memory Features in AI Models
In the wake of the ChatGPT moment, it became apparent that we needed to impart LLMs with memory to make AI useful in most daily tasks. We needed fact-grounding to avoid LLM hallucinations. We needed memory of past interactions to help personalize responses and avoid repeating the same instructions on repetitive tasks. We needed to give AI models the understanding to call and use tools correctly.
There are several types of memory in AI models, each with a different role and utility:
Semantic memory encapsulates facts and knowledge, enabling AI to ground outputs in accurate information and structured knowledge. Semantic memory can be implemented by embedding data into high‑dimensional spaces for efficient Retrieval Augmented Generation (RAG), or by using web search to identify information to add to the context for an AI model.
Episodic memory is for personalization. Episodic memory logs interactions and events and tags memories with contextual cues, which can be retrieved later to improve responses and personalize user experiences. The memory features for personalization in Claude, Gemini, and ChatGPT are these types of memory.
Working memory is the current conversation (prompt and response) context and is realized in the context window. Context window size limits how much the model can process at once, and as context windows expand, more extended dialogues can be maintained. For example, for AI customer support agents, we may need to maintain working memory of an extended full chat session for best grounded performance.
Parametric memory comprises information and associations learned from training data and embedded within neural network weights, which the AI model can recall instantly at inference time. For example, ask an AI model, “What is the capital of France?” and it responds correctly without needing external fact retrieval.
Procedural memory stores learned sequences of operations or tasks, granting AI agents the ability to execute complex, multi‑step functions without external instruction. This memory is often built through reinforcement or demonstration learning, capturing action policies rather than declarative facts.
ChatGPT, Gemini and Grok have developed memory features for personalization. The features capture prior chat histories (episodic memory) and allow for storing and managing personalized information. We discuss these features below.
ChatGPT Memory Features: Cross-session Recall and Memory with Search
OpenAI first introduced memory for ChatGPT last year. In the original memory feature, users needed to explicitly give their ‘memories’ to ChatGPT directly, a controllable but manual process. This has the benefit of user control, but the inconvenience of manual instruction is a barrier to adoption. I didn’t find it useful enough to bother.
Cross-section recall automates the gathering of episodic memory from prior sessions. As of an April update from OpenAI, ChatGPT can reference all past conversations automatically (if the user turns on the feature). Making this a background activity transforms ChatGPT into a truly cross‑session personal assistant. The memory feature is conceptually simple but also powerful:
ChatGPT can now remember useful details between chats, making its responses more personalized and relevant.
Heavy users of ChatGPT that repeat similar requests and queries will find this extremely helpful. It can recall personal details, anecdotes, work tasks and topics, personal preferences (such as goals, hobbies, and writing style), all without the user having to be explicit. The promise of this is a seamless personalized AI interaction that improves with use.
Note also that if you are worried about privacy or letting personal information get out, you can turn the feature off. There is even a ‘temporary chat’ that is like ‘dark mode’ where your chat won’t be saved in your memory or chat history. What happens in “temporary chat” stays there.
Just last week, OpenAI introduced Memory with Search, which allows ChatGPT to incorporate details from stored memories when rewriting user prompts into web search queries. OpenAI shared a rather contrived example of utility:
For example, a vegan user in San Francisco will see “good vegan restaurants, San Francisco” instead of a generic “restaurants near me,” improving relevance and personalization.
Still one can imagine this could be useful for work use-cases in the enterprise, or research application, where you are running repeated similar queries, and might want some common assumptions or preferred sources for your domain.
Google Gemini Memory Features
Google has introduced two memory features to Gemini: Referencing past chats, and personalization. The features offer more natural and relevant responses:
You can already share your interests, preferences and important details with Gemini — like dietary restrictions or partners’ names — to allow for more natural conversations and relevant responses.
In February, Google expanded capabilities of Gemini to recall past relevant chats. In March, the “Saved info” memory feature was extended to all free users.
To use this feature to save and manage memories, users need to have a Google Account. Users can say “remember that” or click the “Saved info” button to store information, and users can later delete or manage it through the profile interface for their account. To look at your previous Gemini chat history, go to the MyActivity.google.com page for Gemini.
Google also added a “Personalization” feature in March. The feature seems intriguing, since it is just taking your history, not just of AI prompts but also search, and personalizing the AI model around it. However, this feature is tied to Gemini 2.0 Flash Thinking only for now, so you cannot control which AI model to use with it.
xAI Grok Persistent Memory Feature
xAI recently announced that its Grok chatbot now includes a memory feature, storing user‑specific details across sessions to deliver more personalized recommendations. With memory enabled, Grok can recall past conversation insights, such as personal preferences, to tailor its suggestions and responses over time.
Memories are transparent. As with the memory features in other AI models, you can see what Grok has in its memory, and users can edit and delete (forget) memories.
How to Use Memory in AI
The memory features of ChatGPT and other AI models could make it into a true personal assistant. For professionals, executives, and entrepreneurs, this could mean serving as a personal chief of staff, brainstorming partner, customer support, or a domain-specific knowledge oracle.
Memory is useful to you if it makes future AI model interactions more convenient or accurate. Thus, think of memory as less of a ‘scrapbook’ feature and more of a convenience feature to avoid future repetition in prompting AI models.
To accomplish this, give the AI specific information intended to help serve you in future interactions as a personal assistant. Giving an AI personalized information then is like “onboarding” an AI employee to give them knowledge to serve you better.
Conclusion - Give Memory a Chance
ChatGPT currently has the most advanced memory features. Gemini and Grok offer basic memory features. Claude 3.7 Sonnet doesn't have persistent memory now, but I suspect they will catch up.
I avoided using memory in ChatGPT because I didn’t feel the personalization benefits outweighed the setup inconvenience. However, having previous chat sessions saved in memory automatically makes it easy to share preferences, topic domains, and personal information for future personalized responses. The ease of use makes it worth a try.
If you are repeating prompts or information, personalization could be a time-saver for you. If you want a more personalized AI assistant, giving it personal information helps improve your interactions with it.
The ultimate promise of memory-based personalization is to enable an AI assistant that serves your needs seamlessly and without repeated detailed direction, because it ‘knows you so well.’
The downside to episodic memory and personalization is giving up privacy. Google already knows much about you. Your AI model may take it to the next level, knowing more about you than your spouse, the Government, and your best friend combined.
There are privacy and potential cyber-security concerns with personalization memory features. Personal info that leaks out could assist with social engineering-style phishing attacks or password hacking against you. If this is setting off alarm bells for you, turn it off or limit the personal information you share.
Coda: RAG and the Rise and Fall of Vector Databases
The era of million token context windows has raised the question: Is RAG dead?
RAG, Retrieval Augmented Generation, is an implementation of one type of memory – semantic memory. It’s been extremely useful in the LLM era where databases are much larger than context windows and semantic search is a useful method for finding relevant information. However, context windows have grown, while the challenge of finding relevant information has turned out to be more complex.
However, the rise of memory features in AI models for personalization might well be a successful use-case for RAG, or at least for embedding information in a vector format.
Jo Kristian Bergum shared his perspectives on Retrieval Augmented Generation (RAG), search, and vector databases in the essay “The Rise and Fall of Vector Databases,” an extended post on X. His view is that the rise of vector databases (such as Pinecone) was driven by the need to connect language models like ChatGPT with external data. However, the need for a separate vector database infrastructure is fading as vector search capabilities are integrated into existing databases and search engines.
Bergum, who has 20 years of experience in search and search systems, believes the natural abstraction for connecting AI with knowledge is search. The incredibly utility of accessing specific data to support an AI model response requires search, the effort to find the needle in the haystack of internet data.
Embeddings are crucial for representing data for AI models, but effective search requires considering other signals like freshness and authority in addition to similarity. This means that semantic similarity, which is what vector DBs run on, isn’t enough. Hybrid queries, combining vector search with metadata, are essential for best quality.
RAG is not dead. We still need to retrieve relevant data even in a world of large context windows, and data is abundant. As episodic memory expands to include volumes of AI model interaction history, RAG will be needed to refine down full histories to semantically relevant information for an ongoing chat session.
However, RAG and vector databases are not as connected as some believe. Embedding-based retrieval (the essence of RAG) can be integrated into primary databases, complementing traditional search. For some cases, a traditional database with vector search capabilities is sufficient, but for other applications, search quality requires both a dedicated search engine and vector similarity search pipelines.
What we label as "vector databases" are, in reality, search engines with vector capabilities. The market is already correcting this categorization—vector search providers rapidly add traditional search features while established search engines incorporate vector search capabilities.
The future of AI memory may be a hybridized form of RAG that collects both semantic memory for knowledge grounding and episodic memory of prior interactions for personalization. Both types of memory are fed into much larger context windows for highly accurate AI model responses and interactions.