Overview of chat vectorization

Chat vectorization searches for messages in your current chat’s history that seem relevant to your most recent messages. It temporarily shuffles the most relevant messages to the beginning or end of the chat history. This happens when the model’s reply to your last message is generated.

The messages at the start and end of the chat history tend to have the greatest impact on the model’s reply. Therefore, shuffling relevant messages to these locations can help the model focus on relevant information in its reply.

In particular, chat vectorization can find relevant messages that are too far back in the message history to fit into the request context. Shuffling these messages into context provides the model with information that it would not have otherwise.

Chat vectorization is a kind of retrieval-augmented generation (RAG). Retrieval-augmented generation increases the quality of responses generated by a model, by providing additional relevant information in the prompt.

  • Retrieval: the most recent messages are used to retrieve relevant past messages
  • Augmented: the model’s context is augmented by inserting past messages in a useful way
  • Generation: the model is instructed to use the past messages when generating the response

Some terms

A vector is a set of numbers that could represent the themes, content, style, or other characteristics of a piece of text.

Vectorization is calculating the vector that represents a piece of text. This is done by a vectorizing model. Just as text generation models make text from text, vectorizing models make vectors from text.

Vector search finds relevant results by comparing vectors rather than, say, keywords. If we calculate the vector for a search query, we can compare it to the stored vectors for a collection of pieces of text. This finds the texts in our collection that are most similar to the text in the search query. In the case of chat vectorization, the “search query” is the most recent 2 messages, and the “texts in our collection” are all the other messages in the chat.

Setting up

To enable Chat vectorization, select “Extensions” > “Vector Storage” > “Enabled for chat messages”.

Configure a vectorization source and vectorization model. Chat vectorization uses the same vector source as Data Bank, so you may have set this up already. The settings for the Vectorization Source and Vectorization Model are documented in Data Bank.

Chat vectorization uses the same vector storage as Data Bank, but this does not need to be set up or configured. There is also information about Vector Storage in Data Bank.

Chat vectorization does not use Data Bank to store the chat messages. The messages are stored in the chat.

Preparing chat messages for search (vector storage)

So that chat messages can be searched, a vector is calculated for each message and stored.

Vectorizing occurs in the background, whenever you send or receive a message.

Each message is stored individually, so that it can be found and shuffled individually during generation.

Large messages are split into “chunks” so that the model can be given the most relevant part of a long message. The chunk size is 400 characters. You can change this with “Chunk size (chars)“.

Messages are divided into chunks by finding a chunk boundary such as a paragraph break, line break, or space between words. This is so that the all the chunks make sense, as far as possible. If your chat messages have some other way to mark natural splitting points, such as ----, you can add this to “Chunk boundary”. The setting for “Chunk boundary” is shared with Data Bank.

Vector storage controls

To calculate vectors for all messages in the current chat, without waiting for them to be processed in the background, choose “Vectorize All” from the settings.

To see how many messages in the current chat have been vectorized, choose “View Stats”. This displays the total number of vectors stored. It also indicates the specific chat messages that have been vectorized, by marking them with a green ball.

To remove all the vectors for messages in the current chat, choose “Purge Vectors”.

Note

The controls for “Vectorize All” and “Purge Vectors” within Chat vectorization only affect the stored vectors for the current chat. However, there are identical buttons in File vectorization that affect the vectors for files in Data Bank. Ensure that you are purging the vectors that you intend to purge.

Finding relevant messages (vector retrieval)

To find the most relevant messages in the chat history, the most recent messages are converted (vectorized) into a query vector. By default, the 2 most recent messages are used. To change this, change the value of “Query messages”. This value is also used when finding relevant content from Data Bank.

Past messages must have a relevance score of at least 25% to be included. You can change this with “Score threshold”. The setting for score threshold is shared with Data Bank.

The 3 most relevant messages from chat history are shuffled. You can change this with “Insert#“.

To avoid disturbing the most recent events in the chat, the 5 most recent messages are not shuffled. To change this, change the value of “Retain#“.

Shuffling messages (augmented generation)

The messages are shuffled to one of 3 places:

  • The top of the chat, after the Main Prompt / Story String (the default)
  • The top of the chat and before the Main Prompt / Story String
  • The end of the chat, before the last 2 messages (“In-chat @ Depth 2”). Since you just sent a message, this position is usually just before the previous reply from the model.

You can change this with “Injection Position” and “Depth”.

The messages are included in order of relevance, with more relevant messages shown after less relevant messages.

The name of the person or character who sent each message is included.

The messages are shown to the model as “past events”. This assists the model to understand that the messages contain information from a different point in the chat history than the point at which they are inserted. You can change this with “Injection Template”.

You can see the final prompt to the model using the Prompt Itemization popup, the terminal logs, or the browser console logs. The browser console logs are useful for understanding what all the steps in Chat vectorization are doing.

Vector summarization

The Vector summarization feature is experimental

Vector summarization does not create summaries of the chat. It does not turn the retrieved messages into summaries. It does not make the chat history shorter. It is not “like Summarize but better”.

Vector summarization is intended to make vector search of chat messages more effective. It does this by introducing a summarizing step prior to vectorizing. The summarizing step extracts the most important parts of the message, so that the resulting vector is a better indicator of what the message relates to.

Vector summarization may make vector search less effective.

To summarize the messages in the chat history, and generate a vector for each summarized message, choose “Summarize chat messages for vector generation”.

The summarized message does not replace the original message in chat. If a vector search matches the vector of a summarized message, the original message is retrieved from chat history and shuffled into context. The summarized versions of the messages are retained in Vector Storage, which may be of interest for debugging.

To summarize the content of the messages used to search the chat history (the last 2 messages by default), choose “Summarize chat messages when sending”.

Each time a message is summarized for vectorising, a separate request is made to the summarizing model. You can choose which summarizing source is used with “Summarize with”. Choosing “Main API” will generate the summaries using the same model and connection settings that you use for generating chat or text completions.

The request consists of the raw message content and an instruction about how the model should produce the summary. You can change the instruction with “Summary Prompt”.