-
-
Notifications
You must be signed in to change notification settings - Fork 23.8k
Description
Describe the bug
Conversational Summary Buffer memory sumarises from all history messages present in db. This will grow with time and at some point will get bigger than llm context, regardless of the context size. From my point of view, the role of summary buffer memory is to keep a summary of past messages and only some of the current messages in order to keep size in tokens under control. For this reason the summary buffer memory should keep its summary until a point in time and recreate it from newer messages than the point the summary was created at.
For example:
We have:
q1,r1, q2,r2 q3,r3, q4,r4, and now user asks q5 and summary discovers that q1 -- q4 have more than 2000 tokens and makes a summary.
This summary will be updated at each step from this moment on, because the conversational summary buffer takes all mesages from the db, now q1 -- q5 and sees that they have more than 2000 tokens.
After 40 pairs of messages, the size in tokens to be summarised will be 20000 tokens if we put 500 tokens average per pair of messages.
From my point of view, the behaviour should be:
when q5 arrives, q1 q4 are summarised.
Next time, the conversational summary Buffer memory reads only q5 from the db may be together with the summary of q1 -- q4.
This way summarising will grow very slow in tokens.
I think that, in getMessages, the query should return only messages created after a specific point in time.
To Reproduce
Create a chatflow with a llm with a verry narrow context, let say 4000 tokens. Add a summary buffer memory to it. Start a conversation with the model. After 10 messages the llm will complain that the context is exhausted.
Expected behavior
The summarisation should help to maintain the conversation in a token size limit.
Screenshots
No response
Flow
No response
Use Method
None
Flowise Version
No response
Operating System
None
Browser
None
Additional context
No response