<https arxiv org pdf 2304 11062 pdf|Some researchers> used R Cerebral Valley #05-ai-news

<Some researchers> used Recurrent Memory Transform...

Jim Park

04/25/2023, 4:55 AM

Some researchers used Recurrent Memory Transformer architecture to work with 2M tokens 🤯. I wonder if anyone here can validate the approach? It's all a bit over my head!

🙌🏾 1

Jim Park

04/25/2023, 4:23 PM

I read the paper and thought about it last night. This paper rings my BS radar... I’m hoping someone can help me to understand better. So here’s my thinking. Let’s presume I have a 4096 token memory limit. I then allocate 1024 tokens to memory space which I will update as I parse/generate my sequence. It seems to me that the interesting part of the paper is how to manage the memory in a strategy to produce the required outcome. If I have a particularly differentiated data point, like a note about a vivid purple cat in a sea of information about shades of gray, then perhaps I can get lucky in that data point persists across naive recursive steps (compress the information in this step and recurse). I don’t know if I’m too naive to have missed it, but I didn’t get the sense that the paper discussed how memory might be tuned to the task. Like, if I have any real data, with real information density, how could I actually put it to use? If we take an example from the student: The student will read a long body of text, and in their head, generalize the content in a running outline. On paper, they take notes that will act as an “index” of highly relevant references. When the student writes the paper, they use that outline and sample their notes to come up with a hypothetical theme or point to their paper. Then, they build an outline and gather resources to construct their paper. This sequence describes a hierarchy of working memory, much like memory layers in CPU architecture, could memory for LLMs have a similar design? What could the memory management coprocessor look like?

Jim Park

04/25/2023, 5:38 PM

In short, I doubt the seemingly naive (maybe I misunderstand) strategy would survive actual, information-dense, data.

Alex Halliday

04/28/2023, 6:17 PM

The gap between sensational research / POC claims and production ready applications in AI has never been bigger... thanks for the thoughts @Jim Park

2 Views

Open in Slack

Previous Next