For my latest collaboration with Arize AI’s Aparna Dhinakaran, we set out to investigate the following question: Given a large set of time series data within the context window, how well can LLMs detect anomalies or movements in the data? In other words, should you trust your money with a stock-picking GPT-4 or Claude 3 agent? To answer, we conducted a series of experiments comparing the performance of large language models in detecting anomalous time series patterns.
You don’t want to miss these results (one model clearly stands out):
https://arize.com/blog-course/large-language-model-performance-in-time-series-analysis