So, got an agent in Unreal Engine doing a call to a multi-modal vision text foundation model every 20 seconds or so. The model is being asked "is this interesting or not" if it's interesting it sends instructions to the gameAI (to investigate the object). If not, it executes a turn, and explores another area. I'm doing this for work, but I'm wondering if anyone else is doing this kind of embodied agent work in the game domain? Less LLMs for dialogue, more LLMs for behavior.