What are the workarounds for OpenAI completion API...
# 06-technical-discussion
a
What are the workarounds for OpenAI completion API of 4k tokens? I am asking LLM to summarize the content in a particular format (aka JSON schema), which takes 2k tokens in the prompt without the actual text. I found that JSON is not very token-efficient and already switched that to the YAML as input and output, but I still need to keep the structured format. Sample of the schema:
Copy code
attr_1:
  type: number
  minimum: 0
  maximum: 30
attr_2:
  type: string
  enum: [value_1, value_2, etc]
and 50 more attributes like that. How are you optimizing long-structured schema prompts? What are the potential workarounds? Would love to hear from the community 🤖
d
If you need to read the entirety of the schema in a single prompt then you could explore prompt compression. Example technique: If you something that looks like an "attr_2" repeated many times, tell the LLM that the entirety of "attr_2: {... }" will be reffered to as "attribute_2_placeholder" from now on. You can also consider stripping irrelevant text on a trial/error basis. Another approach would be segmenting the input to summarize recursively (transformer XL does something similar: https://arxiv.org/abs/1901.02860) That said, are you sure you need to have the entirety of the schema in a single prompt?
👀 1
a
thank you @Daniel Hsu, the promt compression makes sense. I was able to compress is by 50% with a few little tweaks. As for Transformer-XL, are you aware of any implementations?
That said, are you sure you need to have the entirety of the schema in a single prompt?
My goal is to evaluate products described in the documents against specific attributes and then rank products based those attributes in memory (for now). The same product might be mentioned in multiple documents and I am trying to get an aggregate of the attributes across multiple documents. if not passing the schema into the prompt, how would you get a structured data in the output, what are the options here? so the simplified pipeline is: documents (multiple products per document) -> products with attributes -> User input on attributes (structured) -> ranking of products based on the attributes that matter. I am not sure if that’s the best pipeline, but curios if you have any thoughts on that.
d
Is the set of attributes static? i.e. a finite number of possible attributes that is well-defined?
a
yes, attributes are the same across all products and they are well defined: ranges for numbers, possible values for enums, etc
d
Perhaps you could preprocess segments of the documents into just the attributes themselves. e.g. chunk the document first, process each segment to extract all attributes exhibited by that segment. That'd compress your input drastically
1
I believe this is the code for transformer-xl: https://github.com/kimiyoung/transformer-xl
❤️ 2
a
thank you Daniel!
v
this is exactly what i was looking for as well!
❤️ 1
h
Anthropic Claude now supports 100k tokens https://www.anthropic.com/index/100k-context-windows
d
whoa! I'm so curious how they did this...
Maybe they incorporated a chunking/summarizing step
h
I suspect something similar. There is also new research coming out that supports unlimited context windows: https://arxiv.org/pdf/2305.01625.pdf by offloading attention computation across all layers to a k-nearest neighbor index.
😮 1