Caching LLM prompts is a great way to reduce expenses. It works by using vector search to identify similar prompts and then returning its response. If there are no similar responses, the request is passed onto the LLM provider to generate the completion.
FT.CREATE llmcache ON JSON PREFIX 1 "llm:" SCHEMA $.prompt_vector_ as prompt_vector_ VECTOR HNSW 6 TYPE FLOAT64 DIM 1536 DISTANCE_METRIC COSINE
If you need help with any of these steps, you can reach out to the Relevance AI support team via live chat.
- Daniel, Founder @ Relevance AI