KV cache in transformers
“This will be a small experiment”. My thoughts as I started working on it a month ago. It is a simple concept of caching previous results for future calculations, similar to dynamic programming in DSA. I could not have been far from the truth. Well, ...
Jul 24, 20253 min read37
