• vivendi@programming.dev
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      2
      ·
      20 hours ago

      This is unironically a technique for catching LLM errors and also for speeding up generation.

      For example in speculative decoding or mixture of experts architectures these kind of setups are used.