• vivendi@programming.dev
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    2
    ·
    19 hours ago

    This is unironically a technique for catching LLM errors and also for speeding up generation.

    For example in speculative decoding or mixture of experts architectures these kind of setups are used.