• jocanib@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    2
    ·
    2 years ago

    In context. And that is exactly how they work. It’s just a statistical prediction model with billions of parameters.

    • keegomatic@kbin.social
      link
      fedilink
      arrow-up
      11
      arrow-down
      2
      ·
      2 years ago

      That’s not really how LLMs work. You’re basically describing Markov chains. The statement “It’s just a statistical prediction model with billions of parameters” also applies to the human brain. An LLM is much more of a black box than you’re implying.

    • Zeth0s@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      2 years ago

      regurgitate the next word most commonly used by humans in any given context.

      is not what it does. That would create non sensical text (you can try yourself).

      This is a summary of the method, as summarized by gtp-4:


      Sure, here is a detailed description of how text is generated with ChatGPT, which is based on the GPT architecture:

      1. Initial Prompt: The process begins with an input prompt. This could be something like “Tell me about the weather today” or any other string of text.
      1. Tokenization: The input text is broken down into smaller parts, called tokens, which can represent words, parts of words, or punctuation. GPT uses a byte pair encoding (BPE) tokenization, which essentially breaks down text into commonly occurring chunks.
      1. Embedding: Each token is then turned into a vector via an embedding. This vector captures semantic information about the token and serves as the input for the model.
      1. Processing the Input: The GPT model processes the input vectors sequentially with a stack of transformer layers. Each layer applies self-attention and feeds its output into the next layer.
      1. Self-Attention Mechanism: The self-attention mechanism in the Transformer model allows it to weigh the importance of different words when predicting the next word. For example, when trying to predict the last word in the sentence “The cat sat on the ____,” the words “cat” and “on” are likely to have more influence on the prediction than “The”. This weighing is learned during training and allows the model to generate more coherent and contextually appropriate responses.
      1. Output Layer: The output from the final transformer layer for the last input token goes through a linear layer followed by a softmax function, which turns it into a probability distribution over the possible next tokens in the vocabulary. Each possible next token is assigned a probability.
      1. Sampling with Temperature: The next token is chosen based on these probabilities. One common method is to sample from this distribution, which introduces some randomness into the process. The temperature parameter controls the amount of randomness: a higher temperature makes the distribution more uniform and the output more random, while a lower temperature makes the model more likely to choose the highest-probability token.
      1. Decoding: The chosen token is then decoded back into text and appended to the output.
      1. Next Iteration: The process then repeats for the next token: the model takes the output so far (including the newly-generated token), processes it, and generates probabilities for the next token. This continues until a maximum length is reached, or an end-of-sequence token is produced.
      1. Post-Processing: Any necessary post-processing is applied, such as cleaning up tokenization artifacts.

      In this way, the model generates a sequence of tokens, one at a time, based on the input prompt and the tokens it has generated so far. Please note that while this process typically uses sampling with a temperature parameter, other methods like beam search or top-k sampling can also be used to choose the next token. These methods have different trade-offs in terms of computational efficiency, diversity, and quality of output.


      You are missing the key part where the text is tranformed in a vector space of “concepts” where semanticic relationships are represented, that is where the inference happens. The inference is not on words to get the next commonly used word, otherwise it wouldn’t work. And you also missed the final sampling to introduce a randomness in the word selection.

      I don’t understand why are you so upset for a chain of complex mathematical functions that complete and input sentence. Why are you angry?

      • jocanib@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        3
        ·
        2 years ago

        You’re agreeing with me but using more words.

        I’m more annoyed than upset. This technology is eating resources which are badly needed elsewhere and all we get in return is absolute junk which will infest the literature for decades to come.

        • Zeth0s@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          2 years ago

          I am not agreeing with you because “regurgitate the next most commonly world” is not what it does.

          That said, the technology is not doing anything wrong. The people using it are doing it. The technology is a great achievement of human kind, possibly one of the greatest. If people decide to use it to print sh*t is people fault. Quantum mechanics is one of the greatest achievement of human kind, if people decided to use it to kill people, it is a fault of people. Many humans are simply shitty, don’t blame a clever mathematical function and its clever implementation