Skip to main content

Command Palette

Search for a command to run...

What is GPT?

Updated
1 min read
A
An enthusiastic Full Stack Ruby on Rails Engineer from India, who enjoys writing from time to time. I started this blog as a way of motivating myself into making meaningful contributions. Most of these writings are Ruby posts with a few smatterings of opinion with regard to developer life.

GPT = Generative Pre-trained Transformer

A family of large language models that generate text one token at a time, using the Transformer architecture. GPTs are primarily used to generate text.

  • Generative - it produces (generates) text.

  • Pre-trained - first trained on huge amounts of text (next-token prediction).

  • Transformer - the underlying architecture.

How it works:

  • Text is broken down into tokens by a tokenizer. Each token is turned into a vector.

  • A series of Transformer decoder layers use self-attention and feed-forward networks to figure out the next-token distribution.

  • During inference, the model either samples or chooses the next token with the highest probability, adds it, and repeats the process.

Why GPT is powerful

  • trained on massive text datasets.

  • You can teach it new behavior by providing examples.

More from this blog

Aniket's Blog

10 posts

Hello there! I'm Aniket Patidar, a full-stack engineer who takes pride in engineering captivating digital realms.