Introduction

A type of deep learning model
Trained on massive text datasets to understand, generate, or summarize human language.
Transformer-based models that can handle text generation, translation, summarization, Q&A, and reasoning.
- Use self-attention instead of RNNs to handle sequences.
- Parallel processing → faster training
- Captures long-range dependencies better than RNN/LSTM

Tokenization

Workflow: