📄️ The Illustrated Transformer (Notes)
- 注意力机制
📄️ Transformer Inference Arithmetic
截至 2022-03-30
📄️ Chat Template
https://github.com/openai/openai-python/blob/release-v0.28.0/chatml.md
📄️ LLM为什么使用Decoder only架构?
1. 淘汰掉BERT这种encoder-only,因为它用masked language modeling预训练,不擅长做生成任务
📄️ Speculative Decoding
Speculative decoding uses two models:
📄️ ChatGPT Fine Tuning
When to Fine Tune?
📄️ Train Domain LLM with GRPO
motivation