Build Your LLM

Animated walkthrough — tokenization · n-gram training · temperature sampling · text generation

Step 1 of 6
1
Use Case
2
Data
3
Tokenize
4
Configure
5
Train
6
Generate
🎯 Choose Your Use Case Select a domain for your LLM to learn
🧚
Fairy Tales
Classic storytelling patterns — "Once upon a time" narratives with magical elements
🎬
Movie Reviews
Film critique vocabulary and sentiment patterns from editorial reviews
🍳
Cooking
Culinary instructions with ingredient lists, methods, and techniques
Sports Commentary
Athletic vocabulary, scores, and play-by-play match language
💻
Tech Docs
Technical documentation language with APIs, functions, and system design
🌹
Poetry
Lyrical verse with rhythmic patterns, metaphor, and expressive language
Select a use case to continue
📚 Training Data 247 words of fairy tale text loaded
247
Total Words
127
Unique Words
1.9x
Avg Frequency
Once upon a time in a dark forest there lived a young girl named Rose. The forest was filled with ancient trees whose branches twisted toward the sky like grasping fingers. Rose was brave and curious and she ventured deeper into the forest each day. One morning she discovered a small cottage hidden behind a curtain of ivy. The door was painted red and the windows glowed with a warm golden light. She knocked three times and waited. A voice called out from within bidding her enter. Inside the cottage sat an old woman with silver hair and kind eyes. The woman offered Rose a bowl of warm soup and a place by the fire. They talked for many hours about the magic hidden within the forest. The old woman knew the secret paths and the ancient songs that made the trees dance. She told Rose that the forest was alive and that every creature within it had a story to tell. Rose listened carefully to each tale the old woman shared. As the sun began to set Rose thanked the woman and stepped back into the forest. The trees seemed to lean toward her as she walked their branches swaying gently as if in greeting. From that day forward Rose visited the cottage every week and learned the old ways of the forest folk. She became a keeper of stories a bridge between the ancient world and the new.
🔤 Tokenization Split corpus into tokens the model can learn from
247
Token count
127
Vocabulary size
⚙️ Configure Your LLM Set the n-gram size, epochs, and temperature
N-gram Size (context window)
Bigram1 token context
Trigram2 token context
4-gram3 token context
Training Epochs
50
Temperature (sampling creativity)
0.8
📐 Bigram model — predict each word from the 1 preceding word. With 247 training tokens and Laplace smoothing, estimated perplexity after 50 epochs: ~28
🏋️ Train Your LLM Building n-gram probability table with Laplace smoothing
0 / 50
Epoch
Loss
Perplexity
✨ Generate Text Your LLM is generating — token by token
Once upon a
STEP 1 OF 7
Select a domain — your LLM will learn language patterns from that corpus.
↗ Open Full Tool