Yet another basic AI glossary part 1
Mini essays,  AI

Yet another basic AI glossary part 1

AI & Machine Learning Glossary for Beginners

Yet another basic AI glossary part 1. This is base what i need to learn better and understand all that „AI” and „LLMs”. Feel free to go through all of it and dive deeper on those subjects. Defining here ai concepts, ideas, math functions, slang and anything that might be helpful in better understanding „the whole lot”.

1. Logit

A logit is the raw output we get from a model before any functions are applied. Before the softmax functions. During classification logit shows the confidence of the model about everypossible output.

2. Logit Definition (Mathematical View)

In math logits are real numbers output by an LLM final layer ( and we have multiple of those). After proceeding with softmax function logits become values between 0 and 1. Those are probabilities that sum to 1.

3. Softmax function

Converts a vector of logits into a probability distribution. You can run an integral on it ! After applying the functions each number becomes a value between 0 and 1, with all probabilities adding up to 1. This is deterministic so we end up with a table of values with biggest propabilities ( to chose for the next output token).

4. Temperature (in AI sampling)

Temperature controls randomness (variability) in generative llms / AIs. Low temperature (e.g., 0.2) makes the model choose more predictable words ( giving more stable results, repetitive). High temperature on the other hand (e.g., 1.0 or 1.5) give model permission to choose from more distinct suggestion creating more random output. We people call it „creativity”.

5. Top-k Sampling

Top-k sampling limits the model to the k most likely words according to the probability distribution, introducing controlled randomness while maintaining coherence.

6. Top-p (Nucleus Sampling)

Top-p sampling (also nucleus sampling) selects the smallest ( starting from the top with the token having the highest probability), set of tokens whose combined probability exceeds p (e.g., 0.9). This helps keep outputs diverse but contextually grounded.

When You define bioth functions its the matter of values which one will be satisfied and 'end’ reading of the new tokens.

TokenProbabilityTop-k=2 Top-p=0.8 Cumulative Probability
eat0.40✅ Yes✅ Yes0.40
sleep0.25✅ Yes✅ Yes0.65
play0.17❌ No✅ Yes0.82
run0.13❌ No❌ No0.95
jump0.05❌ No❌ No1.00

7. Random Forest

A random forest is a set of multiple decition nodes ( trees). Each tree votes independently and we summarize the results. Given majority decides about classification. This approach is improving accuracy whiel avoiding overfitting by averaging predictions from multiple trees trained on different data samples.

RANDOM FOREST TREE #1 (Trained on Bootstrap Sample A)

Root Node (n=150 samples)

├── petal length (≤ 2.45 cm?) [gini=0.667, samples=50/150]
│ │
│ ├── YES (42 samples) → petal width (≤ 1.75 cm?) [gini=0.168]
│ │ │
│ │ ├── YES (37 samples) → setosa [100%, gini=0.0] ⭐ FINAL CLASS
│ │ └── NO (5 samples) → versicolor [93%, gini=0.124]
│ │
│ └── NO (8 samples) → petal width (≤ 1.75 cm?) [gini=0.375]
│ │
│ ├── YES (4 samples) → versicolor [100%, gini=0.0]
│ └── NO (4 samples) → virginica [100%, gini=0.0]

└── petal length (> 2.45 cm?) [gini=0.500, samples=100/150]

├── petal width (≤ 1.75 cm?) [gini=0.160, samples=54/100]
│ │
│ ├── YES (48 samples) → versicolor [95%, gini=0.095]
│ └── NO (6 samples) → virginica [100%, gini=0.0]

└── petal width (> 1.75 cm?) [gini=0.032, samples=46/100]

├── petal length (≤ 4.95 cm?) [gini=0.199, samples=24/46]
│ ├── YES (12 samples) → versicolor [92%, gini=0.160]
│ └── NO (12 samples) → virginica [100%, gini=0.0]

└── petal length (> 4.95 cm?) → virginica [100%, samples=22]

8. Euclidean Distance

Euclidean distance is a straight-line distance between two points in space. In vector math, it’s used to evaluate how far apart data points or embeddings are. The further apart the less likely they are connected in the given context.

X Y A(1,2) B(4,6) 5 units Δx=3 Δy=4 d = √[(4-1)² + (6-2)²] = √(9+16) = 5

9. Cosine Similarity

Cosine similarity measures how similar two vectors are by calculating the cosine of the angle between them. It ranges from -1 to 1, where 1 means identical direction (perfect similarity). Less means the vector are aligned giving the same context space.

Vector A Vector B Vector C θ ≈ 20° cos(θ) = 0.94 Cosine Similarity: -1 to 1 1=Identical | 0=Perpendicular | -1=Opposite

10. Dot Product

The dot product of two vectors is the sum of the products of their corresponding elements. It reflects how much two vectors point in the same direction.
Formula: AB=ABcos(θ)\vec{A}\cdot \vec{B}=|\vec{A}||\vec{B}|\cos(\theta)A⋅B=∣A∣∣B∣cos(θ)

11. Dot Product and Cosine Relationship

For unit vectors (where A=1|\vec{A}| = 1∣A∣=1 and B=1|\vec{B}| = 1∣B∣=1), the dot product directly equals the cosine of the angle between them:
AB=cos(θ)\vec{A}\cdot \vec{B} = \cos(\theta)A⋅B=cos(θ).
In simple terms, this means the dot product measures directional similarity.

Acute (dot > 0) x y θ v₁x v₁y v₂x v₂y v₁ v₂ v₁ · v₂ > 0, cos θ > 0 Orthogonal (dot = 0) x y v₁x v₂y v₁ v₂ v₁ · v₂ = 0, cos θ = 0 Obtuse (dot < 0) x y θ > 90° v₁ v₂ v₂x v₂y v₁ · v₂ < 0, cos θ < 0

12. Embedding

Embedding process transforms any kind of data into a vector of numbers that represent „meaning”. Embeddings are key in search, recommendation systems, and semantic similarity.

„Hello world!” → [0.12, -0.05, 0.34, 0.21, -0.08]

WordSample 5D EmbeddingIntuition
„Hello”[0.42, 0.15, -0.03, 0.28, 0.11]Greeting vector (warm, social dim high)
„world”[-0.18, 0.07, 0.41, 0.14, -0.25]Global/universal concept (broad dim high)
Average[0.12, 0.11, 0.19, 0.21, -0.07]Combined „welcoming to all”

cos(„Hello world!”, „Hi everyone!”) ≈ 0.92 → Very similar (greetings)
cos(„Hello world!”, „Goodbye moon”) ≈ 0.15 → Unrelated
cos(„Hello world!”, „Shut up!”) ≈ -0.23 → Opposite sentiment

13. Token

A token is a chunk of text (e.g., a word, part of a word, or punctuation) that an AI model processes. Large language models (LLMs) work by predicting one token at a time.

MethodExample TokensCountNotes
Word-level["Hello", "world!"]2Simple split on spaces/punctuation
Subword (BPE)["Hel", "lo", "world", "!"]4Common in GPT models—merges frequent pairs
BERT WordPiece["hello", "world", "!"]3Handles unknowns via ## prefixes
Character["H","e","l","l","o"," ","w","o","r","l","d","!"]12Max granularity, good for spelling tasks

14. Instruction File

An instruction file defines how an AI model should behave, what tasks it should prioritize, or what tone to adopt. For example, it can set boundaries or style preferences for generation.


title: „Pinia Stores Setup Guide – Vue 3 + Cache”
description: „Complete Vue.js Pinia with localStorage caching for cart, and preferences. Refresh-proof state management.”
keywords: „vue 3, pinia, localstorage cache, vue store, embeddings cache, woocommerce cart”
tech: „Vue 3.5+, Pinia 2.2+, Vite, pinia-plugin-persistedstate”
category: „Frontend”

15. Fine-tuning

Fine-tuning involves training a pre-existing model on specific data to adapt it for a particular task or domain, such as customer support or law-related text generation.

16. Prompt

A prompt is the input text or command given to an AI model. Well-crafted prompts guide the model to deliver precise, high-quality outcomes.

TASK

Refactor the legacy code to use the new library version while:

  1. Preserve exact functionality (zero behavior change)
  2. Modern Vue 3 Composition API (no Options API)
  3. TypeScript interfaces (strict types)
  4. Pinia store integration (if state-related)
  5. Error handling (try/catch + user-friendly messages)
  6. Performance (no memory leaks, reactive cleanup)

CONSTRAINTS

  • No breaking changes to public API
  • Handle edge cases from old code
  • 100% backward compatible inputs/outputs
  • Remove deprecated methods
  • Add JSDoc comments

EXAMPLE: Lodash → Native (common case)

LEGACY:
„`js
// Lodash 4.x
import _ from 'lodash’
const users = .groupBy(data, 'status’) const active = .filter(users.active, u => u.age > 18)

17. Context Window

Context window is the maximum amount of data in tokens a model can “remember” in a single conversation / session. Larger windows allow for longer, more coherent outputs. Too much data can spoil the answers. When context window will run out of capacity a shortening of context might trigger. Instead of 45 672 tokens it will make a summary for example around 10 000 tokens. You might say it will run a garbage collect on unused data in context.

ModelParametersSize (GB, FP16)Context Window (Input Tokens)Output TokensNotes
GPT-4o~1.76T~3.5 TB (cloud)128,0004,096Default/main model in Copilot Chat
GPT-4.1~1.8T~3.6 TB (cloud)128,0004,096High-quality coding model
GPT-52T+~4+ TB (cloud)128,0008,192New flagship (preview models vary)
GPT-5 mini~100B~200 GB (cloud)128,0004,096Fast/lightweight
GPT-5.1-Codex-Max~500B~1 TB (cloud)400,000128,000Max context preview (code-focused)
Gemini 3 Pro (Preview)1.5T~3 TB (cloud)128,00064,000Google model, codebase indexing
Gemini 2.5 Pro~500B~1 TB (cloud)64,000–128,000*VariesOften limited vs native 1M
Claude Sonnet 4400B~800 GB (cloud)80,000–128,000*4,096Preview models capped; native up to 1M
o4-mini~50B~100 GB (cloud)~100,0004,096Fast OpenAI variant
Qwen432B–72B64–144 GB (local)128K–1M8K–32KAlibaba; excellent code/math; Ollama-friendly
DeepSeek R1671B (37B active MoE)~1.3 TB (cloud/~74 GB active)128,00032,768Reasoning/coding beast; distilled 7B–70B local
Ollama 3.2 (Llama 3.2 base)3B–405B*6–810 GB (local, quantized ~2–200 GB)128K (configurable)VariesYour local setup; num_ctx: 131072; vision support

18. Inference

Inference refers to the process of running a trained model to generate predictions or responses, as distinct from training (which adjusts internal weights).

19. Model Parameters

Parameters are the internal values in a neural network (often millions or billions) that determine how inputs transform into outputs. They are adjusted during training. Inference happens when a trained AI model (like GPT-4o or your Ollama Qwen4) takes new input and generates predictions/output using patterns learned during training. No learning happens—inference applies fixed knowledge to fresh data

20. Plan / Subscription Tier

Most AI platforms offer plans or tiers that define limits such as context length, tokens to use (input and output), model choice or request volume. Paid plans often allow access to more advanced models and APIs.

21. Vector Database

Vector databases store embeddings (numerical vectors from text/images) and enable similarity search using cosine distance. Perfect for semantic search.

22. Gradient Descent

Gradient descent is a core optimization algorithm that updates model parameters by finding the direction that reduces error step by step. It is the core optimization algorithm that trains AI models by iteratively minimizing a loss function—finding the best weights/parameters. Think „hill descending” in parameter space.

23. Loss Function

A loss function measures how far the model’s predictions deviate from the correct answers. Lower loss means better performance during training and stable output.

TaskLoss FunctionFormulaExampleWhen
Regression (numbers)MSE(pred - actual)²Recipe IG: predict 45, actual 42 → loss=9Embeddings, prices
Regression (robust)MAE`pred – actual`
Classification (binary)Cross-Entropy-[y*log(pred) + (1-y)*log(1-pred)]„Is fasting?” 1→0.9: loss=0.1Spam/not spam
Classification (multi)Categorical Cross-EntropyExtension of binaryRecipe tagsMultiple categories
Margin (SVM)Hingemax(0, 1 - margin)Class separationRobust classification
Error (pred-actual) Loss Actual=42 MSE: (8)²=64 MAE: \|8\|=8 Pred=50 MSE vs MAE Loss

Want more ?

Read more !

Piotr Kowalski