Yet another basic AI glossary part 1

AI & Machine Learning Glossary for Beginners

Yet another basic AI glossary part 1. This is base what i need to learn better and understand all that „AI” and „LLMs”. Feel free to go through all of it and dive deeper on those subjects. Defining here ai concepts, ideas, math functions, slang and anything that might be helpful in better understanding „the whole lot”.

1. Logit

A logit is the raw output we get from a model before any functions are applied. Before the softmax functions. During classification logit shows the confidence of the model about everypossible output.

2. Logit Definition (Mathematical View)

In math logits are real numbers output by an LLM final layer ( and we have multiple of those). After proceeding with softmax function logits become values between 0 and 1. Those are probabilities that sum to 1.

3. Softmax function

Converts a vector of logits into a probability distribution. You can run an integral on it ! After applying the functions each number becomes a value between 0 and 1, with all probabilities adding up to 1. This is deterministic so we end up with a table of values with biggest propabilities ( to chose for the next output token).

4. Temperature (in AI sampling)

Temperature controls randomness (variability) in generative llms / AIs. Low temperature (e.g., 0.2) makes the model choose more predictable words ( giving more stable results, repetitive). High temperature on the other hand (e.g., 1.0 or 1.5) give model permission to choose from more distinct suggestion creating more random output. We people call it „creativity”.

5. Top-k Sampling

Top-k sampling limits the model to the k most likely words according to the probability distribution, introducing controlled randomness while maintaining coherence.

6. Top-p (Nucleus Sampling)

Top-p sampling (also nucleus sampling) selects the smallest ( starting from the top with the token having the highest probability), set of tokens whose combined probability exceeds p (e.g., 0.9). This helps keep outputs diverse but contextually grounded.

When You define bioth functions its the matter of values which one will be satisfied and 'end’ reading of the new tokens.

Token	Probability	Top-k=2	Top-p=0.8	Cumulative Probability
eat	0.40	✅ Yes	✅ Yes	0.40
sleep	0.25	✅ Yes	✅ Yes	0.65
play	0.17	❌ No	✅ Yes	0.82
run	0.13	❌ No	❌ No	0.95
jump	0.05	❌ No	❌ No	1.00

7. Random Forest

A random forest is a set of multiple decition nodes ( trees). Each tree votes independently and we summarize the results. Given majority decides about classification. This approach is improving accuracy whiel avoiding overfitting by averaging predictions from multiple trees trained on different data samples.

RANDOM FOREST TREE #1 (Trained on Bootstrap Sample A)

Root Node (n=150 samples)
│
├── petal length (≤ 2.45 cm?) [gini=0.667, samples=50/150]
│ │
│ ├── YES (42 samples) → petal width (≤ 1.75 cm?) [gini=0.168]
│ │ │
│ │ ├── YES (37 samples) → setosa [100%, gini=0.0] ⭐ FINAL CLASS
│ │ └── NO (5 samples) → versicolor [93%, gini=0.124]
│ │
│ └── NO (8 samples) → petal width (≤ 1.75 cm?) [gini=0.375]
│ │
│ ├── YES (4 samples) → versicolor [100%, gini=0.0]
│ └── NO (4 samples) → virginica [100%, gini=0.0]
│
└── petal length (> 2.45 cm?) [gini=0.500, samples=100/150]
│
├── petal width (≤ 1.75 cm?) [gini=0.160, samples=54/100]
│ │
│ ├── YES (48 samples) → versicolor [95%, gini=0.095]
│ └── NO (6 samples) → virginica [100%, gini=0.0]
│
└── petal width (> 1.75 cm?) [gini=0.032, samples=46/100]
│
├── petal length (≤ 4.95 cm?) [gini=0.199, samples=24/46]
│ ├── YES (12 samples) → versicolor [92%, gini=0.160]
│ └── NO (12 samples) → virginica [100%, gini=0.0]
│

└── petal length (> 4.95 cm?) → virginica [100%, samples=22]

8. Euclidean Distance

Euclidean distance is a straight-line distance between two points in space. In vector math, it’s used to evaluate how far apart data points or embeddings are. The further apart the less likely they are connected in the given context.

9. Cosine Similarity

Cosine similarity measures how similar two vectors are by calculating the cosine of the angle between them. It ranges from -1 to 1, where 1 means identical direction (perfect similarity). Less means the vector are aligned giving the same context space.

10. Dot Product

The dot product of two vectors is the sum of the products of their corresponding elements. It reflects how much two vectors point in the same direction.
Formula: $\vec{A}\cdot \vec{B}=|\vec{A}||\vec{B}|\cos(\theta)$ A⋅B=∣A∣∣B∣cos(θ)

11. Dot Product and Cosine Relationship

For unit vectors (where $|\vec{A}| = 1$ ∣A∣=1 and $|\vec{B}| = 1$ ∣B∣=1), the dot product directly equals the cosine of the angle between them:
$\vec{A}\cdot \vec{B} = \cos(\theta)$ A⋅B=cos(θ).
In simple terms, this means the dot product measures directional similarity.

12. Embedding

Embedding process transforms any kind of data into a vector of numbers that represent „meaning”. Embeddings are key in search, recommendation systems, and semantic similarity.

„Hello world!” → [0.12, -0.05, 0.34, 0.21, -0.08]

Word	Sample 5D Embedding	Intuition
„Hello”	[0.42, 0.15, -0.03, 0.28, 0.11]	Greeting vector (warm, social dim high)
„world”	[-0.18, 0.07, 0.41, 0.14, -0.25]	Global/universal concept (broad dim high)
Average	[0.12, 0.11, 0.19, 0.21, -0.07]	Combined „welcoming to all”

cos(„Hello world!”, „Hi everyone!”) ≈ 0.92 → Very similar (greetings)
cos(„Hello world!”, „Goodbye moon”) ≈ 0.15 → Unrelated
cos(„Hello world!”, „Shut up!”) ≈ -0.23 → Opposite sentiment

13. Token

A token is a chunk of text (e.g., a word, part of a word, or punctuation) that an AI model processes. Large language models (LLMs) work by predicting one token at a time.

Method	Example Tokens	Count	Notes
Word-level	`["Hello", "world!"]`	2	Simple split on spaces/punctuation
Subword (BPE)	`["Hel", "lo", "world", "!"]`	4	Common in GPT models—merges frequent pairs
BERT WordPiece	`["hello", "world", "!"]`	3	Handles unknowns via ## prefixes
Character	`["H","e","l","l","o"," ","w","o","r","l","d","!"]`	12	Max granularity, good for spelling tasks

14. Instruction File

An instruction file defines how an AI model should behave, what tasks it should prioritize, or what tone to adopt. For example, it can set boundaries or style preferences for generation.

title: „Pinia Stores Setup Guide – Vue 3 + Cache”
description: „Complete Vue.js Pinia with localStorage caching for cart, and preferences. Refresh-proof state management.”
keywords: „vue 3, pinia, localstorage cache, vue store, embeddings cache, woocommerce cart”
tech: „Vue 3.5+, Pinia 2.2+, Vite, pinia-plugin-persistedstate”
category: „Frontend”

15. Fine-tuning

Fine-tuning involves training a pre-existing model on specific data to adapt it for a particular task or domain, such as customer support or law-related text generation.

16. Prompt

A prompt is the input text or command given to an AI model. Well-crafted prompts guide the model to deliver precise, high-quality outcomes.

TASK

Refactor the legacy code to use the new library version while:

Preserve exact functionality (zero behavior change)

Modern Vue 3 Composition API (no Options API)

TypeScript interfaces (strict types)

Pinia store integration (if state-related)

Error handling (try/catch + user-friendly messages)

Performance (no memory leaks, reactive cleanup)

CONSTRAINTS

No breaking changes to public API

Handle edge cases from old code

100% backward compatible inputs/outputs

Remove deprecated methods

Add JSDoc comments

EXAMPLE: Lodash → Native (common case)

LEGACY:
„`js
// Lodash 4.x
import _ from 'lodash’
const users = .groupBy(data, 'status’) const active = .filter(users.active, u => u.age > 18)

17. Context Window

Context window is the maximum amount of data in tokens a model can “remember” in a single conversation / session. Larger windows allow for longer, more coherent outputs. Too much data can spoil the answers. When context window will run out of capacity a shortening of context might trigger. Instead of 45 672 tokens it will make a summary for example around 10 000 tokens. You might say it will run a garbage collect on unused data in context.

Model	Parameters	Size (GB, FP16)	Context Window (Input Tokens)	Output Tokens	Notes
GPT-4o	~1.76T	~3.5 TB (cloud)	128,000	4,096	Default/main model in Copilot Chat
GPT-4.1	~1.8T	~3.6 TB (cloud)	128,000	4,096	High-quality coding model
GPT-5	2T+	~4+ TB (cloud)	128,000	8,192	New flagship (preview models vary)
GPT-5 mini	~100B	~200 GB (cloud)	128,000	4,096	Fast/lightweight
GPT-5.1-Codex-Max	~500B	~1 TB (cloud)	400,000	128,000	Max context preview (code-focused)
Gemini 3 Pro (Preview)	1.5T	~3 TB (cloud)	128,000	64,000	Google model, codebase indexing
Gemini 2.5 Pro	~500B	~1 TB (cloud)	64,000–128,000*	Varies	Often limited vs native 1M
Claude Sonnet 4	400B	~800 GB (cloud)	80,000–128,000*	4,096	Preview models capped; native up to 1M
o4-mini	~50B	~100 GB (cloud)	~100,000	4,096	Fast OpenAI variant
Qwen4	32B–72B	64–144 GB (local)	128K–1M	8K–32K	Alibaba; excellent code/math; Ollama-friendly
DeepSeek R1	671B (37B active MoE)	~1.3 TB (cloud/~74 GB active)	128,000	32,768	Reasoning/coding beast; distilled 7B–70B local
Ollama 3.2 (Llama 3.2 base)	3B–405B*	6–810 GB (local, quantized ~2–200 GB)	128K (configurable)	Varies	Your local setup; `num_ctx: 131072`; vision support

18. Inference

Inference refers to the process of running a trained model to generate predictions or responses, as distinct from training (which adjusts internal weights).

19. Model Parameters

Parameters are the internal values in a neural network (often millions or billions) that determine how inputs transform into outputs. They are adjusted during training. Inference happens when a trained AI model (like GPT-4o or your Ollama Qwen4) takes new input and generates predictions/output using patterns learned during training. No learning happens—inference applies fixed knowledge to fresh data

20. Plan / Subscription Tier

Most AI platforms offer plans or tiers that define limits such as context length, tokens to use (input and output), model choice or request volume. Paid plans often allow access to more advanced models and APIs.

21. Vector Database

Vector databases store embeddings (numerical vectors from text/images) and enable similarity search using cosine distance. Perfect for semantic search.

22. Gradient Descent

Gradient descent is a core optimization algorithm that updates model parameters by finding the direction that reduces error step by step. It is the core optimization algorithm that trains AI models by iteratively minimizing a loss function—finding the best weights/parameters. Think „hill descending” in parameter space.

23. Loss Function

A loss function measures how far the model’s predictions deviate from the correct answers. Lower loss means better performance during training and stable output.

Task	Loss Function	Formula	Example	When
Regression (numbers)	MSE	`(pred - actual)²`	Recipe IG: predict 45, actual 42 → loss=9	Embeddings, prices
Regression (robust)	MAE	`	pred – actual	`
Classification (binary)	Cross-Entropy	`-[ylog(pred) + (1-y)log(1-pred)]`	„Is fasting?” 1→0.9: loss=0.1	Spam/not spam
Classification (multi)	Categorical Cross-Entropy	Extension of binary	Recipe tags	Multiple categories
Margin (SVM)	Hinge	`max(0, 1 - margin)`	Class separation	Robust classification

Yet another basic AI glossary part 1

AI & Machine Learning Glossary for Beginners

1. Logit

2. Logit Definition (Mathematical View)

3. Softmax function

4. Temperature (in AI sampling)

5. Top-k Sampling

6. Top-p (Nucleus Sampling)

7. Random Forest

8. Euclidean Distance

9. Cosine Similarity

10. Dot Product

11. Dot Product and Cosine Relationship

12. Embedding

13. Token

14. Instruction File

15. Fine-tuning

16. Prompt

CONSTRAINTS

EXAMPLE: Lodash → Native (common case)

17. Context Window

18. Inference

19. Model Parameters

20. Plan / Subscription Tier

21. Vector Database

22. Gradient Descent

23. Loss Function

Want more ?

Read more !

AI & Machine Learning Glossary for Beginners

1. Logit

2. Logit Definition (Mathematical View)

3. Softmax function

4. Temperature (in AI sampling)

5. Top-k Sampling

6. Top-p (Nucleus Sampling)

7. Random Forest

8. Euclidean Distance

9. Cosine Similarity

10. Dot Product

11. Dot Product and Cosine Relationship

12. Embedding

13. Token

14. Instruction File

15. Fine-tuning

16. Prompt

CONSTRAINTS

EXAMPLE: Lodash → Native (common case)

17. Context Window

18. Inference

19. Model Parameters

20. Plan / Subscription Tier

21. Vector Database

22. Gradient Descent

23. Loss Function

Want more ?

Read more !

Zobacz również

GIT token access issue – use special API url !

Why people do not use Dvorak

How to get a QR code idea