Thread Memory & Virtual Threads Internals

Bài trước: ClassLoader & JIT — Bạn đã biết cách JVM load và optimize code. Bài này giải thích mỗi thread tương tác với memory thế nào và cơ chế bên trong Virtual Threads.

Thread Stack Memory

Mỗi platform thread = 1 OS thread + 1 JVM stack:

Platform Thread #1:           Platform Thread #2:
┌─────────────────┐          ┌─────────────────┐
│ JVM Stack       │          │ JVM Stack       │
│ (~512KB-2MB)    │          │ (~512KB-2MB)    │
│ ┌─────────────┐ │          │ ┌─────────────┐ │
│ │ Frame: foo()│ │          │ │ Frame: bar()│ │
│ ├─────────────┤ │          │ ├─────────────┤ │
│ │ Frame: main │ │          │ │ Frame: run  │ │
│ └─────────────┘ │          │ └─────────────┘ │
└────────┬────────┘          └────────┬────────┘
         │                            │
         ▼                            ▼
┌──────────────────────────────────────────────┐
│              Shared Heap                     │
│  Objects, arrays, String Pool                │
└──────────────────────────────────────────────┘

Memory Cost per Thread

1 platform thread ≈ 512KB - 2MB stack memory (default ~1MB)

100 threads   × 1MB = 100MB   ← OK
1,000 threads × 1MB = 1GB     ← Đáng kể
10,000 threads × 1MB = 10GB   ← Chỉ cho stacks!

Thread scaling limit

Thread-per-request model (mỗi HTTP request = 1 thread):

10,000 concurrent requests = 10,000 threads = ~10GB chỉ cho stacks
Chưa tính heap memory cho objects
OS cũng có giới hạn native threads (~30,000 trên Linux mặc định)

Đây là lý do Virtual Threads ra đời.

Java Memory Model (JMM)

Vấn đề: Visibility

Mỗi thread có working memory (CPU cache/registers) riêng — không phải lúc nào cũng đồng bộ với main memory (Heap):

Thread 1 (Core 1):        Thread 2 (Core 2):
┌───────────────┐         ┌───────────────┐
│ Working Memory│         │ Working Memory│
│ x = 42        │         │ x = 0 (?!)    │ ← Chưa thấy update!
│ (CPU cache)   │         │ (CPU cache)   │
└───────┬───────┘         └───────┬───────┘
        │                         │
        ▼                         ▼
┌──────────────────────────────────────────┐
│           Main Memory (Heap)             │
│           x = 42                         │
└──────────────────────────────────────────┘

// Ví dụ: Visibility problem
class SharedData {
    boolean running = true;  // Shared variable

    void stop() {
        running = false;  // Thread 1 ghi
    }

    void run() {
        while (running) {   // Thread 2 đọc
            // Có thể KHÔNG BAO GIỜ dừng!
            // Thread 2 đọc running từ cache → luôn true
        }
    }
}

Happens-Before Relationship

JMM (Java Memory Model) định nghĩa happens-before rules — khi nào thay đổi của thread A chắc chắn thấy được bởi thread B. JLS §17.4.5 liệt kê 8 quy tắc:

Rule	Mô tả	Ví dụ
1. Program Order	Trong cùng thread, action trước happens-before action sau	`x = 1; y = 2;` → ghi x happens-before ghi y
2. Monitor Lock	unlock() happens-before lock() tiếp theo trên cùng monitor	Thread A exit `synchronized(obj)` → Thread B enter `synchronized(obj)` thấy thay đổi của A
3. Volatile Variable	Write volatile happens-before read volatile cùng biến	Thread A ghi `volatile flag = true` → Thread B đọc `flag` thấy true và mọi thay đổi trước đó
4. Thread Start	`thread.start()` happens-before mọi action trong thread đó	Main thread gọi `t.start()` → Thread t thấy mọi thay đổi trước start()
5. Thread Termination	Mọi action trong thread happens-before `join()` return	Thread t kết thúc → `t.join()` return → main thread thấy mọi thay đổi của t
6. Interruption	`thread.interrupt()` happens-before thread detect interrupt	Thread A gọi `t.interrupt()` → Thread t phát hiện qua `isInterrupted()` hoặc `InterruptedException`
7. Finalizer	Constructor kết thúc happens-before `finalize()` bắt đầu	Object được tạo hoàn toàn trước khi GC gọi finalizer
8. Transitivity	Nếu A hb B và B hb C → A hb C	`x = 1` hb `volatile y = 2` hb đọc y → đọc y cũng thấy x = 1

Transitivity — Quy tắc mạnh mẽ nhất

Transitivity cho phép chain happens-before relationships qua nhiều threads. Ví dụ:

// Thread 1:
x = 42;              // (1)
volatile flag = true; // (2) happens-before (1) do program order

// Thread 2:
if (flag) {          // (3) đọc volatile → happens-before từ (2)
    print(x);        // Đảm bảo thấy x = 42 nhờ transitivity: (1) hb (2) hb (3)
}

Không có volatile, thread 2 có thể thấy flag = true nhưng vẫn x = 0 (reordering).

Memory Barriers

CPU và compiler có thể reorder instructions để tối ưu hiệu suất. Memory barriers (rào cản bộ nhớ) ngăn reordering xung quanh điểm barrier:

4 loại memory barriers:

LoadLoad:   Load1; LoadLoad; Load2
            → Load1 hoàn thành trước Load2

StoreStore: Store1; StoreStore; Store2
            → Store1 visible trước Store2

LoadStore:  Load1; LoadStore; Store2
            → Load1 hoàn thành trước Store2

StoreLoad:  Store1; StoreLoad; Load2
            → Store1 visible trước Load2 (đắt nhất — flush cache)

Volatile mapping to barriers:

// Write volatile:
<stores before volatile>
StoreStore barrier
<volatile write>          // Flush cache → main memory
StoreLoad barrier         // Đắt nhất! Đảm bảo write visible cho reads tiếp theo

// Read volatile:
<volatile read>           // Read từ main memory
LoadLoad barrier
LoadStore barrier
<loads/stores after volatile>

Tại sao StoreLoad đắt nhất?

StoreLoad barrier phải flush store buffer và invalidate CPU cache — đồng bộ toàn bộ hierarchy. LoadLoad/StoreStore chỉ cần ngăn reordering trong pipeline, không cần flush.

Điều này làm volatile write ~4-5x chậm hơn write thông thường trên x86.

volatile Deep Dive

volatile cung cấp 3 đảm bảo chính:

1. Visibility — Đọc/ghi trực tiếp main memory

class SharedData {
    volatile boolean running = true;  // volatile: đọc/ghi trực tiếp main memory

    void stop() {
        running = false;  // Write trực tiếp → main memory
    }

    void run() {
        while (running) {   // Read trực tiếp từ main memory
            // Bây giờ SẼ dừng khi running = false
        }
    }
}

2. Ordering Guarantees — Ngăn reordering

Volatile write tạo release fence (mọi thay đổi trước đó không reorder ra sau). Volatile read tạo acquire fence (mọi đọc/ghi sau không reorder lên trước):

int x = 0, y = 0;
volatile boolean flag = false;

// Thread 1:
x = 42;              // (1) Thay đổi trước volatile write
y = 100;             // (2) Thay đổi trước volatile write
flag = true;         // (3) Volatile write = release fence → (1), (2) KHÔNG reorder xuống dưới

// Thread 2:
if (flag) {          // (4) Volatile read = acquire fence
    print(x, y);     // (5), (6) KHÔNG reorder lên trên (4)
    // Đảm bảo thấy x = 42, y = 100 (không có reordering làm x/y vẫn 0)
}

3. Atomic 64-bit Read/Write (long và double)

Trên 32-bit JVM, long và double (64-bit) có thể bị torn reads/writes (đọc/ghi từng nửa 32-bit). Volatile ngăn điều này:

long counter = 0;  // Không volatile: thread có thể đọc 32-bit cao từ value cũ, 32-bit thấp từ value mới!

volatile long counter = 0;  // Atomic: đọc/ghi nguyên 64-bit

volatile KHÔNG đủ cho compound operations

volatile chỉ đảm bảo atomic single read hoặc single write, KHÔNG đảm bảo compound operations:

volatile int count = 0;
count++;  // ❌ KHÔNG atomic! = read count → increment → write count
// 2 threads đồng thời count++ có thể mất update

// ✅ AtomicInteger cho compound operations
AtomicInteger count = new AtomicInteger(0);
count.incrementAndGet();  // Atomic compare-and-swap (CAS)

So sánh volatile vs AtomicInteger:

	volatile int	AtomicInteger
Read/write đơn	Atomic	Atomic
Visibility	Có	Có
count++	❌ Không atomic	✅ Atomic (CAS)
compareAndSet	❌ Không có	✅ Có
Performance	Nhanh hơn (read/write)	Chậm hơn (CAS loop)
Use case	Flags, state variables	Counters, complex updates

// volatile: chỉ dùng cho read/write đơn giản
volatile boolean flag = true;
flag = false;  // OK

// AtomicInteger: cho compound operations
AtomicInteger counter = new AtomicInteger(0);
counter.compareAndSet(0, 1);  // Atomic: nếu == 0 thì set 1
counter.addAndGet(5);          // Atomic: += 5

synchronized — Visibility + Mutual Exclusion

synchronized cung cấp cả visibility (happens-before) VÀ mutual exclusion (chỉ 1 thread tại 1 thời điểm):

class Counter {
    private int count = 0;

    // Khi thread exit synchronized block:
    // 1. Flush working memory → main memory (tất cả thay đổi)
    // 2. Release lock
    // Khi thread enter synchronized block:
    // 1. Acquire lock
    // 2. Invalidate cache → đọc từ main memory

    synchronized void increment() {
        count++;  // Atomic: read + increment + write trong lock
    }

    synchronized int getCount() {
        return count;  // Đọc giá trị mới nhất từ main memory
    }
}

OCP Exam Tips

1. Double-Checked Locking (bẫy cổ điển)

Pattern này bị broken trước Java 5 và chỉ an toàn nếu biến là volatile:

// ❌ Broken (trước Java 5) — exam hay hỏi!
class Singleton {
    private static Singleton instance;

    public static Singleton getInstance() {
        if (instance == null) {              // Check 1 (không lock)
            synchronized (Singleton.class) {
                if (instance == null) {      // Check 2 (trong lock)
                    instance = new Singleton();  // ⚠️ Có thể return partially constructed object!
                }
            }
        }
        return instance;
    }
}

Vấn đề: new Singleton() gồm 3 bước:

Allocate memory
Initialize object
Assign reference to instance

CPU có thể reorder → bước 3 xảy ra trước bước 2! Thread khác check instance == null → false → return object chưa initialize xong.

Fix: Thêm volatile (Java 5+):

// ✅ An toàn với volatile (từ Java 5)
private static volatile Singleton instance;  // volatile ngăn reordering

2. Publication Safety — Partially Constructed Objects

Không volatile/synchronized, thread khác có thể thấy object chưa khởi tạo xong:

class Resource {
    private int value;

    public Resource(int value) {
        this.value = value;  // Step 1: ghi field
    }                        // Step 2: return reference
}

// Thread 1:
Resource r = new Resource(42);  // Có thể reorder → return r trước khi value = 42

// Thread 2:
if (r != null) {
    print(r.value);  // Có thể in 0 (giá trị default) thay vì 42!
}

Fix: Dùng volatile cho reference hoặc publish qua synchronized/final.

3. Thread.sleep() KHÔNG release locks

Đây là câu hỏi exam phổ biến:

synchronized (lock) {
    Thread.sleep(1000);  // ⚠️ GIỮ lock trong 1 giây!
    // Threads khác bị block hết 1 giây
}

// ✅ Nên release lock trước sleep
lock.notifyAll();
lock.wait(1000);  // wait() release lock, sleep() không!

4. volatile đảm bảo visibility KHÔNG atomicity

// Exam trick question:
volatile int count = 0;

// 10 threads đồng thời:
count++;  // ❌ Không thread-safe! count++ = read + increment + write

// Sau khi 10 threads chạy xong, count có thể < 10 vì race condition

Virtual Threads Deep Dive (Java 21)

Platform Thread vs Virtual Thread

	Platform Thread	Virtual Thread
Mapping	1:1 với OS thread	N:1 (nhiều VT trên 1 carrier)
Stack size	~1MB (fixed)	~1KB (growable, stack copy)
Tạo mới	Đắt (~1ms, kernel call)	Rẻ (~1μs, JVM managed)
Số lượng	~10,000 max (OS limit)	Hàng triệu
Scheduling	OS kernel scheduler	JVM ForkJoinPool scheduler

Mount/Unmount Mechanism

Đây là cơ chế cốt lõi của Virtual Threads:

Carrier Thread Pool (ForkJoinPool, mặc định = số CPU cores):
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Carrier1│ │Carrier2│ │Carrier3│ │Carrier4│  (4 cores = 4 carriers)
└───┬────┘ └───┬────┘ └───┬────┘ └────────┘
    │          │          │
    ▼          ▼          ▼
  VT-1       VT-2       VT-3    VT-4, VT-5, ... (hàng nghìn, chờ mount)
(mounted)  (mounted)  (mounted)  (unmounted, waiting)

Lifecycle:

// Ví dụ: 10,000 concurrent HTTP requests
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    for (int i = 0; i < 10_000; i++) {
        executor.submit(() -> {
            // Mỗi virtual thread:
            // 1. Mount → carrier thread
            String response = httpClient.send(request);  // Blocking I/O
            // 2. Unmount (I/O blocking) → carrier freed
            // 3. I/O done → mount lại
            process(response);
            // 4. Task done → virtual thread ends
        });
    }
}
// 10,000 virtual threads, nhưng chỉ cần ~4 carrier threads (= CPU cores)

Tại sao tiết kiệm Memory?

Platform threads (10,000 requests):
  10,000 × ~1MB stack = ~10GB

Virtual threads (10,000 requests):
  10,000 × ~1KB stack = ~10MB   ← 1000x ít hơn!
  + 4 carrier threads × ~1MB = ~4MB
  Total: ~14MB vs 10GB

Virtual Thread Stack

Virtual thread stack growable — bắt đầu nhỏ (~1KB) và mở rộng khi cần. JVM dùng stack copying (copy stack frames khi mount/unmount) thay vì fixed-size allocation.

Pinning — Khi Virtual Thread bị "ghim"

synchronized block pin virtual thread vào carrier → không thể unmount:

// ❌ Pinning: virtual thread bị ghim vào carrier
synchronized (lock) {
    // Nếu blocking I/O ở đây → carrier bị block
    // Các virtual threads khác KHÔNG thể mount lên carrier này
    Thread.sleep(1000);  // Carrier thread bị block 1 giây!
}

// ✅ Dùng ReentrantLock thay synchronized
private final ReentrantLock lock = new ReentrantLock();

lock.lock();
try {
    Thread.sleep(1000);  // Virtual thread unmount → carrier freed!
} finally {
    lock.unlock();
}

Pinning giảm scalability

Nếu tất cả carrier threads bị pinned → không virtual thread nào chạy được → throughput giảm về 0.

Fix: Thay synchronized bằng ReentrantLock cho code sections có blocking I/O.

Detect: -Djdk.tracePinnedThreads=full để log khi pinning xảy ra.

Khi nào dùng Virtual Threads?

Use case	Virtual Threads?	Lý do
I/O-bound (HTTP, DB, file)	Có	Blocking I/O → unmount → carrier freed
CPU-bound (tính toán)	Không	CPU task không block → không cần unmount
Thread-per-request (web server)	Có	Hàng triệu concurrent requests
Thread pools nhỏ (< 100 threads)	Không cần	Platform threads đủ
Pinning-heavy code (nhiều synchronized + I/O)	Cẩn thận	Pinning giảm hiệu quả

Lỗi thường gặp

Lỗi 1: Quên volatile cho shared boolean flag

// ❌ Thread có thể không bao giờ thấy running = false
boolean running = true;

// ✅ volatile đảm bảo visibility
volatile boolean running = true;

Lỗi 2: Dùng synchronized trong virtual threads cho I/O

// ❌ Pin virtual thread → block carrier
synchronized (lock) {
    database.query("SELECT ...");  // I/O trong synchronized
}

// ✅ ReentrantLock cho phép unmount
lock.lock();
try {
    database.query("SELECT ...");
} finally {
    lock.unlock();
}

Lỗi 3: Dùng virtual threads cho CPU-intensive task

// ❌ Virtual thread cho CPU task → overhead không cần thiết
// CPU task không block → không unmount → không lợi gì
Executors.newVirtualThreadPerTaskExecutor()
    .submit(() -> computePI(1_000_000));  // CPU-bound

// ✅ Platform thread pool cho CPU tasks
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors())
    .submit(() -> computePI(1_000_000));

Bài tập

Bài 1: Visibility Problem `[Cơ bản]`

Giải thích tại sao chương trình sau có thể chạy mãi không dừng, và fix:

public class VisibilityDemo {
    static boolean stop = false;

    public static void main(String[] args) throws Exception {
        Thread t = new Thread(() -> {
            while (!stop) { /* busy wait */ }
            System.out.println("Stopped!");
        });
        t.start();
        Thread.sleep(1000);
        stop = true;
        System.out.println("stop = true set");
    }
}

Xem lời giải

Tại sao chạy mãi: Thread t đọc stop từ CPU cache (working memory). Main thread ghi stop = true nhưng Thread t không thấy vì cache chưa đồng bộ.

Fix: Thêm volatile:

static volatile boolean stop = false;

Bài 2: Happens-Before Analysis `[Trung bình]`

Cho code sau, kết quả result có thể là bao nhiêu? Giải thích bằng happens-before.

int x = 0, y = 0, result = 0;

// Thread 1:
x = 1;
y = 1;

// Thread 2:
if (y == 1) {
    result = x;  // Có thể là 0 hoặc 1?
}

Xem lời giải

result có thể là 0 hoặc 1.

result = 1: Nếu Thread 2 thấy y = 1, nghĩa là Thread 1 đã ghi y. Nhưng...
result = 0: Compiler/CPU có thể reorder instructions. Thread 1 có thể ghi y = 1 trước x = 1. Thread 2 thấy y == 1 nhưng x vẫn 0.

Không có happens-before relationship giữa Thread 1 và Thread 2 (không có volatile, synchronized, hay join). JMM cho phép reorder.

Fix: Làm y volatile → write y happens-before read y → x = 1 chắc chắn thấy được.

Bài 3: Virtual Thread Performance `[Thách thức]`

Viết benchmark so sánh: (a) 10,000 platform threads (cached pool), (b) 10,000 virtual threads. Mỗi thread sleep 1 second (mô phỏng I/O). Đo tổng thời gian hoàn thành.

Gợi ý

long start = System.currentTimeMillis();
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    for (int i = 0; i < 10_000; i++) {
        executor.submit(() -> { Thread.sleep(Duration.ofSeconds(1)); return null; });
    }
}
long elapsed = System.currentTimeMillis() - start;

Kỳ vọng: Virtual threads hoàn thành ~1-2 giây (10K unmounted, chỉ cần vài carriers). Platform threads: lâu hơn nhiều hoặc OOM.

Tóm tắt

Khái niệm	Điểm chính
Thread Stack	~1MB per platform thread, chứa method call frames
Working Memory	CPU cache — thread đọc/ghi local copy, không phải main memory
volatile	Đọc/ghi trực tiếp main memory, đảm bảo visibility
synchronized	Visibility + mutual exclusion + happens-before
Happens-before	Quy tắc xác định khi nào thay đổi chắc chắn thấy được
Virtual Threads	Lightweight (~1KB stack), mount/unmount on carriers
Pinning	synchronized block pin VT vào carrier → dùng ReentrantLock thay

Đọc thêm

Oracle: Java Memory Model - JLS §17.4 — Đặc tả chính thức của JMM, happens-before rules
JSR-133 FAQ — Giải thích dễ hiểu về memory model từ tác giả JSR-133
Oracle: Virtual Threads — Hướng dẫn chính thức về Virtual Threads
Baeldung: Java Memory Model — Tutorial thực hành về JMM

Thread Stack Memory​

Memory Cost per Thread​

Java Memory Model (JMM)​

Vấn đề: Visibility​

Happens-Before Relationship​

Memory Barriers​

volatile Deep Dive​

1. Visibility — Đọc/ghi trực tiếp main memory​

2. Ordering Guarantees — Ngăn reordering​

3. Atomic 64-bit Read/Write (long và double)​

synchronized — Visibility + Mutual Exclusion​

Virtual Threads Deep Dive (Java 21)​

Platform Thread vs Virtual Thread​

Mount/Unmount Mechanism​

Tại sao tiết kiệm Memory?​

Pinning — Khi Virtual Thread bị "ghim"​

Khi nào dùng Virtual Threads?​

Lỗi thường gặp​

Bài tập​

Bài 1: Visibility Problem [Cơ bản]​

Bài 2: Happens-Before Analysis [Trung bình]​

Bài 3: Virtual Thread Performance [Thách thức]​

Tóm tắt​

Đọc thêm​