Skip to content

Conversation

@maurycy
Copy link
Contributor

@maurycy maurycy commented Jan 31, 2026

The hint enables Transparent Huge Pages on systems with madvise, which seems to be the default on Ubuntu and Fedora, at least according to this article.

More on THP:

Importantly, it seems to cary no SIGBUS risk. mimalloc seems to already do this with MIMALLOC_LARGE_OS_PAGES=1.

Reusing the benchmark from #144319:

bench_obmalloc.py
import sys, gc

def bench_small_object_churn():
    objs = []
    for _ in range(200_000): objs.append(bytearray(64))
    for _ in range(200_000): objs.append(bytearray(64)); objs.pop(0)

def bench_bulk_small_alloc():
    objs = [bytearray(48) for _ in range(1_000_000)]
    for o in objs: o[0] = 1

def bench_dict_churn():
    for _ in range(500_000): d = {"a": 1, "b": 2, "c": 3, "d": 4}; del d

def bench_mixed_sizes():
    sizes = [8, 16, 24, 32, 48, 64, 96, 128, 192, 256, 384, 512]
    objs = [bytearray(sizes[i % 12]) for i in range(500_000)]

def bench_fragmentation():
    objs = [bytearray(128) for _ in range(500_000)]
    for i in range(0, len(objs), 2): objs[i] = None
    for i in range(0, len(objs), 2): objs[i] = bytearray(128)

def bench_list_of_tuples():
    objs = [(i, i+1, i+2) for i in range(1_000_000)]

def bench_class_instances():
    class Pt:
        __slots__ = ('x', 'y', 'z')
        def __init__(s, x, y, z): s.x = x; s.y = y; s.z = z
    objs = [Pt(i, i+1, i+2) for i in range(500_000)]

def bench_arena_pressure():
    layers = [[bytearray(256) for _ in range(200_000)] for _ in range(10)]

def bench_random_walk():
    import random; random.seed(42)
    objs = [bytearray(64) for _ in range(1_000_000)]
    idx = list(range(len(objs))); random.shuffle(idx)
    for i in idx: objs[i][0] = i & 0xff

BENCHMARKS = dict(small_object_churn=bench_small_object_churn,
    bulk_small_alloc=bench_bulk_small_alloc, dict_churn=bench_dict_churn,
    mixed_sizes=bench_mixed_sizes, fragmentation=bench_fragmentation,
    list_of_tuples=bench_list_of_tuples, class_instances=bench_class_instances,
    arena_pressure=bench_arena_pressure, random_walk=bench_random_walk)

if __name__ == "__main__":
    gc.collect(); gc.disable(); BENCHMARKS[sys.argv[1]](); gc.enable()

on

[126] 2026-01-31T02:32:04.127734128+0100 maurycy@eiger /home/maurycy  % sudo cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

Where the baseline is the main branch

Wall-clock time

Benchmark Baseline With MADV_HUGEPAGE Change
fragmentation 0.107s 0.101s -5.4%
bulk_small_alloc 0.126s 0.121s -4.1%
class_instances 0.078s 0.076s -2.9%
list_of_tuples 0.102s 0.101s -1.2%
mixed_sizes 0.085s 0.084s -1.1%
random_walk 0.517s 0.515s -0.4%
arena_pressure 0.325s 0.326s +0.3%

dTLB load misses

Benchmark Baseline With MADV_HUGEPAGE Change
fragmentation 123,390 99,413 -19.4%
arena_pressure 280,228 237,222 -15.3%
bulk_small_alloc 93,894 85,661 -8.8%
list_of_tuples 88,019 81,778 -7.1%

It's smaller than MAP_HUGETLB because MADV_HUGEPAGE is just a hint, so maybe khugepaged did not kick in yet.

I noted no regression with THP=always.

The only thing that I'm wondering whether and how it should be guarded. Enabling by default seems risky, but it's not exactly --with-pymalloc-hugepages. That's why I'm opening this as a draft.

Meanwhile, I'm running the whole pyperformance suite to check if there are any regressions, I will update this PR description once known.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant