Everything old is new again: memory optimization

4 days ago (nibblestew.blogspot.com)

98 comments

ibobev

I'm always confused as hell how little insight we have in memory consumption.

I look at memory profiles of rnomal apps and often think "what is burning that memory".

Modern compression works so well, whats happening? Open your taskmaster and look through apps and you might ask yourself this.

For example (lets ignore chrome, ms teams and all the other bloat) sublime consumes 200mb. I have 4 text files open. What is it doing?

Alone for chrome to implement tab suspend took YEARS despite everyone being aware of the issue. And addons existed which were able to do this.

I bought more ram just for chrome...

pjc50 4 hours ago
https://learn.microsoft.com/en-us/sysinternals/downloads/vmm... for an empty sublime text window gives me:
- 100MB 'image' (ie executable code; the executable itself plus all the OS libraries loaded.)
- 40MB heap
- 50MB "mapped file", mostly fonts opened with mmap() or the windows equivalent
- 45MB stack (each thread gets 2MB)
- 40MB "shareable" (no idea)
- 5MB "unusable" (appears to be address space that's not usable because of fragmentation, not actual RAM)
Generally if something's using a lot of RAM, the answer will be bitmaps of various sorts: draw buffers, decompressed textures, fonts, other graphical assets, and so on. In this case it's just allocated but not yet used heap+stacks, plus 100MB for the code.
Edit: I may be underestimating the role of binary code size. Visual Studio "devenv.exe" is sitting at 2GB of 'image'. Zoom is 500MB. VSCode is 300MB. Much of which are app-specific, not just Windows DLLs.
- muskstinks 3 hours ago
  
  Tx for the breakdown. I will play around with it later on my windows machine.
  But isn't it crazy how we throw out so much memory just because of random buffers? It feels wrong to me
  
  2 replies →
- wat10000 3 hours ago
  
  Turning these numbers into "memory consumption" gets complicated to the point of being intractable.
  The portions that are allocated but not yet used might just be page table entries with no backing memory, making them free. Except for the memory tracking the page table entries. Almost free....
  A lot of "image" will be mmapped and clean. Anything you don't actually use from that will be similarly freeish. Anything that's constantly needed will use memory. Except if it's mapped into multiple processes, then it's needed but responsibility is spread out. How do you count an app's memory usage when there's a big chunk of code that needs to sit in RAM as long as any of a dozen processes are running? How do you count code that might be used sometime in the next few minutes or might not be depending on what the user does?
- Capricorn2481 4 hours ago
  
  But I have sublime text open with a hundred files and it's using 12mb.
  
  1 reply →
inetknght 3 hours ago
> I look at memory profiles of rnomal apps and often think "what is burning that memory".
As a corrolary to this: I look at CPU utilization graphs. Programs are completely idle. "What is burning all that CPU?!"
I remember using a computer with RAM measured in two-digit amounts of MiB. CPU measured in low hundreds of MHz. It felt just as fast -- sometimes faster -- as modern computers. Where is all of that extra RAM being used?! Where is all of that extra performance going?! There's no need for it!
- ThrowawayR2 2 hours ago
  
  Next time you see someone on HN blithely post "CPU / RAM is cheaper than developer time", it's them. That is the sort of coder who are collectively wasting our CPU and RAM.
- pjc50 2 hours ago
  
  > I remember using a computer with RAM measured in two-digit amounts of MiB
  Yes, so do I. It was limited to 800x600x16 color mode or 320x200x256. A significant amount of memory gets consumed by graphical assets, especially in web browsers which tend to keep uncompressed copies of images around so they can blit them into position.
  But a lot is wasted, often by routing things through single bottlenecks in the whole system. Antivirus programs. Global locks. Syncing to the filesystem at the wrong granularity. And so on.
- VorpalWay 1 hour ago
  
  I too wonder that. And it is true on an OS level as well. The only worthwhile change in desktop environments since the early 2000s has been search as you type launchers. Other than that I would happily use something equivalent to Windows XP or (more likely) Linux with KDE 3. It seems everything else since then has mostly been bloat and stylistic design changes. The latter being a waste of time in my opinion.
  Of course, some software other than desktop environments have seen important innovation, such as LSPs in IDEs which allows avoiding every IDE implementing support for every language. And SSDs were truly revolutionary in hardware, in making computers feel faster. Modern GPUs can push a lot more advanced graphics as well in games. And so on. My point above was just about your basic desktop environment. Unless you use a tiling window manager (which I tried but never liked) nothing much has happened for a very long time. So just leave it alone please.
- HPsquared 3 hours ago
  
  Work expands to fill the available time. This applies to CPU time just as it does to project management.
gwbas1c 4 hours ago

Basically, the short answer is that most memory managers allocate more memory than a process needs, and then reuse it.
IE, in a JVM (Java) or dotnet (C#) process, the garbage collector allocates some memory from the operating system and keeps reusing it as it finds free memory and the program needs it.
These systems are built with the assumption that RAM is cheap and CPU cycles aren't, so they are highly optimized CPU-wise, but otherwise are RAM inefficient.
ben-schaaf 2 hours ago

Completely agree, it would be very helpful to get even just a breakdown of what the ram is being used for. It's unfortunately a lot of work to instrument.
> sublime consumes 200mb. I have 4 text files open. What is it doing?
To add to what others have said: Depending on the platform a good amount will be the system itself, various buffers and caches. If you have a folder open in the side bar, Sublime Text will track and index all the files in there. There's also no limit to undo history that is kept in RAM.
There's also the possibility that that 200MB includes the subprocesses, meaning the two python plugin hosts and any processes your plugins spawn - which can include heavy LSP servers.
senfiaj 4 hours ago

It's partly because there are layers of abstractions (frameworks, libraries / runtimes / VM, etc). Also, today's software often has other pressures, like development time, maintainability, security, robustness, accessibility, portability (OS / CPU architecture), etc. It's partly because the complexity / demand has increased.
https://waspdev.com/articles/2025-11-04/some-software-bloat-...
pjmlp 2 hours ago

It is a matter of tooling.
Visual Studio runs the memory profiler in debug mode right from the start, it is the default configuration, you need to disable it.
https://learn.microsoft.com/en-us/visualstudio/profiling/mem...
Orygin 5 hours ago

200Mb for Sublime does not seem so bad when compared to Postman using 4Gb on my machine...
veunes 5 hours ago

Part of the problem is that modern apps aren't really "one thing" anymore
Capricorn2481 4 hours ago
> sublime consumes 200mb. I have 4 text files open. What is it doing?
Huh? Sublime Text? I have like 100 files open and it uses 12mb. Sublime is extremely lean.
Do you have plugins installed?
- muskstinks 3 hours ago
  
  I do not have plugins installed and i have only a handful of files open on macos.
  Memroy statistics says 200mb and a peak of 750mb in the past (for whatever reason)
  
  3 replies →

1vuio0pswjnm7 2 hours ago

Been waiting for online commentary about programming to start acknowledging this situation as it pertains to writing programs

Memory and storage are not "cheap" anymore. Power may also rise in cost

Under these conditions, memory usage and binary size are irrefutably relevant^1

To some, this might feel like going backwards in time toward the mainframe era. Another current HN item with over 100 points, "Hold on to your hardware", reflects on how consumer hardware may change as a result

To me, the past was a time of greater software efficiency; arguably this was necessitated by cost. Perhaps higher costs in the present and future could lead to better software quality. But whether today's programmers are up for the challenge is debatable. It's like young people in finance whose only experience is in a world with "zero" interest rates. It's easier to whine about lowering rates than to adapt

With the money and poltical support available to "AI" companies, the incentive for efficiency of any kind is lacking. Perhaps their "no limits" operations, e.g., its effects on supply, may provide an incentive for others' efficiency

1. As an underpowered computer user that compiles own OS and writes own simple programs, I've always rejected large binary size and excessive memory use, even in times of "abundance"

canpan 6 hours ago

String views were a solid addition to C++. Still underutilized. It does not matter which language you are using when you make thousands of tiny memory allocations during parsing. https://en.cppreference.com/w/cpp/string/basic_string_view.h...

VorpalWay 6 hours ago
The issue with retrofitting things to an existing well established language is that those new features will likely be underutilized. Especially in other existing parts of the standard library, since changing those would break backwards compatibly. std::optional is another example of this, which is not used much in the c++ standard library, but would be much more useful if used across the board.
Contrast this with Rust, which had the benefit of being developed several decades later. Here Option and str (string views) were in the standard library from the beginning, and every library and application uses them as fundamental vocabulary types. Combined with good support for chaining and working with these types (e.g. Option has map() to replace the content if it exists and just pass it along if None).
Retrofitting is hard, and I have no doubt there will be new ideas that can't really be retrofitted well into Rust in another decade or two as well. Hopefully at that point something new will come along that learned from the mistakes of the past.
- menaerus 4 hours ago
  
  Retrofitting new patterns or ideas is underutilized only when it is not worth the change. string_view example is trivial and anyone who cared enough about the extra allocations that could have happened already (no copy-elision taking place) rolled their own version of string_view or simply used char+len pattern. Those folks do not wait for the new standard to come along when they can already have the solution now.
  std::optional example OTOH is also a bad example because it is heavily opinionated, and having it baked into the API across the standard library would be a really wrong choice to do.
  
  3 replies →
pjc50 6 hours ago

C# gained similar benefits with Span<>/ReadOnlySpan<>. Essential for any kind of fast parser.
groundzeros2015 4 hours ago
In C you have char*
- rcxdude 4 hours ago
  
  Which isn't very good for substrings due to the null-termination requirement.
- kccqzy 4 hours ago
  
  And the type system does not tell you if you need to call free on this char* when you’re done with it.
- pjc50 4 hours ago
  
  In C you only have char*.

zahlman 2 hours ago

> This sounds like a job for Python. Indeed, an implementation takes fewer than 30 lines of code.

I don't know if the implementation is written in a "low-level" way to be more accessible to users of other programming languages, but it can certainly be done more simply leveraging the standard library:

  from collections import Counter
  import sys

  with open(sys.argv[1]) as f:
      words = Counter(word for line in f for word in line.split())

  for word, count in words.most_common():
      print(count, word)

At the very least, manually creating a (count, word) list from the dict items and then sorting and reversing it in-place is ignoring common idioms. `sorted` creates a copy already, and it can be passed a sort key and an option to sort in reverse order. A pure dict version could be:

  import sys

  with open(sys.argv[1]) as f:
    counts = {}
    for line in f:
      for word in line.split():
        counts[word] = counts.get(word, 0) + 1

  stats = sorted(counts.items(), key=lambda item: item[1], reverse=True)

  for word, count in stats:
      print(count, word)

(No, of course none of this is going to improve memory consumption meaningfully; maybe it's even worse, although intuitively I expect it to make very little difference either way. But I really feel like if you're going to pay the price for Python, you should get this kind of convenience out of it.)

Anyway, none of this is exactly revelatory. I was hoping we'd see some deeper investigation of what is actually being allocated. (Although I guess really the author's goal is to promote this Pystd project. It does look pretty neat.)

gwbas1c 4 hours ago

A lot of frameworks that use variants of "mark and sweep" garbage collection instead of automatic reference counting are built with the assumption that RAM is cheap and CPU cycles aren't, so they are highly optimized CPU-wise, but otherwise are RAM inefficient.

I wonder if frameworks like dotnet or JVM will introduce reference counting as a way to lower the RAM footprint?

pjc50 4 hours ago
Reference counting in multithreaded systems is much more expensive than it sounds because of the synchronization overhead. I don't see it coming back. I don't think it saves massive amounts of memory, either, especially given my observation with vmmap upthread that in many cases the code itself is a dominant part of the (virtual) memory usage.
- gwbas1c 34 minutes ago
  
  That's why Rust has Rc<> for single-threaded structs, and Arc<> for thread-safe structs.
- adrian_b 3 hours ago
  
  Incrementing or decrementing a shared counter is done with an atomic instruction, not with a locked critical section.
  This has negligible overhead in most cases. For instance, if the shared counter is already in some cache memory the overhead is smaller than a normal non-atomic access to the main memory. The intrinsic overhead of an atomic instruction is typically about the same as that of a simple memory access to data that is stored in the L3 cache memory, e.g. of the order of 10 nanoseconds at most.
  Moreover, many memory allocators use separate per-core memory heaps, so they avoid any accesses to shared memory that need atomic instructions or locking, except in the rare occasions when they interact with the operating system.
  
  2 replies →
- zozbot234 4 hours ago
  
  If you use an ownership/lifetime system under the hood you only pay that synchronization overhead when ownership truly changes, i.e. when a reference is added or removed that might actually impact the object's lifecycle. That's a rare case with most uses of reference counting; most of the time you're creating a "sub"-reference and its lifetime is strictly bounded by some existing owning reference.
  
  1 reply →
vaylian 4 hours ago

Unlikely. Maybe I'm overly optimistic, but I think it's fairly likely that the RAM situation will have sorted itself out in a few years. Adding reference counting to the JVM and .NET would also take considerable time.
It makes more sense for application developers to think about the unnecessary complexity that they add to software.
xyzzy_plugh 4 hours ago
That's not strictly true. Mark and sweep is tunable in ways ARC is not. You can increase frequency, reducing memory at the cost of increased compute, for example.
- cogman10 2 hours ago
  
  M&S also doesn't necessitate having a moving and compacting GC. That's the thing that actually makes the JVM's heap greedy.
  Go also does M&S and yet uses less memory. Why? Because go isn't compacting, it's instead calling malloc and free based on the results of each GC. This means that go has slower allocation and a bigger risk of memory fragmentation, but also it keeps the go memory usage reduced compared to the JVM.

griffindor 7 hours ago

Nice!

> Peak memory consumption is 1.3 MB. At this point you might want to stop reading and make a guess on how much memory a native code version of the same functionality would use.

I wish I knew the input size when attempting to estimate, but I suppose part of the challenge is also estimating the runtime's startup memory usage too.

> Compute the result into a hash table whose keys are string views, not strings

If the file is mmap'd, and the string view points into that, presumably decent performance depends on the page cache having those strings in RAM. Is that included in the memory usage figures?

Nonetheless, it's a nice optimization that the kernel chooses which hash table keys to keep hot.

The other perspective on this is that we sought out languages like Python/Ruby because the development cost was high, relative to the hardware. Hardware is now more expensive, but development costs are cheaper too.

The take away: expect more push towards efficiency!

pjc50 7 hours ago
>> Peak memory consumption is 1.3 MB. At this point you might want to stop reading and make a guess on how much memory a native code version of the same functionality would use.
At this point I'd make two observations:
- how big is the text file? I bet it's a megabyte, isn't it? Because the "naive" way to do it is to read the whole thing into memory.
- all these numbers are way too small to make meaningful distinctions. Come back when you have a gigabyte. It gets more interesting when the file doesn't fit into RAM at all.
The state of the art here is : https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times... , wherein our hero finds the terrible combination of putting the whole file in a single string and then running strlen() on it for every character.
- dgb23 6 hours ago
  
  > all these numbers are way too small to make meaningful distinctions. Come back when you have a gigabyte.
  I have to disagree. Bad performance is often a result of a death of a thousands cuts. This function might be one among countless similarly inefficient library calls, programs and so on.
  
  1 reply →
- kloop 5 hours ago
  
  > how big is the text file? I bet it's a megabyte, isn't it?
  The edit in the article says ~1.5kb
  
  2 replies →
zozbot234 3 hours ago

> If the file is mmap'd, and the string view points into that, presumably decent performance depends on the page cache having those strings in RAM.
Not so much, because you only need some fraction of that memory when the program is actually running; the OS is free to evict it as soon as it needs the RAM for something else. Non-file-backed memory can only be evicted by swapping it out and that's way more expensive,
veunes 5 hours ago

I suspect it'll be selective

tzot 7 hours ago

Well, we can use memoryview for the dict generation avoiding creation of string objects until the time for the output:

    import re, operator
    def count_words(filename):
        with open(filename, 'rb') as fp:
            data= memoryview(fp.read())
        word_counts= {}
        for match in re.finditer(br'\S+', data):
            word= data[match.start(): match.end()]
            try:
                word_counts[word]+= 1
            except KeyError:
                word_counts[word]= 1
        word_counts= sorted(word_counts.items(), key=operator.itemgetter(1), reverse=True)
        for word, count in word_counts:
            print(word.tobytes().decode(), count)

We could also use `mmap.mmap`.

akx 5 hours ago

This doesn't do the same thing though, since it's not Unicode aware.

    >>> 'x\u2009   a'.split()
    ['x', 'a']
    # incorrect; in bytes mode, `\S` doesn't know about unicode whitespace
    >>> list(re.finditer(br'\S+', 'x\u2009   a'.encode()))
    [<re.Match object; span=(0, 4), match=b'x\xe2\x80\x89'>, <re.Match object; span=(7, 8), match=b'a'>]
    # correct, in unicode mode
    >>> list(re.finditer(r'\S+', 'x\u2009   a'))
    [<re.Match object; span=(0, 1), match='x'>, <re.Match object; span=(5, 6), match='a'>]

est 2 hours ago

OP's .split_ascii() doesn't handle U+2009 as well.
edit: OP's fully native C++ version using Pystd

1 reply →
contravariant 4 hours ago

There's bound to be a way to turn a stream of bytes into a stream of unicode code points (at least I think that's what python is doing for strings). Though I'm explicitly not volunteering to write the code for it.

2 replies →

contravariant 4 hours ago

For reasons I never quite understood python has a collections.Counter for the purpose of counting things. It's a bit cleaner.

fix4fun 7 hours ago

Digression: Nowadays when RAM is expensive good old zram is gaining popularity ;) Try to check on trends.google.com . Since 2025-09 search for it doubled ;)

bcjdjsndon 3 hours ago

A few things

- since GC languages became prevalent, and maybe high level programming in general, coders arent as economic with their designs. Memory isn't something a coder should worry about apparently.

- far more people code apps in web languages because they don't know anything else. These are anywhere from 5-10 levels of abstraction away from the metal, naturally inefficient.

- increasing scope... I can only describe this one by example, web browsers must implement all manner of standards etc that it's become a mammoth task, especially compared to 90s. Same for compilers, oses, heck even computers thenselves were all one-man jobs at some point because things were simpler cos we knew less.

veunes 5 hours ago

Not "C++ everywhere again" but maybe "understanding memory again"

tombert 3 hours ago

I've been rewriting a lot of my stuff in Rust to save memory.

Rust is high-level enough to still be fun for me (tokio gives me most of the concurrency goodies I like), but the memory usage is often like 1/10th or less compared to what I would write in Clojure.

Even though I love me some lisp, pretty much all my Clojure utilities are in Rust land now.

yakkomajuri 2 hours ago

The abrupt ending was funny and then I realized the author is Finnish and it all made sense.

Nice post.

(P.S. I'm also Finnish)

dgb23 6 hours ago

Not a C++ programmer and I think the solution is neat.

But it's not necessarily an apples to apples comparison. It's not unfair to python because of the runtime overhead. It's unfair because it's a different algorithm with fundamentally different memory characteristics.

A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count. For most people that would be the first/naive approach as well when they programmed something like this I think. And it would showcase what the actual overhead of the python version is.

zahlman 1 hour ago

> It's unfair because it's a different algorithm with fundamentally different memory characteristics. A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count.
The C++ code is still building a tally by incrementing keys of a hash map one at a time, and then dumping (reversed) key/value pairs out into a list and sorting. The file is small and the Python code is GCing the `line` each time through the outer loop. At any rate it seems like a big chunk of the Python memory usage is just constant (sort of; stuff also gets lazily loaded) overhead of the Python runtime, so.
VorpalWay 6 hours ago
> A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count.
Wouldn't memory mapping the data in Python be the more fair comparison? If the language doesn't support that, then this seems to absolutely be a fair comparison.
> For most people that would be the first/naive approach as well when they programmed something like this I think.
I disagree, my mind immediately goes to mmap when I have to deal with a single file that I have to read in it's entirety. I think the non-obvious solution here is rather io-uring (which I would expect to be faster if dealing with lots of small files, as you can load them async concurrently from the file system).
- dgb23 4 hours ago
  
  I'd make the bet that "most people" (who can program) would not think of mmap, but either about streaming or would even just load the whole thing into memory.
  Ask a bunch of coding agents and they will give you these two versions, which means it's likely that the LLMs have seen these way more often than the mmap version. Both Opus and GPT even pushed back when I asked for mmap, both said it would "add complexity".
  
  2 replies →

callamdelaney 5 hours ago

I shove everything in memory, it's a design decision. Memory is still cheap, relatively.

est 7 hours ago

I think py version can be shortened as:

from collections import Counter

stats = Counter(x.strip() for l in open(sys.argv[1]) for x in l)

voidUpdate 7 hours ago
Would that decrease memory usage though?
- est 2 hours ago
  
  see https://news.ycombinator.com/item?id=47543756

90d 5 hours ago

Speaking about optimization, is Windows just too far gone at this point? It is comical the amount of resources it uses at "idle".

amelius 6 hours ago

> AI sociopaths have purchased all the world's RAM in order to run their copyright infringement factories at full blast

The ultimate bittersweet revenge would be to run our algorithms inside the RAM owned by these cloud companies. Should be possible using free accounts.

biorach 7 hours ago

"copyright infringement factories"

maipen 6 hours ago
Tells you right away where this is coming from.
- Dylan16807 4 hours ago
  
  Do you mean something specific, because that sounds like a criticism but with some blanks that need to be filled in.
  If you just mean they come across as annoyed by AI, that's true, but that's also way too wide of a category to infer basically anything else about them.
- muskstinks 6 hours ago
  
  The critisism is valid. The problem is how you value this critism.
  I agree they are stealing it but I also see the benefit of it for society and for myself.
  Suckerberg downloaded terabytes of books for training, while people around me got sued to hell 20 years ago for downloading one mp3 file.
  
  2 replies →
MrBuddyCasino 5 hours ago
I don't know how anyone can call the most amazing invention in computer science of the last 20 years "copyright infringement factories". We went from the ST:NG ship computer being futuristic tech to "we kinda have this now". Its like calling cars "air pollution factories", as if that was their only purpose and use.
A fundamentally anti-civilisational mindset.
- muskstinks 3 hours ago
  
  You can see both sides, critzise how its done and still wanting to have the result of it.
  Its a little bit hypocritic which often enough ends in realism aka "okay we clearly can't fight their copyright infridgments because they are too powerful and too rich but at least we can use the good side of it".
  Nothing btw. enforces all of this to happen THAT fast besides capitalism. We could slow down, we could do it better or more right.
- vor_ 5 hours ago
  
  I'm sorry, but you're acting obtuse if you pretend you don't know why they're being called that.
- ElectronCharge 3 hours ago
  
  LLMs are amazing technology. It's crazy to interact with something that knows a lot about effectively everything that's ever been written, as well as mimicking human cognition to a large degree.
  What LLMs are NOT is intelligent in the same way as a human, which is to say they are not "AGI". They may be loosely AGI-equivalent for certain tasks, software development being the poster child. LLMs have no equivalent of "judgement", and they lie ("hallucinate") with impunity if they don't know the answer. Even with coding, they'll often do the wrong thing, such as writing tests that don't test anything.
  It seems likely that LLMs will be one component of a truly conscious AI (AGI+), in the same way our subconscious facility to form sentences is part of our intelligence. We'll see how quickly the other pieces arrive, if ever.
- saintfire 4 hours ago
  
  The people pushing this technology, that accelerates climate change, have lobbied the government to circumvent typical roadblocks created by society to limit sensationalist development. Incidentally, the same people who talk about how dangerous AI will be for society, but don't worry, they're going to be the one to deliver it safely.
  Now, I don't believe AI will ever amount to enough to be a critical threat to human life, you know, beyond the immense amounts of wasted energy they propose to convert into something more useful, like a market crash or heat and noise, or both.
  Not sure how you can call someone opposed to any of that "anti-civilisational" matter-of-factly.

gostsamo 6 hours ago

> how much memory a native code version of the same functionality would use.

native to what? how c++ is more native than python?

VorpalWay 6 hours ago

Native code usually refers to code which is compiled to machine code (for the CPU it will run on) ahead of time, as opposed to code running in a byte code VM (possibly with JIT).
I would consider all of C, C++, Zig, Rust, Fortran etc to produce native binaries. While things like Cython exist, that wasn't what was used here (and for various reasons would likely still have more overhead than those I mentioned).
fluoridation 4 hours ago

Native to the hardware platform.

yieldcrv 5 hours ago

as long as you know what architecture questions to ask, agentic coding can help with this next phase of optimization really quickly

delaying comp sci differentiation for a few months

I wonder if assembly based solutions will become in vogue