Build a Database in 3000 Lines with 0 Dependencies

1 year ago (build-your-own.org)

58 comments

not_a_boat

If you want 1 dependency rather than 0, there are various low-level libraries designed for building full-featured databases.

For one of these, RocksDB, one reference point is how CockroachDB was built on top of them for many years and many successful Jepsen tests (until they transitioned to an in-house solution).

https://news.ycombinator.com/item?id=37552085

vrnvu 1 year ago

A similar resource I recently discovered and it’s not that popular: https://github.com/pingcap/talent-plan/tree/master/courses/r...

Writing a Bitcask(KV wal) like db in Rust. Really cool and simple ideas. The white paper is like 5 pages.

asa400 1 year ago

Bitcask is great, it's such an elegant idea. I've built a few different toy implementations in different languages and learned something new each time. YMMV based on how many deps you do or don't want to use, how complete you want to go, but it's a totally doable small-ish project.

andai 1 year ago

I looked into this at one point, I was typing out entire codebases for didactic purposes: SQLite 3 was 120,000 lines of code, but SQLite 2 was 12,000.

So for a bit more effort you get a battle tested real world thing!

cryptonector 1 year ago
The proprietary test suite for SQLite3 is much much larger still. The battle-testedness comes in great part from that.
- postalrat 1 year ago
  
  Is that where the 10x more lines came from? Writing more "testable" code?
  
  15 replies →
ianmcgowan 1 year ago
Really puts the auto- in didact! Very curious to hear how this worked for you; it’s almost directly the opposite of the copilot approach.
I learned assembler by typing in listings from magazines and hand dis-assembling and debugging on paper. Your approach seems similar in spirit, but who has the times these days?
- andai 1 year ago
  
  I learned this from Zed Shaw's Learn X The Hard Way books. He says this approach is mainstream in other disciplines, like music, languages, or martial arts.
  I also heard the philosopher Ken Wilber spent a few years (in what kids today call Monk Mode) writing out great books by hand.
  The main effect I noticed is that I rapidly gain muscle memory in a new programming language, library or codebase.
  The other effect is that I'm forced to round-trip every token through my brain, which is very helpful as my eyes tend to glaze over — often I'll be looking right at an obvious bug without seeing it.
  
  1 reply →
- norir 1 year ago
  
  I program in neovim with no plugins, no autocomplete and no syntax highlighting. I type everything myself (though I will use copy and paste from time to time). There is a discipline to it that I find very beneficial. As a language designer, it also makes me think very carefully about the syntactic burden of languages that I design. It keeps my languages tight. One of the nice things about typing all of my own code without suggestions is that it eliminates many distractions. I may get some things wrong from time to time, but then I only have myself to blame. And I never waste time messing around with broken plugin configs or irritating syntax highlighting nits.
  It's not for everyone but I love it.
  
  2 replies →
esmy 1 year ago
Wait you took a repo and started typing it into the IDE? Could you please expand on what benefits you noticed and how it affected your understanding of the language? It sounds like a fascinating way to force attention to the code simply reading it wouldn't.
- andai 1 year ago
  
  Yeah I just open two panes in Sublime Text, with the source on the right and then I type it out verbatim on the right.
  I make an effort to keep the line numbers synced. Sometimes I skip long repetitive blocks or comments. But I do type out like 80% of the actual characters in the file.
  It's about 500 lines per hour fot me, so I can estimate reasonably well how long it'll take.
  It's not necessarily an efficient thing to do — you'd get way more bang for your buck just poking around, asking questions, trying to make small changes. But for reasonably small projects, you can type it out in a few hours, or a day or two. Then you've "round-tripped" every single token through your brain (though sadly not with a meaningful amount of conscious reflection) -- unless you pause and ask questions along the way.
  See also my other comment above.
  
  3 replies →
tonyedgecombe 1 year ago
>I was typing out entire codebases for didactic purposes
I've read about an author who did this (I can't remember their name right now), writing down the works of another author they wanted to learn from.
- bitdivision 1 year ago
  
  Hunter S. Thompson copied The great Gatsby on his typewriter: https://news.ycombinator.com/item?id=24696790
frankie_t 1 year ago
can you elaborate on typing out for didactic purposes, please?
- t-3 1 year ago
  
  It's a new iteration on the ancient form of learning by copying. I've only ever seen people copy stuff when writing by hand when wanting to memorize something though, I imagine with a keyboard the memory-enhancement effect of writing by hand is lost, but it's probably more effective than just reading alone.
867-5309 1 year ago
one does not simply type out 120 000 lines of code..
- tap-snap-or-nap 1 year ago
  
  Number of lines does not matter anymore.

cryptonector 1 year ago

Re: copy-on-write (CoW) B-tree vs append-only log + non-CoW B-tree, why not both?

I.e., just write one file (or several) as a B-tree + a log, appending to log, and once in a while merging log entries into the B-tree in a CoW manner. Essentially that's what ZFS does, except it's optional when it really shouldn't be. The whole point of the log is to amortize the cost of the copy-on-write B-tree updates because CoW B-tree updates incur a great deal of write magnification due to having to write all new interior blocks for all leaf node writes. If you wait to accumulate a bunch of transactions then when you finally merge them into the tree you will be able to share many new interior nodes for all those leaf nodes. So just make the log a first-class part of the database.

Also, the log can include small indices of log entries since the last B-tree merge, and then you can accumulate even more transactions in the log before having to merge into the B-tree, thus further amortizing all that write magnification. This approaches an LSM, but with a B-tree at the oldest layer.

airstrike 1 year ago

I see what you did there... https://news.ycombinator.com/item?id=42711727

phoronixrly 1 year ago

Please tell me they did that on purpose as a response to this post. Top-tier banter!

qianli_cs 1 year ago

I've read a similar series from Phil back in 2020: "Writing a SQL database from scratch in Go" https://notes.eatonphil.com/database-basics.html

The code is available on GitHub: https://github.com/eatonphil/gosql (it's specifically a PostgreSQL implementation in Go).

It's cool to build a database in 3000 lines, but for a real production-ready database you'll need testing. Would love to see some coverage on correctness and reliability tests. For example, SQLite has about 590 times more test code than the library itself. (https://www.sqlite.org/testing.html)

anacrolix 1 year ago

I recently wrote a KV disk change in Rust that uses the latest syscalls: https://github.com/anacrolix/possum

ComputerGuru 1 year ago

Hey, really nice work!

bArray 1 year ago

A tiny database under 256 lines of C: https://gitlab.com/danbarry16/u-database

The idea in mind was to use it for something like an RSS feed reader.

codr7 1 year ago

Here's a different approach in Common Lisp I wrote a while back:

https://github.com/codr7/whirlog

_zoltan_ 1 year ago

I've started learning Velox last year, and it's a staggering amount of code. Sure, it has a ton of dependency because it wants to support so many things, but I feel like the core itself is also very complex.

I'm not sold on complexity being a necessity in software engineering, as I'm sure a lot of you also aren't. Yet we see a lot of behemoth projects.

wwarren 1 year ago

Man I got the first edition of this book and it was so bad. Hopefully this is better…

teleforce 1 year ago

Currently it's on sales at Leanpub with $19.80 but apparently the ebook with the Golang code is $39:
https://leanpub.com/build_your_own_database_from_scratch/c/L...
To be honest it seems a bit strange to pay for the code since all the source codes to build the database is literally inside the book.
miffy900 1 year ago
Can you elaborate on why it was so bad?
- wwarren 1 year ago
  
  Check the reviews online, but generally random bits of Go code are presented with no build-up or explanation. B Trees are implemented immediately and without explanation. The book is incredibly short and seems just like a loose collection of notes, rather than an actual book. Very interested to check out the new version

koushik_x 1 year ago

I've recently started to think about covering some advanced topics like compilers, databases, interpreters. Do u know any good oss repos where I can learn and build them from scratch on my own

amelius 1 year ago

Waiting for a post where he writes a distributed database.

MarcelOlsz 1 year ago

You're going to feel awful silly when he does it in like, 3 lines.

swyx 1 year ago

related: https://dx.tips/oops-database

sandeep1998 1 year ago

Thank you

fatih-erikli-cg 1 year ago

[dead]