The one billion row challenge or 1brc for short has been all over my internet bubble lately. I first heard about it on hackernews and while I was not immediately interested, it quickly dawned on me that there are some interesting challenges lurking in this.
Since, I’m still having a lot of fun writing in this dead, useless programing language, called Common Lisp, I figured I want to give it a try. I didn’t expect to land anywhere near the highly optimized Java solutions, but I was very curious how close I get and if my Common Lisp-Fu is strong enough already, to tackle this.
In this short post, I just want to briefly introduce the project, and come back to a more elaborate treatment of this, some time in the future. This project allowed me to play around with lparallel, which I was keen to explore for a while.
Results
My fastest solution managed to process the one billion rows in 28 seconds, on a relatively beefy Apple MacBook Pro M2. I did attempt to optimize further, but ultimately wasn’t successful.
I used mmap to optimize IO, lprallel to work on chunks in parallel, and tried to optimize the parsing and processing of the data. I didn’t go so far as to implement my own hash map.
Now I will say, that I know that SBCL is known to be quite a bit slower on ARM than on x86, so I imagine there are some gains to be made still. The same is true for mmap, which appears to be slower on macOS than on Linux / Unix. I would be curious to see, how fast it can be on the machines used for the Java solutions.
Goals
Solve the one billion row challenge in Common Lisp at all
Try to solve it with a runtime that feels acceptable
Decisions
Use memory mapped files to speed up IO
Use multiple threads to process chunks of the input in parallel