We have an application at work that upon restarting intraday needs to replay the market-data tickerplant log file and subscribe to the live data feed. This, of course, takes quite a while. Fortunately, we only subscribe to L1 equity data without futures. It’s a universe of around 6k names, give or take.
If we want “real” data during development it’s all the more frustrating that we have to chew through millions of quote messages just to read the intermingled trade messages we’re interested in. With necessity being the mother of invention, I got to wondering about how to make this a lot faster.
Wouldn’t it be nice if you could do this without copying any data at all?
TL;DR: I haven’t been able to do this, this is about a why I couldn’t make it work.
Very roughly: my plan was to write a library which KDB could load and which could scan the journal up to some sensible point, either (a) its size at the time of opening or (b) the number of messages given by a vanilla tickerplant’s .u.i
value. The library would not materialise any messages as k0
structures but simply note the offset and length of any relevant messages.
The magic would then be the creation of a memory-mapped temporary file and then mapping into that the messages of interest. The sheer number of messages you’d potentially want to map from the source file into the temporary file is huge, and I fully expect the underlying system to have buckled and refused further service, even if you did coalesce contiguous messages-of-interest. It didn’t take long to realise this was an academic exercise.
The reason you can’t do this is that a mapping must be placed at a page-aligned boundary. man mmap
says “The contents of a file mapping … are initialized using length
bytes starting at offset offest
in the file referred to by the file descriptor fd
. offset
must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE)
.” The simple fact is that not all the messages of interest start at a page boundary. So whether or not the Linux kernel could mmap
N million small sections of one file into another will have to wait for another day.
This does of course lead one to ponder the most efficient way of doing the data-copy that then seems inescapable. Do you use io_uring
? Quite possibly. You still need to read from the source-file in userland to know the offsets and lengths of the data you want to copy. Do you writev
multiple sections of data into the new file? Or do you splice
or sendfile
or copy_data_range
? Well, if you’re going to use io_uring
, you can’t reach for sendfile
or copy_data_range
(as at 2024-07-20 I can’t find a reference to io_uring_prep_sendfile
or io_uring_prep_copy_data_range
online). I can find the io_uring_prep_splice
function though, and io_uring_prep_writev
; certainly, if you want to use splice
you almost have to use io_uring
since it would otherwise require a system-call per copy-operation. If you want to use writev
, you’re insulated from the system-call cost by the number of operations you can pack into each. However, this copies data from your userland mapping back into the kernel. If you do use splice
, you won’t have to copy data (albeit freshly faulted-in) across the userland boundary, but you do have to write millions of prep-splice instructions. I haven’t tried it, yet, but my money would be on writev
winning the efficiency battle between the two.