← Back to context

Comment by menaerus

5 months ago

If it isn't a vDSO call, I think 50ns figure shouldn't be possible.

17 comments

menaerus

Reply

ori_b 5 months ago

No need to guess, it's 10 lines of code. And you can use bpftrace to watch the test program enter the kernel.

Using the libc wrapper will use the vdso. Using syscall() will enter the kernel.

I haven't measured, but calling the vdso should be closer to 5ns.

Someone else did more detailed measurements here:

https://arkanis.de/weblog/2017-01-05-measurements-of-system-...

menaerus 5 months ago
50ns on a 3GHz CPU core is ~150 cycles. Pushing and popping back the registers to L1 cache is 5-10 cycles each. With having to handle 16 general purpose registers on x86-64 this is already close to or even more than 150 cycles, no?
- ori_b 5 months ago
  
  When you measure, what numbers do you get?
  Also: register renaming is a thing, as is write combining and pipelining. You're not flushing to L1 synchronously for every register, or ordinary userspace function calls would regularly take hundreds of cycles for handling saved registers. They don't.
  
  13 replies →