I have an akka system running that spawns an actor to serve the request. Inside the request actor it does cpu-computation work and make several non-blocking http requests. Previously, it uses the the default-dispatcher for all non-blocking http requests and for the request actor.
When I decided to separate out the request actor into it’s own dispatcher and other non-blocking http requests into it’s own dispatcher, it gave a better throughput, latency responses, and reduced timeouts from http requests. However, I noticed it increased the cpu-system utilization compared to a single default dispatcher approach. I’m not sure where and how the separation of dispatcher could increase the cpu-system utilization. Are there any thoughts as to what may be causing this increase or point me to a direction that would help with my investigation? Thanks.
Increased CPU utilization is a good thing.
You say you are seeing increased throughput, decreased latency, and reducing timeout. In other words, you are getting more work done. Since you are getting more work done, more CPU is getting used. Your bottlenecks have been removed so the CPU is free to run without interruption.
High CPU utilization means the CPU is doing useful work.
Yes, but the amount of cpu utilization to the amount of TPS gain seemed odd. It looked like it was using about 13% more cpu for about 9%TPS gain. For about 1 TPS / cpu % seems odd.
Specifically, the cpu-user seems normal to me. But I think most of the cpu utilization increase was contributed by an increase of cpu-system. So, I’m wondering if too many dispatcher could cause increase in cpu-system metrics. Maybe more context switching? Does context switching affect cpu-system metrics?
Hmm. If you are seeing most of the increase in system time rather than user time, that is slightly concerning. In theory, that could mean thread contention. But it’s hard to speculate. Especially since you say that a lot of the work is cpu intensive.
There’s not really any chance that too many dispatchers are a problem. Dispatchers are a pretty thin wrapper around various OS pools of resources. But there’s a reasonable possibility you have too many threads overall.
I think you have a few paths you can go down.
- Most of the time I’d be very happy to take a 9% TPS gain for merely 13% more CPU. Especially if other resources are flat. (Especially memory which is arguably the most precious.) So I think it’s entirely reasonable to just declare victory given that you have a fairly measurable TPS gain. My gut is still "this is a very good outcome
- If this is an expensive application to run, and CPU costs are a big factor, you could dig further in. Do some testing, especially with smaller thread counts in each dispatcher.
- Similarly, you could really dig into that system time. My current employer is Red Hat and I’ve seen some great performance work done by opening tickets with Red Hat, digging into the details of the system calls that are contributing to that system time. If you are willing to spend the time, you could probably get some incremental gains there via OS-level tracing tools like strace.
To answer your direct question, I typically haven’t seen context switching result in high CPU. Usually I see it manifest as low CPU with more “wait” time. But it’s not inconceivable that too many threads could result in some thrashing.