Jobs with delay < -new Date().getTime() never complete and cause 100% CPU usage
Created by: harris-m
Description
If you add a job with a delay value less than the negative unix epoch, it never runs and bull thrashes constantly running updateDelaySet.
Minimal, Working Test code to reproduce the issue.
(An easy to reproduce test case will dramatically decrease the resolution time.)
var Queue = require("bull");
var testQueue = new Queue("test_queue", "redis://127.0.0.1:6379");
testQueue.process(function (job, done) {
console.log("job complete");
done();
});
// this works fine
// testQueue.add({}, { delay: -new Date().getTime() });
// subtract 1
// this never completes and locks CPU at 100%
testQueue.add({}, { delay: -new Date().getTime() - 1 });
Bull version
3.13.0
Additional information
I have created a pull request which fixes this issue: https://github.com/OptimalBits/bull/pull/1706
Explanation (as best as I could work out):
I noticed that my application using bull had maxed out CPU usage, on a node worker and redis. I used redis-cli monitor to investigate:
1587536426.277449 [0 172.17.0.1:47268] "evalsha" "8b912cdc5b4c20108ef73d952464fba3a7470d7b" "6" "bull:test_queue:delayed" "bull:test_queue:active" "bull:test_queue:wait" "bull:test_queue:priority" "bull:test_queue:paused" "bull:test_queue:meta-paused" "bull:test_queue:" "1587536426276" "e94d3501-42b5-459d-8f15-854d6900ef18"
1587536426.277481 [0 lua] "ZRANGEBYSCORE" "bull:test_queue:delayed" "0" "6502549202026496" "LIMIT" "0" "1000"
1587536426.277491 [0 lua] "ZRANGE" "bull:test_queue:delayed" "0" "0"
"WITHSCORES"
1587536426.277500 [0 lua] "PUBLISH" "bull:test_queue:delayed" "-3.1539841246358106e+17"
I found these lines repeating infinitely. I noticed the negative numbers and tried clamping my application's job's delays to 0, and the issue went away. Looking into bull's codebase, I found this line which led me to the core issue. delayedTimestamp is expected to be a positive value (a time in the future). However if it's a large enough negative value (less than negative unix epoch), delay is set to a value <= 0 and the function repeats, causing this infinite loop.