Messages stuck in queue, no corresponding queue_dst entry

We send approximately 4500 sms texts a day using the webhook API. Regularly I find messages stuck in the queue that will never send. They won’t send because there is no matching row in the queue_dst table. I need to know what would cause that and also what I can do to fix it. I have to clear out the queue table for entries older than an hour, if I don’t the whole system comoes to a halt with messages piling up in the queue.

can you somehow find a way to reproduce this issue ?

or at least provide me with playsms.log, the snapshot of your queue_dst and probably the queue, and also screenshot of View queue. or anything I can start to work with.

anton

I will do this, but I need to send it to you privately as there is private information in it. Is your gmail account still active? I did not receive a response from the last email I sent to it.

I sent you an email
THanks man!

Got your email.

Can you find something in your full playsms.log (all of them if you get them chunked by logrotate), please run this:

cat /var/log/playsms/playsms.log | grep "result queue_code" | grep "flag:0" 

AND

cat /var/log/playsms/playsms.log | grep "result queue_code" | grep "flag: " 

let me know the result, should be empty, if not then we’re close to finding the culprit

please run above commands carefully, I meant an empty space on flag is necessary.

anton

I saw your message but can’t find it now? Both of those queries returned nothing.

yes, my mistake, that won’t happened.

you mentioned about “non matching row in the queue_dst table”, can you explain that.

anton

also, can you search for this:

cat /var/log/playsms/playsms.log | grep "fail to finalize process queue_code"

anton

the query I used to find that there was no matching row in dst is:

`````select * from playsms_tblSMSOutgoing_queue q left join playsms_tblSMSOutgoing_queue_dst qd on q.id = qd.queue_id where q.queue_code = ‘11915e3c6791daf662630e1aca2554a8’`

where the queue code was one that was stuck in the queue

No results for that grep.

what was the result of this:

ok, can you grep this also:

cat /var/log/playsms/playsms.log | grep "11915e3c6791daf662630e1aca2554a8"

This row will have been deleted as I have a cron job that removes stuck items over an hour old. Otherwise the server breaks.

how about the log

cat /var/log/playsms/playsms.log | grep "11915e3c6791daf662630e1aca2554a8"

anton

Duh sorry. Too early in the morning. Grepping now
No results for that

so nothing huh, thats odd.

well, I can see what I should do for workaround, that is to remove the queue with flag=3 but with no member (dst) in flag=0

but we don’t know what causes that, and we need that to actually solve the problem, any thought ?

anton

I had log level set at 3, would that cause those greps to not return anything?

Anton,
I have found that recent hung queue entries are on the stoplist. Could it be that the stoplist is removing the _dst row but not the queue row?

Anton.
One of my programmers here showed me what may be an issue:

		if (blacklist_mobile_isexists(0, $c_sms_to)) {
			$continue = FALSE;
			
			_log("fail to send. mobile is in the blacklist mobile:" . $c_sms_to, 2, "sendsms");
		}
		
		if ($continue && ($smslog_id[$i] = sendsms_queue_push($queue_code, $c_sms_to))) {
			$ok[$i] = TRUE;
			$queue_count++;
			$sms_count += $count;
			$error_strings[$i] = sprintf(_('Message %s has been delivered to queue'), $smslog_id[$i]);
		} else {
			$ok[$i] = FALSE;
			$failed_queue_count++;
			$failed_sms_count++;
			$error_strings[$i] = sprintf(_('Send message to %s in queue %s has failed'), $c_sms_to, $queue_code);
		}
		$to[$i] = $c_sms_to;
		$queue[$i] = $queue_code;
		$counts[$i] = $count;
	}

They noted that this process never removes the queue record if it’s in the blacklist due to the ordering of the statements and comparators.

If its true that the cause of it than you try to send SMS to 10 destinations with one of them to blacklisted destination, and the queue will get stuck.

anton

Anton
I think we may have found the issue. Under heavy load, the function you use to create the queue_code may not be not random enough.

You use:

$queue_code = md5(uniqid($uid . $gpid, true));

where uniqid is basically a microsecond timestamp and uid and gpid may not change over a period of milliseconds. So it’s not really random.

SO I changed this variable to calculate this way:

$queue_code = md5(str_shuffle(uniqid($uid . $gpid, true)));

Which still isn’t really random, but may be pseudo-random enough for this purpose.