Noted previously in the link-blog — here are more details on the first known instance of the Year 2038 UNIX epoch rollover bug, where AOLServer installs hung due to a 32-year timeout value hitting the end-of-epoch.
It appears that it was caused by an ‘official workaround’ for an Oracle driver bug, where an infinite timeout was desired. Instead of implementing true support for infinite timeouts, the developer just used a very large value — one BILLION seconds, Dr. Evil-style. Unfortunately, this led to the overflow issue.
Here’s some key snippets from the mailing list thread:
On 17 May 2006, at 21:34, Dossy Shiobara wrote:
Dave Siktberg seems to have narrowed it down to 2006-05-12 21:25.
In what timezone? It sound like that could equate to “Sat May 13 02:27:28 BST 2006”, or 1147483648 seconds since epoch, which makes it exactly 1,000,000,000 seconds until expiry of 32 bit time. Coincidence? Seems too strange as to a computer that is not a nice round number.
I had problems starting at the exact same time but on Solaris, where they manifested as a EINVAL return from pthread_cond_tomedwait. After a day of tracing the problem with debug builds and working with my sysadmin to track what changed (of course, nothing had) I cam to the same 1 billion second issue.
Which coincidentally is the expiry time (MaxOpen and MaxIdle) set on my database connections. My system is ACS-derived, so I wouldn’t be surprised if these database settings are common in other ACS-derived systems.
The only bug is that Ns_CondTimedWait doesn’t do any wraparound on the time parameter. All the same, I’ve been enjoying telling people that I hit my first y2038 bug.
For those interested in ancient trivia, I think it was TWO bugs, one in the Oracle driver and/or OCI libraries (most likely OCI), and one in AOLserver. I think the workaround dates from before I ever used AOLserver, but I have these old comments in my AOLserver config file:
MaxIdle and MaxOpen:
Settings these to 1000000000 is a historical bug workaround. Could now probably set this to some normal number, or set to 0 to disable entirely. E.g., in this thread Rob Mayoff says:
http://www.arsdigita.com/bboard/q-and-a-fetch-msg?msg%5fid=000Ibq
It is a bug workaround. Many Linux users (including me) saw that when AOLserver tried to close a database connection, it would hang in the Oracle driver. So people started setting and MaxIdle to a very large number to keep connections from closing. You can also set them to zero, but at the time the bug was discovered, AOLserver had a bug that prevented you from setting them to zero.
I believe the bug was also seen, very rarely, on Solaris.
Curtis Galloway managed to get Oracle to investigate. They suggested to workarounds: use IPC or TCP to connect (which is what I do on my system), or set bequeath_detach=yes in sqlnet.ora.
2002/01/10 14:22 EST
Uselessly, the arsdigita thread URL is now a victim of needless website reorganisation, and redirects to their front page. Still, I think that’s enough info.
This is certainly going to be one of the first widely-recorded Y2038 rollover bugs, I think…