Saturday, March 10, 2012

ORA-0600: internal error code, arguments: [kksfbc-new-child-thresh-exceeded], [], [], [], [], [], [], [], [], [], [], []

Yesterday my client got frequent ORA-0600: internal error code, arguments: [kksfbc-new-child-thresh-exceeded], [], [], [], [], [], [], [], [], [], [], [] error in their alert.log. 
Database version - 11.1.0.7
PSU - 11.1.0.7.2
OS - Linux  5.2
Cluster - 3 node 11.1.0.7

This database was recently migrated from 10.2.0.4 database version to 11.1.0.7. And since after that they started seeing ORA-0600 error in alert.log in all 3 instances. However this error went unnoticed for few days.

Yesterday client came up with issue that one of their node is not responding, so when we check alert.log we found many ORA-4031: unable to allocate 32 bytes of shared memory ("shared pool", in alert.log. We were not able to login into database and neither users. So due to SLA cleint reboot this node and everything became fine. Then we checked in AWR reports for any kind of contention in shared pool or library cache. We did find contention in library cache latch and there were about 5 to 6 SQL statements which were having huge parse calls somewhere around 5000 per executions. But all these were background queries.
We could have increased the shared pool size after seeing ORA-4031 but we did a little more research.

We cheked alert.log in all 3 instance, and we found that ORA-0600 occurred few hours before ORA-4031.
So we first concentrated on ORA-0600 and lokking for it cause. When we checked metalink the very first document which we come across was "ORA-600 [kksfbc-new-child-thresh-exceeded] [ID 285704.1]"

This doc shows that 11.1.0.7 is exposed to two bugs i.e. 8865718 and 7626014
These bugs are related to MView refresh. So whenever MView is refreshed these bugs are likely to hit. Bug explains that while doing MView refresh it create huge number to recursive child cursor  while are not shareable. 
We dont know the reason for this but what actually what we conclude that ORA-0600 and ORA-4031 were related to each other. AS bug explains that they create huge number of recursive child cursor which were non shareable, library cache latch or library cache contention were likely to come and which caused very less memory in library cache for other cursors.

We might not have succeeded in increasing the shared pool size unless and untill we fix these bugs. So we are no planning to apply PSU 11.1.0.7.10 to this database and looking for its stability.

Thanks


No comments:

Post a Comment