今天发现一个报表数据库中SQL运行异常,简单记录一下问题的诊断和解决过程。
问题是在检查ALERT文件时发现的,一个过程运行时间太长而出现了ORA-1555错误。
错误信息:
ORA-01555 caused by SQL statement below (Query Duration=38751 sec, SCN: 0x0000.fe5b584a):
INSERT INTO MAN_ORDER_ITEM (
ID,
REQUEST_QTY,
SALER_ID,
PRODUCT_ID,
UNIT_PRICE,
CREATE_DATE,
ANSWER_DATE,
BUYER_ID
)
SELECT
A.RECORD_ID,
A.REQUEST_QTY,
A.SALER_ORGID,
A.PRODUCT_ID,
A.UNIT_PRICE,
A.CREATE_DATE,
NULL,
A.BUYER_ORGID
FROM ORD_ORDER_ITEM A
WHERE A.CREATE_DATE >= TO_DATE('2004-01-01 0:0:0', 'YYYY-MM-DD HH24:MI:SS')
AND A.CREATE_DATE < TRUNC(SYSDATE)
AND EXISTS (SELECT 1 FROM MAN_PRODUCT WHERE ID = A.PRODUCT_ID)
AND EXISTS (SELECT 1 FROM MAN_DEALER WHERE ID = A.SALER_ORGID)
AND EXISTS (SELECT 1 FROM MAN_BUYER WHERE ID = A.BUYER_ORGID)
由于这是个JOB调用,在失败后JOB会自动重试,于是从DBA_JOBS_RUNNING中查看相关的JOB和SESSION信息。
SQL> SELECT SID, JOB FROM DBA_JOBS_RUNNING;
SID JOB
---------- ----------
70 208
检查这个SESSION目录在执行什么SQL:
SQL> SELECT SQL_TEXT FROM V$SQL SQL, V$SESSION S
2 WHERE SQL.HASH_VALUE = S.SQL_HASH_VALUE
3 AND SQL.ADDRESS = S.SQL_ADDRESS
4 AND S.SID = 70;
SQL_TEXT
------------------------------------------------------------------------------
INSERT INTO MAN_ORDER_ITEM ( ID, REQUEST_QTY, SALER_ID, PRODUCT_ID, UNIT_PRICE, CREATE_DATE, ANSWER_DATE,
BUYER_ID ) SELECT A.RECORD_ID, A.REQUEST_QTY, A.SALER_ORGID, A.PRODUCT_ID, A.UNIT_PRICE, A.CREATE_DATE,
NULL, A.BUYER_ORGID FROM ORD_ORDER_ITEM A WHERE A.CREATE_DATE >= TO_DATE('2004-01-01 0:0:0', 'YYYY-MM-DD HH24:MI:S
S') AND A.CREATE_DATE < TRUNC(SYSDATE) AND EXISTS (SELECT 1 FROM MAN_PRODUCT WHERE ID = A.PRODUCT_ID) AND EXISTS (SEL
ECT 1 FROM MAN_DEALER WHERE ID = A.SALER_ORGID) AND EXISTS (SELECT 1 FROM MAN_BUYER WHERE ID = A.BUYER_ORGID)
从SQL上看,就是刚才失败的那个SQL语句,那么看看SESSION在等待什么:
SQL> SELECT SID, EVENT, P1TEXT, P1RAW, P2TEXT, P2, SECONDS_IN_WAIT FROM V$SESSION_WAIT
2 WHERE SID = 70;
SID EVENT P1TEXT P1RAW P2TEXT P2 SECONDS_IN_WAIT
------- -------------- -------- ---------------- ------- ----- ---------------
70 latch free address 00000004125AB718 number 98 330
通过观察发现,Session的等待事件一直是LATCH FREE。第一个感觉是可能和其他的进程产生了争用。
IXDBA.NET社区论坛查询一下Oracle等待的具体latch的类型。
SQL> SELECT LATCH#, NAME FROM V$LATCH WHERE LATCH# = 98;
LATCH# NAME
---------- ----------------------------------------------------------------
98 cache buffers chains
而查询V$LOCK和V$LATCHHOLDER视图,发现没有其他的进程对JOB运行构成影响:
SQL> SELECT SID, TYPE, ID1, ID2, LMODE, REQUEST, CTIME, BLOCK
2 FROM V$LOCK
3 WHERE SID > 8;
SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
70 TM 35258 0 3 0 12072 0
70 JQ 0 208 6 0 12155 0
SQL> SELECT * FROM V$LATCHHOLDER;
no rows selected
SQL> SELECT * FROM V$LATCHHOLDER;
no rows selected
SQL> SELECT * FROM V$LATCHHOLDER;
no rows selected
SQL> SELECT * FROM V$LATCHHOLDER;
PID SID LADDR NAME
---------- ---------- ---------------- ------------------------------
15 70 0000000412564D98 cache buffers chains
SQL> SELECT * FROM V$LATCHHOLDER;