Friday, 3 June 2016

[firebird-support] Re: Where's my logic flawed?

 

03.06.2016 11:27, Svein Erling Tysvær wrote:
> This query:
> WITH TMP( TJENESTEENHETRESHID, TJENESTEENHETLOKAL, IMPORT_DATO ) AS
> ( SELECT TJENESTEENHETRESHID, TJENESTEENHETLOKAL, MAX( IMPORT_DATO )
> FROM NPR
> WHERE TJENESTEENHETRESHID > 0
> AND TJENESTEENHETLOKAL > ''
> GROUP BY 1, 2 ),
> TMP2( TJENESTEENHETRESHID, TJENESTEENHETLOKAL ) AS
> ( SELECT TJENESTEENHETRESHID, TJENESTEENHETLOKAL
> FROM TMP T
> WHERE NOT EXISTS( SELECT * FROM TMP T2
> WHERE T.TJENESTEENHETRESHID = T2.TJENESTEENHETRESHID
> AND T.IMPORT_DATO < T2.IMPORT_DATO ) )
> SELECT TJENESTEENHETRESHID, LIST( TJENESTEENHETLOKAL ) TJENESTEENHETLOKAL
> FROM TMP2
> GROUP BY 1
> HAVING COUNT( DISTINCT TJENESTEENHETLOKAL ) = 1
>
> generates this plan:
> PLAN SORT ((TMP2 T2 NPR NATURAL))
> PLAN SORT ((TMP2 T NPR NATURAL))
>
> NPR is a medium sized table (I'm guessing 20 million rows), but the
> result of the first CTE is only 2200 rows. None of the fields in this
> particular query are indexed. I tried to run the query, but decided to
> terminate it after about 2 hours 40 minutes.
> If I run the first CTE separately
>
> ( SELECT TJENESTEENHETRESHID, TJENESTEENHETLOKAL, MAX( IMPORT_DATO )
> FROM NPR
> WHERE TJENESTEENHETRESHID > 0
> AND TJENESTEENHETLOKAL > ''
> GROUP BY 1, 2 )
>
> it takes just over 16 minutes. Putting the result of this CTE in a
> temporary table and running the rest of the query against this temporary
> table only takes a couple of seconds (reducing the 2200 rows to 1600).
> The way I normally read plans seems to fail me this time - I don't quite
> understand why NPR is mentioned twice in the plan (I assume it has to do
> with my "two-level" GROUP BYs). Can anyone explain why? Or better: Does
> anyone understand why the query takes so long to execute?

It's slow because T2 is accessed naturally (i.e. a fullscan) for the
every record in T. NOT EXISTS is kinda nested loop join (anti-join,
actually), it's good only if the subquery is indexed. Otherwise, it
takes 16 minutes + (2200 * 16 minutes) to execute.

The question is why T2 does not use an index inside CTE. Condition
T.IMPORT_DATO < T2.IMPORT_DATO cannot use an index because it's
evaluated against an aggregate: IMPORT_DATO is actually
MAX(NPR.IMPORT_DATO). However, condition T.TJENESTEENHETRESHID =
T2.TJENESTEENHETRESHID should use an index on NPR.TJENESTEENHETRESHID,
if available. But I doubt it exists, otherwise the local condition
TJENESTEENHETRESHID > 0 would also be indexed in this case.

So the only way to run this query fast is to ensure
NPR.TJENESTEENHETRESHID is indexed (and it's the first segment, if the
index is compound).

Dmitry

__._,_.___

Posted by: Dmitry Yemanov <dimitr@users.sourceforge.net>
Reply via web post Reply to sender Reply to group Start a New Topic Messages in this topic (2)

Have you tried the highest rated email app?
With 4.5 stars in iTunes, the Yahoo Mail app is the highest rated email app on the market. What are you waiting for? Now you can access all your inboxes (Gmail, Outlook, AOL and more) in one place. Never delete an email again with 1000GB of free cloud storage.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Visit http://www.firebirdsql.org and click the Documentation item
on the main (top) menu.  Try FAQ and other links from the left-side menu there.

Also search the knowledgebases at http://www.ibphoenix.com/resources/documents/

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

.

__,_._,___

No comments:

Post a Comment