Loading...

musicbrainz-devel@musicbrainz.org

[Prev] Thread [Next]  |  [Prev] Date [Next]

Re: [mb-devel] Track mod search speed improvement Björn Krombholz Sat Aug 12 16:26:49 2006

On 2/14/06, Dave Evans <[EMAIL PROTECTED]> wrote:
> On Tue, Feb 14, 2006 at 06:39:24AM +0100, Björn Krombholz wrote:
> > It looks complicated but is necessary, the reason why it is so fscking
> > slow is a missing index.
> There is already an index (both for _open and _closed) on rowid.  It should be
> using that, right?

It does not, that's the main problem. And actually this index doesn't
make much sense. Because one is never interested in a single rowid
scan, because rowid is a mixture of ids of the normal entity tables,
meaning it can refer to a track.id or an album.id or even an
albumjoin.id.

You can only use a rowid in a useful way, if you know the
corresponding entity type = the .tab.

>  Well I'm no PG optimisation expert so I don't know why
> it's not using it ('cos it looks fairly obvious to me).  However this simple
> test might interest you.
>
> Total "explain" cost before:
>
>  Sort  (cost=205210.51..205316.02 rows=42206 width=551)
>
> mb=> select id from albumjoin where track = 5374651;
>    id
> ---------
>  5374612
> (1 row)
>
> Now replace "IN (select id from albumjoin where track = 5374651)" by "IN 
> (5374612)":

This doesn't really work, because the SELECT id FROM albumjoin could
return 20 or more results, although moving this nested query out and
adding the resulting array manually would increase the speed.

> New cost:
>
>  Sort  (cost=479.25..479.37 rows=50 width=551)

One part of the UNION is:

SELECT m.*, NOW()>m.expiretime AS expired, -3 AS vote
FROM moderation_closed m
WHERE
( (m.tab IN ('track', 'Track') AND m.rowid = 5374651)
  OR
  (m.tab IN ('albumjoin', 'AlbumJoin')
   AND
   m.rowid IN (
       SELECT id FROM albumjoin WHERE track = 5374651
   )
  )
)

This seems to be really too complex for pg's query planner. It doesn't
use the rowid index, because the planner can't translate the IN
(SELECT ...) to a simple filter list with (ORs and ANDs).

Instead it is sequentially reading the complete moderation_closed
table and comparing each row to the WHERE clause.

It does not split the "(m.tab IN ('track', 'Track') AND m.rowid =
5374651)" case where it could use the rowid index from the "(m.tab IN
('albumjoin', 'AlbumJoin') AND m.rowid IN (...))" case (where it would
use the rowid as well), because the two ORed conditions "look"
differently and the query planner analyzation has to stop at some
nesting level.

It doesn't know how many elements the might be in the "IN (SELECT
...)", but it does know this in your suggestion, where it can (and
does) simply translate the "m.rowid IN (5374612)" into "m.rowid =
5374612", so the 2 ORed clauses are identical again. It then simply
translates the boolean filter formula into it's disjunctive normal
form: "((m.tab = 'track') AND (m.rowid = 5374651)) OR ((m.tab =
'Track') AND ..." which couldn't be done if the number of results of
the nested "SELECT" in the "IN" operator is unknown at query
compilation time.

> If it's possible to achieve the desired effect without adding another index
> (when, as I've said, the existing index *should* be enough), then I'd rather
> not.

It's less adding "yet another index" than it would be a _replacing_
the current moderation_(closed|open)_idx_rowid indexes by combined
moderation_(closed|open)_idx_rowid_tab indexes, because as I tried to
explain above, the rowid is never used alone but always in conjunction
with an object type == .tab value.


And now I'm off for a week.
Bye
#Fuchs

_______________________________________________
MusicBrainz-devel mailing list
[EMAIL PROTECTED]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel