Is there a way to optimize this aggregate postgreSQL statement by using CASE?

Posted on

Problem

I have a working SQL statement which shows the following:

  • Player
  • Champion
  • Role
  • Total Kill/Death Ratio
  • Total Kill+Assist/Death Ratio
  • Total Win %
  • Difference between Ratio from one month ago
  • Difference between KA Ratio from one month ago
  • Difference between win % from one month ago

However, I don’t think it’s quite as optimized as it could be:

with CTERatio as (

select tt.playerid, tt.champid, tt.roleid, 
    sum(tt.words) as kills, 
    sum(tt.words2) as deaths,
    sum(tt.words3) as assists,
    sum(tt.words)/sum(tt.words2) as ratio,
    (sum(tt.words)+sum(tt.words3))/sum(tt.words2) as kdar,
    avg(tt.complete) as win
from schema.records tt
group by tt.playerid, tt.champid, tt.roleid
having count(*) > 2
order by ratio desc

),

hist as (

select tt.playerid, tt.champid, tt.roleid, 
    sum(tt.words) as kills, 
    sum(tt.words2) as deaths,
    sum(tt.words3) as assists,
    sum(tt.words)/sum(tt.words2) as ratio,
    (sum(tt.words)+sum(tt.words3))/sum(tt.words2) as kdar,
    avg(tt.complete) as win
from schema.records tt
where tt.sys_time < date_trunc('day', NOW() - interval '1 month')
group by tt.playerid, tt.champid, tt.roleid
having count(*) > 2
order by ratio desc

)
select c.playerid as player, c.champid as champion, c.roleid as rolee, 
    round(c.ratio, 2) as ratio, round(c.kdar, 2) as kdar, round(c.win, 2) as win,
    round(c.ratio - h.ratio, 2) as ratiochange,
    round(c.kdar - h.kdar, 2) as kdarchange,
    round(c.win - h.win, 2) as winchange
from CTERatio c, hist h
where c.playerid = h.playerid
and c.champid = h.champid
and c.roleid = h.roleid
order by c.ratio desc

Is it possible I can incorporate using CASE instead of WITH? Such as:

avg(c.kills) - avg
(
case when tt.sys_time <  date_trunc('day', NOW() - interval '1 month') 
then h.words 
end
)

Solution

Naming conventions

I think your aliases are pretty cryptic.

This:

from schema.records tt

That:

from CTERatio c, hist h

It’s better to use aliases that mean something. tt, c, h… think of Mr. Maintainer going back over your code having to make changes to it.

I will compliment you on re-labeling column names words, words2, words3. Those are pretty bad column names and your re-labeling makes them much more meaningful.

Indenting/Formatting

Your CTEs are beautiful in how they are formatted. Your final select statement, not so much. How about this:

select 
    c.playerid as player, 
    c.champid as champion, 
    c.roleid as rolee, 
    round(c.ratio, 2) as ratio, 
    round(c.kdar, 2) as kdar, 
    round(c.win, 2) as win,
    round(c.ratio - h.ratio, 2) as ratiochange,
    round(c.kdar - h.kdar, 2) as kdarchange,
    round(c.win - h.win, 2) as winchange

Performance

This aggregate is expensive, especially used twice:

having count(*) > 2

This means the SQL engine will have to scan the whole table for each column to see if the count is greater than 2. Try to find a unique column to COUNT() on rather than the whole table, like having count(c.playerid) > 2

You have many sum() and round() aggregates, can we cut down on them somehow? It’s hard to give specifics since we don’t know what the data looks like, but try to limit those as much as you can.

Both of your CTEs seem identical. I’m curious as to why you are selecting the same dataset twice. You could do a self join instead.

Here is how I would write this script, bearing in mind I don’t know what the data looks like:

with CTERatio as (
-- calculate ratio
select 
    rec.playerid, 
    rec.champid, 
    rec.roleid, 
    sum(rec.words) as kills, 
    sum(rec.words2) as deaths,
    sum(rec.words3) as assists,
    sum(rec.words)/sum(rec.words2) as ratio,
    (sum(rec.words)+sum(rec.words3))/sum(rec.words2) as kdar,
    avg(rec.complete) as win
from schema.records as rec
group by rec.playerid, rec.champid, rec.roleid
having count(*) > 2
order by ratio desc

),
-- join ratio to itself to find highest ratio
select 
    cte_r1.playerid as player, 
    cte_r1.champid as champion, 
    cte_r1.roleid as rolee, 
    round(cte_r1.ratio, 2) as ratio, 
    round(cte_r1.kdar, 2) as kdar, 
    round(cte_r1.win, 2) as win,
    round(cte_r1.ratio - h.ratio, 2) as ratiochange,
    round(cte_r1.kdar - h.kdar, 2) as kdarchange,
    round(cte_r1.win - h.win, 2) as winchange
from CTERatio as cte_r1
  inner join CTERatio as cte_r2
    on cte_r1.playerid = cte_r2.playerid
    and cte_r1.champid = te_r2.champid
    and cte_r1.roleid = cte_r2.roleid
order by cte_r1.ratio desc
;

Leave a Reply

Your email address will not be published. Required fields are marked *