Analysis of multiple game scores

Posted on

Problem

I am trying to create a list based on some data, but the code I am using is very slow when I run it on large data. So I suspect I am not using all of the Python power for this task. Is there a more efficient and faster way of doing this in Python?

Here an explanantion of the code:

You can think of this problem as a list of games each with a list of participating teams and the scores for each team in the game. For each of the pairs in the current game it calculates the sum of the differences in score from the previous competitions (only for those competing!). Then it updates each pair in the current game with the difference in scores. Then it keeps track of the scores for each pair in each game and update this score as each game is played.

In the example below, based on some data, there are for-loops used to create a new variable list_zz.

The data and the for-loop code:

from collections import Counter, defaultdict
from itertools import combinations
import math

# test data
games = [['A', 'B'], ['B'], ['A', 'B', 'C', 'D', 'E'], ['B'], ['A', 'B', 'C'], ['A'], ['B', 'C'], ['A', 'B'], ['C', 'A', 'B'], ['A'], ['B', 'C']]

gamescores = [[1.0, 5.0], [3.0], [2.0, 7.0, 3.0, 1.0, 6.0], [3.0], [5.0, 2.0, 3.0], [1.0], [9.0, 3.0], [2.0, 7.0], [3.0, 6.0, 8.0], [2.0], [7.0, 9.0]]

list_zz= []

wd = defaultdict(Counter)
past_diffs = defaultdict(float)
this_diff = defaultdict(Counter)

for players, scores in zip(games, gamescores):
    if len(players) == 1:
        list_zz.append(math.nan)
        continue
        
    past_diffs.clear()
    this_diff.clear()
    
    for (player1, score1), (player2, score2) in combinations(zip(players, scores), 2):
        past_diffs[player1] += wd[player1][player2]
        past_diffs[player2] += wd[player2][player1]
        
        this_diff[player1][player2] = score1 - score2
        this_diff[player2][player1] = score2 - score1
        
    list_zz.extend(past_diffs[p] for p in players)
    
    for player in players:
        wd[player].update(this_diff[player])
        
print(list_zz)

Which looks like this:

[0.0,
 0.0,
 nan,
 -4.0,
 4.0,
 0.0,
 0.0,
 0.0,
 nan,
 -10.0,
 13.0,
 -3.0,
 nan,
 3.0,
 -3.0,
 -6.0,
 6.0,
 -10.0,
 -10.0,
 20.0,
 nan,
 14.0,
 -14.0]

Example to understand the code:
In the 5th game where A, B and C play, A gets -4 from the 1st game, 0 from the 2nd, -6 from the 3rd, and 0 from the 4th. Note that only A, B and C count in the 5th game. To be more clear A scores -4 in the 1st game, in the second he doesnt play so he scores 0, in the 3rd we count only the results for its competitors B and C which gives -6, and in the 4th he doesnt play so he gets 0. Notices that results are from the past games against current competitors.

If you could elaborate on the code to make it more efficient and execute faster, I would really appreciate it.

Solution

Solve with math

This is a math problem. Let say we have a competition: [a, b, c] score [5, 2, 10], this means that the scoring is:

abcaNaN35b3NaN8c58NaNres21113abcresaNaN352b3NaN811c58NaN13

As you should be able to see, you don’t need to calculate the sum again and again for each pair.

Solution:
For each team:
team’s score×number of teamstotal scoreteam’s score×number of teamstotal score.

score[a] =  5 * 3 - 17 =  -2
score[b] =  2 * 3 - 17 = -11
score[c] = 10 * 3 - 17 =  13

The time complexity of this is O(n)O(n). Calculate all pairs is O(n2)O(n2).

Some code

Here I will save the total score for each team (not the history of the scores _ the code change for that would not be big tho).

from collections import Counter, defaultdict

# test data
games = [['A', 'B'], ['B'], ['A', 'B', 'C', 'D', 'E'], ['B'], ['A', 'B', 'C'], ['A'], ['B', 'C'], ['A', 'B'],
         ['C', 'A', 'B'], ['A'], ['B', 'C']]

gamescores = [[1.0, 5.0], [3.0], [2.0, 7.0, 3.0, 1.0, 6.0], [3.0], [5.0, 2.0, 3.0], [1.0], [9.0, 3.0], [2.0, 7.0],
              [3.0, 6.0, 8.0], [2.0], [7.0, 9.0]]

wd = defaultdict(float)

for players, scores in zip(games, gamescores):
    if len(players) == 1:
        continue
    total_sum = sum(scores)
    for player, score in zip(players, scores):
        wd[player] = wd[player] + score * len(scores) - total_sum

print(wd)

Result

defaultdict(<class 'float'>, {'A': -12.0, 'B': 32.0, 'C': -17.0, 'D': -14.0, 'E': 11.0})

Edit: Grouping based on last results

OP clarified that each competition affect the total score taking from the previous competition because the grouping changes.

In the example, scores: [1.0, 5.0], [3.0], [2.0, 7.0, 3.0, 1.0, 6.0], [3.0], [5.0, 2.0, 3.0], [1.0], [9.0, 3.0], teams: [a,b], [b], [a,b,c,d,e], [b], [a,b,c],

A scores as:

1st game: -4
2nd game:  0 (he was not participant)
3rd game: -6 (because at 5th game, only A, B, C are competing)

For that, we can pre-process the groups to make sure that only the competitors of next game are considered.

Pre-processing idea

That is just an example how to solve the issue using the pre-processing. Notice that is a backward thinking. Next game determines which competitors matter in terms of scoring. Therefore, the processing is done in reverse order.

def pre_process(games, gamescores):
    last_game = {}
    result = []
    for game in zip(reversed(games), reversed(gamescores)):
        game_dict = dict(zip(game[0], game[1]))
        if len(game[0]) == 1:
            result.append(game_dict)
            continue
        if len(last_game) != 0:
            union_set = set(game_dict.keys()).intersection(set(last_game.keys()))
            last_game = game_dict
            game_dict = {k: game_dict[k] for k in union_set}
        else:
            last_game = game_dict
        result.append(game_dict)
    return result


pairs = list(reversed(pre_process(games, gamescores)))
wd = defaultdict(float)
for game in pairs:
    players = list(game.keys())
    scores = [game[k] for k in players]
    if len(players) == 1:
        continue

    total_sum = sum(scores)
    for player, score in zip(players, scores):
        wd[player] += score * len(scores) - total_sum
    print(wd)

Leave a Reply

Your email address will not be published. Required fields are marked *