Create a function in Python to check for the data errors

Posted on

Problem

I’m trying to create a function to check multiple conditions of data.
The function will return a list if the data is errors.

Here is my idea:

error_lists = []
def check(A, B, C, D, E):
    check condition 1 for A
    if there is a error -> error_list.append(1)
    check condition 2 for B
    if there is a error -> error_list.append(2)
    check condition 3 for C
    if there is a error -> error_list.append(3)
    check condition 4 for D
    if there is a error -> error_list.append(4)
    check condition 5 for E
    if there is a error -> error_list.append(5)
return error_list

If A, B, C do not meet the requirement, the function will return the error_list: [1,2,3]. Likewise, the function will return [2,5] if only B or E do not meet the requirement

Actually I’ve already written the code, however I feel that it need to be improved.

P_list = [35000, 50000, 80000, 120000, 150000, 180,000]
def data_check(Staff_id, A ,B, x, y, z, t, u, D):
    error_list = []
    wh = x + y
    tot = z + t
    C = [x, y, z, t, u]
    """First condition"""
    if A not in P_list:
        print(f"Error type 1 in the data of {staff_id} {A} is not True")
        error_list.append(1)
    """Second condition"""
    if B >= 48:
        print(f"Error type 2 in the data of {staff_id} {B} is not True")
        error_list.append(2)
    """Third condition"""
    for k in C:
        if k < 0:
            print(f"Error type 3 in the data of {staff_id} {k} is less than 0")
            error_list.append(3)
    """Fourth condition"""
    if D >= 3000000:
        print(f"Error type 4 in the data of {staff_id} D is not true")
        error_list.append(4)
    """Fifth condition"""
    if tot >= wh:
        print(f"Error type 5 in the data of {staff_id} E is not true")
        error_list.append(5)
    return print(error_list)

data_check("1a88s2s", 50000 ,71, 23, 28, 3,5, 12, 3500000)

Does anybody has any suggestion ?

Many thanks

Solution

  • P_list should be a set, not a list, based on its usage; it’s only used for membership checks. You can also pre-bind to the __contains__ member to get a function reference you can call, constraining the set to one purpose.
  • Add PEP484 type hints
  • C should be a tuple based on its usage since it’s immutable
  • Your print statements should not be mixed in with your logic, and should be separated to another method
  • Consider representing your errors not as free integers, but as constrained enumeration values
  • Use an any generator for your C negative check
  • 180,000 is likely incorrect and should be 180_000
  • return print is meaningless and will always return None; instead consider returning your error values or yielding them as a generator
  • staff_id shouldn’t be passed into data_check at all; it’s only used for formatting error messages which again can exist in a separate method
  • Your letter-salad variable names are helping no one. You need to spell out what these actually mean.
  • Moving your intermediate error-checking variables C, wh and tot closer to where they’re used is, I think, clearer

Suggested

import enum
from enum import Enum
from typing import Iterable

in_p_list = frozenset((35_000, 50_000, 80_000, 120_000, 150_000, 180_000)).__contains__


@enum.unique
class ErrorType(Enum):
    REJECTED_A = 1
    B_TOO_HIGH = 2
    NEGATIVE_C = 3
    D_TOO_HIGH = 4
    TOTAL_TOO_HIGH = 5


def data_check(
    A: int,
    B: int,
    x: int,
    y: int,
    z: int,
    t: int,
    u: int,
    D: int,
) -> Iterable[ErrorType]:
    if not in_p_list(A):
        yield ErrorType.REJECTED_A

    if B >= 48:
        yield ErrorType.B_TOO_HIGH

    C = (x, y, z, t, u)
    if any(k < 0 for k in C):
        yield ErrorType.NEGATIVE_C

    if D >= 3_000_000:
        yield ErrorType.D_TOO_HIGH

    wh = x + y
    tot = z + t
    if tot >= wh:
        yield ErrorType.TOTAL_TOO_HIGH


def print_errors(staff_id: str, errors: Iterable[ErrorType]) -> None:
    for error in errors:
        print(
            f'Error type {error.value} in the data of {staff_id}: {error.name}'
        )


def test() -> None:
    print_errors("1a88s2s", data_check(50000, 71, 23, 28, 3, 5, 12, 3500000))


if __name__ == '__main__':
    test()

Output

Error type 2 in the data of 1a88s2s: B_TOO_HIGH
Error type 4 in the data of 1a88s2s: D_TOO_HIGH

Adhere to PEP8

Like keep the style of identifiers consistent. data_check, Staff_id, A ,B, x, y, z, t, u, D – what style is common for all those names?

You can pass a list as an argument

No need to write so many arguments and gather them in a list – you can gather them when you call your function and pass only one c argument. The call will look like

data_check(staff_id, A, B,  [x, y, z, t, u], D)

Also you can gather all trailing arguments in a list with * (the call whouln’t change).

Use the loop

All the conditions are simple (one expression), all the messages are almost uniform. Thus, you can use the loop like this:

def data_check(staff_id, *args):
    conditions = [lambda x: x not in P_list,
                  lambda x: x>=48, #are you trying to compare with '0'?
                  lambda x: any(i<0 for i in x),
                  ...
                 ]
    result = []
    for i, (condition, arg) in enumerate(zip(conditions, args), 1):
        if not condition(arg):
            result.append(i)
            printf(f"Error type {i} in the data of {staff_id}: {arg} failed")
    return result

You can change lambdas to local functions; this can make the code more readable (with correct names):

def data_check(staff_id, *args):
    def not_in_p_list(x):
        return x not in P_list
    def greater_then_48(x):
        return x>48
    def any_less_then_0(x):
        return any(i<0 for i in x)
    conditions = [not_in_p_list,
                  greater_then_48,
                  any_less_then_0,
                  ...
                 ]

Of course, you can (and should) give conditions more informative names, like not_listed_in_registry or negative_weight.

Leave a Reply

Your email address will not be published. Required fields are marked *