# Create a function in Python to check for the data errors

Posted on

Problem

I’m trying to create a function to check multiple conditions of data.
The function will return a list if the data is errors.

Here is my idea:

``````error_lists = []
def check(A, B, C, D, E):
check condition 1 for A
if there is a error -> error_list.append(1)
check condition 2 for B
if there is a error -> error_list.append(2)
check condition 3 for C
if there is a error -> error_list.append(3)
check condition 4 for D
if there is a error -> error_list.append(4)
check condition 5 for E
if there is a error -> error_list.append(5)
return error_list
``````

If A, B, C do not meet the requirement, the function will return the error_list: [1,2,3]. Likewise, the function will return [2,5] if only B or E do not meet the requirement

Actually I’ve already written the code, however I feel that it need to be improved.

``````P_list = [35000, 50000, 80000, 120000, 150000, 180,000]
def data_check(Staff_id, A ,B, x, y, z, t, u, D):
error_list = []
wh = x + y
tot = z + t
C = [x, y, z, t, u]
"""First condition"""
if A not in P_list:
print(f"Error type 1 in the data of {staff_id} {A} is not True")
error_list.append(1)
"""Second condition"""
if B >= 48:
print(f"Error type 2 in the data of {staff_id} {B} is not True")
error_list.append(2)
"""Third condition"""
for k in C:
if k < 0:
print(f"Error type 3 in the data of {staff_id} {k} is less than 0")
error_list.append(3)
"""Fourth condition"""
if D >= 3000000:
print(f"Error type 4 in the data of {staff_id} D is not true")
error_list.append(4)
"""Fifth condition"""
if tot >= wh:
print(f"Error type 5 in the data of {staff_id} E is not true")
error_list.append(5)
return print(error_list)

data_check("1a88s2s", 50000 ,71, 23, 28, 3,5, 12, 3500000)

``````

Does anybody has any suggestion ?

Many thanks

Solution

• `P_list` should be a set, not a list, based on its usage; it’s only used for membership checks. You can also pre-bind to the `__contains__` member to get a function reference you can call, constraining the set to one purpose.
• `C` should be a tuple based on its usage since it’s immutable
• Your `print` statements should not be mixed in with your logic, and should be separated to another method
• Consider representing your errors not as free integers, but as constrained enumeration values
• Use an `any` generator for your `C` negative check
• `180,000` is likely incorrect and should be `180_000`
• `return print` is meaningless and will always return `None`; instead consider returning your error values or yielding them as a generator
• `staff_id` shouldn’t be passed into `data_check` at all; it’s only used for formatting error messages which again can exist in a separate method
• Your letter-salad variable names are helping no one. You need to spell out what these actually mean.
• Moving your intermediate error-checking variables `C`, `wh` and `tot` closer to where they’re used is, I think, clearer

## Suggested

``````import enum
from enum import Enum
from typing import Iterable

in_p_list = frozenset((35_000, 50_000, 80_000, 120_000, 150_000, 180_000)).__contains__

@enum.unique
class ErrorType(Enum):
REJECTED_A = 1
B_TOO_HIGH = 2
NEGATIVE_C = 3
D_TOO_HIGH = 4
TOTAL_TOO_HIGH = 5

def data_check(
A: int,
B: int,
x: int,
y: int,
z: int,
t: int,
u: int,
D: int,
) -> Iterable[ErrorType]:
if not in_p_list(A):
yield ErrorType.REJECTED_A

if B >= 48:
yield ErrorType.B_TOO_HIGH

C = (x, y, z, t, u)
if any(k < 0 for k in C):
yield ErrorType.NEGATIVE_C

if D >= 3_000_000:
yield ErrorType.D_TOO_HIGH

wh = x + y
tot = z + t
if tot >= wh:
yield ErrorType.TOTAL_TOO_HIGH

def print_errors(staff_id: str, errors: Iterable[ErrorType]) -> None:
for error in errors:
print(
f'Error type {error.value} in the data of {staff_id}: {error.name}'
)

def test() -> None:
print_errors("1a88s2s", data_check(50000, 71, 23, 28, 3, 5, 12, 3500000))

if __name__ == '__main__':
test()
``````

## Output

``````Error type 2 in the data of 1a88s2s: B_TOO_HIGH
Error type 4 in the data of 1a88s2s: D_TOO_HIGH
``````

Like keep the style of identifiers consistent. `data_check, Staff_id, A ,B, x, y, z, t, u, D` – what style is common for all those names?

### You can pass a list as an argument

No need to write so many arguments and gather them in a list – you can gather them when you call your function and pass only one `c` argument. The call will look like

``````data_check(staff_id, A, B,  [x, y, z, t, u], D)
``````

Also you can gather all trailing arguments in a list with * (the call whouln’t change).

### Use the loop

All the conditions are simple (one expression), all the messages are almost uniform. Thus, you can use the loop like this:

``````def data_check(staff_id, *args):
conditions = [lambda x: x not in P_list,
lambda x: x>=48, #are you trying to compare with '0'?
lambda x: any(i<0 for i in x),
...
]
result = []
for i, (condition, arg) in enumerate(zip(conditions, args), 1):
if not condition(arg):
result.append(i)
printf(f"Error type {i} in the data of {staff_id}: {arg} failed")
return result
``````

You can change lambdas to local functions; this can make the code more readable (with correct names):

``````def data_check(staff_id, *args):
def not_in_p_list(x):
return x not in P_list
def greater_then_48(x):
return x>48
def any_less_then_0(x):
return any(i<0 for i in x)
conditions = [not_in_p_list,
greater_then_48,
any_less_then_0,
...
]
``````

Of course, you can (and should) give conditions more informative names, like `not_listed_in_registry` or `negative_weight`.