Problem
I’ve got a project where I get product data and need to get the prices which come in, in different formats.
Some examples would be: US$17, USD17.00, 17,00€, 17€, GBP17, Only 17,-€, 17.000,00€, 17,000.00$ etc.
So at the beginning I started with one specific string to float function and kept on adding code for specific use cases.
I’m sure the code looks horrible and I can see already ways to improve it, but I wanted to get your opinions in the first place.
def convertPriceIntoFloat ( myString ):
myString = myString.strip()
# 1.298,90 €
if "€" in myString and "." in myString and "," in myString:
myString = (myString.replace('€', '')).strip()
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
return(float_price)
if "€" in myString and "*" in myString and "ab" in myString:
myString = (myString.replace('€', '')).strip()
myString = (myString.replace('*', '')).strip()
myString = (myString.replace('ab', '')).strip()
float_price = float(myString.replace(',', '.'))
return(float_price)
if "€" in myString and "ab" in myString:
myString = (myString.replace('€', '')).strip()
myString = (myString.replace('ab', '')).strip()
if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
# thousand EURO or more
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
else:
float_price = float(myString.replace(',', '.'))
return(float_price)
# 599,- €
if ",-" in myString and "€" in myString:
myString = (myString.replace('€', '')).strip()
myString = (myString.replace(',-', '.00')).strip()
if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
# thousand EURO or more
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
else:
float_price = float(myString.replace(',', '.'))
return(float_price)
# ↵179,89 €↵*↵
if "€" in myString and "*" in myString:
myString = (myString.replace('€', '')).strip()
myString = (myString.replace('*', '')).strip()
if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
# thousand EURO or more
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
else:
float_price = float(myString.replace(',', '.'))
return(float_price)
# ab 223,90 EUR
if "EUR" in myString and "ab" in myString:
myString = (myString.replace('EUR', '')).strip()
myString = (myString.replace('ab', '')).strip()
if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
# thousand EURO or more
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
else:
float_price = float(myString.replace(',', '.'))
return(float_price)
if "EUR" in myString:
# GB Pound
myString = (myString.replace('EUR', '')).strip()
if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
# thousand EURO or more
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
else:
float_price = float(myString.replace(',', '.'))
return(float_price)
if "CHF" in myString:
# CHF Schweiz
myString = (myString.replace('CHF', '')).strip()
if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
# thousand Franks or more
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
else:
float_price = float(myString.replace(',', '.'))
return(float_price)
if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
# thousand EURO or more, coming in as a float already
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
return(float_price)
# 122,60 £
if "£" in myString:
# remove GB Pound sign
myString = (myString.replace('£', '')).strip()
if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
# thousand GB Pounds or more
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
# 122,60 £
if re.match('^d{1,3},d{2}$', myString) is not None:
#
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
return(float_price)
if "$" in myString:
# GB Pound
myString = (myString.replace('$', '')).strip()
float_price = float(myString.replace(',', ''))
return(float_price)
if ",-" in myString:
float_price = float(myString.replace(',-', '.00'))
return(float_price)
if re.match('^d{1,3},d{2}$', myString) is not None:
float_price = float(myString.replace(',', '.'))
return(float_price)
if " " in myString and "€" in myString:
return ( getPriceFromCommaString ( myString ) )
# UVP: 44,95 EURO
if "UVP:" in myString and "EURO" in myString:
myString = (myString.replace('UVP:', '')).strip()
myString = (myString.replace('EURO', '')).strip()
float_price = float(myString.replace(',', '.'))
return(float_price)
# 22,99 €
# € 1.199,99
if "€" in myString:
myString = (myString.replace('€', '')).strip()
if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
# thousand EURO or more
myString = (myString.replace('.', '')).strip()
float_price = float(myString.replace(',', '.'))
else:
float_price = float(myString.replace(',', '.'))
return(float_price)
else:
return(myString)
If anybody knows a Python library that does the same thing, I’d be happy to flip as well.
Solution
I agree that this is a little complicated.
I’d recommend you write a set of tests, and let those guide the complexity of the code. The tests would be simple, like,
assertEq convertPriceIntoFloat("1298,90"), 1298.9
assertEq convertPriceIntoFloat("1.298,90"), 1298.9
assertEq convertPriceIntoFloat("1.298,90 €"), 1298.9
...
Then, start out with a simple float
conversion in your code, and see if that works, then add test cases and only add code as you need it. If things do seem to be getting overly complicated, refactor… you’ll have tests that let you do that easily.
Good luck.
-
From my untrained eye it looks like a simple regex would help ease the problem.
^(.*?)([d.,]+)(.*)$
This is as it results in the following output:
>>> pprint([re.match('^(.*?)([d.,]+)(.*)$', i).groups() for i in ('US$17', 'USD17.00', '17,00€', '17€', 'GBP17', 'Only 17,-€', '17.000,00€', '17,000.00$')]) [('US$', '17', ''), ('USD', '17.00', ''), ('', '17,00', '€'), ('', '17', '€'), ('GBP', '17', ''), ('Only ', '17,', '-€'), ('', '17.000,00', '€'), ('', '17,000.00', '$')]
-
Now that we have the money all that is left is to convert it to a float.
Since you have thousands separators then you can’t just use
float
. And so if you pass the ‘thousand separator’ and the ‘decimal place’ to the function and usestr.translate
then you can convert the code into the form you want.
import re
def _extract_price(value):
match = re.match('^(.*?)([d.,]+)(.*)$', value)
if match is None:
raise ValueError("Can't extract price")
return match.groups()
def _parse_price(price, thousand, decimal):
trans = str.maketrans(decimal, '.', thousand)
return float(price.translate(trans))
def parse_price(value):
prefix, price, suffix = _extract_price(value)
if '€' in prefix + suffix:
thousand = '.'
decimal = ','
else:
thousand = ','
decimal = '.'
return _parse_price(price, thousand, decimal)
>>> [parse_price(i) for i in ('US$17', 'USD17.00', '17,00€', '17€', 'GBP17', 'Only 17,-€', '17.000,00€', '17,000.00$')]
[17.0, 17.0, 17.0, 17.0, 17.0, 17.0, 17000.0, 17000.0]
If anybody knows a Python library that does the same thing, I’d be happy to flip as well.
I suggest you use the “Price and currency parsing utility” –
Money Parser is a price and currency parsing utility.
It provides methods to extract price and currency information from the
raw string.There is a lot of different price and currency formats that present
values with separators, spacing, etc.This library may help you to parse such data.
Here are some examples of what it can do –
>>> price_str("1.298,90 €")
'1298.90'
>>> price_str("599,- €")
'599'
>>> price_str("↵179,89 €↵*↵")
'179.89'
>>> price_str("ab 223,90 EUR")
'223.90'
>>> price_str("122,60 £")
'122.60'
>>> price_str("UVP: 44,95 EURO")
'44.95'
>>> price_str("22,99 €")
'22.99'
>>> price_str(None, default='0')
'0'
>>> price_str("€ 1.199,99")
'1199.99'
NOTES –
Open Command Prompt
and, if you have Python version >= 3.4, then install the Money Parser module using – pip install money-parser
.
Open the Python IDLE and call the module – from money_parser import price_str
Try out an example from above and you’ll know that you have achieved your desired results.
Hope this helps!