Function to extract float from different price patterns

Posted on

Problem

I’ve got a project where I get product data and need to get the prices which come in, in different formats.

Some examples would be: US$17, USD17.00, 17,00€, 17€, GBP17, Only 17,-€, 17.000,00€, 17,000.00$ etc.

So at the beginning I started with one specific string to float function and kept on adding code for specific use cases.
I’m sure the code looks horrible and I can see already ways to improve it, but I wanted to get your opinions in the first place.

def convertPriceIntoFloat ( myString ):
    myString = myString.strip()

    # 1.298,90 €
    if "€" in myString and "." in myString and "," in myString:
        myString = (myString.replace('€', '')).strip()
        myString = (myString.replace('.', '')).strip()
        float_price = float(myString.replace(',', '.'))
        return(float_price)
    if "€" in myString and "*" in myString and "ab" in myString:
        myString = (myString.replace('€', '')).strip()
        myString = (myString.replace('*', '')).strip()
        myString = (myString.replace('ab', '')).strip()
        float_price = float(myString.replace(',', '.'))
        return(float_price)
    if "€" in myString and "ab" in myString:
        myString = (myString.replace('€', '')).strip()
        myString = (myString.replace('ab', '')).strip()
        if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
            # thousand EURO or more
            myString = (myString.replace('.', '')).strip()
            float_price = float(myString.replace(',', '.'))
        else:    
            float_price = float(myString.replace(',', '.'))
        return(float_price)

    # 599,- €
    if ",-" in myString and "€" in myString:
        myString = (myString.replace('€', '')).strip()
        myString = (myString.replace(',-', '.00')).strip()
        if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
            # thousand EURO or more
            myString = (myString.replace('.', '')).strip()
            float_price = float(myString.replace(',', '.'))
        else:    
            float_price = float(myString.replace(',', '.'))
        return(float_price)

    # ↵179,89 €↵*↵
    if "€" in myString and "*" in myString:
        myString = (myString.replace('€', '')).strip()
        myString = (myString.replace('*', '')).strip()
        if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
            # thousand EURO or more
            myString = (myString.replace('.', '')).strip()
            float_price = float(myString.replace(',', '.'))
        else:    
            float_price = float(myString.replace(',', '.'))
        return(float_price)

    # ab 223,90 EUR
    if "EUR" in myString and "ab" in myString: 
        myString = (myString.replace('EUR', '')).strip()
        myString = (myString.replace('ab', '')).strip()
        if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
            # thousand EURO or more
            myString = (myString.replace('.', '')).strip()
            float_price = float(myString.replace(',', '.'))
        else:    
            float_price = float(myString.replace(',', '.'))
        return(float_price)

    if "EUR" in myString: 
        # GB Pound
        myString = (myString.replace('EUR', '')).strip()
        if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
            # thousand EURO or more
            myString = (myString.replace('.', '')).strip()
            float_price = float(myString.replace(',', '.'))
        else:    
            float_price = float(myString.replace(',', '.'))
        return(float_price)

    if "CHF" in myString: 
        # CHF Schweiz
        myString = (myString.replace('CHF', '')).strip()
        if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
            # thousand Franks or more
            myString = (myString.replace('.', '')).strip()
            float_price = float(myString.replace(',', '.'))
        else:    
            float_price = float(myString.replace(',', '.'))
        return(float_price)

    if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
        # thousand EURO or more, coming in as a float already
        myString = (myString.replace('.', '')).strip()
        float_price = float(myString.replace(',', '.'))
        return(float_price)

    # 122,60 £
    if "£" in myString: 
        # remove GB Pound sign
        myString = (myString.replace('£', '')).strip()

        if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
            # thousand GB Pounds or more
            myString = (myString.replace('.', '')).strip()
            float_price = float(myString.replace(',', '.'))
        # 122,60 £
        if re.match('^d{1,3},d{2}$', myString) is not None:
            # 
            myString = (myString.replace('.', '')).strip()
            float_price = float(myString.replace(',', '.'))
        return(float_price)
    if  "$" in myString: 
        # GB Pound
        myString = (myString.replace('$', '')).strip()
        float_price = float(myString.replace(',', ''))
        return(float_price)
    if ",-" in myString: 
        float_price = float(myString.replace(',-', '.00'))
        return(float_price)
    if re.match('^d{1,3},d{2}$', myString) is not None:
        float_price = float(myString.replace(',', '.'))
        return(float_price)
    if " " in myString and "&#8364" in myString:
        return ( getPriceFromCommaString ( myString ) )
    # UVP: 44,95 EURO
    if "UVP:" in myString and "EURO" in myString:
        myString = (myString.replace('UVP:', '')).strip()
        myString = (myString.replace('EURO', '')).strip()
        float_price = float(myString.replace(',', '.'))
        return(float_price)
    # 22,99 €
    # € 1.199,99
    if "€" in myString:
        myString = (myString.replace('€', '')).strip()
        if re.match('^d{1,3}.d{3},d{2}$', myString) is not None:
            # thousand EURO or more
            myString = (myString.replace('.', '')).strip()
            float_price = float(myString.replace(',', '.'))
        else:    
            float_price = float(myString.replace(',', '.'))
        return(float_price)
    else:
        return(myString)

If anybody knows a Python library that does the same thing, I’d be happy to flip as well.

Solution

I agree that this is a little complicated.

I’d recommend you write a set of tests, and let those guide the complexity of the code. The tests would be simple, like,

assertEq convertPriceIntoFloat("1298,90"), 1298.9
assertEq convertPriceIntoFloat("1.298,90"), 1298.9
assertEq convertPriceIntoFloat("1.298,90 €"), 1298.9
...

Then, start out with a simple float conversion in your code, and see if that works, then add test cases and only add code as you need it. If things do seem to be getting overly complicated, refactor… you’ll have tests that let you do that easily.

Good luck.

  • From my untrained eye it looks like a simple regex would help ease the problem.

    ^(.*?)([d.,]+)(.*)$
    

    This is as it results in the following output:

    >>> pprint([re.match('^(.*?)([d.,]+)(.*)$', i).groups() for i in ('US$17', 'USD17.00', '17,00€', '17€', 'GBP17', 'Only 17,-€', '17.000,00€', '17,000.00$')])
    [('US$', '17', ''),
     ('USD', '17.00', ''),
     ('', '17,00', '€'),
     ('', '17', '€'),
     ('GBP', '17', ''),
     ('Only ', '17,', '-€'),
     ('', '17.000,00', '€'),
     ('', '17,000.00', '$')]
    
  • Now that we have the money all that is left is to convert it to a float.

    Since you have thousands separators then you can’t just use float. And so if you pass the ‘thousand separator’ and the ‘decimal place’ to the function and use str.translate then you can convert the code into the form you want.

import re


def _extract_price(value):
    match = re.match('^(.*?)([d.,]+)(.*)$', value)
    if match is None:
        raise ValueError("Can't extract price")
    return match.groups()


def _parse_price(price, thousand, decimal):
    trans = str.maketrans(decimal, '.', thousand)
    return float(price.translate(trans))


def parse_price(value):
    prefix, price, suffix = _extract_price(value)
    if '€' in prefix + suffix:
        thousand = '.'
        decimal = ','
    else:
        thousand = ','
        decimal = '.'
    return _parse_price(price, thousand, decimal)
>>> [parse_price(i) for i in ('US$17', 'USD17.00', '17,00€', '17€', 'GBP17', 'Only 17,-€', '17.000,00€', '17,000.00$')]
[17.0, 17.0, 17.0, 17.0, 17.0, 17.0, 17000.0, 17000.0]

If anybody knows a Python library that does the same thing, I’d be happy to flip as well.

I suggest you use the “Price and currency parsing utility” –

Money Parser is a price and currency parsing utility.

It provides methods to extract price and currency information from the
raw string.

There is a lot of different price and currency formats that present
values with separators, spacing, etc.

This library may help you to parse such data.

Here are some examples of what it can do –

>>> price_str("1.298,90 €")
'1298.90'

>>> price_str("599,- €")
'599'

>>> price_str("↵179,89 €↵*↵")
'179.89'

>>> price_str("ab 223,90 EUR")
'223.90'

>>> price_str("122,60 £")
'122.60'

>>> price_str("UVP: 44,95 EURO")
'44.95'

>>> price_str("22,99 €")
'22.99'

>>> price_str(None, default='0')
'0'

>>> price_str("€ 1.199,99")
'1199.99'

NOTES –

Open Command Prompt and, if you have Python version >= 3.4, then install the Money Parser module using – pip install money-parser.

Open the Python IDLE and call the module – from money_parser import price_str

Try out an example from above and you’ll know that you have achieved your desired results.

Hope this helps!

Leave a Reply

Your email address will not be published. Required fields are marked *