Python functions to fetch spreadsheets through SFTP and from Dropbox [closed]

Posted on

Problem

I have written two functions- first one is to read files from SFTP and second one is to read files from Dropbox.

Both functions have some similar line of code like validating extension and reading/saving file that can be extracted into a separated function and thus need your suggestion how to do it.

PFB my both functions-

class SftpHelper:
    def fetch_file_from_sftp(self, file_name, sheet_name=0):
      valid_extensions = ['csv', 'xls', 'xlsx']
      extension = file_name.split('.')[-1]
      sftp, transport = self.connect_to_sftp()
      remote_path = self.remote_dir + file_name
      data = io.BytesIO()
      sftp.getfo(remote_path, data, callback=None)
      if extension == 'csv':
          file_df = pd.read_csv(io.BytesIO(data.getvalue()))
      else:
          file_df = pd.read_excel(io.BytesIO(data.getvalue()), sheet_name=sheet_name)
      self.close_sftp_connection(sftp, transport)
      return file_df

class DropBoxHelper:
    def read_file_from_dropbox(self, file_name, sheet_name=0):
      valid_extensions = ['csv', 'xls', 'xlsx']
      extension = file_name.split('.')[-1]
      dbx = self.connect_to_dropbox()
      metadata,data=dbx.files_download(file_name)
      if extension == 'csv':
          file_df = pd.read_csv(io.BytesIO(data.content))
      else:
          file_df = pd.read_excel((io.BytesIO(data.content)), sheet_name=sheet_name)
      return file_df

Can anyone please help me to extract the common logix to a seperate function and then use that one in my two functions?

Solution

Since you are already using classes here, you could derive from a base class that have the shared behaviour and delegate to derived classes the specific behaviour of the connection:

from os.path import splitext


class _RemoteHelper
    def file_reader(self, file_name, sheet_name=0):
        _, extension = splitext(file_name)
        data = self._internal_file_reader(file_name)

        if extension == 'csv':
            return pd.read_csv(io.BytesIO(data))
        else:
            return pd.read_excel((io.BytesIO(data)), sheet_name=sheet_name)


class SftpHelper(_RemoteHelper):
    def _internal_file_reader(self, file_name):
        data = io.BytesIO()
        sftp, transport = self.connect_to_sftp()
        sftp.getfo(self.remote_dir + file_name, data, callback=None)
        self.close_sftp_connection(sftp, transport)
        return data.getvalue()


class DropBoxHelper(_RemoteHelper):
    def _internal_file_reader(self, file_name):
        dbx = self.connect_to_dropbox()
        _, data = dbx.files_download(file_name)
        return data.content

This have the neat advantage of harmonizing the interfaces accros both classes.

From looking at the code, some things seem off:

  1. valid_extensions is defined, but not used
  2. connect_to_sftp(), self.remote_dir, io.BytesIO(), sftp.getfo(), pd, self.close_sftp_connection() and a bunch of other functions/fields are not defined

That being said, the core problem is addressed by creating a parent class which both your classes can inherit from. It’d look something like this:

class FileHelper:
    def parse_fetched_file(self, file_name, data, sheet_name):
        valid_extensions = ['csv', 'xls', 'xlsx']
        extension = file_name.split('.')[-1]
        if extension == 'csv':
            return pd.read_csv(io.BytesIO(data.content))
        return pd.read_excel((io.BytesIO(data.content)), sheet_name=sheet_name)


class SftpHelper(FileHelper):
    def fetch_file_from_sftp(self, file_name, sheet_name = 0):
        sftp, transport = self.connect_to_sftp()
        remote_path = self.remote_dir + file_name
        data = io.BytesIO()
        sftp.getfo(remote_path, data, callback=None)
        file_df super(SftpHelper, self).parse_fetched_file(file_name, data, sheet_name)
        self.close_sftp_connection(sftp, transport)
        return file_df

class DropBoxHelper(FileHelper):
    def read_file_from_dropbox(self, file_name, sheet_name = 0):
        dbx = self.connect_to_dropbox()
        metadata, data = dbx.files_download(file_name)
        return super(DropBoxHelper, self).parse_fetched_file(file_name, data, sheet_name)

I’m not 100% sure that it’s the most efficient syntax, but it gets the job done.

Leave a Reply

Your email address will not be published. Required fields are marked *