Problem
I have written two functions- first one is to read files from SFTP and second one is to read files from Dropbox.
Both functions have some similar line of code like validating extension and reading/saving file that can be extracted into a separated function and thus need your suggestion how to do it.
PFB my both functions-
class SftpHelper:
def fetch_file_from_sftp(self, file_name, sheet_name=0):
valid_extensions = ['csv', 'xls', 'xlsx']
extension = file_name.split('.')[-1]
sftp, transport = self.connect_to_sftp()
remote_path = self.remote_dir + file_name
data = io.BytesIO()
sftp.getfo(remote_path, data, callback=None)
if extension == 'csv':
file_df = pd.read_csv(io.BytesIO(data.getvalue()))
else:
file_df = pd.read_excel(io.BytesIO(data.getvalue()), sheet_name=sheet_name)
self.close_sftp_connection(sftp, transport)
return file_df
class DropBoxHelper:
def read_file_from_dropbox(self, file_name, sheet_name=0):
valid_extensions = ['csv', 'xls', 'xlsx']
extension = file_name.split('.')[-1]
dbx = self.connect_to_dropbox()
metadata,data=dbx.files_download(file_name)
if extension == 'csv':
file_df = pd.read_csv(io.BytesIO(data.content))
else:
file_df = pd.read_excel((io.BytesIO(data.content)), sheet_name=sheet_name)
return file_df
Can anyone please help me to extract the common logix to a seperate function and then use that one in my two functions?
Solution
Since you are already using classes here, you could derive from a base class that have the shared behaviour and delegate to derived classes the specific behaviour of the connection:
from os.path import splitext
class _RemoteHelper
def file_reader(self, file_name, sheet_name=0):
_, extension = splitext(file_name)
data = self._internal_file_reader(file_name)
if extension == 'csv':
return pd.read_csv(io.BytesIO(data))
else:
return pd.read_excel((io.BytesIO(data)), sheet_name=sheet_name)
class SftpHelper(_RemoteHelper):
def _internal_file_reader(self, file_name):
data = io.BytesIO()
sftp, transport = self.connect_to_sftp()
sftp.getfo(self.remote_dir + file_name, data, callback=None)
self.close_sftp_connection(sftp, transport)
return data.getvalue()
class DropBoxHelper(_RemoteHelper):
def _internal_file_reader(self, file_name):
dbx = self.connect_to_dropbox()
_, data = dbx.files_download(file_name)
return data.content
This have the neat advantage of harmonizing the interfaces accros both classes.
From looking at the code, some things seem off:
valid_extensions
is defined, but not usedconnect_to_sftp()
,self.remote_dir
,io.BytesIO()
,sftp.getfo()
,pd
,self.close_sftp_connection()
and a bunch of other functions/fields are not defined
That being said, the core problem is addressed by creating a parent class which both your classes can inherit from. It’d look something like this:
class FileHelper:
def parse_fetched_file(self, file_name, data, sheet_name):
valid_extensions = ['csv', 'xls', 'xlsx']
extension = file_name.split('.')[-1]
if extension == 'csv':
return pd.read_csv(io.BytesIO(data.content))
return pd.read_excel((io.BytesIO(data.content)), sheet_name=sheet_name)
class SftpHelper(FileHelper):
def fetch_file_from_sftp(self, file_name, sheet_name = 0):
sftp, transport = self.connect_to_sftp()
remote_path = self.remote_dir + file_name
data = io.BytesIO()
sftp.getfo(remote_path, data, callback=None)
file_df super(SftpHelper, self).parse_fetched_file(file_name, data, sheet_name)
self.close_sftp_connection(sftp, transport)
return file_df
class DropBoxHelper(FileHelper):
def read_file_from_dropbox(self, file_name, sheet_name = 0):
dbx = self.connect_to_dropbox()
metadata, data = dbx.files_download(file_name)
return super(DropBoxHelper, self).parse_fetched_file(file_name, data, sheet_name)
I’m not 100% sure that it’s the most efficient syntax, but it gets the job done.