I have a 'processing' function and a 'serializing' function. Currently the processor returns 4 different types of data structures to be serialized in different ways.
Looking for the best practise on what to do here.
def process(input):
...
return a,b,c,d
def serialize(a,b,c):
...
# Different serialization patterns for each of a-c.
a,b,c,d = process(input)
serialize(a,b,c)
go_on_to_do_other_things(d)
That feels janky.
Should I instead use a class where a,b,c,d
are member variables?
class VeryImportantDataProcessor:
def process(self,input):
self.a = ...
self.b = ...
...
def serialize(self):
s3.write(self.a)
convoluted_serialize(self.b)
...
vipd = VeryImportantDataProcessor()
vipd.process(input)
vipd.serialize()
Keen to hear your thoughts on what is best here!
Note after processing and serializing, the code goes on to use variable d
for further unrelated shenanigans. Not sure if that changes anything.
a
,b
andc
get used outside ofprocess
andserialize
? Or is the "point" of this code to returnd
, with serialization of some values as a side effect, anda
,b
andc
migrated to the API by necessity of implementation rather than by design?a
,b
, andc
are processed products of a raw data stream, serialized for other live APIs to pull down for use in their different tasks. Theprocess
function here is essentially the SQL-like data manipulation in Spark. After this stage we're done with Spark processing.d
is another related subset of the data, but it goes on to additional steps (ML model training)