You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the transformer_opt module, add this function in the method to write the np.ndarray format data output by the module to disk in npz format.
Compared with directly writing the entire csv file, this function can effectively save hard disk space. Since the transformer_opt module has already processed the csv file in batches, writing npz files in each batch can reduce repeated batches in the processing of other modules in the future. The operation is also more convenient for parallel processing.
🏕Solution
Modifications to this issue should be located in the sdgx/transform/transformer_opt.py path.
Please find the _synchronous_transform method in class DataTransformer, it is necessary to add the parameter output_typeto determine the storage type.
🍰Detail
For the coding implementation details of Issue, please refer to the comments in the following code block:
# ISSUE DESCRIPTION add the parameter `output_type`todef_synchronous_transform(self, input_data_path,
column_transform_info_list,
output_path,
output_type): # ISSUE DESCRIPTION new args"""Method Description ... """loop=True# has_write_header = True# use iterator = True reader=pd.read_csv(input_data_path, iterator=True, chunksize=1000000)
whileloop:
# Existing Code ... # ISSUE DESCRIPTION Some codes are omitted due to space reasons # ISSUE DESCRIPTION Add your code herechunk_array=np.concatenate(column_data_list, axis=1).astype(float)
# file object f=open(output_path , 'a')
np.savetxt(f, chunk_array, fmt="%g", delimiter=',')
f.close()
# end while
🍰Example
TBD
The text was updated successfully, but these errors were encountered:
🚅Search before asking
I have searched for issues similar to this one.
🚅Description
In the transformer_opt module, add this function in the method to write the np.ndarray format data output by the module to disk in npz format.
Compared with directly writing the entire csv file, this function can effectively save hard disk space. Since the transformer_opt module has already processed the csv file in batches, writing npz files in each batch can reduce repeated batches in the processing of other modules in the future. The operation is also more convenient for parallel processing.
🏕Solution
Modifications to this issue should be located in the
sdgx/transform/transformer_opt.py
path.Please find the
_synchronous_transform
method in classDataTransformer
, it is necessary to add the parameteroutput_type
to determine the storage type.🍰Detail
For the coding implementation details of Issue, please refer to the comments in the following code block:
🍰Example
TBD
The text was updated successfully, but these errors were encountered: