[Good First Issue]🎆Add npz format output in transform module #33

MooooCat · 2023-10-26T03:02:44Z

🚅Search before asking

I have searched for issues similar to this one.

🚅Description

In the transformer_opt module, add this function in the method to write the np.ndarray format data output by the module to disk in npz format.

Compared with directly writing the entire csv file, this function can effectively save hard disk space. Since the transformer_opt module has already processed the csv file in batches, writing npz files in each batch can reduce repeated batches in the processing of other modules in the future. The operation is also more convenient for parallel processing.

🏕Solution

Modifications to this issue should be located in the sdgx/transform/transformer_opt.py path.

Please find the _synchronous_transform method in class DataTransformer, it is necessary to add the parameter output_typeto determine the storage type.

🍰Detail

For the coding implementation details of Issue, please refer to the comments in the following code block:

# ISSUE DESCRIPTION add the parameter `output_type`to
def _synchronous_transform(self, input_data_path,
                           column_transform_info_list, 
                           output_path,
                           output_type): # ISSUE DESCRIPTION new args
    """Method Description ... """
    
    loop = True
    # has_write_header = True
    # use iterator = True 
    reader =  pd.read_csv(input_data_path, iterator=True, chunksize= 1000000)
    
    while loop:
        # Existing Code ... 
        # ISSUE DESCRIPTION Some codes are omitted due to space reasons 
        
        # ISSUE DESCRIPTION Add your code here
        chunk_array = np.concatenate(column_data_list, axis=1).astype(float)
        # file object 
        f = open(output_path , 'a')
        np.savetxt(f, chunk_array, fmt="%g", delimiter= ',')
        f.close()
    # end while

🍰Example

TBD

The text was updated successfully, but these errors were encountered:

MooooCat added good first issue Good for newcomers help wanted Extra attention is needed difficulty-easy labels Oct 26, 2023

MooooCat changed the title ~~🎆 Add npz format output in transform module~~ [Good First Issue]🎆Add npz format output in transform module Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Good First Issue]🎆Add npz format output in transform module #33

[Good First Issue]🎆Add npz format output in transform module #33

MooooCat commented Oct 26, 2023 •

edited

Loading

[Good First Issue]🎆Add npz format output in transform module #33

[Good First Issue]🎆Add npz format output in transform module #33

Comments

MooooCat commented Oct 26, 2023 • edited Loading

🚅Search before asking

🚅Description

🏕Solution

🍰Detail

🍰Example

MooooCat commented Oct 26, 2023 •

edited

Loading