Install Python: If you haven't already, download and install the latest version of Python from the official website: https://www.python.org/downloads/
Install pandas: Open a terminal (Command Prompt on Windows or Terminal on macOS/Linux) and run the following command to install the pandas library, which the script depends on:
pip install pandas
Save the script: Copy the provided Python script and save it as a
.py
file, for example, ConvertArray2MultipleRows.pyPrepare the input CSV file: Make sure your input CSV file is formatted correctly, with array values enclosed in double quotes. For example:
name,age,colors John,25,"[red,blue,green]" Sarah,30,"[yellow,purple]"
Save the CSV file, for example, as
input.csv
.Run the script: In the terminal, navigate to the folder where you saved the
ConvertArray2MultipleRows.py
script using thecd
command. For example:cd path/to/your/script/folder
Replace
path/to/your/script/folder
with the actual path to the folder containing the script.Execute the script by running the following command in the terminal:
python ConvertArray2MultipleRows.py
Provide the input and output file paths: The script will prompt you to enter the input CSV file path and the output CSV file path. Enter the paths and press Enter. For example:
Please enter input CSV file name(along with path): input.csv
Replace
input.csv
with the actual file paths if they are located in different folders.Then the program starts conversion and displays the number of arrays converted as
Total number of Arrays found and Converted = 49841
Finally enter the Output file name where it has to be stored when prompted
Please enter the output CSV file name: Output.csv
Check the output: The script will process the input CSV file and create a new output CSV file with the specified file path. Open the output CSV file to verify the result.
Python Code (ConvertArray2MultipleRows.py)
import ast import pandas as pd import re # Global counter to know how many arrays are present v_occurances = 0 #To check if a entry is an array def is_array(value): return isinstance(value, (list, tuple)) or re.match(r'\[.*\]', str(value)) #Which columns are having arrays as values def find_array_columns(df): array_columns = [] for column in df.columns: if df[column].apply(is_array).any(): array_columns.append(column) return array_columns #Converting arrays to list def convert_to_list(value): global v_occurances if is_array(value) and re.search(r'\[.*?,.*?\]',str(value)): v_occurances = v_occurances + 1 ret = ast.literal_eval(value) return ret return [value] #Splitting the arrays def split_rows_by_array(df, array_columns): for column in array_columns: df[column] = df[column].apply(convert_to_list) df = df.explode(column).reset_index(drop=True) return df def main(): # Get the input CSV file path from the user file_path = input("Please enter input CSV file name(along with path): ") # Read the CSV file df = pd.read_csv(file_path,low_memory=False) # Find columns containing array values array_columns = find_array_columns(df) # Split the rows based on the array values in the identified columns new_df = split_rows_by_array(df, array_columns) print('Total number of Arrays Found and Converted = ',v_occurances) # Get the output CSV file path from the user output_file_path = input("Please enter the output CSV file name: ") # Save the resulting dataframe to the specified output CSV file new_df.to_csv(output_file_path, index=False) if __name__ == '__main__': main()
Add Comment