1. Install Python: If you haven't already, download and install the latest version of Python from the official website: ​https://www.python.org/downloads/

  2. Install pandas: Open a terminal (Command Prompt on Windows or Terminal on macOS/Linux) and run the following command to install the pandas library, which the script depends on:

    1pip install pandas
  3. Save the script: Copy the provided Python script and save it as a .py file, for example, ConvertArray2MultipleRows.py

  4. Prepare the input CSV file: Make sure your input CSV file is formatted correctly, with array values enclosed in double quotes. For example:

    1name,age,colors 2Akash,25,"[red,blue,green]" 3Laskshmi,30,"[yellow,purple]"

    Save the CSV file, for example, as input.csv.

  5. Run the script: In the terminal, navigate to the folder where you saved the ConvertArray2MultipleRows.py script using the cd command. For example:

    1cd path/to/your/script/folder

    Replace path/to/your/script/folder with the actual path to the folder containing the script.

  6. Execute the script by running the following command in the terminal:

    1python ConvertArray2MultipleRows.py
  7. Provide the input and output file paths: The script will prompt you to enter the input CSV file path and the output CSV file path. Enter the paths and press Enter. For example:

    1Please enter input CSV file name(along with path): input.csv

    Replace input.csv with the actual file paths if they are located in different folders.

  8. Then the program starts conversion and displays the number of arrays converted as

    1Total number of Arrays found and Converted = 49841
  9. Finally enter the Output file name where it has to be stored when prompted

    1Please enter the output CSV file name: Output.csv
  10. Check the output: The script will process the input CSV file and create a new output CSV file with the specified file path. Open the output CSV file to verify the result.


Python Code (ConvertArray2MultipleRows.py)

1import ast 2import pandas as pd 3import re 4 5# Global counter to know how many arrays are present 6v_occurances = 0 7 8#To check if a entry is an array 9def is_array(value): 10 return isinstance(value, (list, tuple)) or re.match(r'\[.*\]', str(value)) 11 12#Which columns are having arrays as values 13def find_array_columns(df): 14 array_columns = [] 15 for column in df.columns: 16 if df[column].apply(is_array).any(): 17 array_columns.append(column) 18 return array_columns 19 20#Converting arrays to list 21def convert_to_list(value): 22 global v_occurances 23 if is_array(value) and re.search(r'\[.*?,.*?\]',str(value)): 24 v_occurances = v_occurances + 1 25 ret = ast.literal_eval(value) 26 return ret 27 return [value] 28 29#Splitting the arrays 30def split_rows_by_array(df, array_columns): 31 for column in array_columns: 32 df[column] = df[column].apply(convert_to_list) 33 df = df.explode(column).reset_index(drop=True) 34 return df 35 36def main(): 37 # Get the input CSV file path from the user 38 file_path = input("Please enter input CSV file name(along with path): ") 39 # Read the CSV file 40 df = pd.read_csv(file_path,low_memory=False) 41 # Find columns containing array values 42 array_columns = find_array_columns(df) 43 # Split the rows based on the array values in the identified columns 44 new_df = split_rows_by_array(df, array_columns) 45 print('Total number of Arrays Found and Converted = ',v_occurances) 46 # Get the output CSV file path from the user 47 output_file_path = input("Please enter the output CSV file name: ") 48 49 # Save the resulting dataframe to the specified output CSV file 50 new_df.to_csv(output_file_path, index=False) 51 52if __name__ == '__main__': 53 main() 54