AWS Glue Fatal exception com.amazonaws.services.glue.readers unable to parse file data.csv

Error in AWS Glue: Fatal exception com.amazonaws.services.glue.readers unable to parse file data.csv 

Resolution: This error comes when your csv is either not "UTF-8" encoded or in your "utf-8" encoded csv there are still some special unicode characters left (generally this happens when you convert csv from excel workbook by right clicking and save as csv). To see the special unicode characters, open your file in the Notepad++ and scan the file.
There are two ways to convert Xlsx to CSV UTF-8:
  1. Convert it from Excel
    • Open xlsx/csv in excel and go to file>save as
    • select Tools > web options
      • Go to Encoding > select "UTF-8"
    • Now upload the file in S3
    • Preview the file in S3
      • select the file > preview 
    • If every thing is fine, you will see the CSV data in the S3 preview
    • If your file has some special unicode characters, S3 will give below error
  1. Convert to UTF-8 programmatically (Best approach)
    1. Write a python script by importing xlrd library that will read your xlsx 
    1. Specify the encoding and save the converted file to csv utf-8
    2. Now upload the file in S3 and you will be able to preview the CSV data

Comments