AWS Glue crawler cannot extract CSV headers properly

Scenario: You have an UTF-8 encoded CSV stored at S3. You ran a Glue crawler to create a metadata table and further read the table in Athena. When you query the table in Athen, you don't get the headers instead every column is named as (col1, col2, col3,...so on)
Solution: 
  • AWS Glue has limitations with column headers because it expects the columns in hive format. you cannot use special characters (e.g: %) and spaces in the columns. 
  • Remove all the special characters and spaces from your columns. you can instead use underscore to separate spaces (e.g.: first_name, last_name)
  • Once you have renamed your columns, re-upload the csv in the S3 and re-run the Glue Crawler.
  • You will be able to see the table with proper headers


Comments