Skip to content

Latest commit

 

History

History
88 lines (74 loc) · 2.84 KB

athena.md

File metadata and controls

88 lines (74 loc) · 2.84 KB

Athena

Provides access to Amazon Athena service using SQL queries via a JDBC driver.

Link csvs from s3 with athena

Simple steps to link CSVs with Athena and get results in Datagrok:

  1. Upload CSVs to an S3 bucket. Note that Athena looks into S3 folder, not file, so if CSVs have different structure, they should be located in separate folders. For example:

     Bucket s3://athena-northwind/
       orders/
         orders.csv
       products/
         products.csv
    
  2. Create a bucket or folder in the existing bucket for Athena Output. For example:

    s3://athena-northwind/results/
    
  3. Create table in Athena console. UI builds SQL query for creating table in Athena. Following example for Northwind "products.csv":

    CREATE EXTERNAL TABLE IF NOT EXISTS northwind.products (
      `productid` int,
      `productname` string,
      `supplierid` int,
      `categoryid` int,
      `quantityperunit` string,
      `unitprice` double,
      `unitsinstock` int,
      `unitsonorder` int,
      `reorderlevel` int,
      `discontinued` int
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
    WITH SERDEPROPERTIES (
      'serialization.format' = ',',
      'field.delim' = ','
    ) LOCATION 's3://athena-northwind/products/'
    TBLPROPERTIES (
      'has_encrypted_data' = 'false',
      'skip.header.line.count' = '1'
    );

    Tips:

    • To change CSV delimiter adjust "serialization.format" and "field.delim" parameters
    • To skip the header line add "skip.header.line.count" parameter
  4. Create a data connection in the Datagrok platform. The parameters may include: "server", "db", "port", "s3OutputLocation", "s3OutputEncOption", "accessKey", "secretKey", or "connString". For example:

    • Name: northwind
    • Server: athena.us-east-2.amazonaws.com
    • Db: northwind
    • Port: 443
    • S3 Output Location: s3://athena-northwind/results/
    • Access Key:
    • Secret Key:
Notes:
  * Server has the following format: "athena.<region>.amazonaws.com"
  * Do not forget "/" at "S3 Output Location" parameter end 
  1. Create a data query under the new connection. For example:

    SELECT * FROM northwind.products

    Notes:

    • Athena requires DB name before table name "northwind.products"

See also: