DBI_RS_Datalake

Overview

This process allows data to be unloaded from Redshift and stored in an AWS S3 datalake allowing users to leverage Athena and Spectrum for query access.  The data will be stored in S3 in the parquet format, and if you desire, partitioned which provides end user query performance.  The command line option requires the -f, -s and -b option.  If you do not supply the -t option, then all of the tables in the Redshift schema are replicated to AWS S3.  

The command line option is as follows:

DBI_RS_Datalake.sh -f <configuration file name> -s <Redshift Schema> -t Redshift Table> -b <Datalake S3 Bucket> -p <Datalake Partition>

Advertisements