Comment by raw_anon_1111
6 hours ago
You realize “the pipeline” you have to build is literally just Athena SQL statement “Create table select * from…”. Yes you can run this directly from S3 and it will create one big file
https://docs.aws.amazon.com/athena/latest/ug/ctas.html
I have a sneaking suspicion that you are trying to use Redshift as a traditional OLTP database. Are you also normalizing your table like an OLTP database instead of like an OLAP
https://fiveonefour.com/blog/OLAP-on-Tap-The-Art-of-Letting-...
And if you are using any OLAP database for OLTP, you’re doing it wrong. It’s also a simple “process” to move data back and forth between Aurora MySQL or Postgres by federating your OlTP database with Athena (handwavy because I haven’t done it) or the way I have done it is use one Select statement to export to S3 and another to export into your OLTP database.
And before you say you shouldn’t have to do this, you have always needed some process to take data from your normalized data to un normalized form for reporting and analytics.
Source: doing boring enterprise stuff including databases since 1996 and been working for 8 years with AWS services outside AWS (startups and consulting companies) and inside AWS (Professional Services no longer there)
Why are you doing this manually? There is a built in way of doing Kinesis Data Streams to Redshift
https://docs.aws.amazon.com/streams/latest/dev/using-other-s...
Also by default, while you can through Glue Catalog have S3 directly as a destination for Redshift, by default it definitely doesn’t use S3.
No comments yet
Contribute on Hacker News ↗