Comment by raw_anon_1111

2 months ago

Also with Redshift - split the file up before ingestion to equal the number of nodes or combine a lot of small files into larger files before putting them into S3 and/or use an Athena CTAS command to combine a lot of small files into one big file.

So in my other case, the whole thing was

Web crawler (internal customer website) using Playwrite -> S3 -> SNS -> SQS -> Lambda (embed with Bedrock) -> S3 Vector Store.

Similar to what you said, I ran into Bedrock embedding service limits. Then once I told it that, it knew how to adjust the lambda concurrency limits. Of course I had to tell it to also adjust the sqs poller so messages wouldn’t be backed up in flight, then go to the DLQ without ever being processed.

2 comments

raw_anon_1111

Mooshux 2 months ago

[flagged]