Comment by imcoconut

3 years ago

it's pretty simple but on various consulting jobs I've had to build SQL databases sometimes with lot's of tables with lot's of columns. Sometimes we switch from on prem to cloud, or vice versa or switch from postgres to sql server, etc. I have this toolkit that automates a lot of the tedious stuff. it allows me to take pandas dataframes and do the following:

- auto detect and convert column types

- save as a parquet file in a folder

- then autogenerate a sqlalchemy table/metadata file in python for all tables with sensible defaults for column types (e.g. 2x the longest string in a column for varchar)

- build the db and all tables

- load data from the files into the tables

this makes it really easy to bootstrap the entire db from a folder of parquet files for testing with sqlite and then makes it easy to move to prod on postgres/sqlserver etc. Before I go to prod i still have to add constraints and keys and indexes but that doesn't take too long. and for dev/testing the data's not too big so performance doesn't really suffer from lack of keys/constraints then we can use something like alembic on the big sqlalchemy tables definition file to do db migrations.

it's kind of like this: https://github.com/agronholm/sqlacodegen but solving an inverse problem.

basically it bootstraps the db and schemas and gets me like 95% of the way there. my quality of life is better with it.

0 comments

imcoconut

No comments yet

Contribute on Hacker News ↗