Comment by kristianp
1 day ago
I tried "select * from items limit 10" and it is slowly iterating through the shards without returning. I got up to 60 shards before I stopped. Selecting just one shard makes that query return instantly. As mentioned elsewhere I think duckdb can work faster by only reading the part of a parquet file it needs over http.
I was getting an error that the users and user_domains tables aren't available, but you just need to change the shard filter to the user stats shard.
Doesn't `LIMIT` just limit the amount of rows returned, rather than the amount read & processed?
That depends on the query. SQLite tries to use LIMIT to restrict the amount of reading that it does. It is often successful at that. But some queries, by their very nature, logically require reading the whole input in order to compute the correct answer, regardless of whether or not there is a LIMIT clause.
That's what it does, but if I'm not mistaken (at least in my experience with MariaDB) it'll also return immediately once it ran up to the limit and not try to process further rows. If you have an expensive subquery in the SELECT (...) AS `column_name`, it won't run that for every row before returning the first 10 (when using LIMIT 10) unless you ORDERed BY that column_name. Other components like the WHERE clause might also require that it reads every row before finding the ten matches. So mostly yes but not necessarily
The limit clause isn't official/standard ansi sql, so it's up to the rdbms to implement. Your assumption is true for bigquery (infamously) but not true for things like snowflake, duckdb, etc.
That's odd. If it was a VFS, that's not what I'd expect would happen. Maybe it's not a VFS?
What is a VFS?
https://sqlite.org/vfs.html
While designed for OS portability, you can use it to convince SQLite to read from something other than a file on disk.