We are following Ollama's design, but not verbatim due to apps being sandboxed.
Phones are resource-constrained, we saw significant battery overhead with in-process HTTP listeners so we stuck with simple stateful isolates in Flutter and exploring standalone server app others can talk to for React.
For model sharing with the current setup:
iOS - We are working towards writing the model into an App Group container, tricky but working around it.
Android - We are working towards prompting the user once for a SAF directory (e.g., /Download/llm_models), save the model there, then publish a ContentProvider URI for zero-copy reads.
We are already writing more mobile-friendly kernels and Tensors, but GGML/GGUF is widely supported, porting it is an easy way to get started and collect feedback, but we will completely move away from in < 2 months.
How does writing a model into an App Group container enable your framework to enable an app to enable a local LLM server that 3rd party apps can make calls to on iOS?[^1]
How does writing a model into a shared directory on Android enable a local LLM server that 3rd party apps can make calls to?[^2]
How does writing your own kernels get you off GGUF in 2 months? GGUF is a storage format. You use kernels to do things with the numbers you get from it.
I thought GGUF was an advantage? Now it's something you're basically done using?
I don't think you should continue this conversation. As easy it as it is to get your work out there, it's just as easy to build a record of stretching truth over and over again.
Best of luck, and I mean it. Just, memento mori: be honest and humble along the way. This is something you will look back on in a year and grimace.
[^1] App group containers only work between apps signed from the same Apple developer account. Additionally, that is shared storage, not a way to provide APIs to other apps.
[^2] SAF = Storage Access Framework, that is shared storage, not a way to provide APIs to other apps.
We are following Ollama's design, but not verbatim due to apps being sandboxed.
Phones are resource-constrained, we saw significant battery overhead with in-process HTTP listeners so we stuck with simple stateful isolates in Flutter and exploring standalone server app others can talk to for React.
For model sharing with the current setup:
iOS - We are working towards writing the model into an App Group container, tricky but working around it.
Android - We are working towards prompting the user once for a SAF directory (e.g., /Download/llm_models), save the model there, then publish a ContentProvider URI for zero-copy reads.
We are already writing more mobile-friendly kernels and Tensors, but GGML/GGUF is widely supported, porting it is an easy way to get started and collect feedback, but we will completely move away from in < 2 months.
Anything else you would like to know?
How does writing a model into an App Group container enable your framework to enable an app to enable a local LLM server that 3rd party apps can make calls to on iOS?[^1]
How does writing a model into a shared directory on Android enable a local LLM server that 3rd party apps can make calls to?[^2]
How does writing your own kernels get you off GGUF in 2 months? GGUF is a storage format. You use kernels to do things with the numbers you get from it.
I thought GGUF was an advantage? Now it's something you're basically done using?
I don't think you should continue this conversation. As easy it as it is to get your work out there, it's just as easy to build a record of stretching truth over and over again.
Best of luck, and I mean it. Just, memento mori: be honest and humble along the way. This is something you will look back on in a year and grimace.
[^1] App group containers only work between apps signed from the same Apple developer account. Additionally, that is shared storage, not a way to provide APIs to other apps.
[^2] SAF = Storage Access Framework, that is shared storage, not a way to provide APIs to other apps.
[flagged]
2 replies →