We are following Ollama's design, but not verbatim due to apps being sandboxed.
Phones are resource-constrained, we saw significant battery overhead with in-process HTTP listeners so we stuck with simple stateful isolates in Flutter and exploring standalone server app others can talk to for React.
For model sharing with the current setup:
iOS - We are working towards writing the model into an App Group container, tricky but working around it.
Android - We are working towards prompting the user once for a SAF directory (e.g., /Download/llm_models), save the model there, then publish a ContentProvider URI for zero-copy reads.
We are already writing more mobile-friendly kernels and Tensors, but GGML/GGUF is widely supported, porting it is an easy way to get started and collect feedback, but we will completely move away from in < 2 months.
- "You are, undoubtedly, the worst pirate i have ever heard of"
- "Ah, but you have heard of me"
Yes, we are indeed a young project. Not two weeks, but a couple of months. Welcome to AI, most projects are young :)
Yes, we are wrapping llama.cpp. For now. Ollama too began wrapping llama.cpp. That is the mission of open-source software - to enable the community to build on each others' progress.
We're enabling the first cross-platform in-app inference experience for GGUF models and we're soon shipping our own inference kernels fully optimized for mobile to speed up the performance. Stay tuned.
Thanks for the comment, but:
1) The commit history goes back to April.
2) LlaMa.cpp licence is included in the Repo where necessary like Ollama, until it is deprecated.
3) Flutter isolates behave like servers, and Cactus codes use that.
[flagged]
We are following Ollama's design, but not verbatim due to apps being sandboxed.
Phones are resource-constrained, we saw significant battery overhead with in-process HTTP listeners so we stuck with simple stateful isolates in Flutter and exploring standalone server app others can talk to for React.
For model sharing with the current setup:
iOS - We are working towards writing the model into an App Group container, tricky but working around it.
Android - We are working towards prompting the user once for a SAF directory (e.g., /Download/llm_models), save the model there, then publish a ContentProvider URI for zero-copy reads.
We are already writing more mobile-friendly kernels and Tensors, but GGML/GGUF is widely supported, porting it is an easy way to get started and collect feedback, but we will completely move away from in < 2 months.
Anything else you would like to know?
4 replies →
reminds me of
- "You are, undoubtedly, the worst pirate i have ever heard of" - "Ah, but you have heard of me"
Yes, we are indeed a young project. Not two weeks, but a couple of months. Welcome to AI, most projects are young :)
Yes, we are wrapping llama.cpp. For now. Ollama too began wrapping llama.cpp. That is the mission of open-source software - to enable the community to build on each others' progress.
We're enabling the first cross-platform in-app inference experience for GGUF models and we're soon shipping our own inference kernels fully optimized for mobile to speed up the performance. Stay tuned.
PS - we're up to good (source: trust us)