Comment by behnamoh

1 year ago

I have Apple Silicon and it's the worst when it comes to prompt processing time. So unless you want to have small contexts, it's not fast enough to let you do any real work with it.

Apple should've invested more in bandwidth, but it's Apple and has lost its visionary. Imagine having 512GB on M3 Ultra and not being able to load even a 70B model on it at decent context window.

4 comments

behnamoh

1ucky 1 year ago

Prompt preprocessing is heavily compute-bound, so relying significantly on processing capabilities. Bandwidth mostly affects token generation speed.

mirekrusin 1 year ago

At 17B active params MoE should be much faster than monolithic 70B, right?

nathancahill 1 year ago

Imagine