Comment by NitpickLawyer

7 hours ago

Really interesting resource, thanks for sharing! It was not on my radar.

> https://github.com/chloeli-15/model_spec_midtraining

I'm a bit confused about this part:

> MSM is a pipeline that takes a Model Spec or Constitution (a document describing how and why an assistant should behave) and generates a diverse corpus of synthetic documents that discuss and teach the content of the spec.

> ANTHROPIC_API_KEY=sk-ant-...

> # Optional but highly recommeded — separate key for using the Anthropic Batch API for batch document generation (needed if USE_BATCH_API=true). # This will significantly reduce generation time high-volume generation. ANTHROPIC_BATCH_API_KEY=sk-ant-...

Isn't this specifically against Anthropic's ToS? I thought generating data to train other models was specifically disallowed. I get this is a research effort, but still. Say you use this pipeline for something internal, this would be against the ToS and risk getting banned, no?