Translating Latin demonology manuals with GPT-4 and Claude

2 years ago (resobscura.substack.com)

Everyone is excited about LLM abilities to help with language learning, while completely ignoring the fact that for most people LLMs will make the learning unneeded. There will be less experts in the field, and therefore we will lose the part of language and foreign literature understanding not captured by statmodels. Which is a huge part (subtle contexts in poetry, etc)

  • You aren't wrong, but this has been a dilemma with every new technology. The camera had that effect, modern metalworking had that effect, even tractors had that effect.

    It's definitely a problem we should be talking about, but we can't go back in time or remain frozen, the genie never goes back in the bottle. We have to move forward towards the future while salvaging the parts of the past we want to bring with us.

    • We need efficiency when we want to maximize comfort and minimize labor.

      But nothing forbids people to pursue less efficient endeavors during their free time. There are people maintaining old cars and locomotive. There are people gardening or woodworking "inefficiently" for their own pleasure.

      What we remove is the need to force people to work on these fields. Whether we abandon them altogether depends solely on our culture.

      1 reply →

  • As someone who loves writing in 3 languages (French and German being my native languages, English the third), playing with Claude (and to a lesser degree gpt4) has actually made me play and investigate the nuances of languages so much more. It does a great job of course doing stylistic transformations, which on their own are always stilted, but are great inspiration. But it also does a phenomenal job at explaining the nuances between the language, say when I want to explain why a certain German phrasing “feels” different to me.

    Certainly seeing the amount of people never learning the language of the country they emigrated to, this is a problem we already have (and in my situation, never going to country where I don’t speak the language).

    I think humans are going to continue nerding with language just as much as they ever have, I do really think it’s an innate drive, and llms are a mind blowing tool to do just so.

  • > Which is a huge part (subtle contexts in poetry, etc)

    Certainly, but from what I remember of my GCSE English Literature back at the turn of the millennium, my fellow students and I didn't understand most of that subtly even when it was famous poets in our native language.

    Shakespeare may be unsurprising in this regard given the age (why eye of newt and leg of toad? Some say common names of herbs, others that it's just some amusingly vulgar items), but we were also just as oblivious to the lived experience of being gassed in the trenches as per Dulce et Decorum Est or a cavalry charge as per The Charge of the Light Brigade in English as we would have been if this had been a second language.

  • I'm not sure about "unneeded". Important motivations why people learn a foreign language are because they want to speak with native speakers in that language without an intermediate (because they move to that country, or because they have a partner speaking that language), or because they find that language interesting/beautiful, or because they want to read or listen to original sources. Machine translation doesn't remove any of those motivations.

  • > while completely ignoring the fact that for most people LLMs will make the learning unneeded

    Good luck using an LLM to talk to strangers in a bar in a foreign language.

  • most Latin text have been translated once and more then 100 years ago while most ancient Arabic text have never been translated. this is an old problem.

    I see AI as savior here esp with reconstructing old languages we only have small amount of text saved

    • Which oldest Arabic text do you mean?

      Most ancient texts (i.e. Egyptian papyri) are expected to be of too little use to even scan, let alone translate.

This Dan Brown/William Gibson crossover sucks my soul right out of the petabyte SSD I bought in a dark alley.

I've used ChatGPT 3.5 (not 4, too expensive) to translate most of the Latin writings of Jerome, Ambrose, and Ambrosiaster (from Migne's Patrologia Latina) - the translations have been put in this repo in the public domain:

https://github.com/HistoricalChristianFaith/Writings-Databas...

Some takeaways:

- ChatGPT did excellent with about 3 sentences max at a time. Exceeding 3 sentences would cause it to often truncate the response (e.g. translating 3ish of 5 sentences, or hallucinating more).

- ChatGPT would originally return the translation, sometimes randomly prefixed with a variant of "The translation is" and sometimes wrapped in quotes, othertimes not. Using the function interface to ChatGPT eliminated this problem.

- When it comes to quotations from Bible verses, ChatGPT sometimes "embellished" (not sure what else to call it). E.g. if part of Ephesians 2:7 is quoted in Latin, in the English ChatGPT would sometimes insert Ephesians 2:7-8 in full.

  • I don't really understand what the value in posting these kinds of takeaways about using GPT3.5 here is. GPT4 is significantly better, and improved models are coming. There's just not a lot of point to benchmarking 3.5 when likely every issue you've pointed out is solved by 4.

    • Average cost to translate an entire work in that repo via GPT3.5 = ~$2

      Average cost to translate an entire work in that repo via GPT4 = ~$40

      For a side-project, one is feasible to scale, and one ain't.

This is really fun. Google Translate is incompetent at Latin, and I'm informed that so far ChatGPT still makes errors of grammar and word choice when generating Latin.

This experiment helps show we can use GPT-4/Claude to parse and summarize Latin, but doesn't yet show that we can rely on them to the level of a human expert.

I'm confident we'll get there pretty soon - and then will be able to rely on LLMs to generate Comprehensible Input and thereby greatly accelerate language learning.

  • I think that mostly depends on getting more high quality Latin into the training set, but I'm guessing the new amount of that being generated/discovered is relatively small. Then again, new techniques for training models could prove me wrong.

    • This is the result when I asked GPT-4 to write "a new, original passage in latin which is inspired by Del Rio's style. Make the Latin elegant and lucid, with witty turns of phrase," then asked Claude to translate it back to English. I am no Latin expert, but I know enough to say that this is clearly lucid, makes sense -- though overwrought -- and has no glaring errors. I will ask my classicist friends to weigh in on the matter of style though.

      GPT-4 improvising: "Quemadmodum divini philosophi docent, mysteria naturae introrsus latitant, neque enim sine causa Democritus 'naturam rerum' dixit esse clausam in occulto. Verum, si Angelorum bonorum opera ut in Psalmis celebratur, 'Benedicite, omnes Angeli Domini, Domino,' quo pacto maleficae entitates—daemones scilicet—invisibiles nec tamen inactiones sunt? Mihi consentaneum videtur quod Tertullianus et Ambrosius prodiderunt, angeli esse nuntios divinos, sed daemonas esse nuntios corruptionis. Ne confundamus! Nam verbum 'daemon' Graecis non semper malum significat, sed Romanis—o mores!—semper sinistram portendit.

      In huius argumenti speculo, miror qua de causa daemones semper in tenebris vel in locis desertis, vel in nocturna silentia se abscondant. An non, ut Seneca loquitur, 'magna vis veritatis est, quae se per se defendet'? Atqui daemones a veritate abhorrent, quasi nox a lucem fugiat.

      Ergo, si volumus a daemonum insidiis nosmetipsos protegere, debemus, ut Virgilius canit, 'forsan et haec olim meminisse juvabit,' non oblivisci quod lux aeterna sit telum acerrimum contra tenebras. Haec ita dicam, et iam concludo: sicut angeli ad caelum pertinent, sic daemones ad abyssum—et utrique suum locum noverunt. Caveamus igitur ne, dum caelum quaerimus, abyssum inveniamus."

      Claude translation: "As the divine philosophers teach, the mysteries of nature lie hidden within, for not without reason did Democritus call nature 'enclosed in the dark'. Yet, if by the work of good Angels as celebrated in the Psalms, 'Bless the Lord, all you Angels of the Lord,' in what manner are evil entities—namely demons—invisible yet not inactive? It seems fitting to me what Tertullian and Ambrose handed down, that angels are divine messengers, but demons are messengers of corruption. Let us not confuse the two! For the word 'daemon' does not always signify something evil for the Greeks, but for the Romans—oh the times!—it always portends something sinister. In the mirror of this argument, I wonder for what reason demons always hide themselves in shadows or deserted places or the silences of the night. Does not, as Seneca says, 'truth have great power, which will defend itself'? Yet demons shrink from the truth, as the night flees from the light. Therefore, if we wish to protect ourselves from the snares of demons, we must, as Virgil sings, 'perhaps one day it will please us to remember these things,' not forget that eternal light is the sharpest weapon against the darkness. I will say this much, and now conclude: just as angels belong to heaven, so demons belong to the abyss—and both know their proper places. Let us beware, therefore, lest while seeking heaven we find the abyss."

      4 replies →

    • The embedding method that nearly all LLM's use puts them at a severe disadvantage because they can't 'see' the spelling of common words. That makes it hard to infer things like 'past tense words end with an e'.

      With small modifications, the exact characters could be exposed to the model, in addition to the current tokens, but it would require a full retraining, which would cost $$$$$$$$.

      2 replies →

  • You get really good results if you prompt it with: “You’re an expert in Latin translation”.

"Many people equate the word "daemon" with the word "demon", implying some kind of satanic connection between UNIX and the underworld. This is an egregious misunderstanding. "Daemon" is actually a much older form of "demon"; daemons have no particular bias towards good or evil, but rather serve to help define a person's character or personality. The ancient Greeks' concept of a "personal daemon" was similar to the modern concept of a "guardian angel"—eudaemonia is the state of being helped or protected by a kindly spirit. As a rule, UNIX systems seem to be infested with both daemons and demons."

that naming convention might turn out to be more prescient than people thought. Can't wait until my Catholic school education pays off and I chant at my computer in Latin

  • Linux actually got the "demons" right: those manifest as usually hidden activity in someone's brain, and in special circumstances can take control over the entire system (e.g. with a deadlock if it's a neutral demon, or by other means if it's malicious). Those Greek daemons, in contrast, never possess or control anyone: they may inspire, but only if the subject is consciously seeking such inspiration ("the gates must be opened from within").

"It's clear that GPT-4 and Claude are skilled translators" on what basis? What makes a predictor LLM better at translating Latin than a system trained specifically for translation?

I'm sure they can do a decent job but it's weird to me that someone would leap to GPT-style tech despite its known tendency to hallucinate/make stuff up instead of translation-oriented tools like DeepL or Google Translate (I say this as someone who despises both of those tools due to their quality issues)

I can't imagine there are vast swaths of Latin in GPT's training set.

  • Well, that's just it - I use Google Translate all the time to translate historical texts, and for whatever reason, GPT-4 and Claude both work better. Since I deal with texts that feature archaic orthography like the long s (ſ) the main advantage over Google Translate is that LLMs can make educated guesses about what a word should be. But even in terms of pure translation ability — assuming all orthographic issues have been corrected — the LLMs do a better job in the languages I've tested and which I can read (early modern Portuguese, Spanish and French, plus Latin).

    The post by David Bell which I linked to gets into this for French - I agree with him that ChatGPT (I guess he was using GPT 3.5) has a tendency to "overtranslate." But it is super impressive as a translator overall IMO: https://davidabell.substack.com/p/playing-around-with-machin...

  • >on what basis? What makes a predictor LLM better at translating Latin than a system trained specifically for translation?

    They just are. Sure it sounds a bit strange if you've never thought about it but they are.

    >I'm sure they can do a decent job but it's weird to me that someone would leap to GPT-style tech despite its known tendency to hallucinate/make stuff up instead of translation-oriented tools like DeepL or Google Translate

    1. They don't just potentially do a decent job. For a couple dozen languages, GPT-4 is by far the best translator you can get your hands on. Google, Deepl are not as good.

    2. Tasks like summarization and translation have very low hallucination rates. Not something to be particularly worried about with languages that have sufficient presence in training.

    >I can't imagine there are vast swaths of Latin in GPT's training set.

    Doesn't matter. There is incredible generalization for predict the next token models as far as proficiency is concerned. a model trained on 500b tokens on English and 50b tokens of french will not speak french like a model trained on only 50b tokens of french but much much better.

    https://arxiv.org/abs/2108.13349

    It also doesn't need to see translation pairs for every language in its corpus to learn how to translate that language pair(but this is the case for traditional models too)

    • > a model trained on 500b tokens on English and 50b tokens of french will not speak french like a model trained on only 50b tokens of french but much much better.

      Thats because french and english are reasonably similar, and share the same context.

      Whilst latin and english are distantly related (latin is more related to french) they do not share the same cultural context.

      Which version of latin are you translating? medieval?

      Whilst its fun to do, and it has its place. There needs to be massive caveats about accuracy.

      1 reply →

  • I haven't attempted latin translations, but anything from my native language to english and back has been 100% perfect, miles better than anything google translate can do

    • Latin is tricky though. Google translate is notoriously bad at latin grammar, much worse than most living languages.

      Not exactly sure why, maybe small corpus, maybe because it's a pro-drop language without fixed word order and an very complex set of conjugation rules.

      4 replies →

  • For some important context, The "Attention is all you need" paper that established the transformer architecture that most LLMs use, is a paper about explicitly machine translation.

    It the idea of using transformers for non-translation tasks was only briefly explored at the end of the paper. So it really shouldn't be surprising that LLMs are still good at translating.

    Yes, the hallucinations are less than ideal, but the extra freedom is part of what makes their translation abilities so good when they do get it right. And it's not look google translate is completely free of "hallucination" type issues. It's well known that dedicated machine translation models will assume (aka hallucinate) genders when going from non-gendered to gendered languages.

  • I don't think they're arguing that an LLM is better at translation that an actual translator, just that they are pretty good at it. DeepL and Google Translate definitely also make things up though, so I don't think that's a good comparison...

    • > DeepL and Google Translate definitely also make things up

      I think what they make up is different, but this is a good point. They have a particularly odd tendency to either do something like autocorrect where it wasn't appropriate (translate a different word that is similar in spelling to the requested word), or to make up false friends, doing something like transliterate + then autocorrect in the target language.

      One example, which I blogged about 5 years ago but is still mistranslated, is the word "ribbit" (what a frog does): https://www.kmjn.org/notes/google_translates_ribbit.html

      In 2018, if you translated it to Greek with Google Translate, it gave you κουνέλι (kouneli), which is Greek for rabbit. A word that is one letter away from ribbit but not close to a similar meaning. When I tried it just now, it translates it to ραβδί (rabdi), which means stick and is completely unrelated to the correct answer, but I guess starts with similar letters as ribbit?

      1 reply →

  • It's a great question. But note that Google Translate is also trained on "predict the missing token": https://blog.research.google/2022/05/24-new-languages-google... / https://arxiv.org/abs/2205.03983 (search the blog post around “Surprisingly, this simple procedure produces high quality zero-shot translations.”)

    This was in May 2022, as part of Google Translate adding support for several low-resource languages (including Sanskrit). I was already very surprised that simply training on predicting tokens does translation so well — then a few months later ChatGPT came out, trained (roughly) the same way and doing a lot of things besides translation.

  • > What makes a predictor LLM better at translating Latin than a system trained specifically for translation?

    Contextual awareness that is baked into the models. Large Language Models are at their core transformation engines. For the operation of transformative text there must be awareness of context. This alone makes LLMs great candidates for translation tasks.

  • I think it's probably great at Latin and Greek for reasons that should be obvious (plenty of public domain raw material, vast reams of scholarship dating back centuries). It's less good with some other languages, eg some Japanese companies have decided to train their own models due to dissatisfaction with ChatGPT's shortcomings.

  • Slightly unrelated, since each model is trained and tunes for specific task(s), but the original transformer architecture and paper was built with translation in mind. The original performance tests were language translation benchmarks.

  • LLM based translates use/add contextual information.

    It just choose better word when the original is ambiguous.

    Hallucinating in translation task is quite low (much lower than creative, fact finding or information retrieval task)

  • Translation-specialised models like Google Translate don't actually understand what they're translating. But models like GPT do. This fact is intuitive and easy for anyone to test.

I tried using BingGPT to translate simple Chinese text from screenshots. The results were complete hallucinations, different each time for the same screenshot.

I wouldn't trust these translations at all.

  • That’s a completely different test. You’re using the vision multimodal ability to decipher Chinese script, essentially adding an OCR step to the process, and it’s not good at OCR of Chinese script.

    Try feeding it actual Chinese characters. From what I understand, it’s somewhat competent.

  • Image input in Bing basically can't handle non English text. Has nothing to do with its Chinese translation ability, which is great.

I envisioned future hacking will be like whispering magic poems and spells aka. prompts to AI systems. I know about prompt injection, however this would raise things to a new level :)

Gpt4 does an ok job translating texts that aren't complex, but if you read the original and its translation side by side, you'll see that gpt4 still makes dumb mistakes every few sentences, hallucinates stuff when it runs into cryptic words it's not familiar with, and sometimes omits important passages. Gpt4 is like a very productive, but clueless newbie.

  • > gpt4 still makes dumb mistakes every few sentences, hallucinates stuff when it runs into cryptic words it's not familiar with, and sometimes omits important passages

    This, but in all domains where you are an expert - it becomes apparent that GPT-4 makes stupid mistakes. It makes mistakes frequently. Can't seriously see it replacing humans soon, it is far from having four 9's of reliability.

    Probably needs 99.99% accuracy to work alone unsupervised by humans, because at each decision step it incurs the error rate again and again, so error rate is growing exponentially in sequence length. Coupled with the "forward only" thinking pattern of LLMs that doesn't allow backtracking and planning this error rate kills autonomy.

    Recently, GPT-4-Vision was found to have poor OCR accuracy. Smart but stupid, same story.

  • And a demon summoning is one of the worst imaginable places to have dumb mistakes in the instructions and incantations.

I always wondered why these texts are so difficult to interpret... ... ... why certain symbols, like a crow have ambivalent meanings. In some cultures the crow is evil, while in others, it's benevolent.

GPT4 to the rescue, let's see what'll happen if everyone has the means to summon demons, curse others and the like.

I've been working on a language-learning app, and gpt-4 has made things doable that didn't seem to be doable without it. For example, translating to lesser known languages like Cree. Online you can find a dictionary of words, or phrases, but I wasn't able to find a translator for arbitrary statements. It's not listed on google translate, nor yandex, et al. But, it's possible with gpt.

My strategy in automated translation is to translate, and then translate back to English. That way we can be fairly sure the translation is accurate. Of course if this app "has wings" I would open source the translations to allow corrections, and/or hire native translators for languages I don't speak (which is the vast majority of languages).

Last tip is that for me, I was able to make a pretty good automated system for this. What I did was spell out a monstrosity of a system-prompt which ensures a few things. One, it will always give me the most descriptive romanization for non-ASCII languages. Two, it will give me output that is essentially .csv data. Three, if it encounters a made up language like Pirate, it will try (and do a pretty good job). The rest is just parsing my final translation file to find prompt/language pairs which aren't saved yet and piping that to the monstrous prompt which queries for translations.

Then you can run a separate program for validation. See if it passes my game of telephone test.

This way I and others could learn common phrases in languages which are hard to otherwise access. It even works for dead languages/dialects/character-voices, etc. For me, it's pretty amazing actually. Please note that I never said perfect. But it's pretty damn close actually.

All in all this workflow allows what would have been unimaginable even two years ago.

In doing tests today I found it interesting/useful to note how gpt-4 is "thinking" about translating the word "settings". Here is its thoughts. This is not the exact system prompt I'm using for my app.

  system_prompt = "You are an expert in translating lesser-known languages. When translating you will include both the native writing system, and the romanization into the latin alphabet. When you romanize text you always include any accents or pronunciation marks."

  user_prompt = "Translate the following into Cree. [Hello, goodbye, settings]"

  # Output
  In Cree language, your words will be translated into:

  1. Hello - ᑌᔭᔭᑎ, romanized as "Tānsi"
  2. Goodbye - ᐊᔭᙱᐂᒥᑎ, romanized as "Ayāwāw"

  "Settings" is a bit more complex, because it implies technological context that doesn't necessarily have a direct equivalent in Cree. However, a possible option is:

  3. Settings - ᓂᐹᕗᓂᑕᐚᓇᐠ, romanized as "Nipāvunitawānāk", which might refer to "adjustments".

  • I really hope you check with native Cree speakers before you publish a system like this. Smaller language communities are already being hurt by low-quality generated text. See also https://wiki.apertium.org/wiki/Evaluation ), but it would have to be a corpus you didn't find online since otherwise GPT-4 has it in its training data and it the numbers will not be valid for sentences that are not in that corpus.

    • At this point any output is considered experimental and would definitely need human intervention to achieve any kind of quality.

  • I think there's some hallucination creeping in here!

    (1) This HN discussion is the only Google search result for each of these syllabics strings.

    (2) I tried using https://syllabics.atlas-ling.ca/ to transliterate these to Roman letters, and none of these was transliterated in the same way as the GPT-4 output (although the third one is somewhat close).

    (3) I searched and found that "hello" in Cree is likely written ᑖᓂᓯ (not ᑌᔭᔭᑎ), while correctly romanized as "tān[i]si".

    Your approach is clever, but I think the language model is still ultimately overconfident (and confused) here somehow.

    • Yeah, for Cree it is definitely more suspect than trustworthy. Another thing I noticed was that on another attempt I actually received different translations, so.. it's hard to say how this is going to be refined to be usable, or if it indeed is at all.

      And wow, yes we are all alone on google results for those strings.

      EDIT 1: Another thought occurs to me, if it's getting the transliteration right, and not the syllabics, maybe I seperate the tasks and go english -> transliteration -> syllabic. I will have to see if that approach works better.

      Another idea might be to use that syllabics site to bring it from transliteration -> syllabic. I noticed that they were correct if translated there.

      EDIT 2: By updating the system prompt I was able to get it to translate properly. I had to remind it to be correct!

        You are an expert in translating Cree. When translating you will include both the native writing system, and the romanization into the latin alphabet. When you romanize text you always include any accents or pronunciation marks. You use syllabics properly and in the modern usage
      
        Hello - ᑕᓂᓯ (Tânisi)
        Goodbye - ᐅᑲᕆ (Okaawii)
        Settings - ᐅᑌᕁ ᐟ (Otēw with Roman orthography)

      2 replies →

Sure give the AI barely constrained by alignment prompts that flip off when given the wrong prompt arcane rites to summon demons that will end well...

It makes a few mistakes still, and that makes it "just a tool"? Ok, how about in a few weeks when it stops making mistakes?

  • Compilers, calculators, hammers, spreadsheet software, watches don’t make mistakes either, and they are “just tools”.

Institutional review board time. We've already been warned that a computer merely enumerating the names of God can end existence. Then surely a computer can also summon ancient demons.

  • The author has no sense of self preservation!

    > Demonology is a topic I’ve been interested in ever since I read Stuart Clark’s brilliant Thinking with Demons. However, it’s also one I find intimidating to do real research in, since so many of the sources involved are written in a highly erudite form of Latin.

    Should be:

    > since I don't want to end up with a creature from the netherworld in my study.

From what I've gathered from Catholic exorcists, demons adopt different personas, and shouldn't be trusted about anything they say. The only questions the exorcist asks are those pertaining to the case, all in the interest of breaking the claim of demons and expelling them, in the name of Christ, the stronger man from the parable. As the Lord says, Satan is a liar and a murderer from the start, and when he lies, he speaks his native tongue. What I'm saying, keeping a database of demons makes little sense.

Slightly off topic but this is hilarious. We are already crafting "chants" and "spells" for LLM, i.e. prompt engineering, now we are teaching it demonology too? Some priest from the middle ages would have a heart attack.

Now I know how the AI apocalypse would look like. GPT-42 would summon hordes of demons from the pit of Hell to bring about the end of days. Who need all that pesky nuclear codes when you can call upon Satan?

  • Technically the origins of the Gnostic demiurge as a concept comes remarkably close to the modern trend of AI.

    A lot of the imagery was inspired by the Orphic Phanes, a dual gendered being of creation and light that gestated in the cosmic egg of the universe.

    There were groups that combined the idea of an eventual creator with naturalism such that you had the idea of an original dual gendered primordial 'man' which then brought forth a 'son of man' that ended up creating us in the images of that original.

    They allegedly patterned this on the Phrygian mysteries of a spontaneous original being that was likened to a tumor springing into existence.

    They even tied this all into concepts around the dependence of the soul on the body, suggesting an original physical world and a non-physical replica created by the demiurge where death was functionally escapable.

    The later influence of Neoplatonism flips it all back to a perfect form original and corrupted physical incarnation - the version of Gnosticism most well known today.

    But for a tiny slice of time in antiquity you had people talking about the ideas of a naturally occurring man in a physical world giving rise to a creator of light which then recreated the physical original man in a non-physical twin of the cosmos in order to effectively allow them to escape death.

    Which is a pretty wild set of beliefs for the time, but less so in an age where we are moving towards humanity bringing forth AI literally in light (optoelectronics is a likely next hardware shift), creating digital twins of the world around us, and increasingly making that world inhospitable to our continued existence.

    Some of the nuances of what they were thinking about are pretty wild to consider in modern contexts.

    • Last year I partnered with a friends company to try to build a general AI management platform and I've always felt that this stuff is very occult in nature but with all the buzz around new LLMs and stuff I leaned in really hard on the occult imagery for the design part of it. The project didn't continue, but I'm going to keep that in my pocket and try to reform it.

      I put the design docs up just for posterity.

      [1] project tombstone: https://orphanim.ai

      [2] Design and Style Guide https://docs.google.com/document/d/1JsEeok9NmRGDAF3o0AX7h9oz...