← Back to context

Comment by coyotespike

2 years ago

This is really fun. Google Translate is incompetent at Latin, and I'm informed that so far ChatGPT still makes errors of grammar and word choice when generating Latin.

This experiment helps show we can use GPT-4/Claude to parse and summarize Latin, but doesn't yet show that we can rely on them to the level of a human expert.

I'm confident we'll get there pretty soon - and then will be able to rely on LLMs to generate Comprehensible Input and thereby greatly accelerate language learning.

You can include in the prompt a requirement to highlight sections the LLM was not sure about/needs to be verified.

  • Wouldn't that work just as well as including in the prompt a requirement for it to not make any mistakes?

    • With some LLMs, emphasizing the possibility and appropriateness of saying "I don't know" has reduced the frequency of hallucinations.

    • If you assume the LLM can gauge its “confidence” in the last n tokens it generated, which seems within the realm of reason (from a layman’s perspective), then I would think this idea would work better the significant majority of the time. It’s providing an additional dimension of context related to the output (which we’re assuming is sound, or at least not entirely nonsensical), which alone seems like enough of a justification to do this. It’s unclear (to me, at least) exactly what effect adding a “no mistake” requirement to the prompt would have on the LLM’s output; I could see it skipping ranges of tokens that it’s unsure about, which seems less preferable to having it provide a best guess and make clear that it’s only a guess, but I could also certainly see it operating as it otherwise would have without the “no mistake” instruction, giving the same dubious output to a user that may now have an unwarranted increase in confidence in the LLM’s output.

      I’ve spent a decent amount of free time doing what feels like coercing, tricking, or otherwise manipulating GPT-4 and Llama2 into doing my bidding - with my bidding being mostly toy ideas for little tools to make random small tasks easier and one or two more interesting ideas that are fun to mess around with, but would probably require some medical-grade antianxiety meds to even consider using in a real production setting (ie a universal ORM.) Even though I’m not developing (or I guess we now call it prompt engineering) in a rigorous or serious way, I’ve found that making the LLM _actively_ reconsider and validate its output works very well, with the effectiveness seeming to be a rough function of “how actively” you trick it into doing so. Giving a list of “be sure to consider these things” at the end of your prompt often works, but also very often doesn’t; adding another step to the process you’re asking them to perform comprised of subtasks that map to the list of gotchas, but reframed as actions you are requiring them to perform, is often the remedy for cases where the simple suggestion list isn’t enough, and is basically a more active variant of the same idea as providing the gotcha list. Dialing it up a bit more, requiring them to provide an update after they complete each subtask to confirm they indeed performed it and to provide a summary of what they found makes their retrospective assessment even more actively involved, and has been a pretty damn reliable trick for ironing out kinks and known failure modes in my prompts.

      All that being said, I think the simple fact that you’re now actively requiring them to reflect on their confidence in their output, and therefore the correctness of their output, may lead to this idea improving the quality of output/results as an unintended side effect that would alone make it worth doing.

I think that mostly depends on getting more high quality Latin into the training set, but I'm guessing the new amount of that being generated/discovered is relatively small. Then again, new techniques for training models could prove me wrong.

  • This is the result when I asked GPT-4 to write "a new, original passage in latin which is inspired by Del Rio's style. Make the Latin elegant and lucid, with witty turns of phrase," then asked Claude to translate it back to English. I am no Latin expert, but I know enough to say that this is clearly lucid, makes sense -- though overwrought -- and has no glaring errors. I will ask my classicist friends to weigh in on the matter of style though.

    GPT-4 improvising: "Quemadmodum divini philosophi docent, mysteria naturae introrsus latitant, neque enim sine causa Democritus 'naturam rerum' dixit esse clausam in occulto. Verum, si Angelorum bonorum opera ut in Psalmis celebratur, 'Benedicite, omnes Angeli Domini, Domino,' quo pacto maleficae entitates—daemones scilicet—invisibiles nec tamen inactiones sunt? Mihi consentaneum videtur quod Tertullianus et Ambrosius prodiderunt, angeli esse nuntios divinos, sed daemonas esse nuntios corruptionis. Ne confundamus! Nam verbum 'daemon' Graecis non semper malum significat, sed Romanis—o mores!—semper sinistram portendit.

    In huius argumenti speculo, miror qua de causa daemones semper in tenebris vel in locis desertis, vel in nocturna silentia se abscondant. An non, ut Seneca loquitur, 'magna vis veritatis est, quae se per se defendet'? Atqui daemones a veritate abhorrent, quasi nox a lucem fugiat.

    Ergo, si volumus a daemonum insidiis nosmetipsos protegere, debemus, ut Virgilius canit, 'forsan et haec olim meminisse juvabit,' non oblivisci quod lux aeterna sit telum acerrimum contra tenebras. Haec ita dicam, et iam concludo: sicut angeli ad caelum pertinent, sic daemones ad abyssum—et utrique suum locum noverunt. Caveamus igitur ne, dum caelum quaerimus, abyssum inveniamus."

    Claude translation: "As the divine philosophers teach, the mysteries of nature lie hidden within, for not without reason did Democritus call nature 'enclosed in the dark'. Yet, if by the work of good Angels as celebrated in the Psalms, 'Bless the Lord, all you Angels of the Lord,' in what manner are evil entities—namely demons—invisible yet not inactive? It seems fitting to me what Tertullian and Ambrose handed down, that angels are divine messengers, but demons are messengers of corruption. Let us not confuse the two! For the word 'daemon' does not always signify something evil for the Greeks, but for the Romans—oh the times!—it always portends something sinister. In the mirror of this argument, I wonder for what reason demons always hide themselves in shadows or deserted places or the silences of the night. Does not, as Seneca says, 'truth have great power, which will defend itself'? Yet demons shrink from the truth, as the night flees from the light. Therefore, if we wish to protect ourselves from the snares of demons, we must, as Virgil sings, 'perhaps one day it will please us to remember these things,' not forget that eternal light is the sharpest weapon against the darkness. I will say this much, and now conclude: just as angels belong to heaven, so demons belong to the abyss—and both know their proper places. Let us beware, therefore, lest while seeking heaven we find the abyss."

    • This is very impressive!

      Problems I see in the Latin:

      (1) The sentence beginning "verum, si Angelorum" loses the thread in the middle (as you can also see in the English translation). The if-clause (protasis) has no main verb, while there is no then-clause (apodosis) at all.

      (2) I think "inactiones" is hallucinated. This word isn't found in Wiktionary, Lewis & Short's Latin Dictionary, or Souter's Glossary of Later Latin. If it existed, it also wouldn't be an adjective meaning "inactive".

      (3) In the indirect discourse governed by "prodiderunt", the subject should also be accusative ("angelos esse nuntios divinos").

      (4) Since "confundo" is generally transitive, it's a bit questionable to say "ne confundamus" without an object like "eos", although I'm not positive it's not idiomatic.

      (5) I would think "sinistrum" rather than "sinistram" if the intention is to refer to "something sinister" rather than "the left-hand side".

      (6) "nox a lucem fugiat" is wrong (the preposition "a" always governs the ablative, not the accusative), and should probably be either "nox lucem fugiat" (less likely to me "nox a luce fugiat"). For rhetorical parallelism with "a veritate" it would definitely be "a luce".

      (7) The orthography is also inconsistent in writing "iam" but then quoting the Aeneid as saying "juvabit" (if you write consonantal i as j, you would expect to do it consistently everywhere).

      Again, it's very impressive. That's a lot fewer mistakes than I would make writing a Latin text of this length!

      3 replies →

  • The embedding method that nearly all LLM's use puts them at a severe disadvantage because they can't 'see' the spelling of common words. That makes it hard to infer things like 'past tense words end with an e'.

    With small modifications, the exact characters could be exposed to the model, in addition to the current tokens, but it would require a full retraining, which would cost $$$$$$$$.

You get really good results if you prompt it with: “You’re an expert in Latin translation”.