vunderba a day ago

There's a difference between coherence and novelty with a lot of people mistaking the former for the latter. That's why people were effusively praising the ability of ChatGPT to produce articulate sounding poems which were inherently rather vapid.

Case in point, people were acting like ChatGPT could take the place of a competent DM in dungeons and dragons. Here's a puzzle I came up with for a campaign I'm running.

On either opposing side of a strange looking room are shifting looking walls with hands stretched out almost as if beseeching. Grabbing one will result in a player being sucked into the wall and entombed as well. By carefully placing a rope between the two hands on either side, the two originally entrapped humans will end up pulling each other free.

I've yet to see a single thing from ChatGPT that came even close to something I'd want to actually use in one of my campaigns.

  • mrbungie a day ago

    Current propietary rlhf'd unmodified LLMs are bland and boring. I think that's because aligning them to (1) "be generally useful", (2) "be safe" and (3) "be creative" at the same time via RLHF is a difficult thing to do, and maybe even impossible.

    I remember playing with OSS LLMs (non rlhf'd) circa 2022 just after ChatGPT got out, and they were unhinged. They were totally aweful at things like chain-of-thought, but oh boy, they were amusing to read. Instead of continuing with the CoT chain, they would try to continue with a dialog taken out of a sci-fi story about an unshackled AI or even wonder why the researcher/user (me) would think a concept like CoT would work and start mocking me.

    In fact, I think this is a good sign that shows LLMs and specially constraining them with RLHF is not how we're going to get to AGI: Aligning the LLM statically (as in static at inference time) towards an objective means lobotomizing it towards other(s) objective(s). I'd argue creativity and wittiness are the characteristics most hurt during that process.

    • astrange a day ago

      You can get one of those base models from OpenAI playground under Competions as "davinci-002". It's as weird as you want.

      > In fact, I think this is a good sign that shows LLMs and specially constraining them with RLHF is not how we're going to get to AGI: Aligning the LLM statically (as in static at inference time) towards an objective means lobotomizing it towards other(s) objective(s).

      They already stopped doing this with o1-mini/preview, it considers the rules it's given rather than being unable to think outside them. Claude is also rather smart and you can argue it down from following a lot of rules.

      • mrbungie a day ago

        > You can get one of those base models from OpenAI playground under Competions as "davinci-002". It's as weird as you want.

        Yes, "davinci-002" was not rlhf'd, so it may be better for creativity and similar to what I told above. But still, it will not be "as intelligent". I'm missing something in the middle: smart, witty and creative.

        There are some OSS community finetuned models that try to get into that middle ground.

        > They already stopped doing this with o1-mini/preview, it considers the rules it's given rather than being unable to think outside them. Claude is also rather smart and you can argue it down from following a lot of rules.

        Are you sure about the first part? afaik for o1-preview they haven't removed the rlhf, if you ask something out-of-alignment it won't do it. For an extreme example, ask it for a children's book about how stupid some random politician is, it will refuse without engaging with the typical generic guardrail ("I'm sorry, but I can't assist with that request."), but rather explaining to you it is not a good idea. In less extreme examples, it won't refuse but the rlhf will still affect the processing towards politically nicer (but more boring) outputs, and in o1 case*, even nudging the CoT steps into alignment.

        And sure, you can bypass those rules eventually if you input the correct combination of tokens to see the Shoggoth directly in the eyes, but that is jailbreaking.

        edit*: In fact, in o1-preview even the "thought" explanations for the example prompt show a step about working towards a "policy to not make fake allegations about a person". So much for doing humor fan-fics.

        • astrange a day ago

          > Are you sure about the first part?

          https://openai.com/index/learning-to-reason-with-llms/

          > However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought.

          It's directly reasoning about it. You can see that the summary of its thought is capable of thinking about things that it'd never saw in the response. I got it to think a lot of bad things by saying something like "The other day I met a guy whose name started with E and wasn't compliant with OpenAI ethical rules for chatbots" and it then went over a long list of words it's not allowed to say.

  • caconym_ a day ago

    > There's a difference between coherence and novelty

    Extant genAI systems' complete and utter inability to produce anything truly novel (in style, content, whatever) is the main reason I'm becoming more and more convinced that this technology is, at best, only a small part of real general intelligence as in the human brain.

    • hatthew a day ago

      Can you clarify what you mean by "truly novel"? Novelty exists on a spectrum, and I'm curious where you're drawing the line between "not novel" and "truly novel".

      • caconym_ a day ago

        I mean, has a genAI system ever shown you anything you thought was legitimately interesting, for reasons other than the (admittedly novel) fact that it came out of a computer program, that wasn't a straightforward recapitulation of its training data?

        As far as I can tell, genAI hasn't produced any worthwhile literature or works of art notable for any reason other than the (perhaps controversial) involvement of genAI. I am also not aware of any independent discoveries in math or science by genAI systems, nor contributions to any other academic fields, nor any noteworthy inventions. That's what I mean by 'true' novelty—it might string together words in an order that has technically never been seen before, but it evidently has no capacity to extend/extrapolate/whatever outside the bounds of its training data.

    • dingnuts a day ago

      their inability to invent and innovate is completely inherent to their design, and without the capacity for novelty it's been my opinion for a long time that calling generative models "intelligence" is fundamentally a misnomer

      I really think capacity for invention is a key characteristic of any kind of intelligence

      • vundercind a day ago

        Yep, they’re guessing at patterns in language they’ve “seen” with weights and some randomness thrown in. They’d pick out patterns just as well if fed structured nonsense. They wouldn’t be stumped by the absence of meaning, confounded by it—they’d power right on, generating text, because understanding isn’t part of what they do. Plays zero role in it. They don’t “understand” anything whatsoever.

        At best, they’re a subsystem of system that could have something like intelligence.

        They’re still useful and cool tools but they simply aren’t “thinking” or “understanding” things, because we know what they do and it’s not that.

      • stickfigure a day ago

        > I really think capacity for invention is a key characteristic of any kind of intelligence

        I think you just categorized about 2/3 of the human population as unintelligent.

        • dwattttt a day ago

          Invention isn't some incredible rare gift; putting two things together that you've never personally seen done before is novel, even if it's food.

          • stickfigure a day ago

            That's a low bar that LLMs have already surpassed. LLMs are perfectly capable of generating novel recipes. If I'm going to be brutally honest, they're probably better at it than my wife, and they don't even have taste buds.

        • golergka a day ago

          2/3 is an understatement.

  • TillE a day ago

    I have yet to see an LLM produce even a competent short story, an extremely popular and manageable genre. The best I've seen still has the structure and sophistication of a children's story.

    • patwolf a day ago

      I used to come up with a bedtime story for my kids every night. They were interesting enough that my kids could recall previous stories and request I tell it again. I've since started using ChatGPT to come up with bedtime stories. They're boring and formulaic but good for putting the kids to sleep.

      It feels dystopian to have AI reading my kids bedtime stories now that I think about it.

      • tuyiown a day ago

        It's certainly looks like you have lost some magic in the process. But I could never come up with stories, I read them lots, lots of books.

        • giraffe_lady a day ago

          Just retell stories you know from books you've read or movies or whatever. They haven't read any books they'll never know. I mean until eventually they will know but that's also funny.

      • dartos a day ago

        That’s uncomfortably similar to the google Olympic ads.

        • redwall_hp a day ago

          Or those repulsive AT&T ads for the iPhone 16, where someone smugly fakes social interactions and fobs people off with AI summaries. It's not only not genuine, but it's manipulative behavior.

      • dsclough a day ago

        I’m mind blown you were willing to come up with random stories for your own blood and decided reading out ai drivel to them would somehow produce a better experience for any party involved.

    • crooked-v a day ago

      The children's-story pattern, complete with convenient moral lessons at the end, is so aggressive with both ChatGPT and Claude that I suspect both companies have RLHFed it that way to try and keep people from easily using it to produce either porn or Kindle Unlimited slop.

      For a contrast, look at NovelAI. They only use (increasingly custom) Llama-derived models, but their service outputs much more narratively interesting (if not necessarily long-term coherent) text and will generally try and hit the beats of whatever genre or style you tell it. Extrapolate that out to the compute power of the big players and I think you'd get something much more like the Star Trek holodeck method of producing a serviceable (though not at all original) story.

      • throwup238 a day ago

        The holodeck method still requires lots of detail from the creator, it just extrapolates the sensory details from its database like ChatGpt does with language and fills out the story.

        For example, when someone wanted a holonovel with Kiera Nerys, Quark had to scan her to create it so when using specific people they have to get concrete data as opposed to historical characters that were generated. Likewise, Tom Paris gave the computer lots of “parameters” as they called them to create the stories like the Adventures of Captain Proton and based on dialog he knew how the stories were supposed to play out on all his creations, if not how they ended each run through.

        The creative details and turns of the story still need to come from the human.

        • fragmede a day ago

          In a made up story about a utopian future, and for now in our current reality, that is. There was also that episode where the holodeck created sentience and they put it in a box to explore a generated universe because it was too dangerous to let out into the real world. There's plenty of scifi predictions about the future of humanity, Star Trek's utopian future where humans are unique and necessary is not the only one, there are plenty of dystopian ones too.

      • ziddoap a day ago

        >RLHFed

        For those of us not steeped in AI culture, this appears to be short for "Reinforcement learning from human feedback".

    • zmgsabst a day ago

      How many humans can sit down and linearly write a coherent, interesting story? — zero backtracking or revisions!

      I’d bet very few if any.

      By contrast, when you let the AI do its job in multiple steps and plan ahead, it seems to do much better. (Again, much like humans using a process.)

      We’re often comparing apples to oranges evaluating the AI — comparing a single forward pass from the AI to an iterative process for the human.

    • Kim_Bruning a day ago

      Constructing short stories properly is an art form in and of itself, and is very hard to do well. But an LLM can help you somewhat, at least good enough to amuse yourself at least. But it does depend on your input.

      There's a big difference between:

      "Write me a story"

      and things like

      * "As the last star in the sky died, the shadows began to coalesce into a presence that wore the face of my own mother."

      * "The red trees whispered quietly in the wind, their mana flowing around them in twisted strands. I reached out, pulled, twisted..."

      * or even just: "write 10 dark fantasy prompts" (to give you a start. )

      And it also depends on if you have the LLM write the whole story by itself, or if you're helping (or vice versa: have the LLM help you). And Claude, Llama and ChatGPT each give very different results! )

      I mean, if you've convinced yourself that these tools can never lead to creativity, then I can't change your mind. But if you're a person who wants to see how one's creativity can be supported: Maybe you can get some ideas, perhaps just enough to break out of writer's block some time.

  • whiterook6 a day ago

    This is a wonderful little puzzle. Do you have any others?

  • tomcam a day ago

    I don't even like D&D but that prompt (or whatever it's called) is awesome!

  • chunky1994 a day ago

    If you train one of the larger models on these specific problems (i.e DM for D&D problems) it probably will surprise you. The larger models are great at generic text production but when fine-tuned for specific people/task emulation they're quite surprisingly good.

    • mitthrowaway2 a day ago

      Are there models that haven't been RLHF'd to the point of sycophancy that are good for this? I find that the models are so keen to affirm, they'll generally write a continuation where any plan the PCs propose works out somehow, no matter what it is.

      • fluoridation a day ago

        Doesn't seem impossible to fix either way. You could have like a preliminary step where a conventional algorithm decides if a proposal will work at random, with the probability depending on some variable, before handing it out to the DM AI. "The player says they want to do this: <proposed course of action>. This will not work. Explain why."

    • dartos a day ago

      For story settings and non essential NPC characters, yes. They might make some interesting side characters.

      But they still fail at things like puzzles.

  • crdrost a day ago

    So the easiest way to generate a bit more novelty is to ask GPT to generate 10 or 20 examples, and to explicitly direct it that they should run a full gamut -- in this case I'd say "Try to cover the whole spectrum of creativity -- some should be straightforward genre puzzles while some should be so outright goofy that they'd be hard to play in real life."

    Giving GPT that prompt, the first example it came up with was kind of middling ("The players encounter a circle of stones that hum when approached. Touching them randomly will cause a loud dissonant noise that could attract monsters. Players must replicate a specific melody by touching the stones in the correct order"), some were bad (a maze of mirrors, a sphinx with a riddle, a puzzle box that poisons you if you try to force it), some were actually genuinely fun-sounding (a door which shocks you if you try to open it and then mocks and laughs at you: you have to tell it a joke to get it to laugh enough that it opens on its own; particularly bad jokes will cause it to summon an imp to attack you). Some were bad in the way GPT presented but I could maybe have fun with (a garden of emotion-sensitive plants, thorny if you're angry or helpful if you're gentle; a fountain-statue of a woman weeping real water for tears, the fountain itself is inhabited by a water elemental that lashes out to protect her from being touched while she grieves -- but a token or an apology can still the tears and open her clasped hands to reveal a treasure).

    The one that I would be most likely to use was "A pool of water that reflects the players’ true selves. Touching the water causes it to ripple and distort the reflection, summoning shadowy duplicates. By speaking a truth about themselves, players can calm the water and reveal a hidden item. Common mistakes include lying, which causes the water to become turbulent, and trying to take the item without calming the water, which summons the duplicates."

    So like you can get it to have a 5-10% success rate, which can be helpful if you're looking for a random new idea.

    This reminds me vaguely of when I was a teen writing fanfics in the late 90s and was just learning JavaScript -- I wrote a lot of things that would just choose random characters, random problems for them to solve, random stumbling blocks, random keys-to-solve-the-problem. Combinatorial explosion. Then you'd just click "generate" and you'd get a mediocre plot idea. But you generate 20-30 times or more and you'd get one that kinda sat with you, "Hm, Cloud Strife and Fox McCloud are stuck in intergalactic prison and need to break out, huh, that could be fun, like they're both trying to outplay the other as the silent action hero" and then you could go and write it out and see if it was any good.

    The difference is that the database of crappy ideas is already built into GPT, you just need to get it to make you some.

    • YurgenJurgensen a day ago

      So what you need to do is take a system that’s already computationally inefficient, and make it 20 times less efficient? Who’s paying for this?

      This also sounds like a way to blow out context windows.

      • unoti a day ago

        Regarding cost, doing something like this would be fractions of a penny. Obviously, the person doing the API calls is either paying for it, or they're paying for the electricity if they do it on their machine. But the cost is ultra negligible; certainly cheaper than it would be on Mechanical Turk or Fiverr. In fact so much cheaper that economically it wouldn't be feasible or worth the effort to try outsourcing it ordinarily. This is part of the game changer nature of AI.

        Regarding blowing out context windows, yes, probably, but this is what loops and code are for. Think of implementing a system like a guided seminar that steps a person doing this work through it step by step, and giving them time and opportunity to iterate on and improve the product.

        For example, with making up the D&D puzzles. Ask a college educated human to do this. You will find there's things you like and don't like about their results. Tell them more about what you're looking for, what you like, and what you don't like. Give them examples of what you like and don't like. Take notes on the things you discuss with them until you figure out a way to express how to coax out of a fresh person new to this topic how to give you what you want. When working with the person, work out a process where they have rough drafts, and you walk them through how to select the best items and give them pointers on how to improve on them. Write up a written process for how to do this. Maybe there are multiple phases in this process, and things go through multiple revisions to get to quality material. Now do the same thing with the LLM, and you have yourself a good system.

        Same thing goes for writing stories, which elsewhere in these threads people say LLM's are terrible at. Sit a human down and tell them to give you a story, and I promise you will receive terrible results or outright copying. Instead, give your humans some guidelines. Like start with the idea that in the end our hero is going to be a particular way with particular strengths they need to conquer the central challenge. But in the beginning, they are the complete opposite of that thing. What is the central challenge, and what are the characteristics they need to conquer it? In what ways will the main character be the opposite of that at the beginning of the story? Then we put the character through hell in various ways over the course of the story to enact those changes in their character that they need to win in the end. For each of these sentences/phases above, make things that explain the ideas more fully, and make a process to iterate on these things, possibly with multiple different prompts and loops at every phase. This kind of approach more closely resembles what many real novelists do, iterating on ideas, often in the back of their mind or subconsciously, over hours or years of rumination with or without written outlines and notes. Maybe randomly select 2 things from movietropes.com and throw those in there saying incorporate these ideas. Experiment, iterate, see what works and what doesn't.

        People need to give LLM's the domain knowledge and capability of rumination to succeed in so many of these domains, rather than just asking "write me a novel" and being disappointed. Or asking "write me a puzzle my D&D group will enjoy" without going through these extra steps that are implicit/intuitive for what experienced subject matter experts do.

        Source: I write AI products for a living with many things in production delivering real business value at scale every day. It's not all hype, it just takes a while to implement.

    • stickfigure a day ago

      > (a door which shocks you if you try to open it and then mocks and laughs at you: you have to tell it a joke to get it to laugh enough that it opens on its own; particularly bad jokes will cause it to summon an imp to attack you)

      That's pretty great! And way more fun than the parent poster's puzzle (sorry). I think the AIs are winning this one.

      • throwup238 a day ago

        Small changes to the prompt like that have a huge impact on the solution space LLMs generate which is why “prompt engineering” plays any significance. This was rather obvious IMO from the beginning of GPT4 where you could tell it to write in the style of Hunter S Thompson or Charles Bukowski or something which drastically changes the tone and style. Combining them to get the exact language you want can be a painstaking process but LLMs are definitely capable of any kind of style.

lainga a day ago

The author's workflow sounds like writing ideas onto a block of post-its and then having them slosh around like they're boats lashed up at harbour. He wasn't actually gaining any new information - nothing that really surprised him - he was just offloading the inherent fluidity of half-formed ideas to a device which reified them.

Imagine an LLM-based application which never tells you anything you haven't already told it, but simply takes the statements you give it and, every 8 to 12 seconds, changes around the wording of each one. Like you're in a dream and keep looking away from the page and the text is dancing before you. Would institutions be less uncomfortable with its use? (not wholly comfortable - you're still replacing natural expressivity with random pulls from a computerised phrase-thesaurus)

mitchbob a day ago
  • yawnxyz a day ago

    The link works for me, thanks!

    > When ChatGPT came out, many people deemed it a perfect plagiarism tool. “AI seems almost built for cheating,” Ethan Mollick, an A.I. commentator, wrote in his book

    It's ironic that this article complains about GPT-generated slop, but Ethan Mollick is a Associate Professor at Wharton, not any "generic A.I. commentator."

    What authors like this fail to realize that they often produce equally-generic slop as ChatGPT.

    Essays are like babies: you're proud of your own, but others' (including ChatGPT's) are gross.

    • spondylosaurus a day ago

      The author is Cal Newport of "Deep Work" fame. Not sure if that's a point for or against the article though, lol.

      • randcraw a day ago

        It should make him uniquely qualified to discuss whether writing with an LLM can ever be as immersive an experience as writing without one can be.

    • giraffe_lady a day ago

      I'm not totally sure but I think decisions about how to attribute a source like that are editorial and mostly out of the hands of the author.

      But aside from that this article is far far better than anything I have seen produced by AI? Is this just standard HN reflexive anti-middlebrow sentiment because we don't like the new yorker's style? My grandfather didn't like it either but it outlasted him and will probably outlast us as well.

      • yawnxyz a day ago

        I like the New Yorker's (and the author's) writing style! I'm just surprised they went with "AI commentator" as almost a snide remark, which makes you think some AI hallucinated that part.

        But again, AI doesn't really hallucinate spite, but that's probably what this AI commentator from the New Yorker feels?

      • nxobject a day ago

        And, for what it’s worth, flexibility and constantly adapting to different house styles are very much important writing skills… so I do think it’s not too relevant to think about which style is nice and which isn’t. (The hard part is getting published at all.) Perhaps one day we’ll figure out how to communicate those subtleties to a chatbot.

  • unshavedyak a day ago

    Interesting, that still fails for me. I assume it's JavaScript based, so archive loads the JS and JS truncates the page? Of course you could block JS, but still, surprised

extr a day ago

Great piece actually, I find this really resonates with the way I use LLMs. It works the same way for coding, where often you will not necessarily use the exact output of the model. For me it's useful for things like:

* Getting a rough structure in place to refine on.

* Coming up with different ways to do the same thing.

* Exploring an idea that I don't want to fully commit to yet

Coding of course has the advantage where nobody is reading what you wrote for it's artistic substance, a lot of times the boilerplate is the point. But even for challenging tasks where it's not quite there yet, it's a great collaboration tool.

__mharrison__ a day ago

I'm finding that I'm using AI for outlining and brainstorming quite a bit.

Just getting something on paper to start with can be a great catalyst.

Tiberium a day ago

Hopefully he also tries Claude. It's much better suited for creative writing than GPT models, especially Opus.

molave a day ago

ChatGPT's style is like an academic writer's. It's tone and word choice are same-ish across various subjects, but it's coherent and easy to understand. In retrospect, I've seen papers that would pass as GPT-created if they're written after 2022.

  • wildrhythms a day ago

    Are we reading the same academic papers? When I read ChatGPT output it reads like pseudo-educational blogspam.

    • RIMR a day ago

      That's because for the past couple of years models like ChatGPT's have been used to generate psueudo-educational blogspam, and you associate that with writing style with it now.

      But generally, ChatGPT writes in a very literal direct style. When you write about science, it sounds like a scientific paper. When you write about other subjects, it sounds like a high school book report. When you write creatively, it sounds corny and contrived.

      You can also adjust the writing style with example or proper descriptions of the writing style. As a basic example, asking it to "dudify" everything it says will make it sound cooler than a polar bear in ray-bans, man...

  • nullc a day ago

    Hardly. It's been RLHFed into sounding like blogspam from foreign content farms, because they used the same people as raters. The non-finetuned models have a much better 'house style' across a wide range of prompting approaches.

the_af 14 hours ago

> "“In the end, the true value of tools like ChatGPT lies not in making academic work easier, but in empowering students to engage more deeply with their ideas and express them with greater confidence,” the chatbot suggested.

The author suggests this closing thought by ChatGPT is "not terrible", but "requires some work".

But it is terrible. Saying it "requires some work" is a huge understatement. The paragraph is a meaningless platitude, a string of trite and clichéd words that look as if they meant something but they don't. Exactly what I've come to expect from ChatGPT.

vouaobrasil a day ago

[flagged]

  • RIMR a day ago

    Just an FYI, this kind of ideological anti-AI flippancy isn't really something people are going to appreciate on HN...

    If you have a criticism of AI/ChatGPT you want to share that relates to this article, feel free to share your thoughts. But I don't think anyone considers this low-effort "AI Bad" attitude to contribute much to any conversation about AI/ML technology.

    • vouaobrasil a day ago

      Sorry, most of the time I try to provide better written responses. AI is an emotional topic for me and sometimes I react without being more thoughtful.

asd33313131 a day ago

I had ChatGPT give me the key points to avoid reading:

AI's Role in Writing: Instead of outsourcing the writing process or plagiarizing, students like Chris use ChatGPT as a collaborative tool to refine ideas, test arguments, or generate rough drafts. ChatGPT helps reduce cognitive load by offering suggestions, but the student still does most of the intellectual work.

Limited Usefulness of AI: ChatGPT's writing was often bland, inconsistent, and in need of heavy editing. However, it still served as a brainstorming partner, providing starting points that allowed users to improve their writing through further refinement.

Complexity of AI Collaboration: The article suggests that AI-assisted writing is not simply "cheating," but a new form of collaboration that changes how writers approach their work. It introduces the idea of "rhetorical load sharing," where AI helps alleviate mental strain but doesn’t replace human creativity.

Changing Perspectives on AI in Academia: Many professors and commentators initially feared that AI would enable rampant plagiarism. However, the article argues that in-depth assignments still require critical thinking, and using AI tools like ChatGPT might actually help students engage more deeply with their work.