• FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    17
    ·
    2 days ago

    Copyright, yes it’s a problem and should be fixed.

    No, this is just playing into another of the common anti-AI fallacies.

    Training an AI does not do anything that copyright is even involved with, let alone prohibited by. Copyright is solely concerned with the copying of specific expressions of ideas, not about the ideas themselves. When an AI trains on data it isn’t copying the data, the model doesn’t “contain” the training data in any meaningful sense. And the output of the AI is even further removed.

    People who insist that AI training is violating copyright are advocating for ideas and styles to be covered by copyright. Or rather by some other entirely new type of IP protection, since as I said this is nothing at all like what copyright already deals with. This would be an utterly terrible thing for culture and free expression in general if it were to come to pass.

    I get where this impulse comes from. Modern society has instilled a general sense that everything has to be “owned” by someone, even completely abstract things. Everyone thinks that they’re owed payment for everything that they can possibly demand payment for, even if it’s something that just yesterday they were doing purely for fun and releasing to the world without a care. There’s this base impulse of “mine! Therefore I must control it!” Ironically, it’s what leads to the capitalist hellscape so many people are decrying at the same time they demand more.

    • Blue_Morpho@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 day ago

      When an AI trains on data it isn’t copying the data, the model doesn’t “contain” the training data in any meaningful sense.

      I’d say it can be a problem because there have been examples of getting AIs to spit out entire copyrighted passages. Furthermore, some works can have additional restrictions on their use. I couldn’t for example train an AI on Linux source code, have it spit out the exact source code, then slap my own proprietary commercial license on it to bypass GPL.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        2
        ·
        1 day ago

        I’d say it can be a problem because there have been examples of getting AIs to spit out entire copyrighted passages.

        Examples that have turned out to either be a result of great effort to force the output to be a copy, a result of poor training techniques that result in overfitting, or both combined.

        If this is really such a straightforward case of copyright violation, surely there are court cases where it’s been ruled to be so? People keep arguing legality without ever referencing case law, just news articles.

        Furthermore, some works can have additional restrictions on their use. I couldn’t for example train an AI on Linux source code, have it spit out the exact source code, then slap my own proprietary commercial license on it to bypass GPL.

        That’s literally still just copyright. There’s no “additional restrictions” at play here.

          • FaceDeer@fedia.io
            link
            fedilink
            arrow-up
            1
            ·
            1 day ago

            Yes, that’s what I said. There are no “additional restrictions” from having a GPL license on something. The GPL license works by giving rights that weren’t already present under the default copyright. You can reject the GPL on an open sourced piece of software if you want to, but then you lose the additional rights that the GPL gives you.

    • patatahooligan@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 day ago

      When an AI trains on data it isn’t copying the data, the model doesn’t “contain” the training data in any meaningful sense.

      And what’s your evidence for this claim? It seems to be false given the times people have tricked LLMs into spitting out verbatim or near-verbatim copies of training data. See this article as one of many examples out there.

      People who insist that AI training is violating copyright are advocating for ideas and styles to be covered by copyright.

      Again, what’s the evidence for this? Why do you think that of all the observable patterns, the AI will specifically copy “ideas” and “styles” but never copyrighted works of art? The examples from the above article contradict this as well. AIs don’t seem to be able to distinguish between abstract ideas like “plumbers fix pipes” and specific copyright-protected works of art. They’ll happily reproduce either one.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        4
        ·
        1 day ago

        That article is over a year old. The NYT case against OpenAI turned out to be quite flimsy, their evidence was heavily massaged. What they did was pick an article of theirs that was widely copied across the Internet (and thus likely to be “overfit”, a flaw in training that AI trainers actively avoid nowadays) and then they’d give ChatGPT the first 90% of the article and tell it to complete the rest. They tried over and over again until eventually something that closely resembled the remaining 10% came out, at which point they took a snapshot and went “aha, copyright violated!”

        They had to spend a lot of effort to get that flimsy case. It likely wouldn’t work on a modern AI, training techniques are much better now. Overfitting is better avoided and synthetic data is used.

        Why do you think that of all the observable patterns, the AI will specifically copy “ideas” and “styles” but never copyrighted works of art?

        Because it’s literally physically impossible. The classic example is Stable Diffusion 1.5, which had a model size of around 4GB and was trained on over 5 billion images (the LAION5B dataset). If it was actually storing the images it was being trained on then it would be compressing them to under 1 byte of data.

        AIs don’t seem to be able to distinguish between abstract ideas like “plumbers fix pipes” and specific copyright-protected works of art.

        This is simply incorrect.

        • patatahooligan@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 day ago

          The NYT was just one example. The Mario examples didn’t require any such techniques. Not that it matters. Whether it’s easy or hard to reproduce such an example, it is definitive proof that the information can in fact be encoded in some way inside of the model, contradicting your claim that it is not.

          If it was actually storing the images it was being trained on then it would be compressing them to under 1 byte of data.

          Storing a copy of the entire dataset is not a prerequisite to reproducing copyright-protected elements of someone’s work. Mario’s likeness itself is a protected work of art even if you don’t exactly reproduce any (let alone every) image that contained him in the training data. The possibility of fitting the entirety of the dataset inside a model is completely irrelevant to the discussion.

          This is simply incorrect.

          Yet evidence supports it, while you have presented none to support your claims.

          • FaceDeer@fedia.io
            link
            fedilink
            arrow-up
            4
            ·
            1 day ago

            Learning what a character looks like is not a copyright violation. I’m not a great artist but I could probably draw a picture that’s recognizably Mario, does that mean my brain is a violation of copyright somehow?

            Yet evidence supports it, while you have presented none to support your claims.

            I presented some, you actually referenced what I presented in the very comment where you’re saying I presented none.

            You can actually support your case very simply and easily. Just find the case law where AI training has been ruled a copyright violation. It’s been a couple of years now (as evidenced by the age of that news article you dug up), yet all the lawsuits are languishing or defunct.

            • patatahooligan@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 day ago

              Learning what a character looks like is not a copyright violation

              And nobody claimed it was. But you’re claiming that this knowledge cannot possibly be used to make a work that infringes on the original. This analogy about whether brains are copyright violations make no sense and is not equivalent to your initial claim.

              Just find the case law where AI training has been ruled a copyright violation.

              But that’s not what I claimed is happening. It’s also not the opposite of what you claimed. You claimed that AI training is not even in the domain of copyright, which is different from something that is possibly in that domain, but is ruled to not be infringing. Also, this all started by you responding to another user saying the copyright situation “should be fixed”. As in they (and I) don’t agree that the current situation is fair. A current court ruling cannot prove that things should change. That makes no sense.

              Honestly, none of your responses have actually supported your initial position. You’re constantly moving to something else that sounds vaguely similar but is neither equivalent to what you said nor a direct response to my objections.

              • FaceDeer@fedia.io
                link
                fedilink
                arrow-up
                4
                ·
                24 hours ago

                But you’re claiming that this knowledge cannot possibly be used to make a work that infringes on the original.

                I am not. The only thing I’ve been claiming is that AI training is not copyright violation, and the AI model itself is not copyright violation.

                As an analogy, you can use Photoshop to draw a picture of Mario. That does not mean that Photoshop is violating copyright by existing, and Adobe is not violating copyright by having created Photoshop.

                You claimed that AI training is not even in the domain of copyright, which is different from something that is possibly in that domain, but is ruled to not be infringing.

                I have no idea what this means.

                I’m saying that the act of training an AI does not perform any actions that are within the realm of the actions that copyright could actually say anything about. It’s like if there’s a law against walking your dog without a leash, and someone asks “but does it cover aircraft pilots’ licenses?” No, it doesn’t, because there’s absolutely no commonality between the two subjects. It’s nonsensical.

                Honestly, none of your responses have actually supported your initial position.

                I’m pretty sure you’re misinterpreting my position.

                The “copyright situation” regarding an actual literal picture of Mario doesn’t need to be fixed because it’s already quite clear. There’s nothing that needs to change to make an AI-generated image of Mario count as a copyright violation, that’s what the law already says and AI’s involvement is irrelevant.

                When people talk about needing to “change copyright” they’re talking about making something that wasn’t illegal previously into something that is illegal after the change. That’s presumably the act of training or running an AI model. What else could they be talking about?

    • RandomVideos@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      If a larger youtuber steals the script and content of a video from a smaller youtuber, as far as i know, it wouldnt be illegal. It would hurt the smaller youtuber and benefit the larger one. It would make people mad if they found out about it, but there wouldnt be people who propose changing copyright law to include ideas

      I am using youtubers as the example because this happened and a lot of people got angry and its similar to the AI situation
      People can complain that something unethical is legal without proposing new copyright laws without flaws

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        2
        ·
        1 day ago

        Sure. But that’s not what’s happening when an AI is trained. It’s not “stealing” the script or content of the video, it’s analyzing them.

        • RandomVideos@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 day ago

          By analyzing, isnt it awarding points based on how well you can replicate it and redoing it in an attempt to obtain more points?

          • FaceDeer@fedia.io
            link
            fedilink
            arrow-up
            2
            ·
            1 day ago

            Very basically, yes. But the result is a model that doesn’t actually contain the training data, it’s too small for it to be physically possible.

      • Blue_Morpho@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        1 day ago

        Stealing script and content is a copyright violation. It could be a lawsuit but is usually ignored because of legal costs.

        • RandomVideos@programming.dev
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          By stealing, i meant rewriting it, following the same points but using unique words. Is that illegal or are you referring to completely copying the text?

          • Blue_Morpho@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            1 day ago

            I thought you meant actual copying like when Linus tech tips used some Gamer’s Nexus script word for word.