A significant uproar has arisen due to the fiction analytics site called Prosecraft, which scanned and uploaded thousands of books into a vast dataset without obtaining the authors' consent.
According to TechCrunch, Prosecraft, developed by cloud word processor Shaxpir, aimed to compile a repository of over 27,000 books, subjecting them to comparison, ranking, and analysis based on the perceived "vividness" of their language.
Prosecraft Faces Backlash
Renowned authors, including Maureen Johnson and Celeste Ng, voiced their discontent with Prosecraft for using their literary works without proper authorization. According to reports, even books released as recently as a few weeks prior were not spared from this unauthorized scanning.
In the face of online criticism, the project's creator, Benji Smith, eventually took down the Prosecraft website, which had been operational since 2017. Smith noted his considerable effort in the project, spending thousands of hours on text organization, annotation, and refinement.
"But in the meantime, 'AI' became a thing. And the arrival of AI on the scene has been tainted by early use-cases that allow anyone to create zero-effort impersonations of artists, cutting those creators out of their own creative process," Smith added.
It is important to note that Prosecraft was not designed as a generative AI tool. However, concerns emerged within the author community regarding its potential to evolve into one.
Smith had amassed an extensive dataset comprising a quarter billion words from published books, primarily acquired through web crawling.
Prosecraft's methodology involved displaying two paragraphs from a book - one considered the "most passive" and another the "most vivid." The project subsequently ranked the books based on percentile scales corresponding to their vividness, length, and passivity.
Authors expressed their frustration at this approach, asserting that writing style is a distinctive aspect that goes beyond rigid rules such as active or passive voice conventions. Smith outlined his rationale in a blog post.
He argued that by publishing summary statistics and small excerpts from the books, he believed he was adhering to the Fair Use doctrine's spirit, which, in his interpretation, did not necessitate author consent.
However, some authors found that the excerpts on Prosecraft contained significant spoilers, adding another layer of discontent.
Read Also : Reporters Without Borders, Journalist Groups Partnered to Craft 'Responsible' AI Use Guidelines in Media
Writers Urge AI Companies to Honor Copyrights
Although Smith offered an apology, the authors continued to express their frustration over the situation. The widespread adoption of AI tools has contributed to an ongoing cycle of challenges for artists and writers.
Upon opting out of one database, they often find their creations being leveraged to train additional AI models, perpetuating a persistent cycle of aggravation and difficulties.
Renowned authors Margaret Atwood and James Patterson are among the thousands of writers urging AI companies to honor copyrights.
In an open letter to generative AI leaders, the Authors Guild "calls on the CEOs of OpenAI, Alphabet, Meta, Stability AI, and IBM to obtain consent, credit, and fairly compensate writers" before incorporating copyrighted materials into their technologies.