We humans like to think we’re the only beings capable of creativity, but computers have been used as a generative force for decades, creating original pieces of writing, art, music, and design. This digital renaissance, powered by advancements in artificial intelligence and machine learning, has ushered in a new era where technology not only replicates but also innovates, blurring the lines between human and machine creativity. From algorithms that compose symphonies to software that drafts novels, the scope of computer-generated creativity is expanding, challenging our preconceived notions of artistry and originality.
A Brief Look Into the History of Creative AI
Generative Adversarial Networks (GANs) for image generation were introduced in 2014. Then in 2016, DeepMind introduced WaveNet and audio generation. Next year, the Google research team suggested the Transformer architecture for text understanding and generation, and it became the basis for all the large language models we know today.
The research advancements quickly transformed into practical applications. In 2015, engineer and creative storyteller Samim trained a neural network on 14 million lines of passages from romance novels and asked the model to generate original stories based on new images.
A year later, Flow Machines, a division of Sony, used an AI system trained on Beatles songs to generate their own hit, “Daddy’s Car,” which eerily resembles the musical style of the hit British rock group. They did the same with Bach music and were able to fool human evaluators, who had trouble differentiating between real Bach compositions and AI-generated imitations.
Then, in 2017, Autodesk, the leading producer of computer-aided design (CAD) software for industrial design, released Dreamcatcher, a program that generates thousands of possible design permutations based on initial constraints set by engineers. Dreamcatcher has produced bizarre yet highly effective designs that challenge traditional manufacturing assumptions and exceed what human designers can manually ideate.
If this applied AI content is useful for you, subscribe to our AI mailing list to be alerted when we release new material.
AI Text Generation
The recent advent of generative AI has sparked a renaissance in computational creativity. OpenAI’s ChatGPT has become probably the most widely-known example of the AI’s text generative power, but it has many strong competitors, including Anthropic’s Claude, Google’s Gemini, Meta’s Llama, and others.
These large language models (LLMs) possess the ability to craft text on virtually any subject, all while reflecting a tailored writing style. For example, imagine we task ChatGPT with writing a piece about artificial intelligence’s worldwide domination through authoring books, crafting images, and generating code – all in the dramatic style of a poetry slam. The resulting creation is quite impressive.
While this serves as a playful illustration, the potential applications of LLMs go well beyond simple entertainment:
- Marketing teams are already tapping into the creative power of ChatGPT and similar models to craft captivating stories, blog posts, social media content, and advertisements that echo a brand’s unique voice.
- Customer support teams utilize LLM-powered bots to offer round-the-clock assistance to their customers.
- In software development, new AI-assisted engineering workflows are taking shape, powered by generative AI coding tools. These tools offer code suggestions and complete functions, drawing on natural language prompts and existing codebases.
However, LLM-based applications are full of their pitfalls. Their performance can be erratic, leading to instances of ‘hallucination.’ Several notable incidents have occurred where companies were forced to honor a refund policy fabricated by their chatbot or users were able to trick the chatbot into selling them a car for $1. At this juncture, it’s imperative to consider these risks and, in high-stakes situations, to incorporate human oversight into the process. Yet, it’s clear that this technology is already significantly influencing business processes, with its impact set to increase further.
AI Image Generation
While large language models are revolutionizing the field of text generation, providing novel tools and challenges to writers, diffusion models are making waves in the world of art and design.
Tools like Midjourney, Stable Diffusion by Stability AI, and DALL-E 3 by OpenAI can generate images so realistic they could be mistaken for actual photographs.
Industry titans like Adobe are also stepping up, placing an emphasis on the ethical and legal implications of AI-generated images. To assuage enterprise concerns about using AI-generated images, Adobe has restricted its training dataset to licensed Adobe Stock and public domain images. Moreover, they provide an IP indemnity for content created using select Firefly workflows, their proprietary AI image generator. Others, including Google, Microsoft, and OpenAI followed their example to enhance the transition of enterprise customers to AI-generated content.
Despite significant advancements in AI image generation throughout 2023, the technology still faces notable limitations, akin to those experienced by LLMs. Chief among these challenges is the tendency of AI tools to deviate from the explicit instructions provided in prompts, produce images with occasional artifacts, and exhibit biases in diversity. Typically, AI image generators produce content that mirrors the available online databases, which often consist of images featuring aesthetically appealing, model-like individuals, predominantly white women and men. To achieve a more equitable representation, it is necessary to deliberately introduce diversity into the generated images. However, caution is advised to avoid the pitfalls of overcorrection, as evidenced by the controversy surrounding Google’s Gemini image generation. The tool faced criticism for its extreme bias in refusing to generate images of white individuals, particularly white men, and for producing unconventional representations, like for example, Black popes and female Nazi soldiers.
AI Video Generation
Last year marked the inception of notable advancements in text-to-video generation and editing, with pioneers like Runway leading the charge. They were at the forefront of creating new videos from text prompts and reference materials. However, the videos were limited to approximately four seconds in duration, were still of low quality, and exhibited significant issues with warping and morphing.
The year 2024 was anticipated to be a watershed moment for AI video generation, and it has already begun to fulfill those expectations. OpenAI recently unveiled Sora, its AI video generator which, based on available demonstrations, significantly surpasses the capabilities of alternative tools developed by Runway, Pika Labs, Genmo, Google (Lumiere), Meta (Emu), and ByteDance (MagicVideo-V2).
While Sora distinguishes itself from its competitors, it remains inaccessible to the public, and the full scope of its capabilities has yet to be thoroughly evaluated beyond the sphere of meticulously crafted demonstrations.
Nonetheless, the technology’s capacity to transform various sectors, such as entertainment, filmmaking, and marketing, is immense. The full extent of how AI-generated videos will be utilized in business and their primary challenges remain to be seen. However, even now, there’s a growing concern over the proliferation of deepfake videos online, as it becomes increasingly straightforward to produce convincing videos depicting events that never occurred.
The Boundless Horizon of AI Creativity
AI systems that create have taken center stage in recent years, expanding their influence across a multitude of sectors, from art, design, music, and entertainment to software development, education, and drug development. As these systems grow more sophisticated, they promise to redefine what’s possible, opening up new avenues for innovation and creativity. The fusion of artificial intelligence with human ingenuity has the potential to accelerate breakthroughs, solve complex problems, and craft experiences that were once unimaginable. As we stand on the brink of this new frontier, it is crucial to navigate the ethical implications and ensure that these technologies are used responsibly and for the greater good.
Enjoy this article? Sign up for more AI updates.
We’ll let you know when we release more summary articles like this one.