What You Will Learn
Stat-of-the-art technologies, and especially artificial intelligence, are dramatically changing the essence of processes that involve creativity. Computers are taking part in creative businesses such as the film industry, Gaming, User interactive apps, architecture, music, and fine arts. In fact, the computer is like a brush, a canvas, a musical instrument, an animation generation device, and so on when we use it in arts. However, we feel that we must focus on more aspiring relation between creativity and computers. Instead of perceiving the computer as a machine to help humans to create something. We could comprehend it as a creative unit on its own that can automatically generate creative stuff. This idea has triggered a new subfield of Artificial Intelligence called Computational Creativity.
This article is about exploring the usage of artificial intelligence in the computational creativity domain. We will explore the usage of artificial intelligence in Visual arts and music. We will see how STN (Style Transfer Network) and GAN (Generative Adversarial Network)
AI in Visual Arts
You ever wished that if you could paint like Van Gogh or Picasso? Yes, computers can do it nowadays. This technique is called neural style transfer which is described in a research paper titled “A Neural Algorithm of Artistic Style “ and published in CVPR . So before moving forward you should know, what is neural style transfer? The Neural Style Transfer is an image optimization technique used for styling an image. In this technique, we take three images, an input image that you want to style on, a reference image for styling, and a content image. After that we mix them all together such that the input image is remodeled to look like the content image, but decorated in the manner of the style image, bridging the island of art and deep learning!
For example, let’s take an input image of the scene containing houses(A) and a splash of colors as the content image (C):
neural style transfer
The above procedure is neural style transfer. So, Is this magic or something? This isn’t magic, this is deep learning. The good thing is, this doesn’t demand any witchcraft: style transfer is an interesting technique and often this is fun that demonstrates the capabilities of neural networks. The basic rule of neural style transfer is to define two distance functions, first distance function describes how much different the content of two images is with respect to each other,
, and the second that calculates the variation or difference between two images with respect to their styles, . After that, the provided three images, a desired content image, the input image (which is initialized with the content image) and a desired style image, the algorithm tries to transform the input image by minimizing the content distance with respect to content image and its style distance with respect to style image. In simple words, we’ll provide the base input image, a content image that we want to make, and the style image that we want to use, to the NST network. The NST algorithm will try to convert the base input image to content image by minimizing the content and style distances (losses) using the backpropagation technique, generating an output image that resembles the style of the style image and content of the content image.
Neural Style Transfer (NST) used a pre-trained convolution neural network and builds on top of that. The idea of using a neural network trained on a different task but that lies in the same domain and applying it to a new but similar task is called transfer learning.
Neural Style Transfer (NST)
There are three steps involved in building Neural Style Transfer (NST) algorithm:
- Build the content cost function J_content (C, G).
- Build the style cost function J_style(S, G).
- Put it together to obtain J(G) = α * J_content(C, G) + β * J_style(S, G).
The overall cost function of the neural style transfer algorithm
AI in Music Generation
Recently, generative adversarial neural networks (GANs) have laid hold of the stage for creative pursuits, such as image enhancement and image generation. Another broad field where these deep learning neural networks are starting to capture the ground is music generation. In this part of the blog, our objective is to investigate the usage of GAN neural networks and LSTM (Long Short Term Memory) to create music that seems as if it were human artist made.
Long Short-Term Memory
There are some reasons for LSTMs that prove them well-suited in handling the music data for the learning process. However, An LSTM is a special kind of recurrent neural network (RNN) that works with sequential data rather than spatial data. That means it works with time and space instead of space only. This type of architecture shares features or weights over the temporal way which allows it to output some kind of coherent patterns. Additionally, the ability to remember and reuse previous inputs over a waste span of time. And making a kind of long-term memory is extensively useful in the music generation process. Thus, LSTMs give us the freedom to maintain a relative consistency throughout a song with a “theme” in mind, which is generated by GAN .
Contrary to other recurrent neural networks (RNNs) that lack long-term memory. LSTM acts like pipelines that can carry useful features onward over remarkable gaps in time. It passes information from the past through a pipe parallel to the network. And uses a gating mechanism to select the information that goes in and out of each step. This mechanism, derived from a sigmoid transformation multiplied to network elements, performs three tasks to control information flow: input, forget, and output. It determines which aspects of a new state are relevant for the new inputs, which aspects from previous states need to forgotten, and which portion of the state gets carried forward. These mechanisms allow LSTM to use complex, yet adaptive, parameters to develop well-crafted sequences that emulate patterns.
Generative Adversarial Networks
On top of the LSTM model, we have to construct a generative adversarial network (GAN) that will be able to fulfill the task of generating music from given inputs. As the name depicts a GAN is a type of neural network which works in an adversarial way. Two networks work opposite to each other in the learning process. However, One is called a generator network and the other is called a discriminator network. The generator model and discriminator model are trained and validated simultaneously in the adversarial learning process.
The discriminator is presented with examples of real data alongside fake data which is generated from random noise. For each set of samples during the training of the discriminator network. Its task is to correctly classify data as either real music or fake music. On the flip side, the task of a generator network is trying to generate fake music. But like the real music from random noise that is able to fool the discriminator into making more classification mistakes (i.e. calling music generated by the generator as “real” music). By putting these two networks in order, both the generator and discriminator networks start to compete against each other until the required end results.
AI Audio Deepfakes
Audio Deepfakes  is another interesting implication of GANs on temporal audio voice data. Moreover, In simple words, it is also called voice cloning. For cloning someone’s voice in TTS (text-to-speech) scenario we need two inputs:
- The text we want to be read
- The sample of voice which we want to read the given text
However, This system is fun for kids and as well as a threat for others which can be misused. Technical giants like Google and others are working on this and achieved admirable results. The system can have positive uses as well in the media and entertainment industry.
- Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. “A neural algorithm of artistic style.” arXiv preprint arXiv:1508.06576 (2015).
- Greff, Klaus, et al. “LSTM: A search space odyssey.” IEEE transactions on neural networks and learning systems 10 (2016): 2222-2232.
- Kietzmann, Jan, et al. “Deepfakes: Trick or treat?.” Business Horizons 2 (2020): 135-146.
- Goodfellow, Ian, et al. “Generative adversarial networks.” Communications of the ACM 11 (2020): 139-144.
- Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. “Image style transfer using convolutional neural networks.” Proceedings of the IEEE conference on computer vision and pattern recognition.
You may also like to read: Role of Artificial Intelligence in Cardiology