Entertainment at it's peak. The news is by your side.

Beginner’s Guide to Creating Original AI Music Through Neural Networks


contemplating the ability of machine learning music generation vs. human creation

Deep Studying has improved many facets of our lives, in solutions each and each glaring and refined. Deep discovering out performs a key position in processes reminiscent of movie advice methods, unsolicited mail detection, and pc vision. Even though there is ongoing dialogue around deep discovering out as a gloomy box and the topic of coaching, there is enormous ability for it in a massive quantity of fields in conjunction with medications, virtual assistants, and ecommerce.

One charming position whereby deep discovering out can play a position is on the intersection of art work and technology. To explore this conception extra, in this article we can discover at machine discovering out music generation via deep discovering out processes, a field many have interaction is beyond the scope of machines (and yet any other attention-grabbing position of fierce debate!).


  • Music Illustration for Machine Studying Fashions
  • Music Dataset
  • Data Processing
  • Mannequin Different
    • Many-Many RNN
    • Time Dispensed Dense Layer
    • Stateful
    • Dropout layers
    • Softmax layer
    • Optimizer
  • Skills of music
  • Summary

Music Illustration for Machine Studying Fashions

We are able to be working with ABC music notation. ABC notation is a shorthand have of musical notation that uses the letters A thru G to signify musical notes, and other substances to position added values. These added values consist of sharps, flats, the length of a reward, the most important, and ornamentation.

This have of notation started as an ASCII persona put code to facilitate music sharing online, adding a brand modern and easy language for machine builders designed for ease of use. Decide 1 is a snapshot of the ABC notation of music.

A snapshot of music in ABC notation, that will be used for machine learning music generation.
Decide 1: A snapshot of music in ABC Notation

Strains in fragment 1 of the music notation demonstrate a letter followed by a colon. These point out varied facets of the tune reminiscent of the index, when there is greater than one tune in a file (X:), the title (T:), the time signature (M:), the default present length (L:), the vogue of tune (R:) and the most important (K:). The lines following the most important designation signify the tune itself.

Music Dataset

Listed here we’ll use the delivery-sourced records accessible on ABC model of the Nottingham Music Database. It comprises greater than 1000 folk tunes, the massive majority of which bear been converted to ABC notation.

Data Processing

The records is at this time in a personality-essentially based mostly mutter layout. In the records processing stage, we’d like to transform the records into an integer-essentially based mostly numerical layout, to rearrange it for working with neural networks.

How we will process musical notation into numerical data for machine learning music generation.
Decide 2: Snapshot of straight forward records processing

Here each and each persona is mapped to a varied integer. This might well perchance maybe well be finished the usage of a single line of code. The ‘textual screech material’ variable is the enter records.

char_to_idx = { ch: i for (i, ch) in enumerate(sorted(listing(put(textual screech material)))) }

To coach the mannequin, we convert the entirety of textual screech material records accurate into a numerical layout the usage of the vocab.

T = np.asarray([char_to_idx[c] for c in textual screech material], dtype=np.int32)

Mannequin Different for Machine Studying Music Skills

In passe machine discovering out fashions, we won’t retailer a mannequin’s outdated stages. On the opposite hand, we can retailer outdated stages with Recurrent Neural Networks (assuredly known as RNN).

An RNN has a repeating module that takes enter from the outdated stage and presents its output as enter to the next stage. On the opposite hand, RNNs can easiest retain records from the most present stage, so our network needs more memory to be taught long-period of time dependencies. That is the put Long Instant Time period Reminiscence Networks (LSTMs) reach to the rescue.

LSTMs are a obvious case of RNNs, with the same chain-esteem construction as RNNs, nevertheless a varied repeating module construction.

Decide 3: The workings of fashioned LSTM

RNN is outdated here because:

  1. The length of the records doesn’t might well perchance maybe well aloof be mounted. For every and each enter, the records length can fluctuate.
  2. Sequence memory is kept.
  3. Rather a pair of combos of enter and output sequence lengths can also furthermore be outdated.

Along with the total RNN, we’ll customize it to our use case by adding a pair of tweaks. We’ll use a ‘persona RNN’. In persona RNNs, the enter, output, and transition output are within the have of characters.

Decide 4: Overview of a Personality RNN

Many-Many RNN

As we’d like our output generated at each and each timestamp, we’ll use a many-many RNN. To implement a many-many RNN, we’d like to position the parameter ‘return_sequences’ to moral so as that every and each persona is generated at each and each timestamp. You will likely be capable to be ready to win a higher working out of it by searching at pick 5, below.

Decide 5: Building of Many-Many RNN

In the pick above, the blue units are the enter, the yellow are the hidden units, and the fairway are the output units. That is a easy overview of a many-many RNN. For a more detailed discover at RNN sequences, here’s a priceless resource.

Time Dispensed Dense Layer

To route of the output at each and each timestamp, we have a time distributed dense layer. To assign this we have a time distributed dense layer on high of the outputs generated at each and each timestamp.


The output from the batch is passed to the next batch as enter by surroundings the parameter stateful to moral. After combining the total aspects, our mannequin will discover esteem the overview depicted in pick 6, below.

Decide 6: Overview of the mannequin architecture

The code snippet for the mannequin architecture is as follows:

mannequin = Sequential()

mannequin.add(Embedding(vocab_size, 512, batch_input_shape=(BATCH_SIZE, SEQ_LENGTH)))

for i in differ(3):

    mannequin.add(LSTM(256, return_sequences=Valid, stateful=Valid))





mannequin.assemble(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

I extremely counsel playing around with the layers to make stronger the performance.

Dropout Layers

Dropout layers are a regularization methodology that consists of a half of enter units to zero at each and each change at some stage within the training to cease overfitting. The half is sure by the parameter outdated with the layer.

Softmax Layer

The generation of music is a multi-class classification area, the put each and each class is a varied persona from the enter records. Hence we’re the usage of a softmax layer on high of our mannequin and mutter unsuitable-entropy as a loss characteristic.

This sediment presents the likelihood of each and each class. From the listing of likelihood, we ranking the one with the indispensable likelihood.

Decide 7


To optimize our mannequin, we use Adaptive Moment Estimation, in total is named Adam, as it is miles a genuinely massive preference for RNN.

A summary of our machine learning music model
Decide 8: Snapshot of the mannequin summary

Producing Music

Unless now we created an RNN mannequin and expert it on our enter records. This mannequin learned patterns of enter records at some stage within the training section. Let’s call this mannequin the ‘expert mannequin’.

The enter dimension outdated within the expert mannequin is the batch dimension. And for generation of music via machine discovering out, the enter dimension is a single persona. So we have a brand modern mannequin which has similarities to the expert mannequin, nevertheless with enter dimension of a single persona which is (1,1). To this modern mannequin, we load the weights from the expert mannequin to replicate the characteristics of the expert mannequin.

model2 = Sequential()

model2.add(Embedding(vocab_size, 512, batch_input_shape=(1,1)))

for i in differ(3):

    model2.add(LSTM(256, return_sequences=Valid, stateful=Valid))




We load the weights of the expert mannequin to the modern mannequin. This might well perchance maybe well be finished the usage of a single line of code.

model2.load_weights(os.course.join(MODEL_DIR, ‘weights.100.h5’.layout(epoch)))


A summary of our machine learning music generation model
Decide 9 : Snapshot of summary

In the midst of of music generation, the first persona is chosen randomly from the ordinary put of characters, the next persona is generated the usage of the beforehand generated persona and so forth. With this construction, we generate music.

Decide 10: Overview of generation architecture

Here is the code snippet that helps us assign this.

sampled = []

for i in differ(1024):

  batch = np.zeros((1, 1))

  if sampled:

     batch[0, 0] = sampled[-1]


     batch[0, 0] = np.random.randint(vocab_size)

  consequence = model2.predict_on_batch(batch).ravel()

  sample = np.random.preference(differ(vocab_size), p=consequence)




print(''.join(idx_to_char[c] for c in sampled))

Here are a pair of generated objects of music:

We generated these fulfilling samples of music the usage of machine discovering out neural networks is named LSTMs. For every and each generation, the patterns will likely be varied nevertheless reminiscent of the training records. These melodies can also furthermore be outdated for a massive quantity of suggestions:

  • To toughen artists’ creativity thru inspiration
  • As a productiveness machine to scheme modern suggestions
  • As extra tunes to artists’ compositions
  • To total an unfinished piece of work
  • As a standalone piece of music

On the opposite hand, this mannequin can aloof be improved. Our training records consisted of a single instrument, the piano. A formula we might well perchance maybe well toughen our training records is by adding music from a pair of instruments. Yet one more might well perchance maybe well be to amplify the genres of music, their rhythms, and their timing signatures.

As we command, our mannequin generates a pair of unsuitable notes and the music is just not any longer unprecedented. Shall we slash these errors and amplify the good of our music by increasing our training dataset as detailed above.


Listed here, we looked at how one can route of music to be used with neural networks, the in-depth workings of deep discovering out fashions esteem RNN & LSTMs, and we also explored how tweaking a mannequin might well perchance maybe well live up in music generation. We are able to assure these ideas to another system the put we generate other codecs of art work, in conjunction with generating landscape work or human portraits.

Thanks for reading! Even as you would ranking to experiment with this assure dataset your self, you will download the annotated records here and stare my code at Github.

Even as you’d ranking to read more of Ramya’s technical articles, be definite to envision out the linked sources below. You will likely be capable to be ready to also study in to the Lionbridge AI newsletter for technical articles delivered straight to your inbox.

Subscribe to our newsletter for more technical articles

Ramya Vidiyala

Ramya is a records nerd and a passionate writer who loves exploring and discovering predominant insights from records. She writes articles on her Medium weblog about ML and records science the put she shares her experiences to abet readers be conscious ideas and resolve concerns. Attain out to her on Twitter (@ramya_vidiyala) to initiate a conversation!

Read More

Leave A Reply

Your email address will not be published.