Basically, I wanted to do implement some kind of model myself. I decided on the LSTM (Long Short Term Memory.) It’s a bit early, and I don’t want to explain it all out, so I’ll just link a good explanation to LSTMs here.
Implementation is easy:
The main question is how to represent musical data and input it to the model.
Here’s a chart showing the different events I recorded in my representation.
- Volume and tempo events set the volume and tempo for the sequence until a new volume or tempo is encountered.
- There’s 128 note values in MIDI, so there’s 128 note up/down events each.
- Pause events represent a time difference between the previous and next event. There’s 48 pause values, representing each value from 1/12 beat to a full 4 beats. This accounts for triplets and sixteenths.
- Pedal is pretty self explanatory.
- Start/End represents the start or end of a sequence.
- I included an artist ID in each sequence so the model could differentiate between different artists and generate in different styles.
An example sequence might look something like this:
[326, 322, 8, 2, 92, 274, 220]
In a more readable format, it’s this:
[‘Beethoven’, ‘Start’, ’60bpm’, ‘Volume 2’, ‘Note 77 On’, ‘Wait 1/6 beat’, ‘Note 77 Off’]
A LSTM isn’t really the most advanced or effective choice for this problem, but it still works decently.
While training this model on the same Beethoven dataset as I used for the Music Transformer, the model only achieved about 75% train accuracy and 25% validation accuracy. This is a phenomenon called overfitting, where the model gets so focused on the training data it doesn’t generalize well to other data. To combat this, I tried a dropout layer, which randomly drops neurons while training to prevent the model getting stuck in a pattern. Validation accuracy increased to around 35%, but it still wasn’t great.
Here’s some samples:
The model doesn’t remember patterns well, and that’s pretty clear here because it deteriorates and starts generating pretty repetitive stuff. Additionally, it seems to always like to start with the same pattern, which is undesirable in my opinion.