Fugue Generation through Genetic Algorithms

CS 473 Artificial Intelligence Project

Fugue Generation with Genetic Algorithms

Eric Milkie, Joel Chestnutt

Cornell University

1 Introduction

Our problem was to be able to create original pieces of music, fugues, from scratch. Fugues have a recurring melody, known as the subject, which is repeated throughout the piece, and the other parts of the fugue (answer and counter-subjects) are then based around the subject, within certain rules and guidelines. To be able to create fugues, our program first had to generate a unique subject, and then be able to create an answer, and counter-subjects from the subject, and finally be able to put them all together into a complete fugue. Music composition is an area that computers have not had much success in, largely due to the fact that it’s hard to define exactly what "sounds good", and be able to create something that is different, yet which also sounds good. Our system is able to successfully generate subjects, and from there create the beginning of a fugue, but is unable to generate complete fugues, due to the complexity involved. And while many of the pieces generated by our system have been rated highly by our test listeners, they are still not competition for a human composer.

2 Problem Definition and Task

2.1 Task Definition

Our program will output a brief fugue, as well as report the fitness of the subject used, and report the results from our evaluation metrics. For inputs, we simply changed the values of the constants in the program to whatever we wished to try. We could can the size of the population in the genetic algorithm, the number of generations it will run, and we can adjust all the weights in our fitness function.

2.2 Algorithm Definition

There are two main parts to our program: subject generation, and fugue generation from the subject. The first part is accomplished through the use of a genetic algorithm that tries random subjects, and then mutates and crosses them to slowly improve them into something decent. For the genetic algorithm code, we used a pre-built genetic algorithm structure and customized it to work with music by extending it to use our data structure and fitness function. For MIDI file production, we used a Java music production system called JMusic, available for free.

GA data structure:

The entire subject is represented as a bit string, where every 9 bits is one "bin", which represents a 16^th note. The entire subject is 4 measures long, so the subject is 64 bins long. The advantage of this structure is that every subject in the population is exactly the same size, regardless of the rhythms or amount of notes contained in it.

Within each bin, the first 7 bits represent the pitch of the note, on a midi scale of 0 to 127. The last 2 bits in the bin represent the "style" of the bin. The style is either a link to the next bin, a rest, or the end of a note. In this way, several adjacent bins with the style of link combine to form longer notes. Since there are only 3 style possibilities, and 4 possible permutations of the 2 style bits, we allow 2 of the permutations to represent the link style, so that our initial random subjects, and mutations that occur will tend to favor linking notes together, so that we do not simply end up with lots of 16^th notes.

We wanted to allow any possible combination of bits to be a legal subject, so we had to define the following cases:

In a group of linked bins, the pitch values do not have to match. The pitch of the note is determined by the pitch value of the first bin in the group.
If a group of linked bins links to a rest, the bin before the rest counts as an end of the note instead, and the rest is not linked in.
Likewise, if the subject ends on a bin with a link style, it just counts as the end of the note (and the subject).

GA Fitness Function:

Our fitness function parses through the bins, picks out the notes, and looks at different features of the subject. It either encourages or discourages them by adding to or subtracting from the fitness value of the subject. Below are the features evaluated:

1. Similar pitch: To avoid the tune jumping around randomly between distant notes, each note is discouraged being different from its predecessor. There is no penalty for a difference of up to 2 whole steps away, but beyond that, the subject is penalized proportionally to the square of the difference between each note and its predecessor.

2. Similar note length: This test compares a note’s length to the length of the previous note in the subject, and encourages them to be alike. This helps lend continuity to the piece by encouraging 16^th or 8^th note runs, or slower quarter notes to be grouped together, rather than the rhythm switching around arbitrarily.

3. Power of 2: The next test is whether a note’s length is a power of 2 or not. If it is, the subject is rewarded. The goal of this reward is to encourage more 16^th, 8^th, quarter, and half notes in a piece, rather than dotted eighth and dotted quarter notes. These other notes will still appear, but will be less common, helping to piece to sound more regular.

4. Tie over a bar line: We discourage notes from being held over between measures (the last bin in a measure’s style tells us this). This helps keep the measures separate and distinct, as they should be.

5. Notes start on 1^st or 3^rd bin of a beat: By encouraging notes that start on the 1^st or 3^rd bins of a beat, we encourage a regular beat throughout the subject. This keeps the subject from sounding syncopated or lost, because it tends to come back to eighth not boundaries (the first and third bins in each set of 4).

6. In key: We reward notes for being in the key of C. This goes a long way toward making the subject sound better. It removes the majority of sharps and flats from the subject, making it much nicer to listen to.

7. Not a rest: Since rests have none of the penalties of the normal notes (pitch being the main penalty), they tend to accumulate in subjects. This test rewards a subject for each bin that is not a rest, thus combating the penalties of the notes and keeping the subject from becoming overrun with rests.

8. Note bonus: Because of the penalties that notes tend to receive, another pattern that pops up is that very long notes appear, often the entire length of a measure of more. This results in a rather boring subject. To fight this effect, we give a bonus to the subject for every note it contains. This helps to keep the average value of a note positive, and results in many more notes per fugue than when the average value of a note is negative.

9. Out of range penalty: Some subjects would not have pitches differing from each other largely, but they would all end up changing in the same direction, which would result in the subject going very high, or very low. This test checks to see if the pitch of a note is outside of our predefined range, and penalizes it if it is.

Using this fitness function, we run our genetic algorithm for several thousand generations, and then take the subject in our population with the highest fitness, and move on to the next part of our program with it.

Figure 1, example subject (from 5000-7.mid)

Generating the answer and counter-subjects

Before we move on to the answer and counter-subjects, our program first parses through the bins and creates a new representation of the subject in a format better suited to the rest of the program. It creates a Phrase object, which is made up of Note objects, each of which has a pitch and a rhythm value (quarter note, eighth note, etc.). The subject is one Phrase, as is the answer and each counter subject. These Phrases are then added to Parts. There are three Parts in a fugue: soprano, alto, and bass. A different instrument plays each Part. We then add the three Parts to a Score object, which we can then output to a midi file. To do all of this, though, we must first generate the Phrases we will use.

Generating the answer phrase:

The answer is simply the same rhythm as the subject, but with all the notes transposed down a third. This is simple to do once we have the subject.

Generating the counter-subjects:

The counter-subjects are designed to be played alongside the subject and answer, and should complement them nicely. To create a good counter-subject, we parse through the notes in the subject (or answer), looking at the different possible cases in each beat:

2 eighths: To complement this we decide randomly to either join the eighth notes into one quarter note, or split them into 4 sixteenth notes. If they are split into 16^ths, the new notes are created by randomly going up or down in pitch from the eighths.
4 sixteenths: If we have a run of 4 sixteenths, we put 2 eighth notes in the counter-subject, thus accenting the 1^st and 3^rd notes of the run.
1 quarter note: We break quarter notes into 2 eighth notes, in the same fashion that we broke the eighth notes in the first case above.
3 notes: When we have some configuration of 3 notes in a beat, we replace it in the counter-subject with one quarter note to run beside it.
There is also the case where the notes in the subject/answer do not evenly break on beat boundaries. In that case, we randomly decide whether to just place the notes into the counter-subject as they are in the subject/answer, or to break up the note crossing the beat boundary, so we can apply one of the 4 cases above to the beat.

Also, the final generated counter-subject is transposed up one third from the subject/answer it was generated from.

We need to generate 2 counter-subjects to write the fugue, but since our counter-subject generating algorithm has a significant amount of randomness built into it, we can just call it twice (once for the answer, once for the subject), and we get 2 different counter-subjects to use.

Once we have the subject, answer, and counter-subjects, we can then build our little fugue, using the rules of how a fugue should be arranged:

Soprano:	Subject	Counter-Subject 1	Counter-Subject 2
Alto:		Answer	Counter-Subject 1
Bass:			Subject

We then end our fugue with a static final chord (major triad in C). This gives the fugue closure, and works better than trying to encourage the genetic algorithm to produce fugues that cadence properly. This is because the combination of the three parts is not being done with a genetic algorithm.

3 Experimental Evaluation
3.1 Methodology

A major difficulty arises when one tries to tackle the problem of evaluating an artificial-intelligence-based music generation algorithm: what constitutes "good" music can vary widely from person to person. This is clearly illustrated by the fact that some people love classical music, some people like rap, and some people like both. When these people are asked why they like a certain genre of music, or a certain piece, they find it difficult to pin down concrete reasons, or the reasons they give are qualitative. Thus, quantitatively evaluating our fugue generator suffers from the same problems: people somewhat disagree on what makes music sound good, and what people use to describe good music is mostly qualitative.

The fugue format itself causes problems, since what exactly is and is not a fugue is subject to debate. Each of our generated fugues fits the loose technical definition of a fugue as originally presented by Bach, so this cannot be used as a measure of quality.

Despite these problems, we attempted to produce scores for each generated fugue. One metric we used was the combined distance in pitch of each of the penultimate notes in each part (soprano, alto, and bass) from their respective ultimate notes. This was a suitable metric because the penultimate notes were generated by the genetic algorithm but the ultimate notes were not; they were just tacked on the end in an attempt to make the fugue end on a perfect triad chord in the correct key. Therefore, the larger the "jump" (distance in pitch) between the next-to-last and last notes, the more jarring and bad the fugue would sound. At first, we made the fitness function favor the last note of the generated phrase being close to the (known) final note of the piece, but we discovered that it detracted from the overall fugue since it tended to make each section cadence instead of just the final section (this is unfuguelike and undesirable). This metric is thus rather contrived, since to obtain good scores we could just adjust the fitness function.

3.2 Results

Lacking any overall quality metric that would be easy to compute, we decided to rank the fugues we produced with a five point scale of goodness, with 1 being poorest and 5 being best. Data for the following tables is included in the appendix.

Several fugues for each generation-number were generated, and five people were asked to rate each of the fugues on the 1-5 scale. Their ratings for the fugues in each generation-number were averaged, and then all the subjects’ ratings were averaged together (this is what appears in the graph above). The graph shows the subjects’ general affinity for fugues generated with 5000 iterations over the other fugues.

This graph shows that 5000-iteration fugues also had the highest overall percentage of "4" or "5" ratings from all the subjects.

Additionally, we took data on fitness values produced for different generation (iteration) sizes. A table of this data is contained in the appendix. The following chart shows a steep rising slope for low numbers of generations, but then a leveling off after about 3000 generations. This suggests that running the algorithm for longer than 3000 gives only a trivial amount of improvement.

3.3 Discussion

Despite having seemingly introduced contrived adjustments to the fugue generator’s fitness function, we think our generator still produces good sounding fugues. The individual affinities we included in the fitness function, such as concerning pitch locality, rhythm patterns, et cetera, are not especially noticeable in the completed fugue unless they are pointed out beforehand.

4 Related Work

Little serious effort has been put into generating computer music "from scratch", as we have attempted to do here. Some use a provided chord progression to create a musical line; others may seed a genetic algorithm with actual musical phrases. Our project is the only one we know of that actually starts out with completely random notes in the initial population.

Other artificially generated music projects only attempt to make a simple musical piece and do not fit the rigid structure of a specific musical format such as the fugue. One particularly interesting attempt shows the musical piece graphically in a web browser as the genetic algorithm is generating it.*

5 Future Work

The fugue generator does not produce complete, robust fugues. We did not attempt to introduce episodes, which are loosely based on the fugue subject and drift from key to key. These do not lend themselves nicely to simple heuristics that can be simply incorporated into the fitness function. Perhaps a different tool than genetic algorithms would be better to use here.

The level of rhythmic and pitch complexity is rudimentary in our generated fugues. More could be done to add modulations, an appropriate tempo, and instrument ("program" in MIDI) selection.

6 Conclusion

We believe this project illustrates the power of using artificial intelligence techniques in a field normally reserved for humans with special inspiration – inspiration for composing music. Obviously, no one will be fooled that our generated fugues were composed outright by a human, but the potential is clearly there. This project adds legitimacy to the field, and while it will not be putting composers out of business any time soon, it could become a valuable composition tool.

Appendix

Four MIDI files have been included as example generated fugues. The first number in the filename is the number of iterations the genetic algorithm ran. The second number is simply a sequence label. Listen for each of the three parts to enter after each of four measures, and notice that each newly entering part has the original fugue theme introduced by the solitary clarinet at the beginning.

Average fitness function values for selected numbers of generations

generations	fitness
100	2178.1
200	3119.5
300	3233.7
400	3251.6
500	3312.1
600	3319.7
700	3330.5
800	3359.3
900	3393
1000	3387.9
2000	3470.4
3000	3548.9
4000	3561.4
5000	3594.5
6000	3619
7000	3649.9

Average scores (ranging from 1 to 5)

generations \| human subject #	1	2	3	4	5	average	% rated good (4 or 5)
1000	2.666	2.333	2.333	4	4	3.0664	0.466667
2000	2.333	2	3.333	2	2.666	2.4664	0.133333
3000	2.666	3.666	3.333	2.333	3	2.9996	0.333333
4000	4.666	3.666	3	2.333	3.666	3.4662	0.466667
5000	2.666	4.333	4.333	4	4	3.8664	0.8
6000	3.666	2.666	4.333	2.333	4	3.3996	0.466667
7000	3	2.333	2.666	3.666	2	2.733	0.333333