CS 473 Artificial Intelligence Project

Fugue Generation with Genetic Algorithms

Eric Milkie, Joel Chestnutt

Cornell University

 

1 Introduction

Our problem was to be able to create original pieces of music, fugues, from scratch. Fugues have a recurring melody, known as the subject, which is repeated throughout the piece, and the other parts of the fugue (answer and counter-subjects) are then based around the subject, within certain rules and guidelines. To be able to create fugues, our program first had to generate a unique subject, and then be able to create an answer, and counter-subjects from the subject, and finally be able to put them all together into a complete fugue. Music composition is an area that computers have not had much success in, largely due to the fact that it’s hard to define exactly what "sounds good", and be able to create something that is different, yet which also sounds good. Our system is able to successfully generate subjects, and from there create the beginning of a fugue, but is unable to generate complete fugues, due to the complexity involved. And while many of the pieces generated by our system have been rated highly by our test listeners, they are still not competition for a human composer.

 

2 Problem Definition and Task

 

2.1 Task Definition

Our program will output a brief fugue, as well as report the fitness of the subject used, and report the results from our evaluation metrics. For inputs, we simply changed the values of the constants in the program to whatever we wished to try. We could can the size of the population in the genetic algorithm, the number of generations it will run, and we can adjust all the weights in our fitness function.

 

2.2 Algorithm Definition

There are two main parts to our program: subject generation, and fugue generation from the subject. The first part is accomplished through the use of a genetic algorithm that tries random subjects, and then mutates and crosses them to slowly improve them into something decent. For the genetic algorithm code, we used a pre-built genetic algorithm structure and customized it to work with music by extending it to use our data structure and fitness function. For MIDI file production, we used a Java music production system called JMusic, available for free.

GA data structure:

The entire subject is represented as a bit string, where every 9 bits is one "bin", which represents a 16th note. The entire subject is 4 measures long, so the subject is 64 bins long. The advantage of this structure is that every subject in the population is exactly the same size, regardless of the rhythms or amount of notes contained in it.

Within each bin, the first 7 bits represent the pitch of the note, on a midi scale of 0 to 127. The last 2 bits in the bin represent the "style" of the bin. The style is either a link to the next bin, a rest, or the end of a note. In this way, several adjacent bins with the style of link combine to form longer notes. Since there are only 3 style possibilities, and 4 possible permutations of the 2 style bits, we allow 2 of the permutations to represent the link style, so that our initial random subjects, and mutations that occur will tend to favor linking notes together, so that we do not simply end up with lots of 16th notes.

We wanted to allow any possible combination of bits to be a legal subject, so we had to define the following cases:

GA Fitness Function:

Our fitness function parses through the bins, picks out the notes, and looks at different features of the subject. It either encourages or discourages them by adding to or subtracting from the fitness value of the subject. Below are the features evaluated:

1. Similar pitch: To avoid the tune jumping around randomly between distant notes, each note is discouraged being different from its predecessor. There is no penalty for a difference of up to 2 whole steps away, but beyond that, the subject is penalized proportionally to the square of the difference between each note and its predecessor.

2. Similar note length: This test compares a note’s length to the length of the previous note in the subject, and encourages them to be alike. This helps lend continuity to the piece by encouraging 16th or 8th note runs, or slower quarter notes to be grouped together, rather than the rhythm switching around arbitrarily.

3. Power of 2: The next test is whether a note’s length is a power of 2 or not. If it is, the subject is rewarded. The goal of this reward is to encourage more 16th, 8th, quarter, and half notes in a piece, rather than dotted eighth and dotted quarter notes. These other notes will still appear, but will be less common, helping to piece to sound more regular.

4. Tie over a bar line: We discourage notes from being held over between measures (the last bin in a measure’s style tells us this). This helps keep the measures separate and distinct, as they should be.

5. Notes start on 1st or 3rd bin of a beat: By encouraging notes that start on the 1st or 3rd bins of a beat, we encourage a regular beat throughout the subject. This keeps the subject from sounding syncopated or lost, because it tends to come back to eighth not boundaries (the first and third bins in each set of 4).

6. In key: We reward notes for being in the key of C. This goes a long way toward making the subject sound better. It removes the majority of sharps and flats from the subject, making it much nicer to listen to.

7. Not a rest: Since rests have none of the penalties of the normal notes (pitch being the main penalty), they tend to accumulate in subjects. This test rewards a subject for each bin that is not a rest, thus combating the penalties of the notes and keeping the subject from becoming overrun with rests.

8. Note bonus: Because of the penalties that notes tend to receive, another pattern that pops up is that very long notes appear, often the entire length of a measure of more. This results in a rather boring subject. To fight this effect, we give a bonus to the subject for every note it contains. This helps to keep the average value of a note positive, and results in many more notes per fugue than when the average value of a note is negative.

9. Out of range penalty: Some subjects would not have pitches differing from each other largely, but they would all end up changing in the same direction, which would result in the subject going very high, or very low. This test checks to see if the pitch of a note is outside of our predefined range, and penalizes it if it is.

Using this fitness function, we run our genetic algorithm for several thousand generations, and then take the subject in our population with the highest fitness, and move on to the next part of our program with it.

Figure 1, example subject (from 5000-7.mid)

Generating the answer and counter-subjects

Before we move on to the answer and counter-subjects, our program first parses through the bins and creates a new representation of the subject in a format better suited to the rest of the program. It creates a Phrase object, which is made up of Note objects, each of which has a pitch and a rhythm value (quarter note, eighth note, etc.). The subject is one Phrase, as is the answer and each counter subject. These Phrases are then added to Parts. There are three Parts in a fugue: soprano, alto, and bass. A different instrument plays each Part. We then add the three Parts to a Score object, which we can then output to a midi file. To do all of this, though, we must first generate the Phrases we will use.

Generating the answer phrase:

The answer is simply the same rhythm as the subject, but with all the notes transposed down a third. This is simple to do once we have the subject.

Generating the counter-subjects:

The counter-subjects are designed to be played alongside the subject and answer, and should complement them nicely. To create a good counter-subject, we parse through the notes in the subject (or answer), looking at the different possible cases in each beat:

Also, the final generated counter-subject is transposed up one third from the subject/answer it was generated from.

We need to generate 2 counter-subjects to write the fugue, but since our counter-subject generating algorithm has a significant amount of randomness built into it, we can just call it twice (once for the answer, once for the subject), and we get 2 different counter-subjects to use.

Once we have the subject, answer, and counter-subjects, we can then build our little fugue, using the rules of how a fugue should be arranged:

Soprano:

Subject

Counter-Subject 1

Counter-Subject 2

Alto:

 

Answer

Counter-Subject 1

Bass:

   

Subject

We then end our fugue with a static final chord (major triad in C). This gives the fugue closure, and works better than trying to encourage the genetic algorithm to produce fugues that cadence properly. This is because the combination of the three parts is not being done with a genetic algorithm.

 

3 Experimental Evaluation

3.1 Methodology

A major difficulty arises when one tries to tackle the problem of evaluating an artificial-intelligence-based music generation algorithm: what constitutes "good" music can vary widely from person to person. This is clearly illustrated by the fact that some people love classical music, some people like rap, and some people like both. When these people are asked why they like a certain genre of music, or a certain piece, they find it difficult to pin down concrete reasons, or the reasons they give are qualitative. Thus, quantitatively evaluating our fugue generator suffers from the same problems: people somewhat disagree on what makes music sound good, and what people use to describe good music is mostly qualitative.

The fugue format itself causes problems, since what exactly is and is not a fugue is subject to debate. Each of our generated fugues fits the loose technical definition of a fugue as originally presented by Bach, so this cannot be used as a measure of quality.

Despite these problems, we attempted to produce scores for each generated fugue. One metric we used was the combined distance in pitch of each of the penultimate notes in each part (soprano, alto, and bass) from their respective ultimate notes. This was a suitable metric because the penultimate notes were generated by the genetic algorithm but the ultimate notes were not; they were just tacked on the end in an attempt to make the fugue end on a perfect triad chord in the correct key. Therefore, the larger the "jump" (distance in pitch) between the next-to-last and last notes, the more jarring and bad the fugue would sound. At first, we made the fitness function favor the last note of the generated phrase being close to the (known) final note of the piece, but we discovered that it detracted from the overall fugue since it tended to make each section cadence instead of just the final section (this is unfuguelike and undesirable). This metric is thus rather contrived, since to obtain good scores we could just adjust the fitness function.

 

3.2 Results

Lacking any overall quality metric that would be easy to compute, we decided to rank the fugues we produced with a five point scale of goodness, with 1 being poorest and 5 being best. Data for the following tables is included in the appendix.

Several fugues for each generation-number were generated, and five people were asked to rate each of the fugues on the 1-5 scale. Their ratings for the fugues in each generation-number were averaged, and then all the subjects’ ratings were averaged together (this is what appears in the graph above). The graph shows the subjects’ general affinity for fugues generated with 5000 iterations over the other fugues.

This graph shows that 5000-iteration fugues also had the highest overall percentage of "4" or "5" ratings from all the subjects.

Additionally, we took data on fitness values produced for different generation (iteration) sizes. A table of this data is contained in the appendix. The following chart shows a steep rising slope for low numbers of generations, but then a leveling off after about 3000 generations. This suggests that running the algorithm for longer than 3000 gives only a trivial amount of improvement.

3.3 Discussion

Despite having seemingly introduced contrived adjustments to the fugue generator’s fitness function, we think our generator still produces good sounding fugues. The individual affinities we included in the fitness function, such as concerning pitch locality, rhythm patterns, et cetera, are not especially noticeable in the completed fugue unless they are pointed out beforehand.

 

4 Related Work

Little serious effort has been put into generating computer music "from scratch", as we have attempted to do here. Some use a provided chord progression to create a musical line; others may seed a genetic algorithm with actual musical phrases. Our project is the only one we know of that actually starts out with completely random notes in the initial population.

Other artificially generated music projects only attempt to make a simple musical piece and do not fit the rigid structure of a specific musical format such as the fugue. One particularly interesting attempt shows the musical piece graphically in a web browser as the genetic algorithm is generating it.*

 

5 Future Work

The fugue generator does not produce complete, robust fugues. We did not attempt to introduce episodes, which are loosely based on the fugue subject and drift from key to key. These do not lend themselves nicely to simple heuristics that can be simply incorporated into the fitness function. Perhaps a different tool than genetic algorithms would be better to use here.

The level of rhythmic and pitch complexity is rudimentary in our generated fugues. More could be done to add modulations, an appropriate tempo, and instrument ("program" in MIDI) selection.

 

6 Conclusion

We believe this project illustrates the power of using artificial intelligence techniques in a field normally reserved for humans with special inspiration – inspiration for composing music. Obviously, no one will be fooled that our generated fugues were composed outright by a human, but the potential is clearly there. This project adds legitimacy to the field, and while it will not be putting composers out of business any time soon, it could become a valuable composition tool.

 

Appendix

Four MIDI files have been included as example generated fugues. The first number in the filename is the number of iterations the genetic algorithm ran. The second number is simply a sequence label. Listen for each of the three parts to enter after each of four measures, and notice that each newly entering part has the original fugue theme introduced by the solitary clarinet at the beginning.

4000-6.mid

5000-7.mid

5000-8.mid

6000-8.mid

Average fitness function values for selected numbers of generations

generations

fitness

100

2178.1

200

3119.5

300

3233.7

400

3251.6

500

3312.1

600

3319.7

700

3330.5

800

3359.3

900

3393

1000

3387.9

2000

3470.4

3000

3548.9

4000

3561.4

5000

3594.5

6000

3619

7000

3649.9

Average scores (ranging from 1 to 5)

generations | human subject #

1

2

3

4

5

average

% rated good (4 or 5)

1000

2.666

2.333

2.333

4

4

3.0664

0.466667

2000

2.333

2

3.333

2

2.666

2.4664

0.133333

3000

2.666

3.666

3.333

2.333

3

2.9996

0.333333

4000

4.666

3.666

3

2.333

3.666

3.4662

0.466667

5000

2.666

4.333

4.333

4

4

3.8664

0.8

6000

3.666

2.666

4.333

2.333

4

3.3996

0.466667

7000

3

2.333

2.666

3.666

2

2.733

0.333333