Parallel Programming

For this assignment, you are expected to implement a parallel programming scheduler. The task of the scheduler is to decide how to use the inter-process communication most effectively to crack an RSA key using a brute force technique.

Cracking the RSA Key

An RSA key is just a large number that is a product of two large distinct primes, this number is the public key used for most encryption systems on the web today. Cracking the RSA key is a matter of finding the two prime numbers used to generate the key; normally these prime numbers are very large but for this assignment they will be small and therefore breakable.

The brute force technique is very simple algorithmic attack on the key, the algorithm tries every prime number combination on the key until it finds the two primes that are the greatest common divisor (kind of like trial and error).

Example:
N will be the variable of the key you are trying to crack.
P and Q will be the variables used to attack N.

In a small example of this attack a number N will be generated for you, like the number 77. Your program would then try to crack this number by searching all the prime numbers that could be the greatest common divisor of this number. First it might try 3 and 5 (3*5=15, 15!=77), then 3 and 7 (3*7=21, 21!=77) and so on until if find a combination of primes like 7 and 11 that produce N. So the solution would be P=7 and Q=11, this solves for our key N so we could now decrypt any messages that were encrypted using N.

Your algorithm doesn't need to check prime values greater than the square root of N.

So as you can see the "attack" is nothing more that trial and error division against the number N. This attack will take a long time for one process on one machine to complete, this is why you will be creating a parallel process program that can break the key. You program will be run only on one machine for testing, but because you are using the MPI interface it can be configured to run on a distributed network of machines.

MPI Programming

The MPI programming standard is for the Message Passing Interface, it's a simple way of passing data between programs through a standard API. We're using the MPI-1 standard which also has some things from MPI-2 in it, but you can use the MPI-1 Docs as your reference.

The LAM/MPI implementation is available on most Linux distributions if you'd like to install it yourself there are RPMS and ebuilds.

Here is an example of how to calculate π using the MPI toolkit. Note the for loop algorithm for splitting the calculations up among the processes. It would be a good idea to search google for more tutorials on MPI programming as you'll need to familiar with the MPI interface to complete this assignment.

		
#include "mpi.h" 
#include <math.h>

int main (int argc, char ** argv[]) { 

  int done = 0, n, myid, numprocs, i, rc; 
  double PI25DT = 3.141592653589793238462643; 
  double mypi, pi, h, sum, x, a;

  /*  MPI_Init always takes &argc and &argv and looks for
      MPI specific arguements like -np <num processors> */ 
  MPI_Init(&argc,&argv);

  /* this says how processes are running - related to 
     the -np ; if don't specify then just get 1  */  
  MPI_Comm_size(MPI_COMM_WORLD,&numprocs); 

  /* each process 0 to <numprocesses -1> gets a unique id */
  MPI_Comm_rank(MPI_COMM_WORLD,&myid); 
  
  printf("this is node %d of %d total\n", myid, numprocs);

  while (!done) 
    { 
      /* only process 0 asks the user for input */
      if (myid == 0) { 
	printf("Enter the number of intervals: (0 quits) "); 
	scanf("%d",&n); 
      } 

      /* process 0 sends the value of n to all other processes */  
      MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); 

      /* if user entered 0 then break */
      if (n == 0) break; 
      
      /* each process takes a different portion of the calculation */
      h = 1.0 / (double) n; 
      sum = 0.0; 

      /* Example if n is 10 and numprocs is 5: 
         0 will take 1,6
         1 will take 2,7
	 2 will take 3,8
	 4 will take 4,9
	 5 will take 5,10
	 */ 
      for (i = myid + 1; i <= n; i += numprocs) { 
	x = h * ((double)i - 0.5); 
	sum += 4.0 / (1.0 + x*x); 
      } 

      mypi = h * sum;

      printf("node %d: local pi value is %.16f\n", myid, mypi);

      /* this sums together every processes mypi variable and
         puts it into the the pi variable */  
      MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

      /* only process 0 prints it out */
      if (myid == 0) 
	printf("pi is approximately %.16f, Error is %.16f\n", 
	       pi, fabs(pi - PI25DT)); 

    } 

    MPI_Finalize();
}

You can save this code to a file, call it pi.c. To compile this code use the mpicc command like this: mpicc pi.c. Now when you want to run the code use mpirun like this: mpirun -np 5 a.out. This will run 5 processes of the program running in parallel. This method of finding π here is explained in this email thread.

You'll need to run the command lamboot before you can use mpirun

Try printing out values like myid, numprocs, mypi, and pi at different sections of the code to get an understanding for how this works.

The mpirun -np 5 prog command causes 5 processes to be spawned off running the prog command. The processes are run on the same machine, but they could be spread to other machines and would be completely unaware of the change.

The MPI toolkit works similarly to how the pthread library works, with blocking and non-blocking calls. However MPI is be done entirely with different processes instead of different threads, the advantage here is that unlike threads these different processes can be run across the network to other machines and still communicate as if they were all on a single machine.

Important Note: The MPI libraries for this assignment are only installed in csug04.csuglab.cornell.edu and csug05.csuglab.cornell.edu. Attempts to use other Computer Science Undergraduate Lab Linux machines are unlikely to be successful.

The Assignment

Write a program using the MPI toolkit which distributes the attack on an RSA key between several n processes.

You will be given a function gen_num() that will generate an N value for you. Then it's up to you to determine how you will distribute the attack against N. You'll need to read the docs on GNU MP to see how to use the mpz_t types, they can't just be printed out or manipulated with normal arithmetic operations. The web page describes how to use the functions and almost all the library functions you'll need are already in the gen_num() function.

Take the mpi tar file and unpack somewhere. tar -xvf mpi.tar. You can test out a few different MPI programs in there simply using the make command. Try out these examples included:

make simple
make run-simple
make pi
make run-pi

You'll be working with the popsys.c file in this assignment. There are comments provided in the code for you to work with. The makefile will allow you to make and make run to test your popsys.c program.

Write Up

Answer the following questions:

Discuss what portions of the code are parallelized and which portions still execute on a single machine. Are there any calculations that are performed on all machines that could be performed on just one of the machines?
Can you achive superlinear speedup on this assignment? Why or why not? How fast does it run on 1 machines? on 2-5 machines? Calculate the speed-up.
On a single machine does it do you any good to use multiple processes?
If possible, try running it on some slower machines and some faster machines. How many slow machines does it take to run as fast as 1 fast machine?
As you increase number of MOD_SIZE bits from X to Y in N, how much longer does it take to crack it? (You can test this by passing in different MOD_SIZE values on the command line like this: mpirun -np 10 popsys 10)
How much does the crack time vary on N of same bits (lucky or not?) For computing the speedup, how important is to use the same N for the different runs?

To Hand in

You'll need to hand in the following:

tar file with your source files and Makefile.
Writeup detailed above