HL Class Task: Merging two sorted files

One of the past paper questions in the File Organization presentation asks what you would do if you had 10,000 large data objects that you wanted to sort, but you couldn’t read them all into RAM at the same time because they are too big.

The answer to this is to read small batches of data into RAM, sort them there using some sorting algorithm like quicksort, and then write them back to the disk in order. Do this repeatedly and you will have, say, 100 batches of 100 sorted objects. Then you can take pairs of 100 objects and merge them into 200 sorted objects, then take pairs of 200 objects and merge them into 400 objects, 800, 1600, and so on until the whole data set is merged.

The point is that you can merge two sorted lists without reading the lists into RAM first. This is a useful technique.In fact it is one of the HL program dossier mastery factors.

Imagine you have list A and list B. This is the algorithm to merge them in ascending order:

Read an item from A and an item from B off the disk
While there are still items
   Compare the two items you have
   If item A < item B then 
        write item A to the disk
        read an item from A
   If item B < item A
        write item B to the disk
        read an item from B
Loop

In this way you only ever have two items in RAM at the same time.

Task:

  • Create a class FileMerger to perform this merge on the two lists given below. You will need to save the lists to two separate .txt files.
  • The class should have three private class variables of type File called inputFileOne, inputFileTwo and outputFile.
  • The class should have a constructor which takes two file paths and initializes the files to be sorted (ie creates new File objects and assigns them to inputFileOne and inputFileTwo).
  • The class should have a method merge, which merges the inputFileOne and inputFileTwo into outputFile.
  • The file path of outputFile should be a class string constant.

You should email this work to me as a java file (code) and jpeg image (sample output) before the beginning of next Wednesday’s lesson (April 18). The lists you need are below.

Save this data to a file called BlockE.txt

Belmonte, Joseph_Kirby
Cruz, Ma. Victoria
De Villa, Nathan Dela
Doherty, Camila
Evangelista, Antonio
Leber, Todd_Justin
Linell, Shay
Mohd Farid Shah, Aina
Nathan, Abigail_Rose
Nishimori, Megumi_Karla
Ohyama, Shun
Panlilio, Emmanuel_Mikel
Sitorus, Anggiat_Bright
Stray, Carl_Fredrik
Suzuki, Hirokazu
Symes, Alannah
Villafuerte, Vincenzo Renato
Zakaria, Muhammad

Save this data to a file called BlockH.txt

Alterado, Apple
Bengzon, Leandro
Cheng, Chelsea_Anne
Choi, Jun_Hwan
Choudhary, Pratik
Guzman, Mariel_Andrea
Klingaman, Adam_Christophe
Lee, Yong_Seok
Leone, Marco
Mendoza, Reine_Christine
Mitra, Sougata
Paterno, Margarita
Quah, Isabel Pei-Yi
R Pannirsilvam, Sanjeevan
Shull, Erin
Sicat, Matthew
Toppari, Rasmus_Eemeli
Truby, Karissa_Bristyn
Wright, Lillian_Rose
Young, Daniel

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s