# (Java) Fast code to compare two Strings to see if one is contained in the other



## Lenny1882 (Nov 22, 2007)

I thought I'd try my hand at the famous infinite monkey theorem, and have built a Java program that randomly generates characters and outputs them to a file. I've got a dictionary of 58,000 words which I check against the string in

turn to see if they are contained in it.

I've managed to optimise the code that generates characters so that it generates one million characters in 0.4s (on my machine - on others it may vary) and can do so for as long as desired (achieved by writing the string of characters

in groups of 175 to a file, clearing the string, and repeating the process), but for the million characters, the original algorithms to compare the dictionary words with the generated characters took over a minute - I see little point

in having a program that can generate one million characters in 0.4s if it takes a minute to check them. I later hit upon the problem that if I tried to read a significant number of characters (~13,850,000 --- 13.6mb) from the text

file I'd run out of heap space (schoolboy error, I know).

I've spent about twelve hours trying to get the comparison part of the code running as quickly as the generating part:

- Split the generated characters into a string of a set length (from 175 characters to 50,000). Once the string was full, I checked the dictionary words against it, recorded any matches, cleared the string, and repeated the process.

- Split the generated characters into strings of a set length (again, 175 - 50,000), add the strings to an ArrayList when full (I also tried t with an array - divide the number of generated characters by the length of each string and

add 1 if (mod1) == 0 to get the dimension of the array), and comparing each dictionary word with each element in the ArrayList once all the generated characters had been added to a string (much like above, except the comparison comes

once everything is done, rather than on the fly)

I've tried variations on the same themes, but to no avail. Instead, the comparisons seem to have become slower. I'm going to try comparing the characters as they're generated (I doubt it'll do much, but it doesn't hurt to try).

So, the question: Any ideas as to how I can speed it up? Examples of code would be great if you have them (doesn't need to be in Java - I'm only doing it in Java as it's the language we use at Uni, and the program can then be used as

my [formative] personal project), but a good explanation should be enough for me to work it out if you don't.

Here are the main bits of code (I think I've taken out all of the UI elements). Sorry that it's a bit messy - I haven't had the need to clean it up (by all means, suggest things that can be changed if you really have nothing else to

do!):


```
import java.util.*;	//For Random Number Generator (RNG), ArrayList
import java.io.*;		//For file reading/writing
import javax.swing.*;		//For UI objects
import java.awt.*;		//For UI objects


public class RandomChar {
	
	public static String s = "";
		//String the generated characters will be added to
	

	/**
	 * Main method
	 * 
	 * Call: Get the NUMBER of CHARACTERS to be calculated
	 * Call: Get the NAME of the MONKEY
	 * 
	 * Call: GENERATE the CHARACTERS
	 * Call: WRITE the generated STRING to a file
	 * Call: COMPARE generated STRING with dictionary
	 * 
	 * @throws IOException
	 */
	public static void main(String[] args) throws IOException {
				
		int x = getNumChars();
		String name = getMonkeyName();

		generateChars(x);
		compare(x, name);
		
		System.out.println("Done");
	}
	
	
	/**
	 * Finds and returns the directory that the executable JAR
	 * file is in.
	 * 
	 * @return String
	 * @throws IOException
	 */
	public static String getDir() throws IOException {
		File dir1 = new File(".");		
		return(dir1.getCanonicalPath());
	}
	
	
	/**
	 * Gets NUMBER of CHARACTERS to GENERATE from User
	 * Validates
	 * 
	 * @return integer
	 */
	public static int getNumChars() {
		
		boolean error = true;
		String sNumChar = "";
		int iNumChar = 0;
		
		do{
    		try{
    			sNumChar = JOptionPane.showInputDialog( . . . );
	        	iNumChar = Integer.parseInt(sNumChar);
	        }catch(NumberFormatException e){
	        	if(sNumChar == null){
	        		iNumChar = 10000;
	        	}else{
	        		JOptionPane.showMessageDialog( error message );	
	        	}
	        }
                	
	        if((iNumChar > 2147483647) || (iNumChar < 10) && iNumChar != 0){
	        	JOptionPane.showMessageDialog( error message );
	        }else if((iNumChar <= 2147483647) && (iNumChar >= 10)){
	        	error = false;
	        }
		}while(error);
		
		return iNumChar;
	}
	
	
	/**
	 * Gets MONKEY NAME from User
	 * 
	 * @return String
	 */
	public static String getMonkeyName() {
		
		String name = JOptionPane.showInputDialog( . . . );
		
		if((name == null) || (name.equals(""))){
			name = "Your monkey";
		}
		
		return name;
	}
	
	
	/**
	 * Calls: RNG to GENERATE CHARACTERS
	 * Calls: makeSentence to ADD CHARACTERS to STRING
	 * 
	 * @param iNumChars
	 */
	public static void generateChars(int iNumChars) throws IOException {
		
		BufferedWriter writer = new BufferedWriter(new FileWriter(
                  new File(getDir() + "\\temp.txt")));
			
		int i = 0;
		
		for(int j = iNumChars; j > 0; j--){
			String a = nextChar();
			makeSentence(a);
			i++;
			
			if(i == 175){
				writer.write(s);
				s = "";
				i = 0;
			}
		}
		
		writer.write(s);
		writer.close();
	}
	
	
	/**
	 * Uses RNG to GENERATE NUMBER between 0 and 94
	 * Adds 32 to NUMBER GENERATED so between 32 and 126
	 * Converts decimal number to ASCII equivalent
	 * 
	 * Dec:  32 == ASCII: SPACE
	 * Dec: 126 == ASCII: ~
	 * 
	 * http://www.asciitable.com/
	 * 
	 * @return String
	 */
	public static String nextChar() {
		Random r = new Random();
		int i = r.nextInt(95) + 32;
		
		String b = "" + (char)i + "";
				
		return b;
	}
	
	
	/**
	 * Adds LAST CHARACTER GENERATED to STRING
	 * 
	 * @param a
	 */
	public static void makeSentence(String a) {
		s += "" + a + "";
	}

}
```
The comparison code (shown below, with bits of different things commented out) uses BufferedReaders to read the generated characters from the file they're written to (adding characters to strings and keeping them in memory are fine,

until the strings become millions of characters long), and indexOf() to check if the dictionary word (retrieved from dictionary.txt) is within the string.

The code in there works to a degree (it checks and takes its time about it, but it works. It just doesn't write the found words to the text document. I can fix that in my own time).


```
/**
	 * Gets LINE from dictionary.txt
	 * Each line is a single word
	 * Compares LINE with STRING - is LINE contained in STRING?
	 * 		- returns -1 if FALSE
	 * 		- returns other value (0? 1?) if TRUE
	 * 
	 * If LINE in STRING, appends written FILE
	 * 
	 * @throws IOException
	 */
	public static void compare(int iNumChar, String n) throws IOException {
		
		File temp = new File(getDir() + "\\temp.txt");
		File temp2 = new File(getDir() + "\\temp2.txt");
		File monkey = new File(getDir() + "\\monkey.txt");
		File dictionary = new File(getDir() + "\\dictionary.txt");
		
		int counter = 0;
		String line = "";
		
		BufferedReader in;
		BufferedReader mnky = new BufferedReader(new FileReader(temp));
		
		BufferedWriter writer = new BufferedWriter(new FileWriter(monkey));
		BufferedWriter writer2 = new BufferedWriter(new FileWriter(temp2));
		
		writer.write(n + " has typed: ");
		writer.newLine(); writer.newLine();
		
		try{
			in = new BufferedReader(new FileReader(dictionary));
		}catch(FileNotFoundException e){
			JOptionPane.showMessageDialog( File not found error message );
			System.exit(1);
		}
		
		in = new BufferedReader(new FileReader(dictionary));
		
		char t;
		String g = "";
		int j = 0;
		
		ArrayList<String> chars = new ArrayList<String>();
		
		int a = 0;
		
		for(int i = 0; i < iNumChar; i++){
			
			t = (char) mnky.read();
			
			a = Integer.valueOf(t).intValue();
			
			if((a > 64) && (a < 91) || (a > 96) && (a < 123)){
				g += t;
				g = g.toLowerCase();
				j++;
			}else{
				if(j > 2){
					chars.add(g);
				}
				
				g = "";
				j = 0;
			}
			
//			g += t;
//			j++;
//			
//			if((j > (block - 6)) && (j <= block)){
//				r += t;
//			}
//			
//			if(j == block){
//				
//				writer.write(g);
//				
//				if(i > block){
//					String tmp = g;
//					g = "" + r + tmp + "";
//				}
//				
//				g = g.toLowerCase();
//				while((line = in.readLine()) != null){
//					if(line.length() > 2){
//						if(g.indexOf(line) != -1){
//							writer2.write(line);
//							writer2.newLine();
//							counter++;							
//						}
//					}	
//				}
//				
//				
//				j = 0;
//				g = "";
//				r = "";
//				
//				in.close();
//				in = new BufferedReader(new FileReader(dictionary));
//			}
		}
		
		writer.write(g);
		
		writer.newLine(); writer.newLine();
		writer.write("Within this mess, our team of specialists found:");
		writer.newLine(); writer.newLine();

		Iterator<String> it = chars.iterator();
		
		int i = 0;
		
		boolean found = false;
		
		ArrayList<String> dict = new ArrayList<String>();
		
		while(in.ready()){
			line = in.readLine();
			if(line.length() > 2){
				dict.add(line);
				i++;
			}
		}
		
		for(String word : dict){
			for(String string : chars){
				if(!found){
					if(string.indexOf(word) != -1){
						writer2.write(word);
						writer2.newLine();
						counter++;
						found = true;
					}
				}
			}
			found = false;
		}
		
		if(counter > 1){
			writer.write("That's a whole " + counter + " words!");
		}else if(counter == 1){
			writer.write("That's a whole " + counter + " word!");
		}else if(counter == 0){
			writer.write("Absolutely nothing. =(");
		}
		
		writer.close();
	}
```
A 'working' copy with the heap problem can be found here if you want to watch it run.

If you want to try the code yourself, you'll need dictionary.txt, which is in the RAR file linked to above.

Thanks.

EDIT: Sorry about the code boxes being a bit wide. If I put code on new lines it's harder to follow, so I'll leave them as they are.

EDIT2: Suppose I should have mentioned that I'm going to give threads a go tomorrow - split the dictionary into a number of smaller pieces, and compare the smaller pieces with the generated string all at the same time.


----------



## Lenny1882 (Nov 22, 2007)

Well... anyone?

Multiple threads made no difference to the speed, alas, so I'm back to square one.


----------



## lotuseclat79 (Sep 12, 2003)

Hi Lenny1882,

If you are using Shared Memory with multiple processors, then if you split the difference evenly between the threads (for as many processors as you use), you would probably see a linear speed up or thereabouts.

Note: Access to the shared memory data structure would be controlled by a lock.

If I were comparing two strings to see if one were contained in the other there are a number of ways to approach the problem. Scan left-right, or right-left and if you were able to use four threads with two processors, launch a left-right and right-left scan of each string against the other - first one done completes and ends the search. Better performance with two strings and four processors.

It also does not make sense to scan for a larger string in a smaller string - so, that choice would be excluded based on length of the strings.

-- Tom


----------

