LimeWire Collection Component API

org.limewire.collection
Class ApproximateMatcher

java.lang.Object
  extended by org.limewire.collection.ApproximateMatcher

public final class ApproximateMatcher
extends Object

Determines if two strings are considered "approximately equal" if one string can be transformed into the other string through some series of inserts, deletes and substitutions.

The approximate matcher has options to ignore case and whitespace. Also, ApproximateMatcher has switches to compare strings backwards and reuse a buffer (which can make ApproximateMatcher perform better). However, the switches do not affect the matching methods directly; the switches only affect the results of the process(String) method. process method preprocesses strings before passing to match(String, String). A typical use:

       String s1, s2;
       ApproximateMatcher matcher=new ApproximateMatcher();
       matcher.setIgnoreCase(true);
       matcher.setCompareBackwards(true);
       String s1p=matcher.process(s1);         //pre-process s1
       String s2p=matcher.process(s2);         //pre-process s2
       int matches=matcher.match(s1p, s2p);    //compare processed strings
       ...

This design (calling process and then match) reduces the pre-processing overhead when a string is matched against many other strings. Preprocessing is required to support the ignoreWhitespace method option; it is not possible to do the k-difference dynamic programming algorithm efficiently in one pass.

Note this class is not thread-safe when you use the buffering constructor, ApproximateMatcher(int).

    ApproximateMatcher am = new ApproximateMatcher();
    am.setIgnoreCase(true);
    String s1 = am.process("Ireland");
    String s2 = am.process("Iceland");

    System.out.println("Number of insertions, deletions or replacements " +am.match(s1, "land"));

    if(am.matches(s1, s2, 1))
        System.out.println(s1+ " and " + s2 + " match.");
    Output:
        Number of insertions, deletions or replacements 3
        ireland and iceland match.


Constructor Summary
ApproximateMatcher()
           
ApproximateMatcher(int size)
          Like ApproximateMatcher() except that the new matcher can compare strings of the given size without any significant allocations.
 
Method Summary
 int match(String s1, String s2)
          Returns the edit distance between s1 and s2.
 boolean matches(String s1, String s2, float precision)
          Returns true if s1 can be transformed into s2 without changing more than the given fraction of s1's letters.
 boolean matches(String s1, String s2, int maxOps)
          Returns true if the edit distance between s1 and s2 is less than or equal to maxOps.
 String process(String s)
          Returns a version of s suitable for passing to match(..).
 void setCompareBackwards(boolean compareBackwards)
           
 void setIgnoreCase(boolean ignoreCase)
           
 void setIgnoreWhitespace(boolean ignoreWhitespace)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ApproximateMatcher

public ApproximateMatcher()

ApproximateMatcher

public ApproximateMatcher(int size)
Like ApproximateMatcher() except that the new matcher can compare strings of the given size without any significant allocations. This is a useful optimization if you need to make many comparisons with one matcher. The matcher will still be able to compare larger strings, but it will require an allocation. The buffer is not released until this is garbage collected. This method breaks thread safety; only one match(..) call can be done at a time with a matcher created by this constructor.

Method Detail

setIgnoreCase

public void setIgnoreCase(boolean ignoreCase)
Parameters:
ignoreCase - true if and only if case should be ignored when matching processed strings. Default value is false.

setIgnoreWhitespace

public void setIgnoreWhitespace(boolean ignoreWhitespace)
Parameters:
ignoreWhitespace - true if and only if the characters ' ' and '_' should be ignored when matching processed strings. Default value is false.

setCompareBackwards

public void setCompareBackwards(boolean compareBackwards)
Parameters:
compareBackwards - true if and only if the comparison should be done backwards when matching processed strings. This is solely an optimization if you expect more differences at the end of the word than the beginning. Default value is false.

process

public String process(String s)
Returns a version of s suitable for passing to match(..). This means that s could be stripped of whitespace, lower-cased, or reversed depending on the calls to setIgnoreWhitespace, setIgnoreWhitespace, and setCompareBackwards. The returned value may be == to s.


match

public final int match(String s1,
                       String s2)
Returns the edit distance between s1 and s2. That is, returns the number of insertions, deletions, or replacements necessary to transform s1 into s2. A value of 0 means the strings match exactly.

If you want to ignore case or whitespace, or compare backwards, s1 and s2 should be the return values of a call to process(..).


matches

public final boolean matches(String s1,
                             String s2,
                             int maxOps)
Returns true if the edit distance between s1 and s2 is less than or equal to maxOps. That is, returns true if s1 can be transformed into s2 through no more than maxOps insertions, deletions, or replacements. This method is generally more efficient than match(..) if you only care whether two strings approximately match.

If you want to ignore case or whitespace, or compare backwards, s1 and s2 should be the return values of a call to process(..).


matches

public final boolean matches(String s1,
                             String s2,
                             float precision)
Returns true if s1 can be transformed into s2 without changing more than the given fraction of s1's letters. For example, matches(1.) is the same as an exact comparison, while matches(0.) always returns true as long as |s1|>=|s2|. matches(0.9) means "s1 and s2 match pretty darn closely".

If you want to ignore case or whitespace, or compare backwards, s1 and s2 should be the return values of a call to process(..).


LimeWire Collection Component API

Copyright © 2008 Lime Wire LLC. All Rights Reserved.