-
Notifications
You must be signed in to change notification settings - Fork 13
usage.coffeescript
Home > Usage > CoffeeScript / JavaScript
Once built, the compiled library consists of three files:
- Consists of the Levenshtein transducer and its direct dependencies.
- Consists of the memoized, Levenshtein distance functions.
- Consists of every file in the library.
The file you are most likely interested in is levenshtein-transducer.js
, which
is the main subject of this document.
Of course, you may use the CoffeeScript files. Here is a brief overview of the structure of the source code:
-
collection/
- Data structures used by the transducer-
dawg.coffee
- Directed Acyclic Word Graph (i.e. MA-FSA, or Minimal Acyclic, Finite-State Automaton) -
max-heap.coffee
- Max heap, used as a priority queue.
-
-
levenshtein/
- Files specifically relating to Levenshtein distance-
builder.coffee
- Constructs Levenshtein transducers. -
distance.coffee
- Memoized functions for finding Levenshtein distance in the supported algorithms -
transducer.coffee
- Driver for the Levenshtein transducer
-
-
util/
- Utility functions, used during testing.-
mutate.coffee
- Randomly-mutates a term -
operations.coffee
- Set of elementary operations that can be performed on a term -
permutations.coffee
- Generates all permutations of a list of elements -
truth-table.coffee
- Generates a truth table, which is used when selecting combinations of parameters.
-
Each file composes its own module within the levenshtein
namespace.
To use the library on your website, reference the desired file from the
<head/>
of your document, like so:
<!DOCTYPE html>
<html>
<head>
<!-- stuff ... -->
<script type="text/javascript"
src="http://dylon.github.com/liblevenshtein/javascripts/2.0.4/levenshtein-transducer.min.js">
</script>
<!-- more stuff ... -->
</head>
<body>
<!-- yet another fancy document ... -->
</body>
</html>
Once the script loads, you should construct a transducer via the Builder Api. I'll give you a sample and then a more in-depth breakdown into its functionality (assuming jQuery is being used):
$(function ($) {
"use strict";
// Maximum number of spelling errors we will allow the spelling candidates to
// have, with regard to the query term.
var MAX_EDIT_DISTANCE = 2;
var completion_list = getCompletionList(); // fictitious method
var builder = new levenshtein.Builder()
.dictionary(completion_list, false) // generate spelling candidates from unsorted completion_list
.algorithm("transposition") // use Levenshtein distance extended with transposition
.sort_candidates(true) // sort the spelling candidates before returning them
.case_insensitive_sort(true) // ignore character-casing while sorting terms
.include_distance(false) // just return the ordered terms (drop the distances)
.maximum_candidates(10); // only want the top-10 candidates
var transducer = builder.transducer();
var $queryTerm = $('#query-term-input-field');
$queryTerm.keyup(function (event) {
var candidates, term = $.trim($queryTerm.val());
if (term) {
// NOTE: You should ALWAYS include the maximum edit distance !!!
candidates = transducer.transduce(term, MAX_EDIT_DISTANCE);
printAutoComplete(candidates); // print the list of completions
} else {
clearAutoComplete(); // user has cleared the search box
}
return true;
});
});
This will give the user autocompletion hints (up to 10 of them) as he types in the search box.
A complete breakdown of the available, builder attributes follows:
dictionary(Array, Boolean) : levenshtein.Builder
- Sets the dictionary for the transducer to the list of items. The list is assumed to be unsorted unless you specify that it is sorted by passing true as the second parameter.
- You can pass an instance of
levenshtein.Dawg
, but that is more of a low-level, data structure. Lists should suite most needs.
algorithm(String) : levenshtein.Builder
- Type of Levenshtein algorithm to use for selecting spelling candidates.
- These are the available algorithms:
"standard"
- Specifies the standard, Levenshtein algorithm should be used, which consists of a penalty of 1 unit for each of the following elementary operations: insertion, deletion and substitution.
"transposition"
- Similar to the
"standard"
algorithm, but also treats transpositions as elementary operations. - In the standard algorithm, a transposition would incur a penalty of 2 units.
- Since transpositions are among the most common spelling errors, it makes sense to only penalize 1 unit per transposition.
"merge_and_split"
- Similar to the
"standard"
algorithm, but also treams merges and splits as elements operations. - A merge of two characters occurs when they should have been one, e.g.
you have read,
"cl"
, when you should have read,"d"
. - A split of a character occurs when it should have been two characters,
e.g. you have read,
"d"
, when you should have read,"cl"
. - This algorithm is useful in OCR applications, where computers may not be able to correctly interpret a character in an image.
sort_candidates(Boolean) : levenshtein.Builder
- Whether the spelling candidates should be sorted before they are returned.
case_insensitive_sort(Boolean) : levenshtein.Builder
- Whether the casing of the terms should be ignored while they are being sorted (if they are being sorted).
include_distance(Boolean) : levenshtein.Builder
- Whether to include the distance from the query term of each spelling candidate.
- If this is true, the spelling candidates are returned as pairs with their
corresponding distances, like so:
[["a", 1], ["bc", 2], ["abd", 3]]
maximum_candidates(Number) : levenshtein.Builder
- Whatever is specified here is the maximum number of spelling candidates that
should be returned, each time
levenshtein.Transducer::transduce(...)
is invoked. - If it is left unspecified, every spelling candidate within the tolerated edit distance from the query term will be returned.
custom_comparator(Function) : levenshtein.Builder
- Useful if you want to change the sort order of spelling candidates (especially if you are restricting the number returned)
- This is an arity-2 function that accepts two spelling candidates with their associated distances (pairs, like above). It should return a value less than 1 if the first candidate is less than the second, a value of 0 if they are equal, and a value greater than 0 if the first candidate is greater than the second.
custom_transform(Function) : levenshtein.Builder
- Useful if you want to perform some sort of custom transformation on the spelling candidates before they are returned, such as converting them to all-caps, etc.
- This is an arity-1 function that accepts a [candidate, distance] pair. It should return whatever you want the corresponding element in the results to be.
default_edit_distance(Number) : levenshtein.Builder
- Specifies the default edit distance to use when one is not passed as the
second parameter to
levenshtein.Transducer::transduce(term,n)
.
Here is an example that randomizes the sort order and capitalizes the characters of the spelling candidates:
var builder = new levenshtein.Builder()
.dictionary(completion_list)
.algorithm("transposition")
//.sort_candidates(true) // only useful if you don't specify your own comparator
//.case_insensitive_sort(true) // only useful if you don't specify your own comparator
//.include_distance(false) // only useful if you don't specify your own transform
.maximum_candidates(10)
.custom_comparator(function (a,b) {
//var term_a = a[0], distance_a = a[1];
//var term_b = b[0], distance_b = b[1];
return Math.random() - 0.5;
})
.custom_transform(function (candidate) {
var term = candidate[0], distance = candidate[1];
return term.toUpperCase();
});
Note that each time you set a property on a levenshtein.Builder
, a new
instance is returned. This reduces bugs and lets you create builder-templates,
but can be a source of confusion if you try to configure a builder, one property
at a time:
// ---------------------------------------- //
// EXAMPLE OF SOMETHING NOT TO DO !!! //
// ---------------------------------------- //
// DO NOT DO THIS !!!
var builder = new levenshtein.Builder();
builder.dictionary(completion_list);
builder.algorithm("transposition");
builder.sort_candidates(true);
builder.case_insensitive_sort(true);
builder.include_distance(false);
builder.maximum_candidates(10);
// YOU HAVE BEEN WARNED !!!
If you do the above, the original builder
object is not modified. At the end
of all the statements, builder
will still have its default values. You can do
the following, if you prefer:
// This is okay, and probably looks familiar to my fellow functional peeps :)
var builder = new levenshtein.Builder();
builder = builder.dictionary(completion_list);
builder = builder.algorithm("transposition");
builder = builder.sort_candidates(true);
builder = builder.case_insensitive_sort(true);
builder = builder.include_distance(false);
builder = builder.maximum_candidates(10);
This syntax comes in helpful when you want to pass builder
to a function that
builds-upon it: builder = determineTransducerParams(builder)
.
liblevenshtein is maintained by @dylon