Text::Ngrams free download, Text::Ngrams download on software download

	Text::Ngrams Text::Ngrams is a flexible Ngram analysis (for characters, words, and more).
Download

Text::Ngrams Ranking & Summary

Rating:

License:
Perl Artistic License

Price:
FREE

Publisher Name:
Simon Cozens

Publisher web site:
http://search.cpan.org/~simon/Sub-Versive-0.01/Versive.pm

Text::Ngrams Tags

Text::Ngrams Description

Text::Ngrams is a flexible Ngram analysis (for characters, words, and more). Text::Ngrams is a flexible Ngram analysis (for characters, words, and more).SYNOPSISFor default character n-gram analysis of string: use Text::Ngrams; my $ng3 = Text::Ngrams->new; $ng3->process_text('abcdefg1235678hijklmnop'); print $ng3->to_string; my @ngramsarray = $ng3->get_ngrams;One can also feed tokens manually: use Text::Ngrams; my $ng3 = Text::Ngrams->new; $ng3->feed_tokens('a'); $ng3->feed_tokens('b'); $ng3->feed_tokens('c'); $ng3->feed_tokens('d'); $ng3->feed_tokens('e'); $ng3->feed_tokens('f'); $ng3->feed_tokens('g'); $ng3->feed_tokens('h');We can choose n-grams of various sizes, e.g.: my $ng = Text::Ngrams->new( windowsize => 6 );or different types of n-grams, e.g.: my $ng = Text::Ngrams->new( type => byte ); my $ng = Text::Ngrams->new( type => word ); my $ng = Text::Ngrams->new( type => utf8 );To process a list of files: $ng->process_files('somefile.txt', 'otherfile.txt');This module implement text n-gram analysis, supporting several types of analysis, including character and word n-grams.The module Text::Ngrams is very flexible. For example, it allows a user to manually feed a sequence of any tokens. It handles several types of tokens (character, word), and also allows a lot of flexibility in automatic recognition and feed of tokens and the way they are combined in an n-gram. It counts all n-gram frequencies up to the maximal specified length. The output format is meant to be pretty much human-readable, while also loadable by the module.The module can be used from the command line through the script ngrams.pl provided with the package.Limitations:· If a user customizes a type, it is possible that a resulting n-gram will be ambiguous. In this way, to different n-grams may be counted as one. With predefined types of n-grams, this should not happen. For example, if a user chooses that a token can contain a space, and uses space as an n-gram separator, then a trigram like this "x x x x" is ambiguous.· Method process_file does not handle multi-line tokens by default. This can be fixed, but it does not seem to be worth the code complication. There are various ways around this if one really needs such tokens: One way is to preprocess them. Another way is to read as much text as necessary at a time then to use process_text, which does handle multi-line tokens. Requirements: · Perl

Text::Ngrams Related Software