kgrams

kgrams provides tools for training and evaluating k-gram language models, including several probability smoothing methods, perplexity computations, random text generation and more. It is based on an C++ back-end which makes kgrams fast, coupled with an accessible R API which aims at streamlining the process of model building, and can be suitable for small- and medium-sized NLP experiments, baseline model building, and for pedagogical purposes.

For beginners

If you have no idea about what k-gram models are and didn’t get here by accident, you can check out my hands-on tutorial post on k-gram language models using R at DataScience+.

Installation

Released version

You can install the latest release of kgrams from CRAN with:

install.packages("kgrams")

Development version

You can install the development version from my R-universe with:

install.packages("kgrams", repos = "https://vgherard.r-universe.dev/")

Example

This example shows how to train a modified Kneser-Ney 4-gram model on Shakespeare’s play “Much Ado About Nothing” using kgrams.

library(kgrams)
# Get k-gram frequency counts from text, for k = 1:4
freqs <- kgram_freqs(kgrams::much_ado, N = 4)
# Build modified Kneser-Ney 4-gram model, with discount parameters D1, D2, D3.
mkn <- language_model(freqs, smoother = "mkn", D1 = 0.25, D2 = 0.5, D3 = 0.75)

We can now use this language_model to compute sentence and word continuation probabilities:

# Compute sentence probabilities
probability(c("did he break out into tears ?",
              "we are predicting sentence probabilities ."
              ), 
            model = mkn
            )
#> [1] 2.466856e-04 1.184963e-20
# Compute word continuation probabilities
probability(c("tears", "pieces") %|% "did he break out into", model = mkn)
#> [1] 9.389238e-01 3.834498e-07

Here are some sentences sampled from the language model’s distribution at temperatures t = c(1, 0.1, 10):

# Sample sentences from the language model at different temperatures
set.seed(840)
sample_sentences(model = mkn, n = 3, max_length = 10, t = 1)
#> [1] "i have studied eight or nine truly by your office [...] (truncated output)"
#> [2] "ere you go : <EOS>"                                                        
#> [3] "don pedro welcome signior : <EOS>"
sample_sentences(model = mkn, n = 3, max_length = 10, t = 0.1)
#> [1] "i will not be sworn but love may transform me [...] (truncated output)" 
#> [2] "i will not fail . <EOS>"                                                
#> [3] "i will go to benedick and counsel him to fight [...] (truncated output)"
sample_sentences(model = mkn, n = 3, max_length = 10, t = 10)
#> [1] "july cham's incite start ancientry effect torture tore pains endings [...] (truncated output)"   
#> [2] "lastly gallants happiness publish margaret what by spots commodity wake [...] (truncated output)"
#> [3] "born all's 'fool' nest praise hurt messina build afar dancing [...] (truncated output)"

Getting Help

For further help, you can consult the reference page of the kgrams website or open an issue on the GitHub repository of kgrams. A vignette is available on the website, illustrating the process of building language models in-depth.

Name		Name	Last commit message	Last commit date
Latest commit History 338 Commits
.github		.github
.scribblr/notes		.scribblr/notes
R		R
cpp_docs		cpp_docs
data-raw		data-raw
data		data
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
Doxyfile		Doxyfile
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
kgrams.Rproj		kgrams.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kgrams

For beginners

Installation

Released version

Development version

Example

Getting Help

About

Releases 5

Packages

Contributors 2

Languages

License

vgherard/kgrams

Folders and files

Latest commit

History

Repository files navigation

kgrams

For beginners

Installation

Released version

Development version

Example

Getting Help

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Languages

Packages