Skip to content

Commit

Permalink
Improve wordlists, replacements (0.7.16).
Browse files Browse the repository at this point in the history
  • Loading branch information
finnbear committed Dec 1, 2023
1 parent 808dc8a commit 9cf04fe
Show file tree
Hide file tree
Showing 11 changed files with 102 additions and 10 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[package]
name = "rustrict"
authors = ["Finn Bear"]
version = "0.7.15"
version = "0.7.16"
edition = "2021"
license = "MIT OR Apache-2.0"
repository = "https://github.com/finnbear/rustrict/"
Expand Down
9 changes: 6 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
.PHONY: fuzz

all: test

downloads:
wget -O test.csv https://raw.githubusercontent.com/vzhou842/profanity-check/master/profanity_check/data/clean_data.csv
wget -O src/dictionary.txt https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt
wget -O src/dictionary_common.txt https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt
wget -O src/unicode_confusables.txt https://www.unicode.org/Public/security/14.0.0/confusables.txt
wget -O src/dictionary.txt https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt
wget -O src/dictionary_common.txt https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt
wget -O src/unicode_confusables.txt https://www.unicode.org/Public/security/14.0.0/confusables.txt
# TODO: ttf fonts

false_positives:
cargo run --bin false_positive_finder --release --features censor,regex,indicatif,rayon,find_false_positives
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ is used as a dataset. Positive accuracy is the percentage of profanity detected

| Crate | Accuracy | Positive Accuracy | Negative Accuracy | Time |
|-------|----------|-------------------|-------------------|------|
| [rustrict](https://crates.io/crates/rustrict) | 87.90% | 93.33% | 86.55% | 8s |
| [rustrict](https://crates.io/crates/rustrict) | 87.88% | 93.33% | 86.52% | 8s |
| [censor](https://crates.io/crates/censor) | 76.16% | 72.76% | 77.01% | 23s |

## Development
Expand Down
3 changes: 3 additions & 0 deletions src/dictionary_blacklist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -428,11 +428,13 @@ ll
lo
losers
loss ser
love minors
love slavery
lowlifes
m
male squirting
masochists
massive wood
masturbate(.*)
maya sol
meat beating
Expand Down Expand Up @@ -592,6 +594,7 @@ retardation
retarded
retards
rim job
romances
rs
s
sa
Expand Down
3 changes: 2 additions & 1 deletion src/dictionary_extra.txt
Original file line number Diff line number Diff line change
Expand Up @@ -203,4 +203,5 @@ wouldn't it
yass
yesturday
zenga
zubr east
zubr east
zuck
37 changes: 37 additions & 0 deletions src/false_positives.txt
Original file line number Diff line number Diff line change
Expand Up @@ -568,9 +568,12 @@ antiabortion
antifascist
antigropelos
antiracism
antiromance
antisex
antislavery
antispastic
antonio ga
antonio ger
ants croat
ants perm
anusim
Expand Down Expand Up @@ -2633,6 +2636,7 @@ chief luck
chief rick
chigger
chiggers
child stew
childrens croatia
childrens permalink
childrens permanent
Expand Down Expand Up @@ -2675,6 +2679,7 @@ chinkiest
chinking
chinkle
chinky
chiromance
chitties
cho ad
cho bag
Expand Down Expand Up @@ -2989,6 +2994,7 @@ commisce
commise
commissar
commission
commissionary
commissive
commissoria
commissural
Expand Down Expand Up @@ -3205,6 +3211,7 @@ courts croat
courts perm
covary
coverslut
cow girl
cow ward
cowardice
cowardish
Expand All @@ -3213,6 +3220,7 @@ cowardly
cowardness
cowardy
cowboy friend
cowgirls
cowsucker
coxitis
crack er
Expand Down Expand Up @@ -6557,6 +6565,7 @@ ka strate
kabobs
kabonga
kafirin
kakke
kalanchoe
kaneshite
kansas hole
Expand Down Expand Up @@ -7412,6 +7421,8 @@ mass sess
massachusetts croat
massachusetts perm
masses
massive wooden
massive woods
master balt
master bat
master batter
Expand Down Expand Up @@ -7636,6 +7647,7 @@ mismenstruation
miss croat
miss perm
missile nintendo
missionary
mistful
mitchell
mix ger
Expand Down Expand Up @@ -8533,6 +8545,7 @@ outer vs
outers croat
outers perm
outligger
outromance
outsuck
outwatch
outwater
Expand Down Expand Up @@ -9873,6 +9886,7 @@ pronunciative
pronunciator
pronymph
propagandic
proromance
proslave
proslavery
prospects croat
Expand Down Expand Up @@ -10662,6 +10676,28 @@ roll experiment
roll expert
rollers croat
rollers perm
roman ce
roman kee
roman keith
roman kelk
roman kell
roman ken
roman kep
roman ker
roman kevin
roman key
roman xerox
romancealist
romancean
romanced
romanceful
romanceish
romanceless
romancelet
romancelike
romancemonger
romanceproof
romancer
rong ger
ronga
roofing
Expand Down Expand Up @@ -14412,4 +14448,5 @@ zoophilies
zooxanthella
zope nascar
zope nasdaq
zuck
zwanziger
Loading

0 comments on commit 9cf04fe

Please sign in to comment.