-
Notifications
You must be signed in to change notification settings - Fork 1.6k
pattern es
The pattern.es module contains a fast part-of-speech tagger for Spanish (identifies nouns, adjectives, verbs, etc. in a sentence) and tools for Spanish verb conjugation and noun singularization & pluralization.
It can be used by itself or with other pattern modules: web | db | en | search | vector | graph.
The functions in this module take the same parameters and return the same values as their counterparts in pattern.en. Refer to the documentation there for more details.
For Spanish nouns there is singularize()
and pluralize()
. The implementation is slightly
less robust than the English version (accuracy 94% for singularization
and 78% for pluralization).
>>> from pattern.es import singularize, pluralize
>>>
>>> print singularize('gatos')
>>> print pluralize('gato')
gato
gatos
For Spanish verbs there is conjugate()
,
lemma()
, lexeme()
and tenses()
. The lexicon for verb conjugation
contains about 600 common Spanish verbs, composed by Fred Jehle. For
unknown verbs it will fall back to a rule-based approach with an
accuracy of about 84%.
Spanish verbs have more tenses than English verbs. In particular, the
plural differs for each person, and there are additional forms for
the FUTURE
and CONDITIONAL
tense, the IMPERATIVE
and SUBJUNCTIVE
mood and the PERFECTIVE
aspect:
>>> from pattern.es import conjugate
>>> from pattern.es import INFINITIVE, PRESENT, PAST, SG, SUBJUNCTIVE, PERFECTIVE
>>>
>>> print conjugate('soy', INFINITIVE)
>>> print conjugate('soy', PRESENT, 1, SG, mood=SUBJUNCTIVE)
>>> print conjugate('soy', PAST, 3, SG)
>>> print conjugate('soy', PAST, 3, SG, aspect=PERFECTIVE)
ser
sea
era
fue
For PAST
tense + PERFECTIVE
aspect we can also use PRETERITE
. For PAST
tense + IMPERFECTIVE
aspect we can also use IMPERFECT
:
>>> from pattern.es import conjugate
>>> from pattern.es import IMPERFECT, PRETERITE
>>>
>>> print conjugate('soy', IMPERFECT, 3, SG)
>>> print conjugate('soy', PRETERITE, 3, SG)
era
fue
The conjugate()
function takes the
following optional parameters:
Tense | Person | Number | Mood | Aspect | Alias | Example |
INFINITVE | None | None | None | None | "inf" | ser |
PRESENT | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sg" | yo __soy__ |
PRESENT | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sg" | tú __eres__ |
PRESENT | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sg" | el __es__ |
PRESENT | 1 | PL | INDICATIVE | IMPERFECTIVE | "1pl" | nosotros __somos__ |
PRESENT | 2 | PL | INDICATIVE | IMPERFECTIVE | "2pl" | vosotros __sois__ |
PRESENT | 3 | PL | INDICATIVE | IMPERFECTIVE | "3pl" | ellos __son__ |
PRESENT | None | None | INDICATIVE | PROGRESSIVE | "part" | siendo |
PRESENT | 2 | SG | IMPERATIVE | IMPERFECTIVE | "2sg!" | sé |
PRESENT | 2 | PL | IMPERATIVE | IMPERFECTIVE | "2pl!" | sed |
PRESENT | 1 | SG | SUBJUNCTIVE | IMPERFECTIVE | "1sg?" | yo __sea__ |
PRESENT | 2 | SG | SUBJUNCTIVE | IMPERFECTIVE | "2sg?" | tú __seas__ |
PRESENT | 3 | SG | SUBJUNCTIVE | IMPERFECTIVE | "3sg?" | el __sea__ |
PRESENT | 1 | PL | SUBJUNCTIVE | IMPERFECTIVE | "1pl?" | nosotros __seamos__ |
PRESENT | 2 | PL | SUBJUNCTIVE | IMPERFECTIVE | "2pl?" | vosotros __seáis__ |
PRESENT | 3 | PL | SUBJUNCTIVE | IMPERFECTIVE | "3pl?" | ellos __sean__ |
PAST | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sgp" | yo __era__ |
PAST | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sgp" | tú __eras__ |
PAST | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sgp" | el __era__ |
PAST | 1 | PL | INDICATIVE | IMPERFECTIVE | "1ppl" | nosotros __éramos__ |
PAST | 2 | PL | INDICATIVE | IMPERFECTIVE | "2ppl" | vosotros __erais__ |
PAST | 3 | PL | INDICATIVE | IMPERFECTIVE | "3ppl" | ellos __eran__ |
PAST | None | None | INDICATIVE | PROGRESSIVE | "ppart" | sido |
PAST | 1 | SG | INDICATIVE | PERFECTIVE | "1sgp+" | yo __fui__ |
PAST | 2 | SG | INDICATIVE | PERFECTIVE | "2sgp+" | tú __fuiste__ |
PAST | 3 | SG | INDICATIVE | PERFECTIVE | "3sgp+" | el __fue__ |
PAST | 1 | PL | INDICATIVE | PERFECTIVE | "1ppl+" | nosotros __fuimos__ |
PAST | 2 | PL | INDICATIVE | PERFECTIVE | "2ppl+" | vosotros __fuisteis__ |
PAST | 3 | PL | INDICATIVE | PERFECTIVE | "3ppl+" | ellos __fueron__ |
PAST | 1 | SG | SUBJUNCTIVE | IMPERFECTIVE | "1sgp?" | yo __fuera__ |
PAST | 2 | SG | SUBJUNCTIVE | IMPERFECTIVE | "2sgp?" | tú __fueras__ |
PAST | 3 | SG | SUBJUNCTIVE | IMPERFECTIVE | "3sgp?" | el __fuera__ |
PAST | 1 | PL | SUBJUNCTIVE | IMPERFECTIVE | "1ppl?" | nosotros __fuéramos__ |
PAST | 2 | PL | SUBJUNCTIVE | IMPERFECTIVE | "2ppl?" | vosotros __fuerais__ |
PAST | 3 | PL | SUBJUNCTIVE | IMPERFECTIVE | "3ppl?" | ellos __fueran__ |
FUTURE | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sgf" | yo __seré__ |
FUTURE | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sgf" | tú __serás__ |
FUTURE | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sgf" | el __será__ |
FUTURE | 1 | PL | INDICATIVE | IMPERFECTIVE | "1plf" | nosotros __seremos__ |
FUTURE | 2 | PL | INDICATIVE | IMPERFECTIVE | "2plf" | vosotros __seréis__ |
FUTURE | 3 | PL | INDICATIVE | IMPERFECTIVE | "3plf" | ellos __serán__ |
CONDITIONAL | 1 | SG | INDICATIVE | IMPERFECTIVE | "1sg->" | yo __sería__ |
CONDITIONAL | 2 | SG | INDICATIVE | IMPERFECTIVE | "2sg->" | tú __serías__ |
CONDITIONAL | 3 | SG | INDICATIVE | IMPERFECTIVE | "3sg->" | el __sería__ |
CONDITIONAL | 1 | PL | INDICATIVE | IMPERFECTIVE | "1pl->" | nosotros __seríamos__ |
CONDITIONAL | 2 | PL | INDICATIVE | IMPERFECTIVE | "2pl->" | vosotros __seríais__ |
CONDITIONAL | 3 | PL | INDICATIVE | IMPERFECTIVE | "3pl->" | ellos __serían__ |
Instead of optional parameters, a single short alias, or PARTICIPLE
or PAST+PARTICIPLE
can also be given. With no
parameters, the infinitive form of the verb is returned.
Reference: Jehle, F. (2012). Spanish Verb Forms. Retrieved from: http://users.ipfw.edu/jehle/verblist.htm.
Spanish adjectives inflect with an -o
, -a
,
-os
, -as
, or -es
suffix (e.g., curioso → los gatos curiosos) depending on gender. You
can get the base form with the predicative()
function, or vice versa
with attributive()
. For predicative, a
statistical approach is used with an accuracy of 93%. For attributive,
you need to supply gender (MALE
, FEMALE
, NEUTRAL
and/or PLURAL
).
>>> from pattern.es import attributive, predicative
>>> from pattern.es import FEMALE, PLURAL
>>>
>>> print predicative('curiosos')
>>> print attributive('curioso', gender=FEMALE)
>>> print attributive('curioso', gender=FEMALE+PLURAL)
curioso
curiosa
curiosas
For parsing there is parse(),
parsetree()
and split().
The parse()
function annotates words in
the given string with their part-of-speech
tags (e.g.,
NN
for nouns and VB
for verbs). The parsetree()
function takes a string and
returns a tree of nested objects (Text
→ Sentence
→ Chunk
→ Word
). The split()
function takes the output of parse()
and returns a Text
. See the pattern.en
documentation (here) how to
manipulate Text
objects.
>>> from pattern.es import parse, split
>>>
>>> s = parse('El gato negro se sienta en la estera.')
>>> for sentence in split(s):
>>> print sentence
Sentence('El/DT/B-NP/O gato/NN/I-NP/O negro/JJ/I-NP/O'
'se/PRP/B-NP/O sienta/VB/B-VP/O'
'en/IN/B-PP/B-PNP la/DT/B-NP/I-PNP estera/NN/I-NP/I-PNP ././O/O')
The parser is trained on the Spanish portion of
Wikicorpus using 1.5M words
from the tagged sections 10,000–15,000. The accuracy is around 92%. The
original
Parole tagset is
mapped to Penn Treebank tagset. If you need to work
with the original tags you can also use parse()
with an optional parameter tagset="parole"
.
Reference: Reese, S.,
Boleda, G., Cuadros, M., Padró, L., Rigau, G (2010).
Wikicorpus: A Word-Sense Disambiguated Multilingual Wikipedia
Corpus. Proceedings of LREC'10.
There's no sentiment()
function for
Spanish yet.