Math Concept Identification and NLI Math

A repo for experiments in "math concept identification" using the TAC corpus and the nLab corpus. (and first thoughts about NLI for mathematics.)

The TAC corpus can be found at https://github.com/ToposInstitute/tac-corpus.

A selection of 436 sentences of the TAC corpus (some are empty), selected by size (not too big, not too small) and lack of LaTeX is in https://github.com/ToposInstitute/tac-corpus/blob/main/golden-attempt/examples.txt and is repeated here for convenience both as Experiment2.txt in the folder Experiment436 and as the file 436sentences.txt

The nLab corpus (from around 2020) is at https://github.com/ToposInstitute/nlab-corpus.

Short guidelines for mathematician annotation already agreed:

Try to treat math concepts as black boxes, as much as possible.
Use the singular, instead of the plural, for concepts. Use no Capitals for concepts, as much as possible.
If one has a long span that is a concept, e.g. “enriched accessible categories”, we should also list the sensible subspans like “accessible category”.

A subset of the sentences have no mathematical concepts at all, e.g. "Further applications are given."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Math Concept Identification and NLI Math

Files

README.md

Latest commit

History

README.md

File metadata and controls

Math Concept Identification and NLI Math