Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using OntoGPT outside BioInformatics #471

Open
jamesfebin opened this issue Nov 8, 2024 · 6 comments
Open

Using OntoGPT outside BioInformatics #471

jamesfebin opened this issue Nov 8, 2024 · 6 comments
Labels
question Further information is requested template A request for a new or modified template

Comments

@jamesfebin
Copy link

jamesfebin commented Nov 8, 2024

I am trying to use OntoGPT in a domain outside of bioinformatics. Presently trying something simple like extracting names of people from a given text.

I have a dumb question.

The values are pre-defined in most of the templates I have seen (Ex: vbo_names). So, when I try to modify and use the template, though it's a valid LinkML file, OntoGPT doesn't add them to OWL like only the last value in a list of people's names is added. And it gives errors like

INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Febin John James')]

Custom Template I made.

id: https://w3id.org/linkml/examples/personinfo
name: personinfo
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
default_range: string

classes:
  Person:
    attributes:
      full_name:
  Container:
    tree_root: true
    attributes:
      persons:
        multivalued: true
        inlined_as_list: true
        range: Person

Is there a template that's a bit generic I can use in this case?

@caufieldjh
Copy link
Member

Hi @jamesfebin - OntoGPT can do this, and your template is a great start - it just needs some more details for the LLM to work with.

(The imports should also include core as this defines the main OntoGPT types)

So if the input text is something like this:

In a surprise move, the city council of Oakdale voted to approve a new development project led by prominent businesswoman, Emily-Jane Lee. The project, which will bring a new shopping center and several restaurants to the downtown area, has been met with both excitement and skepticism from local residents. Council members, including Chairperson Maria Rodriguez, Vice Chair John Michael Davis Jr., and Councilor Sofia Patel, cited the potential economic benefits and job creation as key factors in their decision. However, some residents, such as longtime Oakdale resident and activist, Ava Morales, have expressed concerns about the impact on traffic and local small businesses. Despite these concerns, project investor, Julian Styles, remains confident that the development will be a success and a boon to the community.

Then a template like this should work:

id: https://w3id.org/linkml/examples/personinfo
name: personinfo
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
  - core
default_range: string

classes:

  Container:
    tree_root: true
    attributes:
      persons:
        description: >-
          A semicolon-delimited list of people named in the text.
        multivalued: true
        inlined_as_list: true
        range: Person

  Person:
    description: >-
      A person.
    attributes:
      full_name:
        description: >-
          The full name of the person.
        range: string

Run something like ontogpt extract -t personinfo.yaml -i input.txt and you should get a result like:

---
input_text: In a surprise move, the city council of Oakdale voted to approve a new
  development project led by prominent businesswoman, Emily-Jane Lee. The project,
  which will bring a new shopping center and several restaurants to the downtown area,
  has been met with both excitement and skepticism from local residents. Council members,
  including Chairperson Maria Rodriguez, Vice Chair John Michael Davis Jr., and Councilor
  Sofia Patel, cited the potential economic benefits and job creation as key factors
  in their decision. However, some residents, such as longtime Oakdale resident and
  activist, Ava Morales, have expressed concerns about the impact on traffic and local
  small businesses. Despite these concerns, project investor, Julian Styles, remains
  confident that the development will be a success and a boon to the community.
raw_completion_output: 'persons: Emily-Jane Lee; Maria Rodriguez; John Michael Davis
  Jr.; Sofia Patel; Ava Morales; Julian Styles;'
prompt: |+
  Split the following piece of text into fields in the following format:

  full_name: <The full name of the person.>


  Text:
  Julian Styles

  ===

extracted_object:
  persons:
    - full_name: Emily-Jane Lee
    - full_name: Maria Rodriguez
    - full_name: John Michael Davis Jr.
    - full_name: Sofia Patel
    - full_name: Ava Morales
    - full_name: Julian Styles

@jamesfebin
Copy link
Author

Thank you, @caufieldjh I am able to generate the yaml file.

However, I get the following when I use it for OWL format. And it doesn't generate a valid .owl file.

INFO:root:Output format: owl
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: Container == Person owning: Container
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: Container == Person owning: Container
INFO:linkml.generators.pythongen:FALSE: OCCURS BEFORE: Any == Any owning: ExtractionResult
INFO:linkml.generators.pythongen:FALSE: OCCURS BEFORE: Any == Any owning: ExtractionResult
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: TextWithTriples == Publication owning: TextWithTriples
INFO:linkml.generators.pythongen:FALSE: OCCURS BEFORE: Triple == Triple owning: TextWithTriples
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: TextWithTriples == Publication owning: TextWithTriples
INFO:linkml.generators.pythongen:FALSE: OCCURS BEFORE: Triple == Triple owning: TextWithTriples
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: TextWithEntity == Publication owning: TextWithEntity
INFO:linkml.generators.pythongen:TRUE: OCCURS SAME: TextWithEntity == Publication owning: TextWithEntity
INFO:root:Subject=None
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Emily-Jane Lee')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Maria Rodriguez')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='John Michael Davis Jr.')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Sofia Patel')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Ava Morales')]
INFO:root:Subject=None
INFO:root:Cannot determine axiom type for full_name, unprocessed=[Literal(v='Julian Styles')]
INFO:root:Cannot determine axiom type for persons, unprocessed=[]

@caufieldjh
Copy link
Member

Generating OWL requires a few more format-specific details so the OWL interpreter knows how to define relationships the LinkML format doesn't identify.
Try this:

id: https://w3id.org/linkml/examples/personinfo
name: personinfo
prefixes:
  linkml: https://w3id.org/linkml/
  personinfo: https://w3id.org/linkml/examples/personinfo/
imports:
  - linkml:types
  - core
default_range: string

default_prefix: personinfo

classes:

  Container:
    tree_root: true
    attributes:
      persons:
        description: >-
          A semicolon-delimited list of people named in the text.
        multivalued: true
        inlined_as_list: true
        annotations:
          owl: ObjectProperty, ObjectSomeValuesFrom
        range: Person

  Person:
    is_a: NamedEntity
    description: >-
      A person.
    attributes:
      full_name:
        description: >-
          The full name of the person.
        range: string
      id:
        description: >-
          A unique identifier for the person.
          This is their full name without spaces
          or special characters.
        identifier: true
        range: string

That should generate OWL like this:

Prefix( owl: = <http://www.w3.org/2002/07/owl#> )
Prefix( rdf: = <http://www.w3.org/1999/02/22-rdf-syntax-ns#> )
Prefix( rdfs: = <http://www.w3.org/2000/01/rdf-schema#> )
Prefix( xsd: = <http://www.w3.org/2001/XMLSchema#> )
Prefix( xml: = <http://www.w3.org/XML/1998/namespace> )
Prefix( linkml: = <https://w3id.org/linkml/> )
Prefix( personinfo: = <https://w3id.org/linkml/examples/personinfo/> )
Prefix( shex: = <http://www.w3.org/ns/shex#> )
Prefix( schema: = <http://schema.org/> )
Prefix( NCIT: = <http://purl.obolibrary.org/obo/NCIT_> )
Prefix( RO: = <http://purl.obolibrary.org/obo/RO_> )
Prefix( biolink: = <https://w3id.org/biolink/vocab/> )
Prefix( core: = <http://w3id.org/ontogpt/core/> )

Ontology( <https://w3id.org/linkml/examples/personinfo>
    AnnotationAssertion( rdfs:label personinfo:EmilyJaneLee "Emily-Jane Lee" )
    AnnotationAssertion( rdfs:label personinfo:MariaRodriguez "Maria Rodriguez" )
    AnnotationAssertion( rdfs:label personinfo:JohnMichaelDavisJr "John Michael Davis Jr" )
    AnnotationAssertion( rdfs:label personinfo:SofiaPatel "Sofia Patel" )
    AnnotationAssertion( rdfs:label personinfo:AvaMorales "Ava Morales" )
    AnnotationAssertion( rdfs:label personinfo:JulianStyles "Julian Styles" )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:EmilyJaneLee ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:MariaRodriguez ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:JohnMichaelDavisJr ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:SofiaPatel ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:AvaMorales ) )
    SubClassOf( None     ObjectSomeValuesFrom( personinfo:persons personinfo:JulianStyles ) )
)

@jamesfebin
Copy link
Author

Thank you again, @caufieldjh.

However, when I import this on Protege or another owl visualizer, I get an error. Can you point me to any document or resources so I can study and solve these issues myself? (How to go about writing yaml file to generate owl data models)

Screenshot 2024-11-08 at 9 51 21 PM

@caufieldjh
Copy link
Member

Hi @jamesfebin, OntoGPT uses LinkML tools for generating OWL (and other serializations) so you may find these docs helpful: https://linkml.io/linkml/generators/owl.html

@caufieldjh caufieldjh added question Further information is requested template A request for a new or modified template labels Nov 10, 2024
@cmungall
Copy link
Member

cmungall commented Nov 11, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested template A request for a new or modified template
Projects
None yet
Development

No branches or pull requests

3 participants