Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Register binary attribute in collection.xconf.xsd #5432

Open
joewiz opened this issue Aug 29, 2024 · 3 comments · May be fixed by #5436
Open

[BUG] Register binary attribute in collection.xconf.xsd #5432

joewiz opened this issue Aug 29, 2024 · 3 comments · May be fixed by #5436
Labels
bug issue confirmed as bug Lucene issue is related to Lucene or its integration

Comments

@joewiz
Copy link
Member

joewiz commented Aug 29, 2024

Describe the bug

PR #4541 added support for indexing fields as binary values, controlled by a new @binary attribute on the <field> element. The PR neglected to register this new attribute in collection.xconf.xsd.

As a result, eXide, for example, flags uses of this attribute. See #5431 (comment).

Expected behavior

The xconf schema should define the attribute and its allowed values.

To Reproduce

Validate this document from the linked issue against collection.xconf.xsd:

<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!-- Full-text indexing with Lucene -->
    <lucene>
      <!-- Elements upon which to build an index. -->
      <text qname="div">
        <field name="sortable" expression="./test/string()" type="xs:string" binary="yes"/>
      </text>
    </lucene>
  </index>
</collection>

Via the linked issue:

If I try to apply the collection.xconf with eXide this error occurs:

Failed to apply configuration: DocValuesField "sortable" appears more than once in this document (only one value is allowed per field)

Screenshots

n/a

Context (please always complete the following information)

  • Build: eXist-6.2.0
  • Java: 1.8.0_422
  • OS: Ubuntu 22.04.4 LTS - Linux 6.8.0-40-generic amd64

Additional context

  • How is eXist-db installed? n/a
  • Any custom changes in e.g. conf.xml? n/a
@line-o line-o added bug issue confirmed as bug Lucene issue is related to Lucene or its integration labels Aug 29, 2024
@joewiz
Copy link
Member Author

joewiz commented Aug 29, 2024

I'd be happy to submit a PR adding field/@binary to https://github.com/eXist-db/exist/blob/develop/schema/collection.xconf.xsd#L179-L186.

I just need to know the allowed values.

The example code on the Lucene documentation page shows binary="yes", but the tests added in the PR all say binary="true". I take it from https://github.com/eXist-db/exist/pull/4541/files#diff-bdeb3fc906efd9177af97cd1351231dc0e9a9aa14e477246b82b57483d6618e2R113 that either "true" or "yes" are accepted. It seems that any other value is treated as false (same as omitting the attribute).

(The other attributes in this schema that take yes - e.g., caseOpt - don't offer true|false as options. So the binary attribute appears to be an outlier in accepting this larger set of values in this schema.)

If that's a correct interpretation of the code, I'd add a binaryOpt entry to fieldAttrType, defined as follows:

<xs:attributeGroup name="binaryOpt">
    <xs:attribute name="binary" use="optional">
        <xs:simpleType>
            <xs:restriction base="xs:token">
                <xs:enumeration value="yes">
                    <xs:annotation>
                        <xs:documentation>Index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
                <xs:enumeration value="true">
                    <xs:annotation>
                        <xs:documentation>Index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
                <xs:enumeration value="no">
                    <xs:annotation>
                        <xs:documentation>Do not index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
                <xs:enumeration value="false">
                    <xs:annotation>
                        <xs:documentation>Do not index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
            </xs:restriction>
        </xs:simpleType>
    </xs:attribute>
</xs:attributeGroup>

Is this a correct interpretation of what the code supports / implies?

@dizzzz
Copy link
Member

dizzzz commented Aug 30, 2024

As I read the code, you are correct. @adamretter

@joewiz
Copy link
Member Author

joewiz commented Aug 30, 2024

@dizzzz Thanks!

@joewiz joewiz linked a pull request Aug 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issue confirmed as bug Lucene issue is related to Lucene or its integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants