Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #91: Add improved event validator API #94

Merged
merged 7 commits into from
Jun 18, 2024

Conversation

rmouritzen-splunk
Copy link
Contributor

@rmouritzen-splunk rmouritzen-splunk commented Jun 1, 2024

This addresses issue #91.

This API is at POST /api/v2/validate. It is separate from the existing POST /api/validate as the response has a different structure.

This v2 event validator checks all existing aspects of the input event based on the schema.

The input is the same as it was: either a single event JSON object, or a list of events as a JSON array of objects. The response will either be a JSON object or JSON array of objects. The content of the response objects is shown here:

// When payload is an object, response is one object as shown below
// When payload is an array of objects, response is an array of objects, one for each requested event,
// The response object has the form shown here:
{
  // "error" is used for simple global errors; other keys will not be present.
  // When present, the response status code will be 400 - Bad Request.
  // Example value: "Unexpected primitive type. The request JSON must be an object or array of objects."
  "error": "<global error message>",

  "error_count": 1,
  "warning_count": 1,

  // Event's metadata.uid is included if valid value is available (it is set, and it is a string)
  "uid": "<event's metadata.uid>",

  // "errors" holds a list of validation failures; the variations are shown in the list below
  "errors": [
    // general pattern; extra keys exist for specific errors
    {
      "error": "<error type>",      // fixed easily machine readable error code
      "message": "<error message>"  // message is always present
      // error-specific keys will also be present
    }
  ],
  // "warnings" holds a list of validation warnings; the variations are shown in the list below
  "warnings": [
    // general pattern; extra keys exist for specific warnings
    {
      "warning": "<warning type>",  // fixed easily machine readable warning code
      "message": "<warning message>"
      // warning-specific keys will also be present
    }
  ]
}

@rmouritzen-splunk rmouritzen-splunk requested review from rroupski, zschmerber, floydtree and mikeradka and removed request for rroupski June 1, 2024 04:05
@rmouritzen-splunk
Copy link
Contributor Author

rmouritzen-splunk commented Jun 1, 2024

For reviewers, I'm more interested in feedback on use of the new API than a code review. That said, I do think the swagger doc should show the response format.

@floydtree
Copy link
Contributor

A few observations from a quick test

  1. Observables, sibling name validation -
   {
      "error": "attribute_enum_sibling_incorrect",
      "message": "Attribute \"observables[0].type\" enum sibling value \"IP Address\" is incorrect for enum \"observables[0].type_id\" value 2; expected \"IP Address (Dictionary Type)\", got \"IP Address\".",
      "value": "IP Address",
      "attribute": "type",
      "attribute_path": "observables[0].type",
      "expected_value": "IP Address (Dictionary Type)"
    },

Probably not directly related to the validator itself, but I don't think IP Address (Dictionary Type) should be the expected value of type field. It should simply be IP Address. I get that we added it the framework to distinguish different observable types, but it doesn't make sense to be included and expected in the transformed data. Perhaps in the validator we can ignore absence of (Dictionary Type) and other such type values for observables

  1. Validation for unmapped -
{
      "error": "attribute_unknown",
      "message": "Unknown attribute at \"unmapped.sublocation_id\"; attribute \"sublocation_id\" is not defined in object \"object\".",
      "attribute": "sublocation_id",
      "object_name": "object",
      "attribute_path": "unmapped.sublocation_id"
    },

I don't see a need to validate contents of the unmapped object. As seen above, the validation error attribute_unknown is contextually incorrect for a column like unmapped. Fields in unmapped will always be unknown, by virtue of the fields being unmappable.

  1. Perhaps make it configurable to show warnings in the validator output, and show only errors by default? With the warnings section in the validator response, the output is bound to be quite verbose for most of the OCSF users.

@rmouritzen-splunk
Copy link
Contributor Author

I noticed the unmapped issue myself already. This happens because the object type object has no attributes, which normally would mean "no attributes are allowed". I don't want to hard-code handling of object, so a new rule is needed, I think. Something like this (which would need to be documented):

If the final type for an object definition has no attributes, the definition allows any attributes. Conversely, once any attribute is defined, only those attributes are allowed. The current usage of this is the object type, which is directly used by the unmapped and xattributes attributes.

The observables case is also tricky. I need to ponder that one for a bit to avoid having another special-case.

@dkolbly
Copy link

dkolbly commented Jun 4, 2024

For whatever it's worth, I hard-coded support for object and json_t in my validator.

I am also not sure that allowing that generically for any object is wise; if anything, it might make more sense to make that a property of the attribute (i.e., unmapped) rather than the type (object) because there is more semantic weight at that level.

@rmouritzen-splunk
Copy link
Contributor Author

rmouritzen-splunk commented Jun 4, 2024

For whatever it's worth, I hard-coded support for object and json_t in my validator.

Explicitly handling json_t is required.

I think handling object can be done by checking for a class/object definition with no attributes. At least, that's what I'm trying out. This would be a new OCSF schema rule:

A class or object that defines no attributes means the attribute are open -- any attributes of any type can be added. The open attributes property is not inherited, however. A class or object that defines attributes and also extends a base class or object with no attributes only allows those defined attributes.

That is a terse way to say it. When documented, examples would be used to clarify the concept. In the core schema, the object object type defines no attributes, and when used directly in the unmapped and xattributes cases specifically means any attributes can be added. However the object type is also used as a base for all of the other objects defined in the core schema, and in those derived objects the attributes are closed (no extra attributes are meant to be allowed).

(BTW: I actually don't know why object is used as the ultimate base object type in the core schema. I don't see any value to this.)

This isn't the only path forward, though. Here are the alternatives I see:

  1. Add a new "open" attributes property for classes and objects. This is equivalent to JSON Schema's additionalProperties set to true. This property would be inheritable. We would add this to the object class, and then remove object as a base from all other classes. I don't think this would be a breaking change since the "compiled" schema ends up the same.
  2. (@dkolbly's variation) Add a new attribute-level property to specify an open object. This is the same as 1 except at the attribute level instead of the class/object level. Seems less tricky, as Donovan points out above.
  3. Add a new object_t type. This means an object (a map or associative array) with arbitrary attributes. The unmapped and xattributes types would be changed from object to object_t.
  4. Consider object a special-case. The rule is "attributes of typeobject are open-ended". (I don't like this because object is defined by the core schema, and it is defined as having no attributes. Why is it special?)
  5. The proposal above: any class or object defined with no attributes means its attributes are open, though this property isn't inherited. This generalizes the exists way object is used so it isn't a special-case.

IMHO, the existing use of object as open-ended is basically a bug -- an object_t type should have been introduced. But we can solve this, I think.

@dkolbly, @pagbabian-splunk, and others, thoughts?

…dation of observables. Add (limited) observable validation.
…vent bundle with /api/v2/validate_bundle. Remove try/rescue pattern from schema_controller.ex as it returns exception message, leaking internal implementation details (a security problem). Improve Swagger docs for new APIs.
@rmouritzen-splunk
Copy link
Contributor Author

Last commit moves validation of from POST /api/v2/validate to use of event bundle with POST /api/v2/validate_bundle.

That should be the last change, barring further issues found or a change related to how object is validated (see above).

@dkolbly
Copy link

dkolbly commented Jun 6, 2024

Excellent analysis of variant approaches @rmouritzen-splunk !

I didn't even think of your number 3, object_t. And I agree with your comment about using object as an open-ended collection of attributes is basically a bug.

Especially when considered in the context of json_t, I think object_t makes the most sense; basically json_t represents a generic JSON value, and object_t represents a JSON object (as opposed to value).

That helps steer people away from the open-ended objects as much as possible, makes it more obvious when they are using them, and makes things cleaner (I think) from an encoding perspective (certainly it does for protobuf encoding).

Copy link

@dkolbly dkolbly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the extent I can read elixir code, looks good to me!

@rmouritzen-splunk rmouritzen-splunk merged commit 109a3da into ocsf:main Jun 18, 2024
1 check passed
@rmouritzen-splunk rmouritzen-splunk deleted the issue91 branch June 18, 2024 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants