Issue #91: Add improved event validator API #94

rmouritzen-splunk · 2024-06-01T04:04:52Z

This addresses issue #91.

This API is at POST /api/v2/validate. It is separate from the existing POST /api/validate as the response has a different structure.

This v2 event validator checks all existing aspects of the input event based on the schema.

The input is the same as it was: either a single event JSON object, or a list of events as a JSON array of objects. The response will either be a JSON object or JSON array of objects. The content of the response objects is shown here:

// When payload is an object, response is one object as shown below
// When payload is an array of objects, response is an array of objects, one for each requested event,
// The response object has the form shown here:
{
  // "error" is used for simple global errors; other keys will not be present.
  // When present, the response status code will be 400 - Bad Request.
  // Example value: "Unexpected primitive type. The request JSON must be an object or array of objects."
  "error": "<global error message>",

  "error_count": 1,
  "warning_count": 1,

  // Event's metadata.uid is included if valid value is available (it is set, and it is a string)
  "uid": "<event's metadata.uid>",

  // "errors" holds a list of validation failures; the variations are shown in the list below
  "errors": [
    // general pattern; extra keys exist for specific errors
    {
      "error": "<error type>",      // fixed easily machine readable error code
      "message": "<error message>"  // message is always present
      // error-specific keys will also be present
    }
  ],
  // "warnings" holds a list of validation warnings; the variations are shown in the list below
  "warnings": [
    // general pattern; extra keys exist for specific warnings
    {
      "warning": "<warning type>",  // fixed easily machine readable warning code
      "message": "<warning message>"
      // warning-specific keys will also be present
    }
  ]
}

rmouritzen-splunk · 2024-06-01T17:00:59Z

For reviewers, I'm more interested in feedback on use of the new API than a code review. That said, I do think the swagger doc should show the response format.

floydtree · 2024-06-03T19:23:14Z

A few observations from a quick test

Observables, sibling name validation -

   {
      "error": "attribute_enum_sibling_incorrect",
      "message": "Attribute \"observables[0].type\" enum sibling value \"IP Address\" is incorrect for enum \"observables[0].type_id\" value 2; expected \"IP Address (Dictionary Type)\", got \"IP Address\".",
      "value": "IP Address",
      "attribute": "type",
      "attribute_path": "observables[0].type",
      "expected_value": "IP Address (Dictionary Type)"
    },

Probably not directly related to the validator itself, but I don't think IP Address (Dictionary Type) should be the expected value of type field. It should simply be IP Address. I get that we added it the framework to distinguish different observable types, but it doesn't make sense to be included and expected in the transformed data. Perhaps in the validator we can ignore absence of (Dictionary Type) and other such type values for observables

Validation for unmapped -

{
      "error": "attribute_unknown",
      "message": "Unknown attribute at \"unmapped.sublocation_id\"; attribute \"sublocation_id\" is not defined in object \"object\".",
      "attribute": "sublocation_id",
      "object_name": "object",
      "attribute_path": "unmapped.sublocation_id"
    },

I don't see a need to validate contents of the unmapped object. As seen above, the validation error attribute_unknown is contextually incorrect for a column like unmapped. Fields in unmapped will always be unknown, by virtue of the fields being unmappable.

Perhaps make it configurable to show warnings in the validator output, and show only errors by default? With the warnings section in the validator response, the output is bound to be quite verbose for most of the OCSF users.

rmouritzen-splunk · 2024-06-03T19:41:44Z

I noticed the unmapped issue myself already. This happens because the object type object has no attributes, which normally would mean "no attributes are allowed". I don't want to hard-code handling of object, so a new rule is needed, I think. Something like this (which would need to be documented):

If the final type for an object definition has no attributes, the definition allows any attributes. Conversely, once any attribute is defined, only those attributes are allowed. The current usage of this is the object type, which is directly used by the unmapped and xattributes attributes.

The observables case is also tricky. I need to ponder that one for a bit to avoid having another special-case.

dkolbly · 2024-06-04T15:36:30Z

For whatever it's worth, I hard-coded support for object and json_t in my validator.

I am also not sure that allowing that generically for any object is wise; if anything, it might make more sense to make that a property of the attribute (i.e., unmapped) rather than the type (object) because there is more semantic weight at that level.

rmouritzen-splunk · 2024-06-04T15:44:19Z

For whatever it's worth, I hard-coded support for object and json_t in my validator.

Explicitly handling json_t is required.

I think handling object can be done by checking for a class/object definition with no attributes. At least, that's what I'm trying out. This would be a new OCSF schema rule:

A class or object that defines no attributes means the attribute are open -- any attributes of any type can be added. The open attributes property is not inherited, however. A class or object that defines attributes and also extends a base class or object with no attributes only allows those defined attributes.

That is a terse way to say it. When documented, examples would be used to clarify the concept. In the core schema, the object object type defines no attributes, and when used directly in the unmapped and xattributes cases specifically means any attributes can be added. However the object type is also used as a base for all of the other objects defined in the core schema, and in those derived objects the attributes are closed (no extra attributes are meant to be allowed).

(BTW: I actually don't know why object is used as the ultimate base object type in the core schema. I don't see any value to this.)

This isn't the only path forward, though. Here are the alternatives I see:

Add a new "open" attributes property for classes and objects. This is equivalent to JSON Schema's additionalProperties set to true. This property would be inheritable. We would add this to the object class, and then remove object as a base from all other classes. I don't think this would be a breaking change since the "compiled" schema ends up the same.
(@dkolbly's variation) Add a new attribute-level property to specify an open object. This is the same as 1 except at the attribute level instead of the class/object level. Seems less tricky, as Donovan points out above.
Add a new object_t type. This means an object (a map or associative array) with arbitrary attributes. The unmapped and xattributes types would be changed from object to object_t.
Consider object a special-case. The rule is "attributes of typeobject are open-ended". (I don't like this because object is defined by the core schema, and it is defined as having no attributes. Why is it special?)
The proposal above: any class or object defined with no attributes means its attributes are open, though this property isn't inherited. This generalizes the exists way object is used so it isn't a special-case.

IMHO, the existing use of object as open-ended is basically a bug -- an object_t type should have been introduced. But we can solve this, I think.

@dkolbly, @pagbabian-splunk, and others, thoughts?

…dation of observables. Add (limited) observable validation.

…vent bundle with /api/v2/validate_bundle. Remove try/rescue pattern from schema_controller.ex as it returns exception message, leaking internal implementation details (a security problem). Improve Swagger docs for new APIs.

rmouritzen-splunk · 2024-06-06T00:29:13Z

Last commit moves validation of from POST /api/v2/validate to use of event bundle with POST /api/v2/validate_bundle.

That should be the last change, barring further issues found or a change related to how object is validated (see above).

dkolbly · 2024-06-06T15:24:54Z

Excellent analysis of variant approaches @rmouritzen-splunk !

I didn't even think of your number 3, object_t. And I agree with your comment about using object as an open-ended collection of attributes is basically a bug.

Especially when considered in the context of json_t, I think object_t makes the most sense; basically json_t represents a generic JSON value, and object_t represents a JSON object (as opposed to value).

That helps steer people away from the open-ended objects as much as possible, makes it more obvious when they are using them, and makes things cleaner (I think) from an encoding perspective (certainly it does for protobuf encoding).

dkolbly

To the extent I can read elixir code, looks good to me!

Issue ocsf#91: Add improved event validator API

188ebeb

rmouritzen-splunk requested review from rroupski, zschmerber, floydtree and mikeradka and removed request for rroupski June 1, 2024 04:05

rmouritzen-splunk requested a review from dkolbly June 3, 2024 18:37

floydtree requested a review from pagbabian-splunk June 3, 2024 19:24

rmouritzen-splunk added 2 commits June 4, 2024 20:09

Change observable type_id caption so they match up better during vali…

58b44fd

…dation of observables. Add (limited) observable validation.

rmouritzen-splunk added 3 commits June 6, 2024 12:20

Bump Docker elixir image version from 1.16.2 to 1.16.3

caf98fb

Update Elixir dependencies

6d8de1f

Update Docker image for Elixir for final image (missed that last time)

12857b8

dkolbly approved these changes Jun 7, 2024

View reviewed changes

Dependency updates

0a565da

rmouritzen-splunk merged commit 109a3da into ocsf:main Jun 18, 2024
1 check passed

rmouritzen-splunk deleted the issue91 branch June 18, 2024 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #91: Add improved event validator API #94

Issue #91: Add improved event validator API #94

rmouritzen-splunk commented Jun 1, 2024 •

edited

Loading

rmouritzen-splunk commented Jun 1, 2024 •

edited

Loading

floydtree commented Jun 3, 2024

rmouritzen-splunk commented Jun 3, 2024

dkolbly commented Jun 4, 2024

rmouritzen-splunk commented Jun 4, 2024 •

edited

Loading

rmouritzen-splunk commented Jun 6, 2024

dkolbly commented Jun 6, 2024

dkolbly left a comment

Issue #91: Add improved event validator API #94

Issue #91: Add improved event validator API #94

Conversation

rmouritzen-splunk commented Jun 1, 2024 • edited Loading

rmouritzen-splunk commented Jun 1, 2024 • edited Loading

floydtree commented Jun 3, 2024

rmouritzen-splunk commented Jun 3, 2024

dkolbly commented Jun 4, 2024

rmouritzen-splunk commented Jun 4, 2024 • edited Loading

rmouritzen-splunk commented Jun 6, 2024

dkolbly commented Jun 6, 2024

dkolbly left a comment

Choose a reason for hiding this comment

rmouritzen-splunk commented Jun 1, 2024 •

edited

Loading

rmouritzen-splunk commented Jun 1, 2024 •

edited

Loading

rmouritzen-splunk commented Jun 4, 2024 •

edited

Loading