Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multiple data types (remainder of #196) #203

Merged
merged 0 commits into from
Sep 25, 2020

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Sep 25, 2020

Addresses #95 and #111.
Follow-up to #198, #199, #201

Trying again, since #130 failed. This time, I made the Model class to be polymorphic. This way, the amount of pointer indirection is minimized.

Summary: Model is an opaque container that wraps the polymorphic handle ModelImpl<ThresholdType, LeafOutputType>. The handle in turn stores the list of trees Tree<ThresholdType, LeafOutputType>. To unbox the Model container and obtain ModelImpl<ThresholdType, LeafOutputType>, use Model::Dispatch(<lambda expression>).

Also, upgrade to C++14 to access the generic lambda feature, which proved to be very useful in the dispatching logic for the polymorphic Model class.

  • Turn the Model and Tree classes into template classes
  • Revise the string templates so that correct data types are used in the generated C code
  • Rewrite the model builder class
  • Revise the zero-copy serializer
  • Create an abstract matrix class that supports multiple data types (float32, float64 for now).
  • Move the DMatrix class to the runtime.
  • Extend the DMatrix class so that it can hold float32 and float64.
  • Redesign the C runtime API using the DMatrix class.
  • Ensure accuracy of scikit-learn models. To achieve the best results, use float32 for the input matrix and float64 for the split thresholds and leaf outputs.
  • Revise the JVM runtime.

@hcho3 hcho3 closed this Sep 25, 2020
@hcho3 hcho3 reopened this Sep 25, 2020
@hcho3 hcho3 closed this Sep 25, 2020
@hcho3 hcho3 reopened this Sep 25, 2020
@hcho3 hcho3 changed the base branch from multi_type_refactor_breakup to release_0.90 September 25, 2020 09:03
@hcho3 hcho3 changed the base branch from release_0.90 to multi_type_refactor_breakup September 25, 2020 09:03
@hcho3 hcho3 merged this pull request into dmlc:multi_type_refactor_breakup Sep 25, 2020
@hcho3 hcho3 deleted the multi_type_support2 branch September 25, 2020 09:37
hcho3 added a commit that referenced this pull request Oct 9, 2020
Addresses #95 and #111.
Follow-up to #198, #199, #201

Trying again, since #130 failed. This time, I made the Model class to be polymorphic. This way, the amount of pointer indirection is minimized.

Summary: Model is an opaque container that wraps the polymorphic handle ModelImpl<ThresholdType, LeafOutputType>. The handle in turn stores the list of trees Tree<ThresholdType, LeafOutputType>. To unbox the Model container and obtain ModelImpl<ThresholdType, LeafOutputType>, use Model::Dispatch(<lambda expression>).

Also, upgrade to C++14 to access the generic lambda feature, which proved to be very useful in the dispatching logic for the polymorphic Model class.

* Turn the Model and Tree classes into template classes
* Revise the string templates so that correct data types are used in the generated C code
* Rewrite the model builder class
* Revise the zero-copy serializer
* Create an abstract matrix class that supports multiple data types (float32, float64 for now).
* Move the DMatrix class to the runtime.
* Extend the DMatrix class so that it can hold float32 and float64.
* Redesign the C runtime API using the DMatrix class.
* Ensure accuracy of scikit-learn models. To achieve the best results, use float32 for the input matrix and float64 for the split thresholds and leaf outputs.
* Revise the JVM runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant