Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set a more efficient serializer #28

Open
JulienPeloton opened this issue Jun 5, 2018 · 2 comments
Open

Set a more efficient serializer #28

JulienPeloton opened this issue Jun 5, 2018 · 2 comments
Assignees

Comments

@JulienPeloton
Copy link
Member

One needs to introduce KryoSerializer.

@JulienPeloton
Copy link
Member Author

@JulienPeloton
Copy link
Member Author

From my experience, just registering classes does not bring any improvement (current implementation) with Spark 2+:

/** Main App */
val conf = new SparkConf()
conf.registerKryoClasses(
  Array(classOf[MyClass1], etc...)
)

As far as I understand, Spark 2+ is already using Kryo serializer for many common things, but custom classes still require custom declaration, e.g.

/** File SpatialSerializer.scala */
import com.esotericsoftware.kryo.Kryo
import com.esotericsoftware.kryo.io.Input
import com.esotericsoftware.kryo.io.Output
import com.esotericsoftware.kryo.Serializer

class SpatialSerializer extends Serializer {
  // ...

  def write(Kryo kryo, Output output) {
    // ...
  }
    
  def read(Kryo kryo, Input input) {
    // ...
  }
}

and then

/** File spark3dKryoRegistrator.scala */
import com.esotericsoftware.kryo.Kryo

class spark3dKryoRegistrator implements KryoRegistrator {
  override def registerClasses(Kryo kryo) {
    // Instantiate your serializer
    val serializer = new SpatialSerializer()

    // Register your classes
    kryo.register(MyClass1.class, serializer)
    // etc...
  }
}

and finally

/** Main App */
val conf = new SparkConf()
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.kryo.registrator", classOf[spark3dKryoRegistrator].getName)

...

That would be good to have it implemented, and benchmarked.

This was referenced Jun 6, 2018
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant