apache spark – Sparse Vector vs Dense Vector

Apache Spark – Sparse Vector Vs Dense Vector

apache spark – Sparse Vector vs Dense Vector

Unless I’ve completely misunderstood your doubt, the MLlib data type documentation illustrates this fairly clearly:

import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;

// Create a dense vector (1.0, 0.0, 3.0).
Vector dv = Vectors.dense(1.0, 0.0, 3.0);
// Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values similar to nonzero entries.
Vector sv = Vectors.sparse(3, new int[] {0, 2}, new double[] {1.0, 3.0});

Where the second argument of Vectors.sparse is an array of the indices, and the third argument is the array of the particular values in these indices.

Sparse vectors are when you have got loads of values within the vector as zero. While a dense vector is when many of the values within the vector are non zero.

If it’s important to create a sparse vector from the dense vector you specified, use the next syntax:

import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;

Vector sparseVector = Vectors.sparse(4, new int[] {1, 3}, new double[] {3.0, 4.0});

apache spark – Sparse Vector vs Dense Vector

Related posts on Apache spark :

Leave a Reply

Your email address will not be published. Required fields are marked *