How to Split a Vector Column into Rows in PySpark?

Patricia Arquette
Release: 2024-10-31 20:10:01
Original
422 people have browsed it

How to Split a Vector Column into Rows in PySpark?

Splitting a Vector Column into Rows in PySpark

In PySpark, splitting a column containing vector values into separate columns for each dimension is a common task. This article will guide you through different approaches to achieve this:

Spark 3.0.0 and Above

Spark 3.0.0 introduced the vector_to_array function, simplifying this process:

<code class="python">from pyspark.ml.functions import vector_to_array

df = df.withColumn("xs", vector_to_array("vector"))</code>
Copy after login

You can then select the desired columns:

<code class="python">df.select(["word"] + [col("xs")[i] for i in range(3)])</code>
Copy after login
Copy after login

Spark Less Than 3.0.0

Approach 1: Converting to RDD

<code class="python">def extract(row):
    return (row.word, ) + tuple(row.vector.toArray().tolist())

df.rdd.map(extract).toDF(["word"])  # Vector values will be named _2, _3, ...</code>
Copy after login

Approach 2: Using a UDF

<code class="python">from pyspark.sql.functions import udf, col
from pyspark.sql.types import ArrayType, DoubleType

def to_array(col):
    def to_array_(v):
        return v.toArray().tolist()
    return udf(to_array_, ArrayType(DoubleType())).asNondeterministic()(col)

df = df.withColumn("xs", to_array(col("vector")))</code>
Copy after login

Select the desired columns:

<code class="python">df.select(["word"] + [col("xs")[i] for i in range(3)])</code>
Copy after login
Copy after login

By implementing any of these methods, you can effectively split a vector column into individual columns, making it easier to work with and analyze your data.

The above is the detailed content of How to Split a Vector Column into Rows in PySpark?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!