Hive 实现 udf row_number 以及遇到的问题
为hive的每条数据添加row_number, 首先添加行号,必须考虑到数据必须放在一个reduce中去执行。
为hive的每条数据添加row_number, 首先添加行号,必须考虑到数据必须放在一个reduce中去执行。先上代码
package xx.xxxxx.hive.udf;
import org.apache.Hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
@UDFType(deterministic = false)
public class RowNumber extends UDF {
private static int MAX_VALUE = 50;
private static String comparedColumn[] = new String[MAX_VALUE];
private static int rowNum = 1;
public int evaluate(Object ...args) {
String columnValue[] = new String[args.length];
for (int i = 0; i columnValue[i] = args[i].toString();
}
if (rowNum == 1) {
for (int i = 0; i comparedColumn[i] = columnValue[i];
}
for (int i = 0; i if (!comparedColumn[i].equals(columnValue[i])) {
for (int j = 0; j comparedColumn[j] = columnValue[j];
}
rowNum = 1;
return rowNum++;
}
}
return rowNum++;
}
}
打包jar包,并创建函数。
add jar /home/hdbatch/jars/iclickhiveudf.jar;
create temporary function row_number as 'cn.iclick.hive.udf.RowNumber';
但是用法要注意,假设我要对一个表的数据进行标注行号,,两条sql语句,
create table test_tony as select row_number(1), tid from(select distinct tid from cookie where i_date=20131105)t order by tid;
上边这条语句会标注行号错误,会产生11个reduce,所以会打11份相同的row number,所以就会有错误,为什么会有不同的解释呢??? 看explain sql语句, 原因是编写non-deterministic的UDF时遇到的谓词下推错误。
具体详见:
更多详情见请继续阅读下一页的精彩内容:
Hive 的详细介绍:请点这里
Hive 的下载地址:请点这里
相关阅读:
基于Hadoop集群的Hive安装
Hive内表和外表的区别
Hadoop + Hive + Map +reduce 集群安装部署
Hive本地独立模式安装
Hive学习之WordCount单词统计

Hot AI Tools
Undresser.AI Undress
AI-powered app for creating realistic nude photos
AI Clothes Remover
Online AI tool for removing clothes from photos.
Undress AI Tool
Undress images for free
Clothoff.io
AI clothes remover
AI Hentai Generator
Generate AI Hentai for free.
Hot Article
Hot Tools
Notepad++7.3.1
Easy-to-use and free code editor
SublimeText3 Chinese version
Chinese version, very easy to use
Zend Studio 13.0.1
Powerful PHP integrated development environment
Dreamweaver CS6
Visual web development tools
SublimeText3 Mac version
God-level code editing software (SublimeText3)
Hot Topics
1378
52
Use Hive in Go language to implement efficient data warehouse
Jun 15, 2023 pm 08:52 PM
In recent years, data warehouses have become an integral part of enterprise data management. Directly using the database for data analysis can meet simple query needs, but when we need to perform large-scale data analysis, a single database can no longer meet the needs. At this time, we need to use a data warehouse to process massive data. Hive is one of the most popular open source components in the data warehouse field. It can integrate the Hadoop distributed computing engine and SQL queries and support parallel processing of massive data. At the same time, in Go language, use
Getting Started with PHP: PHP and Hive
May 20, 2023 am 08:33 AM
PHP is a widely used server-side programming language that is used in almost all industries. In this article, we will explore the special role of PHP in big data processing. Under certain circumstances, PHP can collaborate with ApacheHive to achieve real-time data processing and analysis. First, let’s introduce Hive. Hive is a Hadoop-based data warehouse solution. It can map structured data into SQL queries and execute the queries as MapReduce tasks.
PHP implements open source Hive big data analysis platform
Jun 18, 2023 pm 02:47 PM
As data processing becomes more and more important, big data analysis becomes more and more common. However, many companies may not want to spend a lot of money on a business analytics platform. Open source solutions offer these companies a viable option. In this article, we will discuss how to implement the open source Hive big data analysis platform using PHP. Hive is a Hadoop-based data warehouse system that can query and manage large-scale data sets on Hadoop through SQL. It uses the SQL-like HiveQL language to query
Microsoft releases fix for Behavior:Win32/Hive.ZY error in Windows Defender
Apr 28, 2023 pm 04:01 PM
A Microsoft official confirmed widespread reports that Google Chrome, ChromiumEdge, Discord and several other applications were flagged as "Behavior:Win32/Hive.ZY" by Microsoft's built-in antivirus software "WindowsDefender". The tech giant confirmed in a statement that it is working on a fix that will be rolled out to everyone in the next few hours. So what exactly is "Behavior:Win32/Hive.ZY"? According to a document posted on Microsoft's security portal, any file marked "Behavior:Win32/Hive.ZY" is
Microsoft Exchange Server attacked by Hive's 'windows.exe” ransomware
Apr 16, 2023 pm 01:28 PM
While keeping software updated and only downloading files from trusted sources are standard cybersecurity practices, given the recent increase in malware attacks, it's clear that more education is needed in this area. To that end, the Varonis forensics team has provided some guidance on how attackers using Hive ransomware are targeting Microsoft Exchange Server in their latest series of attacks. For those who don’t know, Hive follows a ransomware-as-a-service model. Although Microsoft is targeting E in 2021 for known vulnerabilities,
Centos7 installation and configuration Hive tutorial.
Feb 19, 2024 pm 02:21 PM
When installing and configuring Hive on CentOS7, you can follow these steps: Make sure Java is installed: First, make sure Java is installed on CentOS7. You can check whether Java is installed using the following command: java-version If Java is not installed, please install the appropriate Java version according to your needs. Download Hive: Visit the official website of ApacheHive () and download the latest stable version of Hive. Decompress the Hive compressed package: Use the following command to decompress the Hive compressed package: tarxvfzhive-x.x.x.tar.gz This will decompress Hive to the current directory. Configure environment variables: open the terminal,
How to fix Windows Defender behavior: Win32/Hive.ZY alert
May 06, 2023 am 08:04 AM
Many Windows 11 and 10 users are troubled by seeing warning notifications from Windows Defender stating that the threat "Behavior: Win32/Hive.ZY" has been detected. According to reports, this Windows Defender warning or alert is triggered when users try to open some commonly used applications such as Google Chrome or Chromium Edge, Whatsapp, Discord, and Spotify. Even if you have blocked this threat on your PC, it will pop up with a message MicrosoftDefenderAntivi the next time you open this affected application
How to read Hive database using Python?
May 09, 2023 pm 04:28 PM
The actual business code for reading the hive database importloggingimportpandasaspdfromimpala.dbapiimportconnectimportsqlalchemyfromsqlalchemy.ormimportsessionmakerimportosimporttimeimportosimportdatetimefromdateutil.relativedeltaimportrelativedeltafromtypingimportDict,Listimportloggingi


