Home  >  Article  >  Java  >  What to learn about java big data

What to learn about java big data

silencement
silencementOriginal
2019-05-29 17:06:506298browse

What to learn about java big data

For Java programmers, the mainstream big data platform hadoop is developed based on Java, so Java big data programmers have a smoother language environment, and there are many applications based on big data. The framework is also in Java, so mastering the Java language has certain advantages in many big data projects.

Of course, the core value of hadoop is to provide a distributed file system and distributed computing engine. For most companies, there is no need to modify this engine. At this time, in addition to being familiar with programming, you usually also need to learn some knowledge of data processing and data mining. Especially if you develop towards a data mining engineer, you need to master more algorithm-related knowledge.

For data mining engineers, although they also need to master programming tools, in most cases Hadoop is used as a platform and tool. With the help of the interfaces provided by this platform and tools, various scripting languages ​​are used for data processing and Data mining. Therefore, if you are going in the direction of data mining engineering, then it may be more important to be proficient in distributed programming languages ​​such as scala, spark-mllib, etc.

Learning roadmap for Java big data engineers:

Step one: Distributed computing framework

Master the hadoop and spark distributed computing framework, Understand the file system, message queue and Nosql database, and learn related components such as hadoop, MR, spark, hive, hbase, redies, kafka, etc.;

Step 2: Algorithms and tools

Learn to understand various data mining algorithms, such as classification, clustering, association rules, regression, decision trees, neural networks, etc., and be proficient in a data mining programming tool: Python or Scala. At present, mainstream platforms and frameworks have provided algorithm libraries, such as Mahout on Hadoop and Mllib on Spark. You can also start learning these algorithms by learning these interfaces and scripting languages.

Step Three: Mathematics

Supplementary Mathematics Knowledge: Advanced Mathematics, Probability Theory and Line Algebra

Step Four: Project Practice

1) Open source project: tensorflow: Google’s open source library, which already has more than 40,000 stars, which is amazing and supports mobile devices;

2) Participate in the data competition

3) Gain project experience through corporate internships

If you are only doing big data development and operation and maintenance, you can skip the second and third steps. If you are focusing on applying existing algorithms. For data mining, the third step can be skipped first.

The above is the detailed content of What to learn about java big data. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:Why is java io important?Next article:Why is java io important?