What to learn about java big data-javaTutorial-php.cn

Java big data learning process.

What to learn about java big data

The first stage: Static web page basics (HTML CSS)

1. Difficulty level: one star

2. Comprehensive ability of project tasks in the technical knowledge point stage

3. Main technologies include:

html common tags, common CSS layouts, styles, positioning, etc., and static page design and production methods Wait

Second stage: JavaSE JavaWeb

1. Difficulty level: two stars

2. Technical knowledge point stage project task comprehensive ability

3. Main technologies include:

java basic syntax, java object-oriented (class, object, encapsulation, inheritance, polymorphism, abstract class, interface, common class, internal class, common modification symbols, etc.), exceptions, collections, files, IO, MYSQL (basic SQL statement operations, multi-table queries, subqueries, stored procedures, transactions, distributed transactions), JDBC, threads, reflection, Socket programming, enumerations, generics , Design pattern

4. The description is as follows:

is called Java basics, from shallow to deep technical points, real business project module analysis, and the design and implementation of multiple storage methods. This stage is the most important stage of the first four stages, because all subsequent stages are based on this stage, and it is also the stage with the highest density of learning big data. This stage will be the first time for the team to develop and produce real projects with front and backends (the first stage of technology and the second stage of comprehensive application of technology).

The third stage: front-end framework

1. Difficulty and easy procedures: two stars

2. Technical knowledge point stage project task comprehensive ability

3. Main technologies include:

Java, Jquery, annotation reflection are used together, XML and XML parsing, parsing dom4j, jxab, jdk8.0 new features, SVN, Maven, easyui

4. The description is as follows:

Based on the first two stages, we can turn static into dynamic, which can make the content of our web pages richer. Of course, if from the perspective of marketers, there are professional front-end designers , Our goal in designing this stage is that front-end technology can more intuitively exercise people's thinking and design capabilities. At the same time, we also integrate the advanced features of the second stage into this stage. Taking learners to the next level.

The fourth stage: enterprise-level development framework

1. Difficult and easy procedures: three stars

3.Main technologies include:

Hibernate, Spring, SpringMVC, log4j slf4j integration, myBatis, struts2, Shiro, redis, process engine activity, crawler technology nutch, lucene, webService CXF, Tomcat cluster and hot standby, MySQL read and write separation

The fifth stage: First introduction to big data

1. Difficulty level: three stars

2. Comprehensive ability of project tasks in the technical knowledge point stage

3. Main technologies include:

Part 1 of big data (what is big data, application scenarios, how to learn big databases, virtual machine concepts and installation, etc.), common Linux commands (file management, system management, disk management), Linux Shell programming (SHELL variables, loop control, applications), getting started with Hadoop (Hadoop composition, stand-alone environment, directory structure, HDFS interface, MR interface, simple SHELL, java access hadoop), HDFS (introduction, SHELL, Use of IDEA development tools, fully distributed cluster construction), MapReduce applications (intermediate calculation process, Java operation MapReduce, program running, log monitoring), Hadoop advanced applications (YARN framework introduction, configuration items and optimization, CDH introduction, environment construction), Extension (MAP side optimization, COMBINER usage method, see TOP K, SQOOP export, snapshots of other virtual machine VMs, permission management commands, AWK and SED commands)

4. The description is as follows:

This stage is designed to allow newcomers to have a relatively big concept of big data. How to deal with it? After studying JAVA in the pre-requisite course, you will be able to understand how the program runs on a stand-alone computer. Now, what about big data? Big data is processed by running programs on a cluster of large-scale machines. Of course, big data requires data processing, so similarly, data storage changes from single-machine storage to large-scale cluster storage on multiple machines. (You ask me what is a cluster? Well, I have a big pot of rice. I can finish it by myself, but it will take a long time. Now I ask everyone to eat together. When I am alone, I call it people, but when there are more people? Is it called a crowd? ) Then big data can be roughly divided into: big data storage and big data processing. So at this stage, our course designed the standard of big data: HADOOP. The operation of big data is not that we often use WINDOWS. 7 or W10, but the most widely used system now: LINUX.

The sixth stage: big data database

1.Difficulty level: four stars

2.Technical knowledge point stage project task comprehensive ability

3. Main technologies include: Getting started with Hive (Introduction to Hive, Hive usage scenarios, environment construction, architecture description, working mechanism), Hive Shell programming (table creation, query statements, partitioning and bucketing, index management and views), Hive advanced application (DISTINCT implementation, groupby, join, SQL conversion principle, Java programming, configuration and optimization), introduction to hbase, Hbase SHELL programming (DDL, DML, Java operation table creation, query, compression, filter), detailed description of Hbase Modules (REGION, HREGION SERVER, HMASTER, ZOOKEEPER introduction, ZOOKEEPER configuration, Hbase and Zookeeper integration), HBASE advanced features (read and write processes, data models, schema design read and write hotspots, optimization and configuration)

4. The description is as follows:

This stage is designed to allow everyone to understand how big data handles large-scale data. Simplify our programming time and increase reading speed.

How to simplify it? In the first stage, if complex business correlation and data mining are required, it is very complicated to write MR programs by yourself. So at this stage we introduced HIVE, a data warehouse in big data. There is a keyword here, data warehouse. I know you are going to ask me, so let me first say that the data warehouse is used for data mining and analysis. It is usually a very large data center. The data is stored in large databases such as ORACLE and DB2. These databases are usually Used as a real-time online business. In short, analyzing data based on data warehouse is relatively slow. But the convenience is that as long as you are familiar with SQL, it is relatively easy to learn, and HIVE is such a tool, a SQL query tool based on big data. This stage also includes HBASE, which is a database in big data. I'm confused, haven't you learned about a data "warehouse" called HIVE? HIVE is based on MR, so the query is quite slow. HBASE is based on big data and can perform real-time data query. One for analysis, the other for query.

The seventh stage: real-time data collection

1. Difficult and easy procedures: four stars

2. Technical knowledge point stage project task comprehensive ability

3. Main technologies include:

Flume log collection, KAFKA introduction (message queue, application scenarios, cluster construction), KAFKA detailed explanation (partition, topic, receiver, sender, and ZOOKEEPER Integration, Shell development, Shell debugging), advanced use of KAFKA (java development, main configuration, optimization projects), data visualization (introduction to graphics and charts, CHARTS tool classification, bar charts and pie charts, 3D charts and maps), introduction to STORM ( Design ideas, application scenarios, processing procedures, cluster installation), STROM development (STROM MVN development, writing STORM local programs), STORM advancement (java development, main configuration, optimization projects), KAFKA asynchronous sending and batch sending timeliness, KAFKA global Messages are in order, STORM multi-concurrency optimization

4. The description is as follows:

The data source in the previous stage is based on the existing large-scale data set, and the results after data processing and analysis There is a certain delay, and the data usually processed is the data of the previous day. Example scenarios: website anti-hotlinking, customer account anomalies, and real-time credit reporting. What if these scenarios are analyzed based on the data from the previous day? Is it too late? Therefore, in this stage we introduced real-time data collection and analysis. It mainly includes: FLUME real-time data collection, which supports a wide range of collection sources, KAFKA data reception and transmission, STORM real-time data processing, and data processing at the second level.

The eighth stage: SPARK data analysis

1. Difficulty and easy procedures: five stars

2. Comprehensive ability of project tasks in the technical knowledge point stage

3. Main technologies include: SCALA introduction (data types, operators, control statements, basic functions), SCALA advanced (data structures, classes, objects, traits, pattern matching, regular expressions), SCALA Advanced usage (higher-order functions, Corey functions, partial functions, tail iterations, built-in higher-order functions, etc.), introduction to SPARK (environment construction, infrastructure, operating mode), Spark data sets and programming models, SPARK SQL, SPARK advanced Stage (DATA FRAME, DATASET, SPARK STREAMING principle, SPARK STREAMING support source, integrated KAFKA and SOCKET, programming model), SPARK advanced programming (Spark-GraphX, Spark-Mllib machine learning), SPARK advanced application (system architecture, main configuration and Performance optimization, fault and stage recovery), SPARK ML KMEANS algorithm, SCALA implicit conversion advanced features

4. The description is as follows:

Let’s also talk about the previous stages, mainly the first stage. HADOOP is relatively slow in analyzing large-scale data sets based on MR, including machine learning, artificial intelligence, etc. And it is not suitable for iterative calculations. SPARK is analyzed as a substitute product for MR. How to replace it? Let’s talk about their operating mechanisms first. HADOOP is based on disk storage analysis, while SPARK is based on memory analysis. You may not understand when I say this, but to be more descriptive, just like if you are taking a train from Beijing to Shanghai, MR is a green train, and SPARK is a high-speed rail or maglev. SPARK is developed based on the SCALA language. Of course, it has the best support for SCALA, so we first learn the SCALA development language in the course. What? Want to learn another development language? No no no! ! ! Let me just say one thing: SCALA is based on JAVA. From historical data storage and analysis (HADOOP, HIVE, HBASE) to real-time data storage (FLUME, KAFKA) and analysis (STORM, SPARK), these are all interdependent in real projects.

The above is the detailed content of What to learn about java big data. For more information, please follow other related articles on the PHP Chinese website!