We have been waiting for a long time for lambda to bring the concept of closure to Java, but if we do not use it in collections, we will lose a lot of value. The problem of migrating existing interfaces to lambda style has been solved through default methods. In this article, we will deeply analyze the bulk data operation (bulk operation) in Java collections and unravel the mystery of the most powerful role of lambda.
1. About JSR335
JSR is the abbreviation of Java Specification Requests, which means Java specification request. The main improvement of Java 8 version is the Lambda project (JSR 335), whose purpose is to make Java more Easy to write code for multi-core processors. JSR 335=lambda expression + interface improvement (default method) + batch data operation. Together with the previous two articles, we have completely learned the relevant content of JSR335.
2. External VS internal iteration
In the past, Java collections were not able to express internal iteration, but only provided one way of external iteration, that is, a for or while loop.
List persons = asList(new Person("Joe"), new Person("Jim"), new Person("John")); for (Person p : persons) { p.setLastName("Doe"); }
The above example is our previous approach, which is the so-called external iteration. The loop is a fixed sequence loop. In today's multi-core era, if we want to parallelize the loop, we have to modify the above code. How much the efficiency can be improved is still uncertain, and it will bring certain risks (thread safety issues, etc.).
To describe internal iteration, we need to use a class library like Lambda. Let’s use lambda and Collection.forEach to rewrite the above loop
persons.forEach(p->p.setLastName("Doe"));
Now the jdk library controls the loop. We don't need to care about how the last name is set to each person object. The library can decide how to do it according to the running environment, parallel, out-of-order or lazy loading. This is internal iteration, and the client passes the behavior p.setLastName as data into the api.
Internal iteration is actually not closely related to the batch operation of the collection. With its help, we can feel the changes in grammatical expression. The really interesting thing related to batch operations is the new stream API. The new java.util.stream package has been added to JDK 8.
3.Stream API
Stream only represents a data stream and has no data structure, so after it has been traversed once, it can no longer be accessed. Traversal (you need to pay attention to this when programming, unlike Collection, there is still data in it no matter how many times it is traversed), its source can be Collection, array, io, etc.
3.1 Intermediate and end-point methods
The function of the stream is to provide an interface for operating big data, making data operations easier and faster. It has methods such as filtering, mapping, and reducing the number of traversals. These methods are divided into two types: intermediate methods and terminal methods. The "stream" abstraction should be continuous by nature. Intermediate methods always return a Stream, so if we want to get the final result If so, endpoint operations must be used to collect the final results produced by the stream. The difference between these two methods is to look at its return value. If it is a Stream, it is an intermediate method, otherwise it is an end method. Please refer to Stream's API for details.
A brief introduction to several intermediate methods (filter, map) and end-point methods (collect, sum)
3.1.1Filter
The first step is to implement the filtering function in the data stream The most natural operation we can think of. The Stream interface exposes a filter method, which accepts a Predicate implementation representing an operation to use a lambda expression that defines filter conditions.
List persons = … Stream personsOver18 = persons.stream().filter(p -> p.getAge() > 18);//过滤18岁以上的人
3.1.2Map
Suppose we filter some data now, such as when converting objects. The Map operation allows us to execute an implementation of Function (the generic T and R of Function
Stream adult= persons .stream() .filter(p -> p.getAge() > 18) .map(new Function() { @Override public Adult apply(Person person) { return new Adult(person);//将大于18岁的人转为成年人 } });
Now, convert the above example into a lambda expression:
Stream map = persons.stream() .filter(p -> p.getAge() > 18) .map(person -> new Adult(person));
3.1.3Count
The count method is the end-point method of a stream, which can make the final statistics of the stream results and return int. For example, we calculate the total number of people who are 18 years old or older:
int countOfAdult=persons.stream() .filter(p -> p.getAge() > 18) .map(person -> new Adult(person)) .count();
3.1.4Collect
The collect method is also an end-point method of the stream, which can collect the final results
List adultList= persons.stream() .filter(p -> p.getAge() > 18) .map(person -> new Adult(person)) .collect(Collectors.toList());
Or, if we want to use a specific implementation class to collect the results:
List adultList = persons .stream() .filter(p -> p.getAge() > 18) .map(person -> new Adult(person)) .collect(Collectors.toCollection(ArrayList::new));
The space is limited, other intermediate The methods and endpoint methods will not be introduced one by one. After reading the above examples, you only need to understand the difference between these two methods, and you can decide to use them according to your needs later.
3.2 Sequential Stream and Parallel Stream
Each Stream has two modes: sequential execution and parallel execution.
Sequential flow:
Listpeople = list.getStream.collect(Collectors.toList());
Parallel flow:
Listpeople = list.getStream.parallel().collect(Collectors.toList());
As the name suggests, when traversing in a sequential manner, read each item before reading the next item. When using parallel traversal, the array will be divided into multiple segments, each of which is processed in a different thread, and then the results are output together.
3.2.1 Parallel stream principle:
List originalList = someData; split1 = originalList(0, mid);//将数据分小部分 split2 = originalList(mid,end); new Runnable(split1.process());//小部分执行操作 new Runnable(split2.process()); List revisedList = split1 + split2;//将结果合并
3.2.2 Sequential and parallel performance test comparison
If it is a multi-core machine, theoretically parallel stream will be faster than sequential stream Double the above, the following is the test code
long t0 = System.nanoTime(); //初始化一个范围100万整数流,求能被2整除的数字,toArray()是终点方法 int a[]=IntStream.range(0, 1_000_000).filter(p -> p % 2==0).toArray(); long t1 = System.nanoTime(); //和上面功能一样,这里是用并行流来计算 int b[]=IntStream.range(0, 1_000_000).parallel().filter(p -> p % 2==0).toArray(); long t2 = System.nanoTime(); //我本机的结果是serial: 0.06s, parallel 0.02s,证明并行流确实比顺序流快 System.out.printf("serial: %.2fs, parallel %.2fs%n", (t1 - t0) * 1e-9, (t2 - t1) * 1e-9);
3.3 About Folk/Join framework
Application hardware parallelism is available in Java 7, that is, one of the new features of the java.util.concurrent package is a fork-join style parallel decomposition framework, which is also very powerful and efficient. Interested students Go and research, I won’t go into details here. Compared to Stream.parallel(), I prefer the latter.
4. Summary
If there is no lambda, Stream is quite awkward to use. It will generate a large number of anonymous internal classes, such as the 3.1.2map example above. If there is no default method, the collection framework will change. It is bound to cause a lot of changes, so lambda+default method makes the jdk library more powerful and flexible. The improvements of Stream and collection framework are the best proof.
For more Java8 new features, what are the uses of lambda expressions (usage examples) and related articles, please pay attention to the PHP Chinese website!