Aggregation
preface
Aggregation can be regarded as the most difficult part of ES, and its API is also the most anti-human. SpringData-ElasticSearch gave up the support of aggregation directly, and EE's simplification of its ES aggregation was limited. Nevertheless, it can be regarded as the best framework supporting ES aggregation in the open source framework on the market at present. Please spray lightly when using it. We really tried our best to arrange and combine thousands of aggregation methods and analyze the data of flexible tree barrels. We advise you to abandon your illusions and prepare for the challenge. This is not MySQL. The aggregation of ES and the aggregation of MySQL are completely different things. The aggregation you see now is definitely not the final form of EE's support for aggregation, and we will continue to explore further simplifying aggregation to alleviate the troubles of users using ES aggregation function.
# General aggregation
In MySQL, we can perform group by aggregation by specifying fields, and EE also supports similar aggregation:
@Test
public void testGroupBy() {
LambdaEsQueryWrapper<Document> wrapper = new LambdaEsQueryWrapper<>();
Wrapper.likeright (document:: getcontent, "push");
wrapper.groupBy(Document::getCreator);
//Support multi-field aggregation
// wrapper.groupBy(Document::getCreator,Document::getCreator);
SearchResponse response = documentMapper.search(wrapper);
System.out.println(response);
}
2
3
4
5
6
7
8
9
10
提示
Although the syntax is consistent with MP, in fact, the aggregation results of ES are placed in a separate object, and the format is as follows. Therefore, our high-level syntax needs to use SearchResponse to receive the returned results, which needs to be different from MP and MySQL.
"aggregations": {"solids # creator": {"doc _ count _ error _ upper _ bound": 0, "sum _ other _ doc _ count": 0, "buckets": [{"key": "old man", "doc _ count": 2.
# Other aggregations:
//Find the minimum value
wrapper.min();
//Find the maximum value
wrapper.max();
//average
wrapper.avg();
//Sum
wrapper.sum();
2
3
4
5
6
7
8
If it is necessary to groupBy first, and then calculate the maximum value and average value according to the data in the bucket after groupBy aggregation, it is also supported, and pipeline aggregation will be performed according to the order you specified in the wrapper.
Example:
@Test
public void testAgg() {
//Aggregate according to the creator, and aggregate again according to the number of likes in the bucket after aggregation.
//Note: The specified aggregation parameters are Pipeline aggregation, which is the result after the first aggregation parameter is aggregated, and then aggregated according to the second parameter, corresponding to pipeline aggregation.
LambdaEsQueryWrapper<Document> wrapper = new LambdaEsQueryWrapper<>();
Wrapper.eq(Document::getTitle, "old man")
.groupBy(Document::getCreator)
.max(Document::getStarNum);
SearchResponse response = documentMapper.search(wrapper);
System.out.println(response);
}
2
3
4
5
6
7
8
9
10
11
Set the aggregation bucket size and sorting within the bucket
@Test
@Order(6)
public void testConditionGroupBy() {
LambdaEsQueryWrapper<Document> wrapper = new LambdaEsQueryWrapper<>();
wrapper.match(Document::getContent, "Test")
.size(20) //Set the aggregation bucket size
.bucketOrder(BucketOrder.count(Boolean.TRUE)) //Set bucket sorting rules
.groupBy(Document::getStarNum);
SearchResponse response = documentMapper.search(wrapper);
System.out.println(response);
}
2
3
4
5
6
7
8
9
10
11
In addition, EE also provides a parameter that can be configured whether to open pipeline aggregation. By default, it is on. If you want the results of multiple field aggregation to appear in their own buckets, you can specify the enablePipeline parameter as false.
@Test
public void testAggNotPipeline() {
//For the following two fields, if you don't want to aggregate by pipeline, and the results of their aggregation are displayed in their own buckets, we also provide support.
LambdaEsQueryWrapper<Document> wrapper = new LambdaEsQueryWrapper<>();
//Specifies that enabling pipe aggregation is false.
wrapper.groupBy(false, Document::getCreator, Document::getTitle);
SearchResponse response = documentMapper.search(wrapper);
System.out.println(response);
}
2
3
4
5
6
7
8
9
# Depolymerization
in order to facilitate users to duplicate, we provide a very friendly way to duplicate fields, which solves the trouble that users have to write a lot of code to duplicate according to fields and pagination. It only takes one line to duplicate using ee single fields!
API:
//De-duplication means de-duplication.
wrapper.distinct(R column);
2
Below I use a piece of code to demonstrate the deduplication according to the specified field:
@Test
public void testDistinct() {
//Query all documents titled Old Man, remove duplicates according to the creator, and return them in pages.
LambdaEsQueryWrapper<Document> wrapper = new LambdaEsQueryWrapper<>();
Wrapper.eq(Document::getTitle, "old man")
.distinct(Document::getCreator);
PageInfo<Document> pageInfo = documentMapper.pageQuery(wrapper, 1, 10);
System.out.println(pageInfo);
}
2
3
4
5
6
7
8
9
bad taste
For the support of multi-field deduplication, it is not as simple as the above, because multi-field deduplication cannot be realized by folding, and the data will be put into the bucket and returned. The analysis of data in the bucket, which fields are needed, sorting rules, and coverage rules are too flexible, and we can't help users shield these through the framework. At present, there is no framework in the market to support it. Therefore, we only support the encapsulation of query conditions for multi-field deduplication. The data analysis part needs to be completed by users themselves, so please forgive me. Fortunately, there are not many scenes of multi-field deduplication. If users use multi-field deduplication, please refer to the introduction of groupBy at the beginning of this article, and multi-field deduplication can be realized through groupBy.