Aggregation
Preface
Aggregation can be regarded as the most difficult part of ES, and its API is also the most anti-human, and everyone understands it. For the support of aggregation, SpringData-ElasticSearch directly gave up, and EE's simplification of ES aggregation is also relatively limited. Nevertheless, it can be regarded as the framework that supports ES aggregation among the open source frameworks on the market. Everyone is using it. When you have time, please spray lightly. There are thousands of permutations and combinations of aggregation methods, and data analysis of flexible tree buckets. We really tried our best. We advise everyone to abandon their illusions and prepare for challenges. This is not MySQL. ES aggregation and MySQL aggregation are two completely different things. The aggregation you see now is definitely not the final form of EE's support for aggregation. We will continue to explore further simplifying aggregation to ease the troubles of users using ES aggregation functions.
# regular aggregation
In MySQL, we can perform group by aggregation by specifying fields, and EE also supports similar aggregation:
@Test
public void testGroupBy() {
LambdaEsQueryWrapper<Document> wrapper = new LambdaEsQueryWrapper<>();
wrapper.likeRight(Document::getContent,"Push");
wrapper.groupBy(Document::getCreator);
// support multi-field aggregation
// wrapper.groupBy(Document::getCreator,Document::getCreator);
SearchResponse response = documentMapper.search(wrapper);
System.out.println(response);
}
2
3
4
5
6
7
8
9
10
提示
Although the syntax is consistent with MP, in fact, the aggregated result of ES is placed in a separate object, and the format is as follows. Therefore, our higher-order syntax needs to use SearchResponse to receive the returned result, which needs to be different from MP and MySQL. .
"aggregations":{"sterms#creator":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":"Old Man","doc_count":2},{"key": "Pharaoh","doc_count":1}]}}
# Other aggregates:
// find the minimum value
wrapper.min();
// find the maximum value
wrapper.max();
// find the average
wrapper.avg();
// sum
wrapper.sum();
2
3
4
5
6
7
8
If you need to groupBy first, and then based on the data in the bucket after groupBy aggregation, the maximum value, mean value, etc. are also supported, and the pipeline aggregation will be performed according to the order you specified in the wrapper.
Example:
@Test
public void testAgg() {
// Aggregate according to the creator, and aggregate again according to the number of likes in the bucket after the aggregation
// Note: The specified multiple aggregation parameters are pipeline aggregation, which is the result of the aggregation of the first aggregation parameter, and then aggregated according to the second parameter, corresponding to the Pipeline aggregation
LambdaEsQueryWrapper<Document> wrapper = new LambdaEsQueryWrapper<>();
wrapper.eq(Document::getTitle, "Old man")
.groupBy(Document::getCreator)
.max(Document::getStarNum);
SearchResponse response = documentMapper.search(wrapper);
System.out.println(response);
}
2
3
4
5
6
7
8
9
10
11
In addition, EE also provides a parameter to configure whether to enable pipeline aggregation. The default is to enable. If you want the results of multiple field aggregations to appear in their respective buckets, you can specify the enablePipeline parameter to false.
@Test
public void testAggNotPipeline() {
// For the following two fields, if you don't want to aggregate in the pipeline, and the results of each aggregation are displayed in their respective buckets, we also provide support
LambdaEsQueryWrapper<Document> wrapper = new LambdaEsQueryWrapper<>();
// Specify enable pipeline aggregation as false
wrapper.groupBy(false, Document::getCreator, Document::getTitle);
SearchResponse response = documentMapper.search(wrapper);
System.out.println(response);
}
2
3
4
5
6
7
8
9
# Deduplication aggregation
In order to facilitate user deduplication, we provide a very friendly method for field deduplication, which solves the trouble that users need to write a lot of code for deduplication and paging according to fields. It only takes one line to use ee single field to deduplicate Get it done!
API:
// Deduplication, the input parameter is deduplication
wrapper.distinct(R column);
2
Below I use a piece of code to demonstrate deduplication based on specified fields:
@Test
public void testDistinct() {
// Query all documents with the title of the old man, deduplicate according to the creator, and return in pagination
LambdaEsQueryWrapper<Document> wrapper = new LambdaEsQueryWrapper<>();
wrapper.eq(Document::getTitle, "Old man")
.distinct(Document::getCreator);
PageInfo<Document> pageInfo = documentMapper.pageQuery(wrapper, 1, 10);
System.out.println(pageInfo);
}
2
3
4
5
6
7
8
9
bad taste
For multi-field deduplication support, it is not as simple as the above, because multi-field deduplication cannot be achieved by folding, the data will be placed in the bucket and returned, the analysis of the data in the bucket, which fields are needed, the sorting rules, and the coverage rules. is too flexible, we can't help users shield these through the framework, and there is no framework on the market that can support it. Therefore, we only support the encapsulation of query conditions for the deduplication of multiple fields, and the data parsing part needs to be completed by the user. Please understand. Fortunately, there are not too many scenarios for multi-field deduplication. If users are useful to multi-field deduplication, please refer to the introduction to groupBy at the beginning of this article. GroupBy can be used to deduplicate multiple fields.