Index hosting model
Preface
ES is difficult to use, indexes bear the brunt, index creation and update is not only complex, but also difficult to maintain, once the index has changed, you must face the index reconstruction brought about by service downtime and data loss and other problems ... Although the ES official index aliasing mechanism to solve the problem, but the threshold is still very high, the steps are cumbersome, in the production environment by manual operation is very easy to make mistakes to bring serious problems. In order to solve these pain points, Easy-Es provides a variety of strategies to completely free the user from the maintenance of the index, we provide a variety of index processing strategies to meet the individual needs of different users. Through the first experience of indexing, I believe you can also appreciate the maturity and ease of use of EE. The fully automatic smoothing mode, for the first time the world's leading "brother you do not have to move, EE I fully automatic" mode, index creation, update, data migration and all the full life cycle without user intervention, fully automated by EE to complete the process, zero downtime, even the index type can be automatically deduced intelligently, a one-stop service, to your satisfaction. The world's first open source, fully borrowed from the JVM garbage collection algorithm ideas, unprecedented, although the Internet has a smooth transition program, but not fully automated, the process still relies on manual intervention, I advocate for EE, please rest assured that the index hosted to EE, the index will only be completely migrated successfully to delete the old index, otherwise there will be no impact on the original index and data, any accidents can retain the original index and data, so the safety factor is very high. Data, so the safety factor is very high.
Warning: Newbies can try to choose the manual one-click mode when going on the road. To pursue the stability of the production environment, it is recommended that you use the manual mode. Our manual mode also provides a very friendly one-click creation function, which is also sweet to use. of. The automatic transmission mode is not recommended for production. Taking autonomous driving as an example, the current automatic transmission can probably reach the L2+ level of assistance. Because most developers cannot weigh the migration time well, the maximum number of retries and the retry interval configuration are not ideal. The migration failed, so it is not recommended to use the automatic mode in the production environment. This mode is only used as a trial version for the development environment to reduce the burden on developers. Otherwise, this framework will not be responsible for the negative impact of index migration on the production environment data and machine load. .
# mode one: automatic hosting of smooth mode (automatic gears - snow mode) default open this mode
In this mode, the whole life cycle of index creation and data migration can be done without any operation by the user, with zero downtime and no perception by the user, enabling a smooth transition in the production environment, similar to the automatic gearbox-snow mode of a car, smooth and comfortable, completely freeing the user to enjoy the fun of automatic driving! It is worth noting that in auto-hosting mode, the system will automatically generate an index called ee-distribute-lock, which is used internally by the framework and can be ignored by the user, and if unfortunately a deadlock occurs in a very small probability due to other factors such as power failure, the index can be deleted. In addition, when using the index changes, the original index name may be appended with the suffix _s0 or _s1, do not panic, this is a fully automatic smooth migration of zero downtime must go through, the index suffix does not affect the use of the framework will automatically activate the new index. About _s0 and _s1 suffix, in this mode can not be avoided, because to retain the original index data migration, and can not exist at the same time two indexes of the same name, where there is a price to pay, if you do not recognize this way of processing, you can continue to look down, there is always a suitable for you.
The core processing flow is sorted out as shown in the following diagram, may be combined with the source code to see, easier to understand:
! smooth mode.png (opens new window)
# Mode 2: Auto-hosted non-smooth mode (auto-motion mode)
In this mode, index creation and update is done fully automatically and asynchronously by EE, but does not deal with data migration work, very fast similar to the auto-motion mode of the car, simple and brutal, ejection start! Suitable for use in development and testing environments, of course, if you use other tools such as logstash to synchronize data, you can also turn on this mode in the production environment, in this mode will not appear _s0 and _s1 suffix, the index will remain the original name.
! non-smooth-mode.png (opens new window)
提示
The above two automatic mode, the index information mainly relies on the entity class, if the user does not have any configuration of the entity class, EE can still be based on the field type of intelligent inference of the field in the ES storage type, this can further reduce the burden of developers, the new to ES is the gospel of the white man.
Of course, automatic inference by the framework is not enough, we still recommend that you try to use the detailed configuration so that the framework can automatically create a production-level index. For example, for example, String type field, the framework can not infer that your actual query on the field is an exact query or a split-word query, so it can not infer the field in the end with the keyword type or text type, if the text type, the user expects the splitter is what? All of this requires the user to tell the framework through the configuration, otherwise the framework can only be created by default, and then will not be able to do your expectations.
automatically infer the type priority < the type priority specified by the user through the annotation
Automatically infer the mapping table:
JAVA | ES |
---|---|
byte | byte |
short | short |
int | integer |
long | long |
float | float |
double | double |
BigDecimal | keyword |
char | keyword |
String | keyword |
boolean | boolean |
Date | date |
LocalDate | date |
LocalDateTime | date |
List | text |
... | ... |
Best practice example in "automatic" mode:
@Data
@IndexName(shardsNum = 3,replicasNum = 2) // can specify the number of shards, replicas, if the default is the default is 1
public class Document {
/*
* If you want to customize the id in es to the id you provide, such as the id in MySQL, please specify the type in the annotation as customize, so that the id will support any data type)
*/
@IndexId(type = IdType.CUSTOMIZE)
private Long id;
/* ***
* Document title, not specified type is created as keyword_text type by default, can be queried precisely
*/private
private String title;
private String title; /**
* The content of the document, specifying the type and store/query separator
*/
@HighLight(mappingField="highlightContent")
@IndexField(fieldType = FieldType.TEXT, analyzer = Analyzer.IK_SMART, searchAnalyzer = Analyzer.IK_MAX_WORD)
private String content;
/**
* The author adds @TableField annotation, and specifies strategy = FieldStrategy.NOT_EMPTY to indicate that the update strategy is to update when the creator is not an empty string
*/
@IndexField(strategy = FieldStrategy.NOT_EMPTY)
private String creator;
/**
* Creation time
*/
@IndexField(fieldType = FieldType.DATE, dateFormat = "yyyy-MM-dd HH:mm:ss")
private Date gmtCreate;
/**
* A field that does not exist in es, but is added to the model, so that it does not map to es, you can add the annotation @TableField to this type of field, and specify exist=false
*/
@IndexField(exist = false)
private String notExistsField;
/**
* Geographical location latitude and longitude coordinates e.g.: "40.13933715136454,116.63441990026217"
*/
@IndexField(fieldType = FieldType.GEO_POINT)
private String location;
/**
* Graphs (e.g., circle, rectangle)
*/
@IndexField(fieldType = FieldType.GEO_SHAPE)
private String geoLocation;
/**
* Custom field name
*/
@IndexField(value = "wu-la")
private String customField;
/**
* Highlight the field whose return value is mapped
*/
private String highlightContent;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# # mode three: manual mode (manual gear)
In this mode, all the maintenance of the index EE framework are not involved, the user to deal with, EE provides out-of-the-box index CRUD-related API, you can choose to use the API to manually maintain the index, due to a high degree of perfection of the API, although it is a manual block, but the use is still simple to burst, a line of code to deal with index creation. Of course you can also maintain the index through es-head and other tools, in short, in this mode, you have a higher degree of freedom, more suitable for those who question the EE framework of conservative users or the pursuit of ultimate flexibility of the user to use, similar to the car's manual gear
In manual mode, EE provides the following API for users to easily call.
- indexName needs to be manually specified by the user
- Object Wrapper is a conditional constructor
// Get index information
GetIndexResponse getIndex();
// Get the specified index information
GetIndexResponse getIndex(String indexName);
// If or not the index exists
Boolean existsIndex(String indexName);
// Create an index based on entities and custom annotations with one key
Boolean createIndex();
// Create an index
Boolean createIndex(LambdaEsIndexWrapper<T> wrapper);
// Update an index
Boolean updateIndex(LambdaEsIndexWrapper<T> wrapper);
// Delete the specified index
Boolean deleteIndex(String indexName);
2
3
4
5
6
7
8
9
10
11
12
13
14
! ManualMode.png (opens new window)
The above API, we only demonstrate the creation of indexes, the other is too simple to be repeated here, if necessary, you can move to the source code test module to see. Create index manually through the API, we provide two ways
-One way: according to the entity class and custom annotations to create a key (recommended), 99.9% of scenarios applicable
/**
* Entity class information
**/
@Data
@IndexName(shardsNum = 3, replicasNum = 2, keepGlobalPrefix = true)
public class Document {
/**
* If you want to customize the id in es to the id you provide, such as the id in MySQL, please specify the type in the annotation as customize or directly in the global configuration file, so that the id will support any data type)
*/
@IndexId(type = IdType.CUSTOMIZE)
private String id;
/**
* Document title, not specified type is created as keyword type by default, can be queried precisely
*/private
private String title;
private String title; /**
* The content of the document, specifying the type and store/query separator
*/
@HighLight(mappingField = "highlightContent")
@IndexField(fieldType = FieldType.TEXT, analyzer = Analyzer.IK_SMART, searchAnalyzer = Analyzer.IK_MAX_WORD)
private String content;
// omit other fields...
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@Test
public void testCreateIndexByEntity() {
// Then create it directly with one click through the mapper of the entity class, very foolproof
documentMapper.createIndex();
}
```''
:::tip
The use of annotations in the entity class can be found in the annotations section, which is overall rather silly and highly similar to the use of annotations in MP.
:::
-Way II: created by api, each field to be indexed needs to be processed, more tedious, but the best flexibility, support all es can support the creation of all indexes, for 0.01% scenario (not recommended)
```java
@Test
public void testCreatIndex() {
LambdaEsIndexWrapper<Document> wrapper = new LambdaEsIndexWrapper<>();
// For simplicity's sake, keep the index name the same as the entity class name, in lowercase.
wrapper.indexName(Document.class.getSimpleName().toLowerCase());
// here the title of the article mapped to keyword type (does not support word separation), the document content mapped to text type, can be defaulted
// support for split-word query, content splitter can be specified, query splitter can also be specified,, can be default or specify only one of them, not specified is the ES default splitter (standard)
wrapper.mapping(Document::getTitle, FieldType.KEYWORD)
.mapping(Document::getContent, FieldType.TEXT,Analyzer.IK_MAX_WORD,Analyzer.IK_MAX_WORD);
// If the above simple mapping does not meet your business needs, you can customize the mapping
// wrapper.mapping(Map);
// set the shard and replica information, 3 shards, 2 replicas, can be defaulted
wrapper.settings(3,2);
// If the above simple settings do not meet your business needs, you can customize the settings
// wrapper.settings(Settings);
// set alias information, can be defaulted
String aliasName = "daily";
wrapper.createAlias(aliasName);
// Create the index
boolean isOk = documentMapper.createIndex(wrapper);
Assert.assertTrue(isOk);
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Warm tips
In the entity class, the id field does not need to create an index, otherwise it will report an error.
Because of the automatic reconstruction feature of ES index changes, this interface is designed to create the index required mapping, settings, alias information three in one, although each of these configurations can be default, but we still recommend that you create the index before the planning of the above information, you can avoid subsequent changes to bring unnecessary trouble, if there are subsequent changes, you can still through the Alias migration (recommended, smooth transition), or delete the original index to re-create the way to modify.
# Configuration enable mode
For the above three modes, you only need to add a single line to your project's configuration file application.properties or application.yml:
easy-es:
socketTimeout: 600000 # request communication timeout time unit: ms default value 600000ms in smooth mode, due to the migration of data, the user can adjust the size of this parameter according to the size of the data, otherwise the request is easy to timeout leading to index hosting failure, you are advised to try to give a large not small, the same as that thing, big point is fine, too small you know!
global-config:
process_index_mode: smoothly #smoothly: smoothly mode, not_smoothly: non-smoothly mode, manual: manual mode
async-process-index-blocking: true # Whether asynchronous index processing blocks the main thread or not.
distributed: false # Whether the project is deployed in a distributed environment, the default is true, if it is a single machine, you can fill in false, it will not add distributed locks, more efficient.
reindexTimeOutHours: 72 # rebuild index timeout time in hours, default 72H according to the size of the migrated index data flexibility to specify
2
3
4
5
6
7
If this line is defaulted, smooth mode is enabled by default.
提示
- In auto mode, if the index is hosted successfully, the console will print Congratulations auto process index by Easy-Es is done !
- Unfortunately, auto process index by Easy-Es failed... and an exception log, you can check it according to the exception log information.
- If the index hosting failed, at this time the user called insert-related API insert data, because the index does not exist, es (non-framework) will automatically create a default index for the user to create the index field type keyword_text type, not specified by the user through the annotation, so the situation does not ask me why it did not take effect, because the index hosting because of your configuration or environment The original problem failed.
- When running the test module is strongly recommended to turn on asynchronous processing index blocking the main thread, otherwise the test case is run, the main thread exit, but the asynchronous thread may not be finished running, there may be a deadlock, if unfortunately there is a deadlock, remove the ee-distribute-lock can be.
- Production environment or migration data volume is relatively large, you can configure to turn on non-blocking, so that the service starts faster.
- The above three modes, users can choose according to the actual needs of flexible, free to experience, in the process of using any comments or suggestions to feedback to us, we will continue to optimize and improve,
- EE in the index hosting using the strategy factory design pattern , the future if there are more better patterns , you can easily complete the expansion on the basis of the original code does not change , in line with the principle of open and closed , but also welcome all open source enthusiasts to contribute more patterns PR!
- We will continue to uphold the complexity of leaving the framework, the ease of use to the user of this idea, to forge ahead.
All the above API code demo can be referred to the source code test module -> test directory -> index package under the code