Auto process index
Preface
ES is difficult to use, and the index bears the brunt. The creation of the index is not only complicated, but also difficult to maintain. Once the index changes, it must face the problems of service downtime and data loss caused by index reconstruction... Although ES officially provides an index alias mechanism to Solving problems, but the threshold is still high, the steps are cumbersome, and manual operations are very prone to serious problems in the production environment. In order to solve these pain points, Easy-Es provides a variety of strategies to completely remove users from index maintenance. We provide a variety of index processing strategies to meet the individual needs of different users. Through the initial experience of the index, I believe you can also have a deeper understanding of the maturity and ease of use of EE. Among them, the automatic smoothing mode , For the first time, the world's leading "You don't need to move, EE is fully automatic" mode, the creation, update, data migration and other full life cycles of indexes do not require user intervention, and are fully automated by EE, with zero downtime in the process and even indexing. All types can be intelligently and automatically inferred, one-stop service, and you are satisfied. It is the world's first open source initiative, and it fully draws on the idea of JVM garbage collection algorithm. I speak for EE, please rest assured to host the index to EE, the old index will be deleted only after the complete migration is successful, otherwise it will not affect the original index and data, and the original index and data can be retained in any accident, so The safety factor is very high.
Reminder: Beginners can choose automatic transmission mode as much as possible. For old drivers, please feel free to use automatic transmission and manual transmission. To pursue the stability of the production environment, it is recommended that you use manual transmission mode. Our manual transmission also provides a very friendly one-click creation function. It is also sweet to use. In automatic mode, it is recommended that you fully understand its operation principle and source code before starting production, otherwise many Xiaobai will produce without brains without understanding the principle and how to configure it correctly, which will easily be pitted by yourself. Come on, figure it out please feel free, our smooth mode is actually pretty safe.
# Mode 1: The smooth mode of automatic hosting (automatic block-snow mode) This mode is enabled by default
In this mode, users can complete the whole life cycle of index creation, update, data migration, etc. without any operation. The process has zero downtime and no user perception. It can achieve a smooth transition in the production environment, similar to the automatic transmission of a car - snow Ground mode, stable and comfortable, completely liberate users, and enjoy the fun of automatic stance! It is worth noting that in the automatic hosting mode, the system will automatically generate an index named ee-distribute-lock, which is used internally by the framework and can be ignored by the user. If a deadlock occurs under the probability, you can delete the index. In addition, if the index is changed during use, the original index name may be suffixed with _s0 or _s1. Don’t panic, this is a fully automatic smooth migration with zero downtime The only way to go, the index suffix does not affect the use, the framework will automatically activate the new index. Regarding the _s0 and _s1 suffixes, it is unavoidable in this mode, because the original index data migration must be preserved, and two indexes with the same name cannot exist at the same time. If you don't agree with this method, you can continue to look down, there is always one suitable for you.
# Mode 2: Non-smooth mode of automatic management (automatic gear-sport mode)
In this mode, the index creation and update are automatically and asynchronously completed by EE, but data migration is not handled. The speed is extremely fast, similar to the automatic transmission-sport mode of a car. Of course, if you use other tools such as logstash to synchronize data, you can also enable this mode in the production environment. In this mode, the _s0 and _s1 suffixes will not appear, and the index will keep the original name.
Tips
In the above two automatic modes, the index information mainly relies on the entity class. If the user does not configure any entity class, EE can still intelligently infer the storage type of the field in ES according to the field type, which can further reduce the development cost. It is a burden for the users, and it is even more good news for Xiaobai who has just come into contact with ES.
Of course, it is not enough for the framework to automatically infer. We still recommend that you configure as much detail as possible so that the framework can automatically create production-level indexes. For example, for fields of type String, the framework cannot infer your actual In the query, whether the field is a precise query or a word segmentation query, it cannot infer whether the field is of a keyword type or a text type. If it is a text type, what is the tokenizer expected by the user? These all require the user to tell the framework through configuration, otherwise Frames can only be created by default, which won't do as well as you expect then.
The priority of the automatically inferred type < the type priority specified by the user through the annotation
Automatically infer the mapping table:
JAVA | ES |
---|---|
byte | byte |
short | short |
int | integer |
long | long |
float | float |
double | double |
BigDecimal | keyword |
char | keyword |
String | keyword |
boolean | boolean |
Date | date |
LocalDate | date |
LocalDateTime | date |
List | text |
... | ... |
Best practice example in "Auto" mode:
@Data
@IndexName(shardsNum = 3, replicasNum = 2) // The number of shards and replicas can be specified, if the default is 1
public class Document {
/**
* The unique id in es, if you want to customize the id provided by the id in es, such as the id in MySQL, please specify the type in the annotation as customize, so the id supports any data type)
*/
@IndexId(type = IdType.CUSTOMIZE)
private Long id;
/**
* Document title, if no type is specified, it will be created as keyword type by default, which can be queried accurately
*/
private String title;
/**
* Document content, specifying type and storage/query tokenizer
*/
@HighLight(mappingField="highlightContent")
@IndexField(fieldType = FieldType.TEXT, analyzer = Analyzer.IK_SMART, searchAnalyzer = Analyzer.IK_MAX_WORD)
private String content;
/**
* The author adds the @TableField annotation and indicates that strategy = FieldStrategy.NOT_EMPTY means that the strategy when updating is updated when the creator is not an empty string
*/
@IndexField(strategy = FieldStrategy.NOT_EMPTY)
private String creator;
/**
* Creation time
*/
@IndexField(fieldType = FieldType.DATE, dateFormat = "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis")
private String gmtCreate;
/**
* Fields that do not actually exist in es, but are added to the model. In order not to map with es, you can add the annotation @TableField to this type of field, and specify exist=false
*/
@IndexField(exist = false)
private String notExistsField;
/**
* Geographic location latitude and longitude coordinates For example: "40.13933715136454,116.63441990026217"
*/
@IndexField(fieldType = FieldType.GEO_POINT)
private String location;
/**
* Graphics (eg circle center, rectangle)
*/
@IndexField(fieldType = FieldType.GEO_SHAPE)
private String geoLocation;
/**
* custom field name
*/
@IndexField(value = "wu-la")
private String customField;
/**
* Highlight the field whose return value is mapped
*/
private String highlightContent;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Mode 3: Manual mode (manual transmission)
In this mode, all the maintenance work of the index is not involved in the EE framework and is handled by the user. EE provides an out-of-the-box index CRUD related API, you can choose to use this API to manually maintain the index, because the API is highly perfect, although It is a manual gear, but it is still extremely simple to use, and one line of code can complete the index creation. Of course, you can also maintain the index through tools such as es-head. In short, in this mode, you have a higher degree of freedom, which is more suitable for those who doubt Conservative users of the EE framework or users who pursue extreme flexibility use it. It is similar to the manual transmission of a car. It is not recommended for beginners to use this mode.
In manual transmission mode, EE provides the following APIs for users to make convenient calls:
- indexName needs to be specified manually by the user
- Object Wrapper as Conditional Constructor
// get index information
GetIndexResponse getIndex();
// Get the specified index information
GetIndexResponse getIndex(String indexName);
// is there an index
Boolean existsIndex(String indexName);
// Create indexes with one key based on entities and custom annotations
Boolean createIndex();
// create index
Boolean createIndex(LambdaEsIndexWrapper<T> wrapper);
// update index
Boolean updateIndex(LambdaEsIndexWrapper<T> wrapper);
// delete the specified index
Boolean deleteIndex(String indexName);
2
3
4
5
6
7
8
9
10
11
12
13
14
For the above APIs, we only demonstrate the creation of indexes. Others are too simple to be described here. If necessary, you can move to the source code test module to view. Manually create indexes through API, we provide two ways
-Method 1: One-click creation based on entity classes and custom annotations (recommended), applicable to 99.9% of scenarios
/**
* Entity class information
**/
@Data
@IndexName(shardsNum = 3, replicasNum = 2, keepGlobalPrefix = true)
public class Document {
/**
* The unique id in es, if you want to customize the id provided by the id in es, such as the id in MySQL, please specify the type in the annotation as customize or specify it directly in the global configuration file, so the id supports any data type)
*/
@IndexId(type = IdType.CUSTOMIZE)
private String id;
/**
* Document title, if no type is specified, it will be created as keyword type by default, which can be queried accurately
*/
private String title;
/**
* Document content, specifying type and storage/query tokenizer
*/
@HighLight(mappingField = "highlightContent")
@IndexField(fieldType = FieldType.TEXT, analyzer = Analyzer.IK_SMART, searchAnalyzer = Analyzer.IK_MAX_WORD)
private String content;
// Omit other fields...
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@Test
public void testCreateIndexByEntity() {
// Then create it directly with one click through the mapper of the entity class, which is very foolish
documentMapper.createIndex();
}
2
3
4
5
Tips
For the usage of annotations in entity classes, please refer to the Annotation chapter, which is relatively fool-like as a whole, and is highly similar to the usage of annotations in MP.
- Method 2: Created through API, each field that needs to be indexed needs to be processed, which is cumbersome, but has the best flexibility, supports all index creation that all ES can support, and is used in 0.01% scenarios (not recommended)
@Test
public void testCreatIndex() {
LambdaEsIndexWrapper<Document> wrapper = new LambdaEsIndexWrapper<>();
// For the sake of simplicity here, the index name must be consistent with the entity class name, with lowercase letters. Later chapters will teach you how to configure and use indexes more flexibly
wrapper.indexName(Document.class.getSimpleName().toLowerCase());
// Here, the article title is mapped to the keyword type (word segmentation is not supported), and the document content is mapped to the text type, which can be defaulted to
// Support word segmentation query, content tokenizer can be specified, query tokenizer can also be specified, either default or only one of them can be specified, if not specified, ES default tokenizer (standard)
wrapper.mapping(Document::getTitle, FieldType.KEYWORD)
.mapping(Document::getContent, FieldType.TEXT,Analyzer.IK_MAX_WORD,Analyzer.IK_MAX_WORD);
// If the above simple mapping cannot meet your business needs, you can customize the mapping
// wrapper.mapping(Map);
// Set shard and replica information, 3 shards, 2 replicas, default
wrapper.settings(3,2);
// If the above simple settings cannot meet your business needs, you can customize the settings
// wrapper.settings(Settings);
// Set alias information, can be default
String aliasName = "daily";
wrapper.createAlias(aliasName);
// create index
boolean isOk = documentMapper.createIndex(wrapper);
Assert.assertTrue(isOk);
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Tips
In the entity class, the id field does not need to create an index, otherwise an error will be reported.
Due to the feature of automatic reconstruction of ES index changes, the mapping, settings, and alias information required for index creation are combined into one when this interface is designed. Although each of these configurations can be defaulted, we still recommend that you create an index. Planning the above information in advance can avoid unnecessary troubles caused by subsequent modifications. If there are subsequent modifications, you can still modify them by alias migration (recommended, smooth transition), or by deleting the original index and re-creating it. .
# configure enable mode
To configure the above three modes, you only need to add a line of configuration to your project's configuration file application.properties or application.yml:
easy-es:
global-config:
process_index_mode: smoothly #smoothly: smooth mode, not_smoothly: non-smooth mode, manual: manual mode
async-process-index-blocking: true # Whether the asynchronous processing index blocks the main thread by default blocking
distributed: false # Whether the project is deployed in a distributed environment, the default is true, if it is running on a single machine, you can fill in false, and no distributed lock will be added, which is more efficient.
2
3
4
5
If this line is configured by default, smooth mode is enabled by default.
Tips
- When running the test module, it is strongly recommended opening the asynchronous processing index to block the main thread. Otherwise, after the test case runs, the main thread exits, but the asynchronous thread may not finish running, and a deadlock may occur. If a deadlock occurs unfortunately, delete ee-distribute- just lock.
- In the production environment or when the amount of data to be migrated is relatively large, non-blocking can be configured to enable the service to start faster.
- For the above three modes, users can flexibly choose and experience freely according to actual needs. If you have any comments or suggestions during the use process, you can feedback to us, we will continue to optimize and improve,
- EE adopts the strategy + factory design mode in index hosting. If there are more and better modes in the future, the expansion can be easily completed without changing the original code, which conforms to the principle of opening and closing. Open source enthusiasts are welcome to contribute more Mode PR!
- We will continue to uphold the concept of leaving complexity to the framework and leaving ease of use to users, and forge ahead.
All the above API corresponding code demonstrations can refer to the code under the source code test module -> test directory -> index package