Data synchronization solutions
# Background
Many friends asked me in the community Q&A group how to synchronize data from MySQL to ES. To answer this question well, we need to combine specific business scenarios and data volume to And the specific number of developers, different situations have different plans, just like cooking with salt, I can't answer you directly, I can only give some feasible plans. Users can consider the situation of their own company comprehensively.
# Full synchronization
As the name implies, it is to synchronize all the data in the existing database to ES. Usually, when accessing ES for the first time, it is necessary to initialize all the data. Full synchronization can
Consider the official Logstash of ES and the open source DataX of Ali. Of course, if your data volume is less than 10 million, and the data itself is relatively small, you want to make it simple.
Save trouble, it is not bad to insert in batches by using the insertBatch(Collection
# Incremental synchronization
When the full data synchronization is completed, users start to use ES, and then with the CRUD of this data, we need to synchronize the changes of these data to ES to ensure the ES. The data effectiveness is called incremental synchronization. A mature solution is the open source Canal of Ali, but you can also use MQ or even local message events (for example. Guava's Event-Bus,springboot's own Event), according to the number of your business and whether the deployment machine is distributed environment, etc. Suitable solutions, the above solutions are similar in principle, basically publish messages when MySQL data changes (add/delete/modify), and then the subscribers subscribe to the consumer. After the event, synchronize the data changes to ES, so that you can decouple. If your concurrency is not too high and the data volume changes little, use Easy-Es to provide it. Batch CRUD method is also acceptable. You can find out the changed data from MP and then call the API below to complete the data change. The API below is the same as MP, so it will not be introduced again.
Integer updateBatchByIds(Collection<T> entityList);
Integer deleteBatchIds(Collection<? extends Serializable> idList);
Integer insertBatch(Collection<T> entityList);
Integer delete(Wrapper<T> wrapper);
2
3
4
提示
Before the formal synchronization, please plan and design your index structure, fragments and copies, etc., because the ES index has to be rebuilt after it changes. Relocation of data is time-consuming and labor-intensive, so don't rush it, or you may go back to before liberation overnight!
In addition, the above schemes are all reference schemes. If you find a better solution, you are welcome to try it, and I won't introduce each scheme here. Implementation, are mature open source components, we will not ask Du Niang or ChatGPT.