TiDB 4.0 introduces TiCDC as TiDB’s substitute info do away with framework. It’s an originate-source feature that replicates TiDB’s incremental adjustments to downstream platforms by subscribing to interchange logs in TiKV (TiDB’s storage engine). It will restore info to a relentless grunt with any upstream timestamp Oracle (TSO) and provides the TiCDC Originate Protocol to augment other info shoppers that subscribe to TiKV’s info adjustments.
With excessive info reliability and horizontal scalability aspects, TiCDC provides excessive-availability replication products and companies for 100 TB clusters with easiest milliseconds of latency. In TiDB 4.0.6, TiCDC reaches overall availability (GA), and likewise you would possibly perchance seemingly well perhaps voice it for your manufacturing atmosphere.
In this put up, we’ll stroll you by TiCDC’s aspects, application eventualities, and real-world case reports.
TiCDC supports these aspects:
Info excessive availability
TiCDC captures substitute logs from TiKV, which would possibly perchance well be very accessible. This ensures excessive availability of info. Even when TiCDC shuts down, must you restart it, it must peaceable normally do away with info.
A few TiCDC nodes can ticket a cluster. You’re going to be ready to evenly schedule replication tasks to assorted nodes. Even as you’ve gotten huge info, you would possibly perchance seemingly well perhaps add nodes to mitigate replication tension.
When a TiCDC node within the cluster fails, the replication job on that node is robotically scheduled to the relaxation TiCDC nodes.
A few downstream programs and more than one output codecs
The database is the core of enterprise IT, and it is far going to accelerate stably and hang appropriate danger restoration ability to ticket sure application continuity.
Brooding concerning the significance of capabilities and managing costs, some users need the core database to meet the chance restoration requirement of the active and standby info products and companies (DCs). The voice of the TiCDC-primarily based mostly danger restoration solution for the TiDB active and standby DCs is a super need. In response to TiCDC’s info replication feature, this solution is precious when two DCs are far apart and hang excessive latency. It will assemble one-way info replication between TiDB clusters in two DCs to ensure transactions are at closing constant and cease second-level restoration point dreams (RPOs).
You’re going to be ready to voice TiCDC to cease ring replication between three TiDB clusters and thus attach a multi-DC danger restoration solution for TiDB. If the energy fails at a DC, you would possibly perchance seemingly well perhaps switch your application to a TiDB cluster in another DC. By some means, your transactions would possibly perchance be constant, and likewise you will assemble second-level RPO. To distribute application assemble entry to tension, you would possibly perchance seemingly well perhaps switch routing on the application layer at any time. You’re going to be ready to stability the weight by switching visitors to a TiDB cluster that is no longer busy. This ensures info excessive availability and makes your cluster more tolerant of failures.
TiCDC ring replication
TiCDC provides real-time, excessive-throughput, and real info subscription products and companies for downstream info shoppers. To satisfy users’ desires for making voice of and analyzing diversified kinds of info in huge info eventualities, TiCDC makes voice of the Originate Protocol to place with heterogeneous ecosystems in conjunction with MySQL, Kafka, Pulsar, Flink, Canal, and Maxwell. TiCDC is an correct solution for log series, monitoring info aggregation, streaming info processing, online and offline diagnosis.
Xiaohongshu is a favored social media and e-commerce platform in China. The Xiaohongshu app enables users to put up and portion product opinions, shuttle blogs, and life-style experiences by short movies and photos. By July 2019, it had over 300 million registered users.
Xiaohongshu makes voice of TiDB for his or her core capabilities in more than one eventualities, in conjunction with:
- Tale diagnosis
- Right by a huge promotion, offering real-time info to a huge teach cowl
- Logistics warehousing
- An e-commerce info hub
- Screech security assessment and diagnosis
Within the narrate security assessment and diagnosis grunt of affairs, TiDB within the upstream data security assessment info in real time, which is written by online capabilities, to put in pressure real-time info monitoring and diagnosis.
When TiCDC analyzes assessment info, it extracts TiDB’s real-time stream info and sends it downstream to Flink for real-time calculation and aggregation. The calculation results are written support to TiDB for assessment info diagnosis, ebook effectivity diagnosis, and administration.
Xiaohongshu calls TiCDC’s interior API (which is defined by sink interface) to customise their sink. They voice the Canal Protocol to ship info to Flink to place to the existing application gadget. This vastly reduces the costs of refactoring the application gadget.
TiCDC’s atmosphere plentiful info replication and make stronger for heterogeneous huge info ecologies hang laid a real basis for the true-time processing of Xiaohongshu application info.
Autohome is the leading online shuttle space for automobile shoppers in China. It’s miles basically the most visited auto net pages on this planet. It’s goal is to ticket procuring for a automobile more straightforward and more delightful.
Authome has accelerate TiDB for larger than two years, and it be ancient in crucial capabilities such as dialogue board replies, resource pools, and friend administration. For a huge sales promotion on August 18, 2020, Autohome deployed TiDB in three DCs all the way by two cities to support capabilities like a good deal bustle, crimson packets, and a fortunate plot. TiCDC replicates TiDB cluster info to the MySQL database within the downstream in real time. The MySQL database is ancient as a backup in case of failures to enhance the capabilities’ ability to tolerate failures. TiCDC’s replication latency is within seconds, which satisfies the true-time requirements for online sales promotion capabilities.
Tidy advice is Autohome’s crucial application, and its underlying storage is the resource pool. The resource pool receives and gathers all kinds of info. After it processes info, info is ancient for capabilities like homepage suggestions, product shows, and search. Sooner than Autohome ancient TiDB, the resource pool ancient MySQL because the storage layer, and ancient MySQL binlog to ship info to Elasticsearch for voice in search results. Due to MySQL’s performance and ability bottlenecks, after Autohome switched to TiDB, they ancient TiCDC as an alternate of MySQL binlog to replicate heterogeneous info. TiCDC aspects excessive availability, low latency, and make stronger for huge-scale clusters, which guarantee capabilities are running stably.
Apart from, Autohome has ancient TiCDC as a sinful to manufacture an interface that outputs log info to Kafka to replicate huge heterogeneous info. It’s miles running within the manufacturing atmosphere and has been running stably for larger than two months.
Haier is a world-leading provider of alternate choices to higher lifestyles. The Haier Tidy Residence app is its worthy cell interactive experience portal, offering world users with rotund-job orderly dwelling products and companies, a rotund-grunt of affairs orderly dwelling experience, and one-waste orderly dwelling customization alternate choices.
Haier Tidy Residence’s IT know-how products and companies are built on Alibaba Cloud. Its core application has these database requirements:
- Give a plot cease to for the MySQL protocol
- Elastic scalability primarily based totally on dispensed transactions with real consistency
- Terminate integration with diversified huge info know-how ecosystems
TiDB 4.0 meets all these requirements, so it be the suitable need for Haier Tidy Residence.
Haier Tidy Residence makes voice of TiCDC to replicate user info and user posts to Elasticsearch for cease to real-time searches. Currently, the user table has nearly about 10 million rows of info, and its info volume has reached 1.9 GB. Apart from, Kafka consumes about 3 million messages per day. TiCDC also provides real and atmosphere plentiful info replication for orderly advice’s huge info products and companies. In response to the unified TiCDC Originate Protocol with row-level info substitute notification, TiCDC makes it more straightforward for Haier Tidy Residence departments to assessment info. The orderly advice feature is under trend.
Zhihu that means “Enact ?” in classical Chinese, is the Quora of China: a matter-and-answer net pages the attach all kinds of questions are created, answered, edited, and arranged by its neighborhood of users.
Zhihu makes voice of TiDB as their core database within the Moneta application (which stores posts users hang already read). It outputs logs to Kafka by the TiCDC Originate Protocol for big message processing.
As Zhihu’s industrial volume grew, they encountered problems precipitated by the obstacles of Kafka’s structure and ancient model implementation. In due course, Zhihu’s infrastructure would possibly perchance be cloud-native, and Pulsar supports native geo-replication, which is more primarily based totally on Zhihu’s cloud-native infrastructure trend. Therefore, Zhihu replaced Kafka with Pulsar in some capabilities.
Zhihu developed code on TiCDC’s core module. (See pull requests #751 and #869 on GitHub.) To repeat TiCDC’s info to Pulsar, Zhihu connected TiCDC’s sink to Pulsar. With the support of Pulsar’s geo-replication, TiCDC’s shoppers can subscribe to interchange events regardless of their keep. Apart from, the Pulsar cluster can snappy scale nodes and enhance from failures. Thanks to this, TiCDC’s shoppers can assemble real-time info.
Up to now, the application of Pulsar and TiCDC has done ideal results. Zhihu will migrate more capabilities from Kafka to Pulsar. In due course, Zhihu will voice Pulsar to replicate TiDB info all the way by clusters.
Does TiCDC sound like something that can enable you to? You’re going to be ready to snappy deploy TiCDC by TiUP and accelerate
cdc cli to form replication tasks to replicate real-time writes to TiDB, Pulsar, or Kafka downstream. For details, detect Plot up TiCDC Cluster and Replication Responsibilities.
We’d plot cease to thank each person who has contributed to TiCDC. Your effort has made this free up seemingly.