

Twitter’s data pipeline for delivering tweets to followers, with load parame‐ ters as of November 2012 Describing Performance Simple relational schema for implementing a Twitter home timeline.įigure 1-3. One possible architecture for a data system that combines several components. Figure 1-1 gives a glimpse of what this may look like (we will go into detail in later chapters).įigure 1-1. The boundaries between the categories are becoming blurred.įor example, if you have an application-managed caching layer (using Memcached or similar), or a full-text search server (such as Elasticsearch or Solr) separate from your main database, it is normally the application code’s responsibility to keep those caches and indexes in sync with the main database. For example, there are datastores that are also used as mes‐ sage queues (Redis), and there are message queues with database-like durability guar‐ antees (Apache Kafka). They are optimized for a variety of different use cases, and they no longer neatly fit into traditional categories. Many new tools for data storage and processing have emerged in recent years. Periodically crunch a large amount of accumulated data (batch processing).

Send a message to another process, to be handled asynchronously (stream pro‐ cessing).Allow users to search data by keyword or filter it in various ways (search indexes).Remember the result of an expensive operation, to speed up reads (caches).Store data so that they, or another application, can find it again later (databases).Reliable, Scalable, and Maintainable ApplicationsĪ data-intensive application is typically built from standard building blocks that pro‐ vide commonly needed functionality. Martin Kleppmann - Designing Data-Intensive Applications Part I.
