Huangyizhuo

Tell yourself that you deserve better and move on

CHAPTER 1: SCALE FROM ZERO TO MILLIONS OF USERS

Glossary

  • Continuous refinement and endless improvement
  • scale it up to serve millions of people
  • A journey of a thousand miles begins with a single step, and building a complex system is no different.
  • Domain Name System
  • An example for … is shown below
  • SQL database / NoSQL database
  • have been around for 40 years
  • if xx is not suitable for your specific use cases
  • Failover and redundancy
  • Horizontal scaling is more desirable for large scale applications due to the limitations of vertical scaling.
  • in aother scenario
  • this pic show how it works
  • all the data-modifying commands
  • the data in a slave database might not be up to date
  • content delivery network
  • every time a new web page loads, one or more database calls are executed to fetch data
  • single point of failure
  • considerations for using cache
  • it enables the caching of HTML
  • here is how CDN works at the high-level
  • here is a good example that shows how CDN improves load time
  • Time-to-Live TTL
  • until the tll expires
  • caching infrequently used asset providing no significant benefits
  • the issue is that …
  • however, this adds the overhead(额外的资源开销)
  • traffic load
  • To further scale your system, we need to decouple different components of the system so that they can be scaled independently
  • consider the following use case
  • … are good parctices but not a necessity
  • you can monitor error logs at per server level
  • this kind of powerful database could …
  • vertical scaling comes with some serious drawbacks
  • successive requests to a specific shard could cause server overload
  • Scaling a system is an iterative process

Single server setup

Category Stores Join
SQL Table & Row Supported
NoSQL K-V / Graph / Document / Column Not Supported

Database SQL VS NoSQL

Vertical scaling vs horizontal scaling

Load Balancer

Database replication

Cache

Considerations

  • Decide when to use cache
  • Expiration policy
  • Consistency
  • Mitigating failures
  • Eviction Policy

Content delivery network (CDN)

CDN servers cache static content like images, videos, CSS, JavaScript files, etc.

dynamic content cache enables the caching of HTML pages that are based on request path, query strings, cookies, and request headers.

Considerations

  • Cost
  • Setting an appropriate cache expiry
  • CDN fallback
  • Invalidating files

Stateless Server

Message Queue

Logging, metrics, automation

Sharding/Partition

CHAPTER 2: BACK-OF-THE-ENVELOPE ESTIMATION

Glossary

  • take the time factor into consideration
  • the more nines, the better
  • query per second QPS

Power of Two

  • An ASCII character uses one byte of memory (1 Byte/8 bits)

![image-20240121221801807](/Users/bytedance/Library/Application Support/typora-user-images/image-20240121221801807.png)

Latency numbers every programmer should know

  • 通过DB查询1k pixel的延迟 约为 500ms以内
  • 新加坡 -> 美国,理论最快50ms(按光速算)

Availability numbers

![image-20240121222740100](/Users/bytedance/Library/Application Support/typora-user-images/image-20240121222740100.png)

CHAPTER 4: DESIGN A RATE LIMITER

Glossary

  • Denial of Service (DoS)
  • a user can wirte (no more than/a maximum of)2 posts per day
  • there is an alternative way bedsides the client-side and server-side
  • send 3 request within 1 second
  • each of them has distinct pros and cons
  • Pre-defined
  • API endpoints
  • keep track of how many requests are sent form xxx
  • the request is disallowed
  • slowness of disk access
  • In-memory cache
  • increase counter by one
  • performance optimization
  • in case a request is rate limited, APIs return HTTP code

QUIZ

  • how to design a distributed userID-granular rate limiter
    • proposal 1: redis + TTL

High-level

Where to put the rate limiter?

Algorithms for rate limiting

  • Token bucket algorithm
  • Leaking bucket algorithm

Rate limit Rule

domain: auth descriptors:
- key: auth_type 
- Value: login 
- rate_limit:
	- unit: minute 
	- requests_per_unit: 5

![image-20240124231747594](/Users/bytedance/Library/Application Support/typora-user-images/image-20240124231747594.png)

Two questions in distrbuted env:

  • race condition
    • LUA 原子性操作

CHAPTER 5: DESIGN CONSISTENT HASHING

Glossary

  • let us take an in-depth look at the problem
  • however, problem arise when new servers are added, or existing servers are removed

SKIPPED

CHAPTER 6: DESIGN A KEY-VALUE STORE

Glossary

  • A key-value store, also referred to as a key-value database, is a non-relational database
  • there is no perfect design. Each design achieves a specific balance regarding the tradeoffs of the read, write, and memory usage
  • Distribute data across multiple servers evenly(平均地)

Single server key-value store

Fitting everything in memory is impossible

  • data compression
  • store only frequently used data, the rest on disk
  • let us take a look at some concrete examples
  • We only chosse unique servers while performing the clockwise walk logic
  • no longer need to …
  • a proxy between clients and nodes
  • As with any large system at scale, failures are not only inevitable but common
  • Partition tolerance means the system continues to operate despite network partitions.
  • Quoted from Wikipedia
  • Highlighted boxes indicate inconsistency.
  • If root hashes match, both servers have the same data. If root hashes disagree, then the left child hashes are compared followed by right child hashes.
  • refer to the reference material
  • it will be retrieved from the disk instead

CAP

it is impossible for a distributed system to simultaneously provide more than two of these three guarantees: consistency, availability, and partition tolerance

core components and techniques

Data partition

Data replication

Consistency

vector clock

Handling failures

gossip protocol

Handling temporary failures