线程操作/异常处理/日志/并发/竞态/顺序/消息同步/任务分配/网络延迟/宕机/错误恢复/重复计算/缓存…这些问题都不是只有多线程或者协程能解决的,对于大多数人来讲,合理的做法应该是构造正确的抽象,上述问题应当由框架解决,而不是自己做(除非你是一个库的作者)。
考虑如下经典问题:
实时统计某通信软件中聊天消息的单词词频(不可使用已有的分布式计算框架 eg, Spark, Flink, Map-Reduce, Storm…)
问题分析:
In theoretical computer science, the CAP theorem, also named Brewer’s theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
One workaround for the CAP theorem, which I believe was pioneered at Amazon but which is now widespread, is to go for “eventual consistency”: maintain an AP system, but ensure that every update to a given datum is eventually propagated to every node that needs to know about it.
In distributed computing, a conflict-free replicated data type (CRDT) is a data structure which can be replicated across multiple computers in a network, where the replicas can be updated independently and concurrently without coordination between the replicas, and where it is always mathematically possible to resolve inconsistencies that might come up.
群(group): G为非空集合,如果在G上定义的二元运算 *,满足
则称(G,*)是群,简称G是群。
如果仅满足封闭性和结合律,则称G是一个半群(Semigroup);如果仅满足封闭性、结合律并且有幺元,则称G是一个含幺半群(Monoid)。
trait SemiGroup[T]:
extension (x: T) def combine (y: T): T
trait Monoid[T] extends SemiGroup[T]:
def unit: T
object Monoid:
def apply[T](using m: Monoid[T]) = m
given Monoid[Int]:
extension (x: Int) def combine (y: Int): Int = x + y
def unit: Int = 0
1 combine 2
def combineAll[T: Monoid](xs: List[T]): T =
xs.foldLeft(Monoid[T].unit)(_.combine(_))
combineAll(List(1, 2, 3))
type Result = Map[String,Int]
given Monoid[Result]:
extension (x: Result) def combine (y: Result): Result =
(x.toList ::: y.toList).groupBy(_._1).map {
case (k, v) => (k, (v.map(_._2).reduce(_ combine _)))
}.toMap
def unit: Result = Map.empty
val left = Map("hello" -> 1, "monoid" -> 2)
val right = Map("hello" -> 1, "scala" -> 3)
scala> left combine right
val res1: Result = HashMap(monoid -> 2, scala -> 3, hello -> 2)