Golang implementation of the Raft consensus protocol (2)

Posted on 2017-12-13

Abstract

In this big data time, high performance distributed systems are required to process the large volumn of data. However, it is not easy to organize plenty of nodes. One of the significant problems is distributed consensus, which means every node in the cluster will eventually reach a consensus without any conflicts.

Raft is a distributed consensus algorithm which has been proved workable. This expriment contitues the previous expriment and implements the log replication and finally tests the whole system in many abnormal situations.

Setup a Distributed HBase Cluster in Docker

Posted on 2017-12-02

Abstract

Since the time Hadoop came up, the Hadoop ecosystem is getting larger and larger. There are so many softwares being developed around Hadoop. Apache HBase and Apache Hive are two of them.

In this expriment, for the purpose of learning these two softwares, we use HBase and Hive to continue our reseach on wuxia novels mentioned before.

Setup Apache Hive in Docker

Posted on 2017-12-02

Abstract

Since the time Hadoop came up, the Hadoop ecosystem is getting larger and larger. There are so many softwares being developed around Hadoop. Apache HBase and Apache Hive are two of them.

In this expriment, for the purpose of learning these two softwares, we use HBase and Hive to continue our reseach on wuxia novels mentioned before.

How to quickly setup a Hadoop cluster in Docker

Posted on 2017-12-02 In Hadoop

Foreword

This post mainly aims to show you how to build a docker image of Hadoop and how to setup a distributed Hadoop cluster (only) for experiment use (even on single machine).

If you want to deploy a large scale cluster in production, you can read Setup a distributed Hadoop cluster with docker for more information.

Golang implementation of the Raft consensus protocol (1)

Posted on 2017-11-26

Abstract

In this big data time, high performance distributed systems are required to process the large volumn of data. However, it is not easy to organize plenty of nodes. One of the significant problems is distributed consensus, which means every node in the cluster will eventually reach a consensus without any conflict.

Raft is a distributed consensus algorithm which has been proved workable. This expriment mainly focus on designing and implementing leader election described in rart algorithm.

Implement NTP protocol with golang RPC

Posted on 2017-10-23

Abstract

It is necessary in some cases to sync time among the nodes. This expriment uses RPC to implement network time synchronization.

基于Docker Swarm搭建Redis集群

Posted on 2017-06-01 In Redis

介绍

Redis是内存数据库，所有的数据都是存放在内存中，所以它的容量是受到内存大小限制的。当Redis的数据量超过单机内存时，就需要考虑使用集群来扩展。
Redis集群分为两种节点：主节点和从节点。运行时节点可能会实效，考虑到高可用性，至少需要3个主节点。当其中一个主节点实效后，利用少数服从多数的策略，从当机主节点的从节点列表中选出一个从节点接替主节点，其他从节点转换成新节点的从节点。Redis集群中数据是分片存储的，即数据被划分成一定数量的slot，然后根据算法决定slot对应的主节点，主节点间数据没有冗余，冗余的部分由从节点负责。
本文以搭建一个6节点，3主3从的Redis集群为例。所有脚本和文件见redis-cluster。

StreamSpider 总结与展望

Posted on 2017-05-14

总结

这次分布式爬虫系统的设计和实现，学习和了解了多方面的知识。通过实际编写一个爬虫，并在集群上大范围的测试，了解到了更多爬虫相关的知识，尤其是网页编码、去重、错误处理等方面，这些通常是教学视频所涉及不到的。Apache Storm是一个分布式的流式计算平台，设计简单但是功能强大，通过学习和利用Apache Storm构建一个分布式的流式计算系统，深入了解了Storm相关的知识。通过对Storm的学习，了解Storm各个节点、组件之间的互连，数据交换，对学习分布式系统有很大的帮助。

StreamSpider 测试与分析

Posted on 2017-05-11

测试是软件开发中重要的一环，对于分布式系统来说更是如此。通过测试，既是检查程序是否按照预期运行，又是对系统的性能进行测试，只有通过测试才能发现静态代码分析不能找到的问题。分布式爬虫系统编码完毕之后就是在已经搭建好的平台上进行测试，通过长时间的运行并观察系统状态，继而分析系统存在的缺陷的不足。

StreamSpider 爬虫常见问题处理及策略

Posted on 2017-05-10

在测试过程中遇到各种各样的问题，在不断调整中总结出以下一些常见的问题和解决（缓解）方案。因为有时一个设计会同时影响到不同的部分，或者说它们本身就是存在联系的，所以内容上会有重叠的地方。