For the final assignment you can choose between writing a technical blog or implementing the leader election portion of Raft.
The starter code is now available!

Note: You can complete this assignment in a group of 1-3 students.

This project is adapted from MIT’s 6.824 course. Their assignment includes the full Raft protocol, but we will only do the first stage which focuses on leader election. You must implement the functions necessary to allow a candidate node to start an election and receive votes from other nodes. An elected leader should send heartbeat messages to remain leader, and if it fails, then a new election should begin.

We supply you with skeleton code src/raft/raft.go. We also supply a set of tests, which you should use to drive your implementation efforts, and which we’ll use to grade your submitted lab. The tests are in src/raft/test_test.go.

To get up and running, execute the following commands.

$ cd src/raft
$ go test
Test (2A): initial election ...
--- FAIL: TestInitialElection2A (5.04s)
        config.go:326: expected one leader, got none
Test (2A): election after network failure ...
--- FAIL: TestReElection2A (5.03s)
        config.go:326: expected one leader, got none
...
$

The code

Implement Raft by adding code to raft/raft.go. In that file you’ll find skeleton code, plus examples of how to send and receive RPCs.

Your implementation must support the following interface, although some of these functions will not be used for the Election phase. You’ll find more details in comments in raft.go.

// create a new Raft server instance:
rf := Make(peers, me, persister, applyCh)

// ask a Raft for its current term, and whether it thinks it is leader
rf.GetState() (term, isLeader)

A service calls Make(peers,me,...) to create a Raft peer. The peers argument is an array of network identifiers of the Raft peers (including this one), for use with RPC. The me argument is the index of this peer in the peers array. In practice you will never call this yourself - the test cases will call this to setup several nodes and automatically configure the network connections between them.

raft.go contains example code that sends an RPC (sendRequestVote()) and that handles an incoming RPC (RequestVote()). Your Raft peers should exchange RPCs using the labrpc Go package (source in src/labrpc). The tester can tell labrpc to delay RPCs, re-order them, and discard them to simulate various network failures. While you can temporarily modify labrpc, make sure your Raft works with the original labrpc, since that’s what we’ll use to test and grade your lab. Your Raft instances must interact only with RPC; for example, they are not allowed to communicate using shared Go variables or files.

Your Task: Leader Election

Implement Raft leader election and heartbeats (AppendEntries RPCs with no log entries). The goal is for a single leader to be elected, for the leader to remain the leader if there are no failures, and for a new leader to take over if the old leader fails or if packets to/from the old leader are lost. Run go test -run 2A to test your code.

The source code you’ve been provided is for an assignment that implements the full Raft protocol. We’ve removed many of the methods we won’t use, but some are still there. You should focus on the sections of code referred to as 2A (since this was the election phase of the original assignment). You won’t need to implement pieces referred to as 2B or 2C.

Hints:

  • You can’t easily run your Raft implementation directly; instead you should run it by way of the tester, i.e. go test -run 2A.
  • Follow the paper’s Figure 2. At this point you care about sending and receiving RequestVote RPCs, the Rules for Servers that relate to elections, and the State related to leader election,
  • Add the Figure 2 state for leader election to theRaft struct in raft.go. You’ll also need to define a struct to hold information about each log entry.
  • Fill in the RequestVoteArgs and RequestVoteReply structs. Modify Make() to create a background goroutine that will kick off leader election periodically by sending out RequestVote RPCs when it hasn’t heard from another peer for a while. This way a peer will learn who is the leader, if there is already a leader, or become the leader itself. Implement the RequestVote() RPC handler so that servers will vote for one another.
  • To implement heartbeats, define an AppendEntries RPC struct (though you may not need all the arguments yet), and have the leader send them out periodically. Write an AppendEntries RPC handler method that resets the election timeout so that other servers don’t step forward as leaders when one has already been elected.
  • Make sure the election timeouts in different peers don’t always fire at the same time, or else all peers will vote only for themselves and no one will become the leader.
  • The tester requires that the leader send heartbeat RPCs no more than ten times per second.
  • The tester requires your Raft to elect a new leader within five seconds of the failure of the old leader (if a majority of peers can still communicate). Remember, however, that leader election may require multiple rounds in case of a split vote (which can happen if packets are lost or if candidates unluckily choose the same random backoff times). You must pick election timeouts (and thus heartbeat intervals) that are short enough that it’s very likely that an election will complete in less than five seconds even if it requires multiple rounds.
  • The paper’s Section 5.2 mentions election timeouts in the range of 150 to 300 milliseconds. Such a range only makes sense if the leader sends heartbeats considerably more often than once per 150 milliseconds. Because the tester limits you to 10 heartbeats per second, you will have to use an election timeout larger than the paper’s 150 to 300 milliseconds, but not too large, because then you may fail to elect a leader within five seconds.
  • You may find Go’s rand useful.
  • You’ll need to write code that takes actions periodically or after delays in time. The easiest way to do this is to create a goroutine with a loop that calls time.Sleep(). Don’t use Go’s time.Timer or time.Ticker, which are difficult to use correctly.
  • Read this advice about locking and structure.
  • If your code has trouble passing the tests, read the paper’s Figure 2 again; the full logic for leader election is spread over multiple parts of the figure.
  • Don’t forget to implement GetState().
  • The tester calls your Raft’s rf.Kill() when it is permanently shutting down an instance. You can check whether Kill() has been called using rf.killed(). You may want to do this in all loops, to avoid having dead Raft instances print confusing messages.
  • A good way to debug your code is to insert print statements when a peer sends or receives a message, and collect the output in a file with go test -run 2A > out. Then, by studying the trace of messages in the out file, you can identify where your implementation deviates from the desired protocol. You might find DPrintf in util.go useful to turn printing on and off as you debug different problems.
  • Go RPC sends only struct fields whose names start with capital letters. Sub-structures must also have capitalized field names (e.g. fields of log records in an array). The labgob package will warn you about this; don’t ignore the warnings.
  • Check your code with go test -race, and fix any races it reports.

Be sure you pass the 2A tests before submitting Part 2A, so that you see something like this:

$ go test -run 2A
Test (2A): initial election ...
  ... Passed --   4.0  3   32    9170    0
Test (2A): election after network failure ...
  ... Passed --   6.1  3   70   13895    0
PASS
ok      raft    10.187s
$

Each “Passed” line contains five numbers; these are the time that the test took in seconds, the number of Raft peers (usually 3 or 5), the number of RPCs sent during the test, the total number of bytes in the RPC messages, and the number of log entries that Raft reports were committed. Your numbers will differ from those shown here. You can ignore the numbers if you like, but they may help you sanity-check the number of RPCs that your implementation sends. The grading script will fail your solution if it takes more than 600 seconds for all of the tests (go test), or if any individual test takes more than 120 seconds.

Submission

To submit your code, you should merge all your changes to master so that the master branch contains the final product. Please create an issue on your repo titled Submission with the following information:

  • Your team name and members
  • Which team members contributed which parts

Then tag us in the issue using our github usernames(@twood02 and @thelimeburner).