Search

  1. Proper continuous integration using Github

    Hours of dealing with painful merges, rollbacks and fixing commits have driven me to write this post.

    Be it open source or enterprise, it is very common seing developers using GiT as a single central repository. And common errors like copy/pasting code across branches or moving files around in the filesystem are, unfortunately, very common.

    No, no, no! Oh no, no, no!

    One can argue that GiT and Github may be used in a number of different setups, but never with the practices above.

    Allow me to explain.





    The Evil practices



    Don’t copy/paste code across branches!
    Ever! Ever! Ever!

    First of all you should know copy/pasting stuff across branches is the root of all evil and it will have your virtual head cut off and set on a stake at the London Bridge of developers.
    It completely takes out the purpose of a repository by leading you to lose the entire commit history.

    This will break the entire “merge” functionality of the repo, leading you to have to manual merge any future commits on other branches.
    It will remove all accountability and developer credit for the developed functionalities.

    Like I said - the root of all evil

    Don’t take me wrong - almost everyone has, at some point when using GiT, done it.
    I have done it in the past, thinking of it as a simple and quick solution to an immediate problem, and I was oh so wrong. 

    It is technically possible to correct the mistake later on - but it takes a much deeper knowledge of GiT to rollback, cherry-pick, remerge, rebase, etc to make it proper again.. It can become a real pain.

    Ok, but what if you want to copy a single feature across to another branch?

    Always use the cherry-pick commands, then manual merge the necessary code to make it work.
    Whatever you do, never copy-paste across branches.



    Don’t move files around in the filesystem!

    The issue with it is that moving a file in the filesystem without notifying git can often result in git tracking the change as 2 different actions, a deleted file and new file created.

    This means you won’t be able to see a diff of the changes to the file in the commit, you will see it as a whole new file.
    Also, if someone else is working on the same file git won’t be able to merge the changes across as it won’t know the 2 files are effectively the same.

    So, if you want to move a file within your repo always use the git command for it: 

    $git mv folderA/file.a folderB/file.a              #move a file within a git repo

    I can’t stress the "always" enough - developers who have come across this issue will know what I mean.





    Best Practices



    Commit often

    Committing often will keep your history very organised, simpler to understand and to merge, and much simpler to correct if a rollback or picking out features is required.

    There is no exact measure, but ideally it will be a few lines of code or a relevant change per commit - whichever comes first.

    Eg.
    Commit ‘Changed search backend to support receiving search queries from querystring’ - a few lines of code.

    Commit ‘Changed search form to GET instead of POST’ - probably small code change, but relevant.



    Merge often

    The more often you merge, the simpler it is to merge, and less prone you are to manual merging errors.

    If the code is still “warm” in your head it will be very easy to merge. 5mins a day if you merge everyday.

    People who have had to merge branches that have diverged for several weeks or even months will understand the pain it can be to merge it. It can take up to several days and a lot of regression tests to make it work properly.



    Use and abuse branches

    Branches are not evil, quite the other way around.

    A branch a day will blow you away.

    Using a lot of branches will make it easier for each merge operation, it will allow to continue working on new features while others are being reviewed or tested, it will allow you to have different stable and unstable versions of your application.

    The different use cases for branches are explained further below.





    Continuous Integration and Distributed Revision Control

    For the sake of simplicity - I’ll explain this setup using GitHub as the example frontend.

    Have one central repository as a “main” repository - let’s call it origin - fork this one to each of the developers in github, let’s call each of these fork the developers hub.
    These two repos are “stored” remotely in github itself.
    Have everyone participating in developing the project with read+write access to origin. This way, everyone will be able to participate in the pull-request review process later on.

    The next step is for the developers to $git clone the origin repository into their local development environment - this will be the developers local repo.
    Now, on the local repo, add a reference for the developers hub, the upstream repository the developer will be pushing to.

    $git remote add upstream https://github.com/carlosouro/MyProject.git



    The model for a small team will look something like this:

    Collaborative Development Model using GitHub

    Notes: 

    • Downstream is where the merged code comes from; 
    • Upstream is where your new code goes to in order to be merged; 
    • To Live/Test is the mainstream where the common merged code travels to test/live environments. 


    Special Note: In case of a really big project, our mainstream would probably be just another upstream into a more centralised repo - this chain can continue indefinitely with several filtering steps where merge and regression tests are made. This is why GiT is the perfect tool for open source projects.





    Main branches per sprint, individual branches per feature



    Starting the sprint - Lead Developer

    Ok, let’s build a new set of features.

    Let’s start by creating a branch for our sprint in the origin (main) repository - this branch will serve as baseline for our sprints work and as a regression test branch if necessary.


    $git checkout master                      #go into your master code
    $git pull origin master                    #get the updated master code from origin
    $git checkout -b sprint_12             #create a new branch for the sprint
    $git push origin sprint_12              #push the sprint branch into github’s origin repo



    Developing a feature - Developer

    $git pull origin sprint_12           #get the sprint branch

    Downstream code can be pulled directly from the origin (main repo) into your local repo

    $git checkout -b myFeature          #create a branch for the new feature

    Before starting to work, create a branch for what you are about to do - this will make it so that you don’t have to wait for your code to be merged in to continue your work and you don’t have to rollback everything and rebase if one of your features is not accepted or requires some changes.

    $git push upstream myFeature          #push feature into your developer hub

    Upstream code will work it’s way up the chain from the developer’s local repo to his developer hub, and then pull requested into the main repository origin into the sprint_12 branch.
    Another developer can now review your pull request, know a little bit more about your code and maybe advise some changes. Naturally the more complex or deep the change, usually a more senior developer will review it, just in case.
    This way the code is reviewed in a fairly distributed way without wasting anyone too much time. 

    $git checkout sprint_12                #get back to the sprint branch
    $git pull origin sprint_12              #update the sprint branch


    And you’re ready to start working on your next feature! 
    No waiting time or having to wait your feature to be approved first. 



    Code Freeze, Regression and closing the sprint

    After a certain date you all agree the sprint is finished and no more features will be included. Ok, It’s regression time, let’s test the sprint_12 branch and make sure everything works well together.
    When finished and happy with the result, let’s merge it into master and make a release.

    The sprint is now closed and we can move on to the next one, in fact, testers can fallback a day or two in regression testing while the developers already started working on the next sprint*.

    Note: *for this you should create the sprint_13 branch from sprint_12 (as it is not yet merged into master) and you can have everyone working in parallel to it’s fullest without waiting times.





    A note on bigger projects:

    On bigger projects there are usually authoritative repos along the way to the actual origin in order for senior “lieutenants” to filter out what really matters.
    See more info on these models at http://git-scm.com/book/en/Distributed-Git-Distributed-Workflows



    Hope you put it to good use!

Notes

  1. code27 reblogged this from 9treeblog
  2. carlos-ouro reblogged this from 9treeblog
  3. 9treeblog posted this