How to clone a large mercurial repository?

Howdy readers! Have you ever experienced a condition in which you find yourself cloning a large repository and unfortunately lost an almost complete clone due to an unreliable network connection?
This experience is even more dreadful when you are having limited data in your internet pack. Let’s find out then, how to clone a large repository in a similar condition when you are unable to completely clone at once?

A few days back when I was intended to fix an easy issue and make my first open source contribution in Mozilla in the mercurial repository of mozilla-central. I didn’t know that, it was going to be a different learning process altogether. To contribute, what firstly I did was a usual process of OS contribution that is to clone the repository. I simply did this,

$ hg clone

But I was shocked, never before I had ever cloned a repository as large as this one. And my hard luck that I had the limit of 1GB per day in the data pack I was using. Cloning was ON, and I noticed that I couldn’t clone the complete repository with the limited data I had. So, I terminated the process.

And if you are familiar with DVCS like “git” or “hg” then you must be knowing that if a clone gets interrupted in between, then you don’t get anything on your system not even an empty folder of the repo.

Now I needed to find an alternative, and at first, I decided to visit my friend storymode7 on weekend to clone the repo. using the wifi available at his place. But before the weekend could arrive I met with this blog post How-to-clone-a-large-repository-using-git. This post was of real help.

$ git clone --depth=<depth_of_clone> <repository>

With this, you can deepen a clone from latest to the first commit in step-by-step like

$ git clone --depth=1 repository
$ cd repository_dir
$ git pull --depth=10
$ git pull --depth=20
$ git pull --unshallow

This last command pulls the rest of the repository.

Using a shallow clone for a git repository to solve my problem was just a life saver. And now I knew, how to do a shallow clone. But I found that this wasn’t the exact solution to my problem because there is no concept of shallow cloning in mercurial.

Though I knew that I can partially clone a repo. but how to perform such partial clone in mercurial? I searched and found out an alternate solution, that is “hg clone –rev” command.

“hg clone –rev” is not same as “git clone –depth” but could save me in the situation. “git clone –depth=5” clone the latest 5 commits and reduces the overall download size of the repo. and do shallow cloning. But on the other hand “hg clone –rev=5” clone the first 5 commits of the repo. and reduces the overall download size.

Yippie! Now I could clone the repo. in chunks with –rev. And I did this.

$ hg clone --rev=1
$ cd mozilla-central
$ hg pull --rev=2
$ hg pull --rev=5
$ hg pull --rev=70000
$ hg pull --rev=90000
$ hg pull

Once you are done with complete cloning, you may update your working copy of the code to start working. Using the following command.

$ hg update

The pull command pulls changes from the remote parent repository but does not actually make any changes to the files in the local repository.

Update command is used to actually update the files in the repository.

And alternatively, you can also download a bundle. See Mercurial bundles for information about downloading a single large file instead of using “hg clone”.

Let me know in the comments section below if there are more possible solutions for dealing with an unreliable connection while cloning.
See you in the next post. Till then this, is GeekyShacklebolt bidding you goodbye!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s