How to split off a subproject from Git repository
There are many guides on the Internet describing how to split off a subproject, but too many of them assume the subproject is already cleanly isolated in a subdirectory and it has always been so. Many split-offs I do however require messy surgery to separate the subproject. I need a quick, general, gotcha-free, and history-preserving procedure for such messy split-offs.
Let's first go over the gotchas and bad ideas:
- Many people recommend git's
filter-repo
andfilter-branch
commands. This however only works for subprojects that are already isolated in a subdirectory. To preserve history, these commands also require the subproject to have always existed in the subdirectory. That's a tall order. - You can
cherry-pick
commits related to the subproject, but that's tedious and nobody will ever find time to do it. It also fails when commits mix subproject changes with changes in the wider project. - You can fork the project and then cut it down, so that the fork contains only the subproject. This is however going to cause a ton of issues. Tags, especially version tags, will be carried over to the new repository. The excess history unrelated to the subproject will pollute GitHub Insights and your contribution activity.
- You can just copy subproject's files into pristine repository and refer people to the parent repository for history. This almost works, but the problem is, isolating a subproject is rarely just a matter of deleting unrelated files. Many non-trivial steps might be needed to create a subproject and you want to keep history of these steps.
The right way to do it is to create a branch in the original repository, cut it down there, and only then copy files to pristine repository. You can then turn the branch into a tag, which will preserve the history of the split-off.
The exact steps are detailed below. We will split off subproject "puppy" from parent project "dog".
Step 1: Create a branch that will capture changes done during the split-off.
cd /path/to/dog git checkout -b puppy-forkoff
Step 2: Make changes to isolate the puppy subproject and commit. You can create multiple commits if the changes are complicated.
# Remove unrelated files and directories git rm -r unrelated-folder1/ another-unrelated-folder/ # Move files around as needed to organize your subproject mv old-location-of-puppy-files/ new-puppy-location/ # Commit these changes git add . git commit -m "Isolated puppy subproject"
Step 3: Test your isolated subproject. Before proceeding, make sure that your subproject is fully functional in isolation. Run any tests or build processes to confirm that it works as expected.
Step 4: Create a new repository for the puppy subproject, clone it locally, and copy files into it.
# Clone the new repository cd .. git clone https://github.com/user/puppy.git cd puppy # Copy files from the puppy-forkoff branch into this repository cp -r ../dog/* . # Add the copied files to the new repository and commit them git add . git commit -m "Forked off puppy subproject from dog project"
Step 5: Convert the branch into a tag in the dog repository. This tag will preserve history of the split-off.
cd ../dog # Create a tag from the current branch state git tag puppy-forkoff -m "Forked off puppy subproject" # Switch back to master, because we cannot delete checked out branch git checkout master # Delete the branch without merging, as the tag preserves the state git branch -D puppy-forkoff # Push the tag to upstream repository git push origin puppy-forkoff
Step 6: Back in master
branch, remove puppy-related files.
# Remove puppy subproject git rm -r puppy-files/ other-puppy-files/ git commit -m "Forked off puppy subproject"
And that's it! You now have two separate projects: dog and puppy.
Puppy gets a clean new repository that captures only new changes and carries no remnants from dog repository.
Its history, including split-off steps, is preserved under puppy-forkoff
tag in dog repository.