Building awesome opensource projects
This is the story behind SourceAFIS in case you were wondering why would anyone build opensource fingerprint matcher. I happened to do many things right back then, so perhaps there is something to learn from the story.
SourceAFIS is undoubtedly a successful opensource project. Both in terms of popularity and technical quality. It’s the most popular opensource fingerprint recognition engine. It’s cleanly implemented and highly reliable. It was rapidly developed, testing limits of programmer productivity. It’s denting user base of commercial vendors. Big customers are starting to consider SourceAFIS instead of commercial AFIS. SourceAFIS achieves higher accuracy than academic projects incorporating existing published research. User feedback is overwhelmingly positive. The project has attracted contributors and triggered development of spin-off projects. There’s a lot to be proud of. Sure there is a lot of room for improvement too. I would be the last one to question that. Yet the current version keeps users happy and there’s a clear roadmap to resolve outstanding issues in future versions. But how did I get here? A look back at how it all happened might teach you a lot about how to succeed in software development, be it opensource or commercial. It’s also a useful reminder for myself that tells me what works and what doesn’t in software projects. So how did it start?
I’ve spent 2.5 years of my life at Innovatrics, a technology company specializing in high-volume fingerprint matching. I have touched every bit of their solution except the core algorithm. That’s nothing unusual in business. Separation of responsibilities and protection of intellectual property take priority over employee’s curiosity. Nevertheless, I am quite clever developer and the matching algorithm was bugging me all the time. How does it work? How can it be so fast? How does it deal with score jitter? It was driving me crazy. I needed to know. Since I couldn’t take a look into Innovatrics’ top secret algorithm, I started to invent my own. During lunch and while in a bus, my mind was filled with various algorithms that could be used for fast and accurate fingerprint recognition. By the time I have left Innovatrics, I already had a full picture of what fingerprint recognition algorithm looks like. But knowing how it works wasn’t enough. Now something else was bugging me. Would it really work? Did I miss anything? How would it perform in practice? I needed to put my ideas to test. I started working on SourceAFIS right after leaving Innovatrics in December 2009.
At this point, I was glad I haven’t signed any non-competition agreement (also somewhat deceptively called non-disclosure agreement or NDA). If I did, I wouldn’t be able to test my ideas and they would be waking me from sleep to this day. Not that there was anything to protect with NDA in the first place. I have never seen Innovatrics algorithm. And if I haven’t seen it, I couldn’t have possibly copied it, right? This was my second line of legal defense. Take my advice and avoid NDA at all costs. NDA doesn’t just protect IP. It will block you from accepting the most interesting and most profitable job offers. It will weaken your position when the time comes to negotiate your salary. Accidental leaks of company data could cost you huge sums of money. Your safety suddenly depends on your employer’s benevolent interpretation of NDA terms, which could change instantly when you stop being a friendly, orderly employee. Significant portion of your brain is now a property of your employer. You have to ask for permission to use your own knowledge. And worst of all, your creative freedom is gone. You cannot materialize your imagination anymore. Your ideas cannot leave your brain. They will haunt you for years. Your NDA might be buying you a ticket to insane asylum. It might earn you a few extra bucks now, but it will ruin you in the long run. NDA is always a lossy business. Be a smart investor. Make sure you own your stuff.
I’ve spent December 2009 writing all the core algorithms of SourceAFIS. I had everything laid out in my head already. I just needed to write it down. I did my research in the first few days. I have looked up all the competing opensource projects and tried to learn from them as much as possible. SourceAFIS extraction algorithm, for example, draws heavily from NBIS MINDTCT algorithm. Many things I’ve read during those days just confirmed my existing understanding of the fingerprint recognition process. There were a few surprises though. I’ve combined all the good bits from other opensource projects. I have then innovated everything as much as possible. I feared failure a lot back then. In order to quiet my own worries, I have planned and validated every bit of SourceAFIS design in great detail very early in the development process. I knew my time was limited and I mercilessly slashed all redundant functionality in order to ensure timely release of the core functionality. Paranoia pays off. Paranoid attitude encourages planning and risk mitigation. Planning is essential for success. If your plan isn’t defensible under scrutiny, you are most likely lying to yourself. Sweet lies will kill you.
I could finally start coding. I am not particularly fast coder. Yet I needed to do a lot of work in a very short span of time. I needed to figure out how to be super efficient. Everything was already designed in plain text. I just needed to translate it to code. Preparation paid off once again. I’ve used C#, the most advanced language available at the time. C# was perfect for expressing complex high-performance algorithms. I’ve downloaded several fingerprint databases off the net. This way I could run an automated benchmark of accuracy and speed early in the development process. I didn’t have to wait for user feedback. Various constants littered SourceAFIS code. My design already counted on hill-climbing algorithm to tune the constants. When I first ran it, it worked like a charm. Within a few hours, it reduced EER to half of its original level and effectively saved me weeks of manual tuning. I’ve stubbornly refused all micro-optimizations that would complicate the code. Clean code has paid off when it turned out I needed to replace some algorithms with better alternatives. I could just swap a component and run tests again. I could improve accuracy and speed much more efficiently with algorithmic changes than with micro-optimizations. I haven’t written a single line of test code. Automated tests are good for production quality, but they just stand in the way when prototyping. When I eventually started writing tests, they were all high-level tests that implicitly tested everything underneath the top layer. My tests were heavily data-driven in order to reduce the need for manual coding of numerous special cases. I wrote several visualization routines that were later combined into Fingerprint Analysis app that provided unprecedented insight into inner workings of the algorithm (Fingerprint Analysis app no longer exists, but its functionality is available via transparency API and corresponding visualization library). It was kind of a domain-specific visual debugger. It allowed me to quickly pinpoint the cause of every match error. I’ve utilized external tools as much as possible. If I could delegate work to an external library or a tool, I did so. SourceAFIS has very little infrastructural code. There’s a pattern to all these tricks. All my efficiency was in smart decisions. I never learned to code quickly. I have learned to write less code and to take fewer steps to stabilize it. I have sort of hired tools to do jobs for me. I’ve only considered tools that saved me time. I’ve surrounded myself with an army of robot workers.
Once I knew the core functionality of SourceAFIS is stable and healthy, I’ve started to work on user friendliness or, as people call it these days, on user experience. Ease of use turned out to be a killer feature. Most AFIS solutions have complex functionality. They are designed for the most demanding applications with large datasets. They assume that big projects can invest heavily in integration of fingerprint recognition into the overall application. SourceAFIS, on the other hand, is used mostly for small-time applications. It’s used by students in classrooms. It’s used for prototyping and demoing before expensive AFIS is swapped in. It’s used in small commercial applications of up to 1,000 enrollees. These kinds of applications absolutely require extremely low total cost of ownership. In light of this observation, features are not that important. They can be actually counter-productive. What’s really important is quality. Bugs are costly to end-users. Counter-intuitive user experience is costly. Lack of documentation is hugely expensive. Here my obsessive perfectionism could really shine. I’ve created clean API, highly reliable implementation, complete documentation, click-through installer, and a wiki listing all the important facts and links in one place. I’ve signed up to FVC-onGoing to give SourceAFIS external assessment of quality. I’ve made sure that SourceAFIS is discoverable by people who look for it. I even gave SourceAFIS permissive license in order to save people the hassle of license incompatibilities. Making SourceAFIS into the best opensource fingerprint engine was crucial, because it simplified user’s decision making process.
The beauty of my approach is in the details, but perhaps we can observe some key patterns. What ingredients go into my recipe for success? Follow passion, not money. Protect your creative freedom. Draw inspiration from your competitors. Make a defensible plan. Do the basics before going cutting-edge. Delegate work to tools. Make it easy for users. I should add that your approach will depend on the kind of project you want to do. While reading checklists like this one, you shouldn’t forget that you have to be mindful of your goals. I am describing many shortcuts, but you have to pick the ones that help you get faster to your goal.