Category Archives: Product Management

High level architecture and design goals

I come across many businesses that are not grounded in basic high level architecture and design goals.  Basically, they simply leap from features to building stuff, most often this happens because everyone is in a mad rush to just get something working as soon as possible.  The fundamental flaw in this approach is the belief that spending the time to build it “the right way” takes longer and is more expensive.  I’ve found the exact opposite to be true.  And for the record, the idea that you can just slop something together quickly for a minimally viable product and later insert a good architecture is pure non-sense – it never happens, ever.

Here are a few basic guidelines that apply to almost anything that you are building and I’m certain that there are plenty more that are specific to your particular challenge and some of the technology choices that you make along the way.

Support Hierarchical Configuration

Don’t cripple configuration automation and unnecessarily burden your operations team with configuration management.  Design configuration points in a hierarchical fashion such that all deployments derive from a base configuration and implement deployment specific configurations that override the defaults.  Wherever possible configuration should be modifiable at runtime and should be persistent across restarts.

Facilitate Production Troubleshooting

Don’t count on your development team’s access to production, it isn’t a good practice to allow a live debug session attached to your production environment.  When log messages are written to record exception situations they should include as much contextual information as possible in order to enable production support staff to recreate the conditions present at the time of the exception or undo data corruption that results from the error.

Fail Fast

Don’t unnecessarily retry what you already know won’t ever work.  If exceptional conditions occur the system will not be configured or coded in a way that directs it to retry the action that failed. This rule is particularly important where interfaces into 3rd party APIs are being configured.  If a 3rd party API is failing and we have no expectation that is should ever fail (say a pool API that provides us with database connections) there should be no attempt at reconnection.  The exception should be logged as a high priority exception (fatal) and messaged to a management system.

Automate Test Environment Management

Avoid accumulating manual test drag, there is a huge return on investing in test automation.  The design of test infrastructure will include a framework by which the overall test “suite” initial setup (configuration, code, results and data) is configured to a well known state and that individual unit test are otherwise atomic in their own setup and execution.

N+1 Design

Lots of stuff happens in the real world, stuff that you will never anticipate.  Ensure that anything developed has at least one additional instance in the event of failure.  There should never be less than two of anything.

Design for Rollback

Your release will fail, no doubt about it.  Any new design should be backwards compatible with previous releases.  Test your rollback before every release or you will get caught and the impact may be fatal.

Design to Be Disabled

Enable efficient maintenance and minimize outages – planned or unplanned.  Any system or service endpoint should be designed to be capable of being “marked down” or disabled.

Design to be Monitored

There typically are signs that a failure will occur soon, make sure you know that bad things are accumulating.  The system should be able to identify when it is performing differently than it normally operates in addition to alerting when it is not functioning properly.  An example of this principle is instrumenting the application to report performance statistics on page render times or query execution times.

Asynchronous Design

While it is nice to count on quick and efficient compute pathways, high scalability platform often benefit from offloading and distribution but this usually relies on asynchronous designs.  Wherever possible systems should communicate in an asynchronous fashion.

Atomic Compute and Stateless Systems

Don’t attempt to store state outside of your persistent data storage – you’ll unnecessarily create scalability obstacles and cripple your ability to build resilient platforms.

Scale Out Not Up

You’ll eventually not be able to buy a big enough server.  The system should be able to be horizontally split in terms of data, transactions and customers.


Outsource Unicorn

If I tell you that you can purchase a brand new iPhone for $5, what do you think?  Yet if I tell you that you can hire an outsourced developer for $10 / hour, somehow we set aside our dad’s advice that “nothing is free” and “you get what you pay for”.

Over the years, dozens of companies have asked me to join their company to help them “clean up their technology organization which they have unsuccessfully outsourced”.

Tell me if you’ve heard any of these before?

  • You should outsource that projects because engineering resources overseas cost about $10 / hour.
  • Our engineering team is buried for the next 12 months, just outsource that project.
  • That project is a one time project, don’t hire full time resources, just outsource it.
  • We outsourced all our engineering 3 years ago, it isn’t really working and now we need someone to come help us clean up.
  • You can hire a few outsourced resources to augment your core engineering team.

So while I believe that you may know of someone who has successfully outsourced a project, there are some real challenges to getting it to work well.  I’ve learned quite a few lessons along the way – here are some of these learnings.

As you know, communications is always a critical factor in any business – outsourced or not.  Communications in an outsourced relationship, especially offshore, is a very big challenge that is most often completely underestimated.  Not only are there usually language challenges, even if the outsourced team speaks english, but there are time challenges.  You should be prepared to manage workday offsets, sometimes by as much as 12 hours.  Outsourced companies will tell you that their resources will use technology to help communicate efficiently – Skype and Hangouts work poorly in many global scenarios – delay, echo, drops, etc…  Using Wikis, Basecamp, Slack, ticketing systems and email just like you probably use with your core team is helpful, but is not a great substitute.  Outsourced companies will also tell you that they will assign an on-shore resource that will manage the offshore team as a solution to communication and timeshifted work hours.  This approach helps, but again it is not great and at best it will add lots of cost.  It adds the cost of this onshore resource, but it also adds the cost of inefficiency in having a relay system in place.

In addition to communication, there is a very large issue of resource stability over time.  For whatever reason – maybe because they pay their resources poorly, outsource companies don’t seem to retain resources for more than a month or two.  The costs of changing resources is very high, as usual.  You will pay for the lost productivity of having to bring in a new resource and bring that resource up to speed.  Also, if the onshore manager, communication relay that I mention above happens to be the person who leaves your team then the impact is enormous.  Finally, keep in mind that if you reduce your demand for resources, of course the outsource company will move the resources and so starting another phase or even supporting your existing product will become very, very expensive – so manage your demand wisely or the supply will go away.

In my opinion, there is a big difference between engineers and coders.  Coders, simply implement – write lines of instructions to tell a computer what to do.  Understanding a problem and designing an approach to solving the problem using computer instructions is a totally different animal.  Designing an end-to-end platform that involved many, complicated interactions requires even more experience, education and skill – engineering is very, very different than coding.  I have yet to work with an outsource company that employs engineers and never an architect.  The results that I have routinely seen is giant pile of code, mostly undocumented and never, ever efficient.  By this, I mean there is never any use of modern object oriented constructs such as inheritance, if a routine is needed elsewhere – the code is copied and pasted elsewhere.  So when you have a bug, you probably have that bug in 10 places.  To make matters even worse, I’m just describing simple procedural routines.  I never, ever see good use of more advanced engineering concepts – for example, multi-threading or distributed processing.  All of this leads to a maintenance nightmare.  Sometimes you can lower the impact of this by hiring a full time employee as your architect, but even this is not foolproof.

The above are very serious landmines that are not easy to circumnavigate – you are not going to write an iron clad contract and secure a fixed bid contract to avoid all of this.  And while you may conclude that you should never use outsourced development, that is not really my mission here – you can use it, but it will be far more expensive than advertised and you will need to invest significantly in avoiding issues that will turn your project into a big problem.

Beyond the MVP

One of the most important product management concepts is getting to market sooner rather than later with the Minimally Valuable Product (MVP) and iterating or pivoting as you gather feedback and learn, especially if you are building new, innovative products.  This is simply the 80/20 rule applied to product management.  Ignoring this concept and you risk over investing before you truly understand market requirements and you are very unlikely to get a return on your time and money.

MinimumViableProductBut what happens after you’ve validated that your MVP is on-track and valuable?  How do you decide how much of the remaining 20% improvement is worth your time and money?  Information is the missing ingredient and often it doesn’t exist when you’re building something new.  The Imitation Game movie is about how Alan Turing crack the enigma code by building what many folks believe is the first computer.  The key was reducing the breadth of the problem, reducing it again and finally searching for a pattern to build into your product.  You can always expand the scope later.

How do you get the non-existent information?  In years past, the only way to get non-existent data is to gather the information yourself – often this is slow and very expensive.  All the map product guys employ many people to drive around and gather information on roads and buildings.  Startups never have the resources that Google has so alternatively today, crowdsourcing is an outstanding method and alternative to gathering your own data.  For example, Waze designed a product that encouraged their users to provide feedback and they crowdsourced information that would have cost many millions of dollars.

Finally, when you’re in the trenches of reducing the problem scope and searching for patterns, be on the lookout for technology game changers – i.e. internet, mobile, cloud, etc. – that change assumption and unlock new pathways.