Category Archives: Engineering

Engineering Principles – Don’t Over Engineer the Solution

A couple of years ago, I posted some thoughts about high-level architecture and design goals, really they are just the tip of the iceberg.  Here are some additional engineering principles – predominately don’t over-engineer and leverage the cloud.  Complex systems are difficult and costly to build, very expensive to maintain and scale poorly; be on the lookout for complex designs and sanity check yourself by explaining your design to a peer – hard to understand means very bad design.  Simplify, simplify and then simplify again while you follow the 80/20 rule.  Over-engineering is typically the root cause of many costly problems and a very common mistake so I thought I would add a few half-baked ideas here at the end of this decade.

In addition to over-engineering, engineers like to build stuff themselves – sometimes even things that other folks have already sunk many thousands of hours building, enabling many businesses to make better use of their expensive engineering resources.  Don’t reinvent the wheel – open-source commodity services and cloud-based managed-services will routinely enable you to quickly and efficiently focus your attention on creating value.

Take risks and learn from mistakes

Agile teams value failing fast; discuss and learn from your failures.  Engineering culture tends to be risk-averse, resist this tendency by aggressively learning through doing while understanding the risks – there is no need to mitigate every risk scenario.  Evaluate risk, advocate for experimentation, and transparently communicate to the business.  Making use of feature switches provides the ability to experiment in a production environment.  

Failing to design for rollback is a faulty design

When an agile team is aggressively taking risks, experimenting in production environments using features switches and expecting to fail fast, it is also essential that you can easily rollback code, data migrations and configuration changes in an underlying system.  The rollback should be validated to work in a staging environment.  Application complexity and frequent code releases are not acceptable reasons to not invest in rollback.

Don’t depend upon QA to find errors

It is impossible to replicate a production environment for testing – it is expensive to keep in sync, it doesn’t have critical user interaction and it will not have accurate customer data.  An agile team will emphasize experimentation in production over quality assurance during feature development and deploy small incremental releases with wire on and off functionality.

Design your application to be monitored

If you are interested in changing the conversation from the number of bugs to customer impact and resolutions times then you should learn to love useful monitoring as it can actually help you develop the product, decrease debugging times and understand customer impact; if you are building a native cloud-based product, use their logging services.  Think about your logging strategy up-front by asking is there a problem, what is the problem and where did the problem start.  Inefficient or non-purposeful logging will significantly increase storage and compute costs.  Age the logs and ship the log data to a central location to improve usability and reduce incident response times.  Logging should include product feature logging, analytics, to enable the business to make data-driven decisions about the use and usefulness of features.

Design for Statelessness, asynchronous communications and relax temporal constraints

Want to scale?  Independent microservice tiers that use asynchronous communications – queues using message-driven publisher-subscriber architectures – are essential.  It is very rare to need to maintain state server-side and an agile team will proactively guard against these types of architectures; instead, rely on caching and stateless deployments such as Lambda and ephemeral machines using AWS Beanstalk and ECS.  Beyond scaling, these principles will also enable modular designs, autonomous functions, fault tolerance, and degraded service rather than outages.

Beyond avoiding maintaining state server-side, alleviate temporal constraints as coupling significantly underminds fault tolerance and scalability; temporal coupling includes synchronous call between systems, systems in series, interactions with users waiting for writes to complete and the evil-twin, chained workflows.