Module 16: Engineering Rules of Thumb

Quick estimates for how things are built

Part A · what engineering actually is
The one-sentence definition that covers everything
Engineering is the application of science and mathematics to solve real problems under real constraints — where constraints include cost, time, weight, safety, regulation, and what can actually be manufactured.
Science asks "why?"
Physics explains why a bridge can hold weight. A physicist publishes a paper.
Engineering asks "how, cheaply, safely?"
An engineer builds a bridge that holds weight, within budget, before the deadline, that won't fall down.
The engineer's eternal trade-off
Fast, cheap, good — pick two. Every engineering decision is a negotiation between these three.
Part B · the thumb rules every engineer uses
Safety factor Design load × 2–10 the most important rule

What it means

Build it stronger than the maximum expected load by a multiplier.

Typical factors

Buildings: ×2–3. Bridges: ×4. Aircraft: ×1.5 (weight-critical). Pressure vessels: ×4. Elevators: ×10.

Why not just ×1?

Materials vary, loads are uncertain, manufacturing isn't perfect, and failures are catastrophic.

An elevator cable rated for 1,000 kg is actually strong enough to hold 10,000 kg. This is why buildings don't collapse from one extra person, why bridges don't fail when a lorry hits a pothole, and why planes don't break apart in turbulence. The safety factor quietly saves millions of lives daily.

The 80/20 rule (Pareto principle) 80% of problems come from 20% of causes prioritisation tool

In practice

80% of bugs come from 20% of the code. 80% of heat loss from 20% of the building envelope. 80% of failures from 20% of components.

Action

Find the 20% first. Fixing it gives 4× more return than spreading effort equally.

Also true:

The last 20% of performance improvement costs 80% of the effort. "Good enough" is often the right engineering answer.

Vilfredo Pareto noticed in 1896 that 80% of land in Italy was owned by 20% of the population — and the ratio kept appearing in unrelated systems. Engineers use it to decide what to fix first, what to optimise, and when to stop. A bridge optimised for the last 5% of strength costs twice as much for 5% gain.

Order-of-magnitude estimation (Fermi estimation) ±1 order of magnitude is "good enough" back-of-envelope

What it is

A quick estimate using round numbers that gets you within 10× of the real answer, done in your head in minutes.

Classic example

"How many piano tuners in Chicago?" Fermi: population ÷ pianos per family × tuning frequency ÷ hours per tuner = ~125. Actual: ~150.

Engineering use

Before spending a week calculating: "Is this even in the right ballpark?" If your estimate says it would need 10× more power than available, don't detail-design it.

Enrico Fermi famously estimated the yield of the first nuclear bomb test by dropping scraps of paper and watching how far they moved in the blast wave — getting within a factor of 2 of the measured result. Engineers use this constantly: the first answer should come in minutes, not days. Only if it looks promising do you do the full calculation.

Tolerance and fit nominal ± tolerance manufacturing reality

What it means

Nothing is made perfectly. A "10 mm" rod is actually 9.95–10.05 mm. The ±0.05 mm is the tolerance.

Clearance vs interference fit

Clearance: shaft is smaller than hole (moves freely — bearings). Interference: shaft is larger (press-fit — never comes apart). Transition: could be either.

Why it matters

A jet engine has thousands of parts with tolerances of ±0.005 mm. Stack tolerances wrong and parts don't fit, or jam, or vibrate to destruction.

Tight tolerances cost money — exponentially. A part machined to ±1 mm costs €5. The same part to ±0.1 mm costs €50. To ±0.01 mm costs €500. The engineering discipline is: use the loosest tolerance that still works. Over-specifying tolerances is one of the most common and expensive rookie engineering mistakes.

The 10× cost rule of error correction fix it early, or pay 10× later project management

The rule

A design error caught at the drawing stage costs €1 to fix. At prototype: €10. In production: €100. After delivery: €1,000+.

Why

As a project progresses, more decisions have been built on top of the error. Fixing the foundation means rebuilding the house.

Applies to software too

A bug found in code review: 1 hour to fix. Found in testing: 1 day. Found in production: 1 week + reputation cost.

This is why engineering reviews, design audits, prototypes, and testing exist. The money spent catching errors early is always cheaper than correcting them later. The Boeing 737 MAX disasters were partly attributed to software changes late in the design process that weren't subjected to the same scrutiny as the original design.

Redundancy N+1 for reliability, N+2 for safety-critical failure tolerance

What it means

Have one more than you need. If you need 1 pump, install 2. If you need 2 engines, use 4.

Examples

Commercial aircraft: 2+ engines, 3+ hydraulic systems, 4+ flight computers. Nuclear plants: multiple coolant loops. Data centres: dual power, UPS, generators.

The maths

If each component fails with probability p, two independent components both fail with probability p². If p=0.01, two units: 0.0001 (100× safer).

A single-engine aircraft failing means a forced landing. A twin-engine commercial plane losing one engine continues to destination safely. This is why critical infrastructure is never designed to exactly minimum requirements — always with capacity to survive failures. RAID storage, dual power supplies, backup generators, emergency brakes — all redundancy.

Part C · back-of-envelope calculator
Quick engineering sanity checks — pick a scenario
Part D · the major engineering disciplines
Part E · the key concepts every engineer internalises

Feedback loop

Output affects input

A thermostat senses temperature (output) and adjusts the heater (input). Without feedback, systems overshoot or drift. Negative feedback = stability (thermostat). Positive feedback = runaway (microphone squeal, population explosion, compound interest).

Precision vs accuracy

Repeatable ≠ correct

Accurate = close to truth. Precise = consistent, even if consistently wrong. A scale that always reads 2 kg too heavy is precise but not accurate. A scale that gives random readings near the true value is accurate but not precise. You need both.

Failure mode analysis

"What can go wrong?"

FMEA (Failure Mode and Effects Analysis): list every possible failure, how likely it is, how bad the consequence is, and how detectable it is. Product of these three = risk priority number. Fix the high scores first.

The weakest link

System = its weakest part

Strengthening the strongest part of a system achieves nothing. A chain with one weak link fails at that link regardless of how strong the other links are. Always find and address the bottleneck first.

Diminishing returns

Each improvement costs more

Getting from 0% to 90% efficiency is often easier than 90% to 99%. The last 1% of fuel efficiency in a car might cost more to achieve than the first 30%. Decide in advance what "good enough" is.

First principles thinking

Build from fundamentals

Instead of "we've always done it this way," ask: what are the physical constraints? What does physics actually allow? Elon Musk famously used this to challenge rocket costs: what are the raw materials? Why does the assembled rocket cost 100× more? This mindset drives engineering breakthroughs.

Part F · famous engineering failures and what they taught

Tacoma Narrows Bridge (1940)

Resonance ignored

The bridge oscillated at its natural frequency in a 64 km/h wind and collapsed after 4 hours. Lesson: all structures have resonant frequencies. Match external forces to them and they amplify to destruction. Changed bridge design permanently — aerodynamic testing became mandatory.

Challenger Space Shuttle (1986)

Known failure mode, ignored

O-ring seals known to fail at low temperatures. Launch temperature: -2°C. Engineers objected; management overrode. 73 seconds after launch: disaster. Lesson: safety culture matters as much as engineering. Known risks must be escalated, not suppressed.

Millennium Bridge, London (2000)

Positive feedback oscillation

Opened June 2000; wobbled so severely it closed within 2 days. Pedestrians involuntarily walked in sync with the bridge's natural lateral frequency, amplifying the oscillation. Lesson: human interaction with structures creates unexpected feedback loops. Cost £5M to fix.

Mars Climate Orbiter (1999)

Unit conversion error

One team used imperial units (pound-force seconds), another metric (newton-seconds) for thruster data. The $328M spacecraft entered the atmosphere and burned up. Lesson: unit consistency is safety-critical. Always check units explicitly at system interfaces.

Part G · test yourself

1. A floor is designed to hold 500 kg/m² with a safety factor of 3. What is the maximum safe load, and why isn't the safety factor just 1?

The floor is structurally designed to hold 500 × 3 = 1,500 kg/m². The "safe load" rating (500 kg/m²) is the published limit users work to. The structure is actually 3× stronger. The factor isn't 1 because: materials vary batch-to-batch, calculation models aren't perfect, loads are dynamic not static (people jumping vs standing), materials degrade over time, and a structural failure is catastrophic. A factor of 3 means the floor could have a significant calculation error, a bad concrete batch, AND extra-dynamic loading — and still survive.

2. Estimate (back-of-envelope): how much steel is in a typical 10-storey office building?

Rough estimate: ~500–1,000 tonnes. A 10-storey building might cover 1,000 m² per floor = 10,000 m² total floor area. Typical steel intensity for a steel-framed office building: ~50–100 kg per m² of floor area. So: 10,000 m² × 75 kg/m² = 750,000 kg = 750 tonnes. This is the Fermi approach — use a density rule-of-thumb, apply it to the known area, and get an order-of-magnitude answer. Actual buildings vary from 40–150 kg/m² depending on height, span, and design — so the answer "~500–1,000 tonnes" is a solid back-of-envelope estimate.

3. Why does a plane with 4 engines not need to be 4× as powerful as a plane with 2 engines to carry the same load?

Because the extra engines are partly redundancy, not capacity. A 4-engine plane (like a Boeing 747) doesn't need more thrust than a 2-engine plane carrying the same weight — both need thrust equal to drag to maintain cruise. The 4-engine design was historically driven by reliability: with 1950s–1970s engine technology, losing one engine was a realistic risk. 4 engines meant you could lose one and still have 75% thrust. Modern high-bypass turbofans are so reliable that twins (like the 777 or A380's replacement A350) have made 4-engine designs nearly obsolete on most routes. ETOPS regulations govern how far a twin-engine plane can fly from a diversion airport.

4. A part is specified as "25.00 ± 0.05 mm." What does this mean, and what happens if you tighten the tolerance to ±0.005 mm?

The part must be between 24.95 and 25.05 mm — a 0.1 mm acceptable range. Any part outside this range is rejected. Tightening to ±0.005 mm means the acceptable range shrinks to 0.01 mm — ten times tighter. This requires: a more precise machine (CNC instead of manual), slower machining speed (more passes), temperature-controlled environment (steel expands ~11 µm per metre per °C — at room temperature, a 25mm part changes by 0.3 µm per °C), and more frequent measurement and rejection. Cost increases roughly 5–10× per order-of-magnitude tighter tolerance. The engineering question is always: does the function actually need this precision, or is it being over-specified?

5. A system has three components in series, each with 99% reliability (1% failure rate). What is the system reliability?

97.03%. For components in series, the system works only if ALL components work. System reliability = 0.99 × 0.99 × 0.99 = 0.970299 ≈ 97%. This is the critical insight: chaining reliable components in series always reduces overall reliability. A system with 100 components each at 99% reliability has overall reliability of 0.99¹⁰⁰ = 36.6% — barely a coin flip. This is why complex systems (aircraft, power plants) use redundancy (parallel components) rather than just making each component more reliable. Parallel redundancy: if one fails, the other takes over. System failure requires BOTH to fail = 0.01 × 0.01 = 0.01% — 100× more reliable.