How Systems Engineering differs from IT Systems Administration

I sometimes get asked about my Masters in Systems Engineering, usually by people who expect it to be more like network administration or similar IT work. I mean, I certainly could be doing IT; it’s how I paid for my undergrad. But compared to the rest of my career experience, that was long enough ago on simpler projects, so I hardly bother making space for that on my resume/CV anymore.

So no, Systems Engineering isn’t like IT. Instead, the fastest way I could describe Systems Engineering is “being selfishly empathetic for your fellow problem-solvers.”

“Selfishly empathetic? Huh?”

What I mean is that by having someone on the team trying to look at your problem from everyone else’s perspective, and occasionally poking everyone else into taking moments to do the same, you can all together (selfishly) save truckloads of money, time, and heartache.

Truckloads, or boatloads, so to speak…which gets into the longer way I could explain Systems Engineering, by telling a story from where I was employed while earning my Masters:

It may sound commonplace nowadays to enable disaster recovery by having redundancy across multiple regions from your cloud service provider. But what if you’re completely off-grid, and need real-time disaster responsiveness? Need multiple, physically separate, redundant processors, all running the same software, constantly analyzing the system they’re connected to? Controlling digital & analog I/O, and arbitrating disagreements if any one processor has a discrepancy of how healthy it thinks the system is?

My then-employer handed over this problem to a few of our long-tenured engineers, and they proceeded to make a series of well-reasoned, logical technical and business decisions…which still ended in a debacle.

How? First, let’s look at it from each of the leads’ perspectives:

The lead hardware engineer recognized, “This is being treated as a green-field project, we’ve got plenty of design freedom, and we can already tell we’re going to need to produce at least four or five times as many instances of this hardware vs. what a simpler, non-redundant concept would use. And that’s per time we install it! Plus, we can imagine tons of ways a similar hardware-redundant solution could make future systems safer. Let’s in-house-design our own custom hardware, and have it become our next mass-producible, reusable logic board, something ours and future projects could treat as a new standard.”

So they proceeded to design the hardware schematics. When it came time to pick the microcontroller, they chose a commercially-available one where supply availability wouldn’t be an issue (great!), price per chip was impressively low (desirable when anticipating scale-up; even a fraction of a penny difference adds up to enough to save a company a substantial sum & perhaps gift someone a well-deserved bonus when multiplied across enough produced units), and it was capable of running a real-time operating system.

Meanwhile, the software lead had been told things needed to operate in real-time. No problemo — already familiar with writing software for Linux/Unix; real-time forks exist; the chip maker’s marketing team is saying the hardware team’s chosen line of microcontrollers (foreshadowing: this product line, not this model) would allow for running a miniaturized but still POSIX-compliant RTOS. So the software team got to work developing code on their own Linux boxes, debugging and testing the software on their own desktop workstations, trusting that when they’d deploy it to the hardware, it’d behave pretty much the same.

You probably guessed it by now: Hard nope.

Shaving off a few pennies from the Bill of Materials for hardware only makes a ton of cents [pun certainly intended] when you’re scaling up into the tens or hundreds of thousands of units produced.

The hardware team had failed to recognize that a common piece of advice given for cost-savings in the broader industry didn’t apply the same way here. Given how we were at a certain defense contractor instead of consumer goods maker, this system, at most, ever, would’ve been installed…what, maybe fifty times? Being generous, claim double or triple that if you really want to get ambitious and say it would’ve been adopted as a new standard piece of hardware for other projects, like the designer had been eagerly hoping. Multiply it out all you want, but it’s still chump change compared to what “scale-up” means in the world of consumer electronics goods.

As for that few cents for the chosen microcontroller vs. higher-cost, more-advanced alternatives? Let me put it this way: later, in looking up documentation from the chipmaker, I stumbled on how their marketing pamphlets contained recommendations along the lines of: “This microcontroller is great for embedding in simple devices like smoke- and CO2-detectors. If you are designing a more complex apparatus, such as a dishwasher, you should upgrade to this other model or product line of ours instead…” I kid you not, that was an official marketing team stance — and that was assuming the software written for it would’ve just been simpler C code, akin to what you might nowadays put on an Arduino. Never mind trying to keep an entire RTOS in memory, juggling a multithreaded application capable of communicating with its other instances to perform disaster recognition & recovery.

Reading that, and realizing what this employer wanted to control with this hardware, I didn’t know whether to laugh or cry.

Because of how this employer handled its in-house hardware designs, those schematics were “set in stone” by the time I’d been introduced to this project. The challenge assigned to me then became coming up with a custom way to debug the runtime errors the software would encounter. And encounter errors it did, because when deployed to such a memory- and processing-power-restricted bit of kit, not only was it behaving differently than when unit tested on a Linux-based desktop workstation, it literally could not support a traditional debugger and breakpoints.

Exactly how I worked around that is a tangent topic & won’t be covered here. But it got deep enough for me to recognize that just like I’d facepalmed at the hardware decisions, the software team wasn’t faultless, either: just because the lead was familiar with writing concurrent software, and wanted to be able to both write and run his code from the comfort of his own familiar workstation, doesn’t mean that was the simplest way to design a perfectly-sufficient embedded firmware solution.

To give both the hardware and software leads credit where credit’s due: as hinted above, each had made well-reasoned, cost-efficient-sounding, and comfortably-familiar decisions…if regarded in isolation from each other. And that right there was the problem: seeing what happens when integrating the system together, (and hindsight being as clear as it is), it’s obvious everyone would’ve been better off with more of what you could call “professional empathy.”

How do things end up better with the help of a Systems Engineer on the team? (And assuming they’re on the team early enough instead of getting added so late into the project, like I was with the story above?)

Someone specifically trained not to “lose the forest for the trees” is there routinely considering the perspective of the fully-integrated problem and solution as a whole, instead of tunnel-visioning on only their portion. They can’t claim the same level of technical depth in every other field like the other component leads (no one human could for all of them), but they have enough interdisciplinary familiarity to speak about and understand all of them.
Someone’s there to recognize when a decision loses relevance because another coworker’s understanding of a less familiar-to-them discipline (whether another engineering field, or business context) is a bit off-target — such as the story above, where the hardware team was misapplying advice about keeping the Bill of Materials cheap, or the software team was scope-creeping what could’ve been kept simpler. Not contained in the above story, but often warned while earning my degree, is how an up-front-cost-saving decision might be ignoring the total cost of ownership, such as externalized environmental risks and costs when maintaining or disposing of the system or its byproducts after use.
Someone’s there ensuring the entire team is properly communicating their requirements and design, including insisting on cross-disciplinary peer-review of each other’s documentation (not just within your own working group) to catch such issues early, while it’s still feasible to change course.

All of the above sounds like they strictly slow the project down, but this is a case where “fast is not fast; smooth is fast.” It doesn’t take a whole lot of system complexity to reach the threshold at which having a Systems Engineer can pay for itself, either in terms of direct costs, better-off project schedule, and/or mitigated risks.

The above story tries to give a high-level example of a time I’ve seen a project sorely in need of someone to provide a Systems Engineering perspective, but on its own, it doesn’t answer to how I’ve since been able to take those lessons learned to other projects and employers. I can see I’ve already written too long a blogpost here already, so I’ll stop here for today. If you’re interested in hearing more about my more recent work, let me know if you want me to continue this series by reaching out via post comments or this site’s contact form.

How Systems Engineering differs from IT Systems Administration

Comments

Leave a Reply Cancel reply