Duplicate systems - same or different vendors?

© Copyright 2012 Rod Hughes Consulting Pty Ltd

Note - if the navigation pane on the left of this window is not visible, click the 2-pane icon on the top bar


 

Probably no "right" answer.

In terms of failure rates the early electromechanical relays hardly ever failed - I recall a study we did based on factory returns of the YTG distance relays - they had an Mean Time Between Failures of something like 350 years. 

Of course that is just a mathematical statistic as I know there are utilities all over the world who have recorded many failures or mal-operations of one sort or another and the relays have "only" been in service for 50 years.

Historical reasons for duplicate protections from two different vendors can be attributed as sound engineering principles of using different measuring principles to maximise the potential coverage of all power system faults - lets talk with some imagery - consider all possible fault scenarios fitting in a square 10 x 10.  

click to enlarge:

One vendor's device physical construction (being electromechanical) might detect a circle of fault conditions say 8.5 diameter - perhaps even able to detect faults that can't happen.  The other vendor's product probably has a different shape of some sort simply as a different construction, and may be offset to the other vendor.  The combined effect is you get more of the square covered by two overlapping different operating principles with two different shape coverage and positioning in the square.  Sometimes both might operate for the same condition.

As electronic relays came into bigger use, we could manipulate the shape and coverage of the relay characteristic closer towards the square boundary - with numerical relays even more so.
On the downside there was early issues with reliability of new technology components (resistors, capacitors, diodes, transistors ...).  Two different relay manufacturers therefore still gave some 'confidence' that a common failure mode would be unlikely - not withstanding that it was highly likely that the simple electronic circuits used the same brand of diode, transistor etc.  These days the same microprocessor chip and/or same communications port chip/stack....
Some relay vendors for many years put all electronic components through extensive heat soak "burn in" as part of the acceptance of the delivery to their factories. Over a relatively short time, the reliability issues of electronic components improved to the point of general electronic component failures not being an issue in general. 

Several years ago CIGRE AU B5 did an internal MTBF analysis of utilities experience based on electromechanical vs electronic vs digital vs numeric.
Largely it all depended on whether you considered a proactive bug fix alert as a failure even though your system never experienced one and you could take pre-emptive updates so not really conclusive as such. It didn't go into comparison of vendors.

Then came the "unreliability" associated with huge complex setting files.

I know of one utility that decided setting of modern numerical bus protection schemes was getting so complex with complex proprietary software, they could reduce more mal-operations by having the same relay and tool for both of the duplicated systems since the setting and commissioning engineers needed to only be familiar with one device - more familiar means less 'silly' mistakes and just down to an engineering calculation error which may then be proliferated in both but which is more readily detectable by setting reviews and quality checks.

However busbar protection devices are deployed on a per busbar basis and hence relatively few instances in any one substation.  In an "emergency" you can possibly/probably get away with simple temporary extension of line protection/transformer zones if you have a generic failure mode that affects both bus bar protection systems.

On the other hand applying that philosophy to several in/out feeders or the transformer protection in the substation is probably not wise since a common mode on the same duplicated vendor means it is almost impossible to take the risk of both systems having common mode failures affecting all feeders or all transformers.  e.g. there have been various vendor recalls of new relays with bugs from all vendors including one I knew of where precisely 91 days after commissioning - not more, not less - the distance relay would lock out.  Not good if that relay was the same one used for both duplicate systems on all incomers and/or outgoers.  (The recall program was fortunately extremely successful as it was basically identified 3-4 months after first production models dispatched so not a huge installed base to contact)  All vendors have such stories.  This one was late 1990s so all OK now on that one.

So it is all about risk assessment.

Is the risk and consequence of using a common brand higher than the risk and consequence of using different brands?

The "right" answer is specific to your situation, vendors, fault fixing and recommissioning time including geographic complications.

And now the innovation - 5-minute restoration at 3 am without the technician leaving home .... refer paper and presentation RH19D at https://ideology.atlassian.net/wiki/x/HYBq


The "Same supplier but different principles" justification

I do agree that to some extent the two different principles from one supplier has HAD some merit in the past,
e.g. a distance relay and a line differential relay from the same supplier.  

Even the "measuring principles" of say the YTG to the PYTS to the SHNB were all definitely different even for distance relays from GEC.  

The principle of the SHNB to the SHPM to the LFZP were all the same although the hardware to provide that measuring principle was a special chip in the SHNB, versus software in SHPM and LFZP.

However these days the algorithms are all standard across the vendor's "platform" of numerical relays.  The hardware of each IED application can even be identical for the auxiliary power supply modules, CT/VT input modules and CPU module - perhaps a different combination of digital I/O and comes card - this is WHY it is a PLATFORM - just listen to all the vendors wanting to go this way since it minimises development costs.

So whilst the modern distance relay is a different measuring principle (i.e calculates impedance in some form) is different to a line differential (i.e. calculates sum of current in and out of the line at each end), you may find both are identical hardware and firmware platforms - even the ability to operate as one or the other simply enabled/disabled by configuration code!

Different principles may not be so obviously the actual case as the "application name" (distance vs differential) of the device would suggest.

The "But modern passenger aircraft have all two or four engines from one supplier" justification

I've heard this aircraft analogy before - "in my humble opinion" it is "cute" but off the mark.  .... Interestingly often quoted by vendors - I wonder if they have a vested interest?.

Yes aircraft must keep flying and yes they (we all) accept that when the aircraft manufacturer builds an aircraft they are going to use one engine manufacturer.
We also know that some aircraft can suffer the failure of one engine, on some aircraft more, and still keep flying safely (obviously except single engine aircraft - although they tend to have better glide capabilities than a 400 ton body of steel!).

We also all accept that when we build a new substation we would (generally) buy:

  • all the transformers from manufacturer "A",
  • all the Circuit Breakers from manufacturer "B"
  • all the CTs from manufacturer "C"
  • all the VTs from manufacturer "D"
  • etc
    Or perhaps one or more of those vendors gets lucky for supply more than one package.

Thereafter refurbishments  or replacements may start to mix and match - something not so easy to do with aircraft engines I would assume.

Interestingly, we may buy T/F sized individually to satisfy supply continuity for N-1 transformers.
This is the equivalent engineering criteria of one aircraft engine failure scenario.

However we don't buy two CBs for each bay from the same OR different manufacturers.  It is just not economical.
This is just as we don't build aircraft with duplicate wings using the same OR different steel.

We don't buy duplicate CT stanchions for each bay but we do buy the CTs with multiple cores made by the same CT core manufacturer as duplicated cores.

We don't buy duplicated VTs or even have duplicate secondary windings of the VTs - just use individual circuits running from the one secondary winding.

These are all PROCESS level primary plant/equipment for making the system work as an overall entity.

The aircraft engines are the PROCESS equipment of creating thrust to push the aircraft forward. They are duplicated from one supplier
The aircraft wings and ailerons are the process equipment that give the aircraft lift to fly. These are NOT duplicated at all.

So if we DID have a common mode of failure of all 4 engines, we would still have a plane crash.

If we did have the right wing fall off we would also have a plane crash.


So lets ignore the PROCESS level comparison as irrelevant to consideration of single sourced duplicate protection equipment.

The subsequent justification of single-vendor solutions is that aircraft use a sophisticated control system provided by one intelligent equipment (that is the equipment, not necessarily, although hopefully,  the supplier ... (smile) ) . 
The duplicate system thereto is probably supplied by the same supplier.
The programming and configuration of the duplicated aircraft control systems was probably done by the same engineering group.

So the more direct comparable "justification" of single-sourced duplicated protection systems would be  "if one secondary systems supplier is good enough for flying ...."

However the design of the hardware and the programming of the control system has been specifically developed over perhaps a decade of specific aircraft development in order that hundreds of identical aircraft can be pushed out the factory and sold to any airline company. 
Its operational environment is uniquely and specifically designed, developed, tested, failure modes created and eliminated over several years of intensive engineering.
19 years to be more precise  (The A380 development started in 1988, first flight 2005 with introduction to commercial flight in 2007 http://en.wikipedia.org/wiki/Airbus_A380)

So to use the "justification" of "if one secondary systems supplier is good enough for flying ...."  to buy both duplicated devices from one supplier - well, lets make sure it has been fully tested for all actual power system faults it would be required to detect, stability for all actual power system faults it would be required not to detect, all failure modes of the hardware and/or firmware combinations, and then to make it economical, design the entire power system incorporating a vendor specific secondary IED and its identified capabilities/failure modes and take 20 years to design the first power system and substation to those requirements and then every substation thereafter to be identical in design and identical in construction and testing of all the substations built by the same substation contractor or utilities internal staff ...

We "could" do all that as a proper engineering equivalency of applying a single-source solution...

Or we could take a pragmatic risk management approach and just eliminate as much of the effort to achieve that certainty by using systems that have as few common modes of failure as we could possible reasonably be concerned about or have control over.

"Superjumbo wing cracks fix still over a year away" - That will make you think about keeping an eye on the wing to make sure it doesn't start to crack rather that whether the engine is working next time you fly ... - what you could do about the cracking is probably somewhat limited other than to calmly alert the Captain ...(wink)  

Ultimately it is relatively easy to design and implement systems to work.

The challenge is to know and manage the effects of when any part of the system doesn't work properly .... THIS is responsible engineering!



Contact Me

Skype: (ping even if showing offline)
My status

Email Me

A phone call is nearly always welcome depending on the time of night wherever I am in the world.
Based in Adelaide UTC +9:30 hours e.g.

April-SeptemberNoon UK = 2030 Adelaide
October-March:Noon UK = 2230 Adelaide

  Office + 61 8 7127 6357
  Mobile + 61 419 845 253



Extra Notes:

Disclaimer
No Liability:
Rod Hughes Consulting Pty Ltd accepts no direct nor consequential liability in any manner whatsoever to any party whosoever who may rely on or reference the information contained in these pages.  Information contained in these pages is provided as general reference only without any specific relevance to any particular intended or actual reference to or use of this information. Any person or organisation making reference to or use of this information is at their sole responsibility under their own skill and judgement.

No Waiver, No Licence:
Whilst the information herein is accessible in the public domain, Rod Hughes Consulting Pty Ltd grants no waiver of Copyright nor grant any licence to any party in relation to this information.