One of the most popular wired network communication protocols is Ethernet1. Since its first release, it has been enhanced several times. Starting by the traditional Bus topology and the coaxial links, now Switched Ethernet architectures with optical links are available with speeds reaching 10 Gbps. Ethernet is based on the CSMA/CD mechanism. This is a non-deterministic protocol. With the introduction of switches, the non-deterministic nature of Ethernet is partially resolved. Now, different problems exist such as queuing delays and queue lengths. Even though Ethernet is non-deterministic by nature, this did not stop researchers in academia and industry from using the Ether-Channel as a communication medium for the most critical applications: Control Systems.
One of the most popular Networked Control Systems (NCS) protocols is CAN [1,2]. It was developed by BOSCH. Its function is to communicate control data from different control nodes in order to replace the tradi- tional point-to-point links present in the early control systems. The automotive industry is the principal driving force behind the development of new control schemes. This is why CAN has a special version for automotive on-board network implementation.
Since Ethernet appeared in the world of wired com- munication systems, the implementation of Ethernet as a communication medium for NCS was a must. The non- deterministic nature of Ethernet was first thought to be problematic because of the real-time constraints inherent in control systems; however, research showed that Ethernet (or IEEE Std 802.3) performed well in NCS either by changing packet format for real-time control messages, or by giving higher priority for these messages [3-5]. The standardization process for the use of Ethernet in control is also under way2. Rockwell Automation and the ODVA also proposed the EtherNet/IP as an industrial version of Ethernet and they have developed the CIP [6-9]. More references on this topic can be found in .
In , a new methodology was proposed, namely the use of Ethernet (IEEE 802.3) without any modifications in the context of NCS. This proved to be successful not only for pure control loads but also when mixing real- time and non-real-time messages. In , fault-tolerance was introduced on this scheme in the context of several machines working in-line.
This new methodology was also introduced in car networks . It was shown that Gigabit Ethernet was able to integrate real-time control functions with non- real-time entertainment functions. More details about the use of NCS in cars can be found in . It was also shown in [15,16] that the same principle is also applicable for train wagon control. In , two train wagons were studied; all sensors, controllers and actuators were connected on top of Gigabit Ethernet. It was shown that this architecture was successful in meeting the real time delay deadlines. Furthermore, the increase in system reliability due to fault-tolerance was calculated.
In this paper, the fault-tolerant network described in  is revisited. The effect of the efficiency of the error detection and reconfiguration mechanisms on the reli- ability of the control function is investigated. In order to reduce the effect of unsuccessful reconfiguration on sys- tem reliability, a novel scheme is developed. A Continu- ous Time Markov Chain (CTMC) is then used to prove that the reliability of this scheme is higher than that of a more conventional reconfiguration scheme.
The rest of this paper is organized as follows. Section 2 summarizes some of the work done in Ethernet train networks. Section 3 focuses on the new fault tolerant scheme developed in this paper and presents the Markov model that is used to calculate system reliability. In Sec- tion 4, it is proven that this new scheme increases system reliability. Finally, Section 5 concludes this research.
2. Ethernet Train Control Network
Due to the current technological advancement, enter- tainment and multimedia are becoming a necessity on board of moving vehicles . Consequently, Ethernet evolves as a promising technology in train control net- works over the currently used protocols such as Local Operating Networks (LonWorks), Train Communication Networks (TCN) and Controller Area Network (CAN) [19,20]. In , it was proven that the use of Ethernet, as a control protocol in trains, could allow carrying an en- tertainment load on top of the control load. This was achieved without jeopardizing the packet end-to-end delay requirement of the control data. A Gigabit Ethernet network model is proposed as a control and entertain- ment network within a one 60-seat train wagon . The network consists of 250 nodes, the maximum number of sensors and actuators currently allowable in train stan- dards . Additionally, there are two categories of en- tertainment traffic added to the control traffic. The first load is in the form of video streams. The second load is a WiFi traffic produced from mobile wireless nodes (lap- tops).
With a packet payload of 32 bytes, the sensor to ac- tuator packet end-to-end delay was measured using OPNET3 simulations. This measured delay includes all the processing, propagation, encapsulation and de-cap- sulation delays. This architecture succeeded in meeting required deadlines. All simulations were run for 16ms and 1ms sampling periods. More information can be found in [15,24].
2.1. Enhancing Train Network Reliability
In order to increase system reliability, two controllers are used instead of one controller . A Control Server (Controller) handles the control load and an Entertain- ment Server handles the entertainment traffic (video streams and WiFi load). The Entertainment Server acts as a backup for the Controller in order to enhance system reliability. Figure 1 below shows the enhanced network model that was successfully simulated with OPNET.
2.2. Ethernet in Two Train Wagons
In , the network model is upgraded to include two wagons. The two wagon network consists of two Gigabit optical fibre star topologies, one per each wagon. In each wagon, the same network model proposed in  and shown in Figure 1 is modelled. These two star networks are connected to each other via a 10 Gigabit Ethernet optical fibre cable at the main switch level. Thus, the two wagons can exchange information.
To further increase network reliability, both Controllers and both Entertainment Servers serve as backups in case of a Controller failure. The worst case scenario is the one where three out of the four Controllers/Entertainment Servers fail; the remaining Controller/Entertainment Server handles the control load of both wagons while the entertainment is dropped in both wagons. Consequently, each sensor in the system has to multicast four replicas of its packets. These packets are sent to two Controllers and two Entertainment Servers.
Figure 1. One wagon network model.
In the context of two wagons, the main metric under study is the maximum sensor-to-actuator packet end-to- end delay. This measured delay includes all the process- ing, propagation, encapsulation and de-capsulation de- lays. OPNET simulations showed that this architecture, when fault free, is successful in meeting required dead- lines. The worst case scenario was also successfully simulated with OPNET; one controller handled the con- trol load of both wagons while the entertainment was completely dropped.
3. Novel Error Detection/Reconfiguration Scheme
The fault-tolerance mentioned above is expected to increase system reliability. Let Rcontrol-FT be the reliability of the control function of both wagons. Furthermore, assume that the controllers in both wagons are identical. The same assumption will also be valid for the Entertainment Servers in both wagons. Let RK be the reliability of any of the two controllers (K1 and
Note that (1−RK) is the unreliability of a Controller while (1−RE) is the unreliability of an Entertainment Server. (1−R(t)) is the probability that the system has failed at time t. Intuitively, the fault tolerant architecture described in the previous section should increase the reliability of the control function in the context of two wagons. Without any fault tolerance, the control function will fail as soon as either of the two Controllers fails. Let this architecture have a reliability Rcont.
The time to failure of electronic equipment has been historically assumed to be exponentially distributed [24, 25]. The failure rate will therefore be constant. Let λK be the failure rate of any of the two Controllers and let λE be the failure rate of any of the two Entertainment Servers. The relation between the reliability and the failure rate is as follows [24,25]:
By comparing Rcontrol-FT and Rcont, it was shown in  that fault-tolerance had increased system reliability as expected.
This increase in reliability relies on the implicit as- sumption that the switching tasks from a failed Control- ler/Entertainment Server to another operational Control- ler/Entertainment Server will always be successful. Next, the details of this switching mechanism are explained in the context of a K failing and E taking over its tasks. In the fault-free situation, all packets sent from the sensors are received by both servers: K and E. Only K responds to these packets, calculates the necessary control packet and sends it to the designated actuator node. A watchdog in the form of “live” packets is continuously exchanged between K and E. When E detects a missing watchdog (which indicates the failure of K), it gets into the loop to replace the inactive K and sends the control packet to the designated actuator. The control procedure running on E and used to backup K in case of failure, must be designed to accommodate the loss of one packet. Also, the control system must not be susceptible to the loss of one control packet. This is to overcome the probability to loose, at most, one packet during the switchover between K and E. A trivial solution in this case would be the “keep previous sample” technique. In this procedure, the actuator applies to the plant the previous action until a new control word is received.
This switching mechanism is susceptible to failure. For example, if the inter-communication between K and E fails, E will assume that K has failed and hence, will take over its tasks. Such a conflict will cause a system failure. More details about unsuccessful reconfiguration can be found in . Furthermore, the reconfiguration process in control systems is covered in .
Since the success of the reconfiguration process is not guaranteed, it has to be taken into account in the reliability model. In the literature, the probability of successful detection/reconfiguration is called coverage [24,25,28]. The coverage is a parameter determined by the user and incorporated in reliability/availability models. It is known that a small mistake in the calculation of the coverage can lead to misleading reliability/availability estimations . Also, system reliability is expected to decrease with a decrease in coverage.
A reconfiguration scheme is described next that aims at reducing the effect of coverage on the reliability of the control function Rcontrol-Ft.
Details of the New Scheme
Let K1 and
Figure 2 shows this Markov model. The name of any state indicates the operational components in that state. Remember that λE is the failure rate of the Entertainment Server and λK is the failure rate of the Controller. The initial state is
Figure 2. Markov model.
In state 2E, both Controllers have already failed and one the Entertainment Servers is controlling both wagons. If either E1 or E2 fails, the control function is switched to the remaining operational server. Here again, the coverage is involved in the transition as shown in Figure 2.
The system can be described by the Chapman-Kol- mogorov equations. The row vectors and are obvious and the transition matrix T is as shown above.
Given that the system starts in state
4. Efficiency of the New Scheme
In this section, it is proved that the novel reconfiguration scheme described above, increases system reliability (Rcontrol-FT). Conventionally, the entire system undergoes reconfiguration in the event of a controller/Entertainment Server failure. Such a system would be modeled by the CTMC depicted in Figure 3. In this model, it is assumed, for simplicity, that λK = λE = λ. Consequently, the failure of any of the four controllers/servers may cause a system failure with a probability (1-c), where c is the coverage parameter discussed above. Any state in Figure 3 indi- cates the number of operational components in that state. A component can be a controller or an entertainment server. This is why the initial state is called “
The use of Ethernet in Railway Networked Control Sys- tems at the sensor/actuator level is a relatively new research
Figure 3. Markov model for the conventional scheme.
Figure 4. Effect of coverage.
area. Despite the fact that Ethernet is a non-determinis- tic protocol, it was proven that it would not violate re- quired real-time delays. This concept has been applied in industrial automation as well as in automotive environ- ments before its use in trains.
This paper focuses on the fault-tolerant aspect of a Networked Control System (NCS) in two train wagons. All sensors, controllers and actuators are connected on top of a single Gigabit Ethernet network. Furthermore, wired and wireless entertainment loads are carried on top of the same control network. Reliability is expected to increase because controller failures do not necessarily cause system failure. However, error detection and sys- tem reconfiguration need to be successful in order to improve reliability. The coverage parameter quantita- tively describes the probability of successful error detec- tion and reconfiguration.
A novel fault-tolerant scheme is developed. This scheme aims at increasing the reliability of the control function in the presence of the coverage parameter. A Markov model is then used to calculate system reliability. This reliability is then compared to that of a conventional fault-tolerant scheme with coverage. It is proven that the proposed scheme has a higher reliability. All results were compared to estimates produced by the SHARPE soft- ware package and were found to be identical.
 F. L. Lian, J. R. Moyne and D. M. Tilbury, “Performance Evaluation of Control Networks: Ethernet, ControlNet a- nd DeviceNet,” Technical Representative, UM-UM-ME AM-99-02, Feb.1999. http://www.eecs.umich.edu/~impact
 J. Nilsson, “Real-Time Control Systems with Delays,” PhD thesis, Department of Automatic Control, Lund Institute of Technology, Lund, Sweden, 1998.
 S. H. Lee and K. H. Cho, “Congestion Control of High-Speed Gigabit-Ethernet Networks for Industrial Applications,” Proceedings of the IEEE ISIE, Pusan, Korea, June 2001, pp. 270-275.
 J. S. Meditch and C. T. A. Lea, “Stability and Optimization of the CSMA and CSMA/CD Channels,” IEEE Transactions on Communications, Vol. 31, No. 6, June 1983, pp. 763-774. doi:10.1109/TCOM.1983.1095881
 P. Pedreiras, L. Almeida and P. Gai, “The FTT-Ethernet protocol: Merging Flexibility, Timeliness and Efficiency,” Proceedings of the IEEE Euromicro Confer- ence on Real-Time Systems ECRTS, Vienna, Austria, June 2002.
 “EtherNet/IP Performance and Application Guide,” Al- len-Bradley, Rockwell Automation Application Solution.
 B. Lounsbury and J. Westerman, “Ethernet: Surviving the Manufacturing and Industrial Environment,” Allen- Bradley white paper, May 2001.
 ODVA, “Volume 1: CIP Common.” http://www.odva.org/10_2/03_events/03_ethernet-homepage.htm
 ODVA, “Volume 2: EtherNet/IP Adaptation on CIP,” Available: http://www.odva.org/10_2/03_events/03_ethernet-homepage.htm
 J. D. Decotignie, “Ethernet-Based Real-Time and Industrial Communications,” Proceedings of the IEEE, Vol. 93, No. 6, June 2005.
 R. M. Daoud, H. M. ElSayed, H. H. Amer and S. Z. Eid, “Performance of Fast and Gigabit Ethernet in Networked Control Systems,” Proceedings of the IEEE Midwest Symposium on Circuits and Systems MWSCAS,
 R. M. Daoud, H. M. ElSayed and H. H. Amer, “Gigabit Ethernet for Redundant Networked Control Systems,” Proceedings of the IEEE International Conference on Industrial Technology ICIT,
 R. M. Daoud, H. H. Amer, H. M. ElSayed and Y. Sallez, “Ethernet-Based Car Control Network,” Proceedings of the Canadian Conference on Electrical and Computer Engineering CCECE, Ottawa, Canada, May 2006, pp. 1031-1034.
 N. Navet, Y. Song, F. Simonot-Lion and C. Wilwert, “Trends in Automotive Communication Systems,” Proceedings of the IEEE, Vol. 93, No. 6, June 2005.
 M. Aziz, B. Raouf, N. Riad, R. M. Daoud, and H. M. ElSayed, “The Use of Ethernet for Single On-board Train Network,” Proceedings of the IEEE International Conference on Networking, Sensing and Control ICNSC, Sanya, China, April 2008.
 M. Hassan, S. Gamal, S. N. Louis, G. F. Zaki and H. H. Amer, “Fault Tolerant Ethernet Network Model for Control and Entertainment in Railway Transportation Systems,” Proceedings of the Canadian Conference on Electrical and Computer Engineering CCECE, Niagara Falls, Canada, May 2008, pp. 771-774.
 M. Hassan, R. M. Daoud and H. H. Amer, “Two-Wagon Fault-Tolerant Ethernet Networked Control System,” Proceedings of the Applied Computing Conference,
 H. Kitabayashi, K. Ishid, K. Bekki and M. Nagasu, New Train Control and Information Services Utilizing Broadband Networks, 2004, Available: www.hitachi.com
 A. Dean, “Embedded Communication Network Pitfalls.” http://www.embedded.com/97/fe30709.htm
 Trains reference list, Siemens AG Transportation systems trains,
 Train Communication Network, IEC 61375, International Electrotechnical Committee,
 H. Kirrmann and P. A. Zuber, “The IEC/IEEE Train Communication Network,” ABB Corporate Research, Baden, Switzerland, Mar/Apr 2001.
 D. P. Siewiorek and R. S. Swarz, “Reliable Computer Systems—Design and Evaluation,” AK Peters, Natick, Massachusetts, 1998.
 K. S. Trivedi, “Probability and Statistics with Reliability, Queuing, and Computer Science Applications,” Wiley, New York, 2002.
 H. H. Amer and R. M. Daoud, “Parameter Determination for the Markov Modeling of Two-Machine Production Lines,” Proceedings of the International IEEE Conference on Industrial Informatics INDIN, Singapore, August 2006, pp. 1178-1182.
 M. Blanke, M. Kinnaert, J. Lunze and M. Staroswiecki, “Diagnosis and Fault-Tolerant Control,” Springer-Verlag,
 T. F. Arnold, “The Concept of Coverage and its Effect on the Reliability Model of a Repairable System,” IEEE Tra- nsactions on Computers, Vol. C-22, No. 3, March 1973. doi:10.1109/T-C.1973.223703
1IEEE 802.3 Standard.
2IEC 61784-1,2 available at: www.iec.ch.
3Official site of OPNET: www.opnet.com.
4Official site of SHARPE: http://sharpe.pratt.duke.edu