Transient Solution of Markov Model for Fault Tolerant System with Redundancy
INTRODUCTION
For the up gradation of reliability and performance of a machining system, the provision of fault tolerance measures, namely maintenance as well as redundancy during the design and operating stages, is required. The inventory of spares has been made available in many realtime machining systems to increase the reliability/availabil ity and system capability as well as to produce the desired output. From the system designer’s point of view, the knowledge of optimal number of spares and repairmen, operative problems and scheduling of the machines can be helpful to achieve the mission reliability. The lifetime and repair attributes of the faulty components should be examined so as to predict the performance metrics for the better design and grade of service of the concerned machining system operating in fault tolerance environment. Therefore, the system engineer attempts to find the size of repair crew and standby units to keep the fault tolerant machining system (FTMS) in working operation despite of some components’ failures and can achieve the desired level of production.
The machine repair system (MRS) has the physical limitation as far as waiting space is concerned. When a queue of machines waiting for repair reaches to a certain length, one can stop the joining of failed machines in the MRS until a waiting space is made available by the repaircompletion. When a machine stops while working in the system, it causes the loss of production. Therefore, to maintain the continuity/regularity of the production, the necessity of redundancy as well as maintenance is felt. The repairmen can help in removing and repairing of the faulty units as and when required so that the same can be further used. Many queue theorists and practitioners have contributed their valuable research works in this direction (cf.. Baker, 1973, Gaver and Lehoezky, 1977; Chiang and Niu, 1981; Goel et al., 1985; Agnihothri, 1989; Gupta, 1995; Moustafa, 1998; Wang and Ke, 2000; Jain et ah, 2004; Chikara et ah, 2006; Jain et ah, 2009; Shinde et ah, 2011; Jain, 2013; Jain et ah, 2016; Jain and Gupta, 2018; Kumar et ah. 2019). Jain et ah (2020) investigated performance metrics of FTMS which can operate in spite of unavailability of server due to service interruption causes by the vacations and breakdowns.
When the inflow of the failed machines and repair rates are influenced by the status of the repair station and already present workload in the system, the state dependent rates can be used to describe the queueing model of machining system. Some efforts by eminent queue theorists in this direction have also been made (cf. Jain. 1997a. b; Jain and Baghel, 2001; Jain et ah, 2008; Jain and Chauhan, 2010; Singh et ah. 2013). The broken down machines demanding repair may balk and/ or renege without getting repair due to heavy workload or other reason. Some of the eminent queue theorists who studied the machine repair problems with state dependent rates are Blackburn (1972), Alseedy (1992), Jain and Lata (1994), Jain et ah (2000, 2004), Singh and Jain (2007), Jain and Mittal (2008), Maheshwari et ah (2010), Jain et ah (2014), Dhakad and Jain (2016), and Sharma et ah (2017). The state dependent queueing system operating under control Fpolicy was studied by Jain and Sanga (2019). They have discussed the specialized scenarios concerned with machine repair system and timesharing system so as to explore the sensitiveness of the system descriptors with respect to indices established.
The main objective of this chapter is to develop Markov queueing model for FTMS having provision of standbys and multirepairmen by considering realistic features, namely state dependent rates. The description of the model is presented in section 4.2. The governing equations for transient states and their solution by using matrix method are given in section 4.3. Some system indices as well as cost function are obtained in section 4.4. Section 4.5 facilitates the sensitivity analysis to demonstrate the significant impact of key system descriptors on the performance indices. In section 4.6, concluding remarks are given.
MATHEMATICAL FORMULATION
Consider the fault tolerant system having a group of Q operating and W cold standby machines. We denote the total machines in the FTMS by T = Q + W. The FTMS has repair crew having s repairmen. It is assumed that the life time and repair time of failed machines follow Markov property. The service facility provides repair according to FCFS basis. The failure rates of operating machines are equal to X, while p is the repair rate of each failed machine. The constant parameters a and b show the degree of the reneging and balking functions, respectively. Two different scenarios can arise.
4.2.1 Case I: s< W
The effective failure rates and repair rates for the system are considered as follows: and
The governing equations for different transient states are obtained as:
4.2.2 Case II: W
In this case, the birth and death rates are given by: and
In this case, the governing equations are given by:
4.2.3 The Limiting Case
When t tends to infinity, i.e., for the steady state we denote the state probabilities by n; = Limit;(?). In this case, using recursive approach, the above Eqs. (4.3)(4.9) and
(4.12)—(4.18) provide the explicit product form solution as follows: where
4.3 THE TRANSIENT SOLUTION
Laplace transforms of Eqs. (4.3)(4.9) can be written in matrix equation by:
Here, C(a) denotes [T + l)x(T н1) real symmetric tridiagonal matrix.
Here dash (') denotes the transpose of the matrix.
Laplace transform of set of Eqs. (4.12)—(4.18) yield
Here D(a), is TxT tridiagonal matrix that can be constructed by leaving the first row and first column of C(a).
After applying Cramer’s rule, we obtain matrices C_{k} (a) and D_{t} (a) from Eqs. (4.20) and (4.23), respectively, by substituting the k'^{1}' column by the RHS unit vector. Now,
and
Since determinants C(a) and D(a) have real and distinct zeros, we can rewrite p_{k}(a) and q_{k}(a) in partial fraction form. Also, we have
where
and
It is noted that the zeros of polynomials y(a) and S(a) are the negative eigen values of [y(0)J and [8(0)J. Let y_{:}, y_{2}.....,Ут and 8_{t}, 8_{2},.... 8_{T} be the eigen values of
[y(0)J and [S(0)J, respectively. Thus:
where y, and 8, (/ = 1, 2.. ..,7') are all negatives.
Hence:
Breaking into partial fractions, the RHS of Eqs. (4.31) and (4.32) yield
and
where
and
PERFORMANCE INDICES
The inverse LT of Eqs. (4.33) and (4.34) provide
and
We notice that the system reliability in terms of probabilities is
The mean time to system failure (MTTF) is given as
Some system indices, namely, (i) mean number of operating machines E(Q), (ii) mean number of cold standby machines E( W), (iii) mean number of busy repairmen E(B), (iv) mean number of idle repairmen E(I), and (v) operating utilization OU are obtained by:
Further, we develop a cost function to obtain the optimum combination of repairmen and spares. The various cost elements per unit time are taken as given below:
 • C_{H}: Expenditure incurred in holding the broken down operating machines.
 • C_{w}: Expenditure incurred in case of broken down standby machines.
 • C_{s}: Expenditure incurred on the busy repairmen.
 • C, Expenditure incurred on the idle repairmen.
The total expected (profit) cost is obtained as:
SENSITIVITY ANALYSIS
For the sensitivity of key descriptors on the system indices, we compute numerical results, which have been shown in Table 4.1 and Figures 4.14.8. For computation, we set the default parameter values as Q = M = 7, IV = S = 5, s = R = 3, X = 0.2. i = l,(e, b) = (0.2,0.4), C„ = 12, C_{w} =16, C_{B} = 20, C, = 30.
TABLE 4.1
Various Performance Measures by Varying W for Q = 20
and (a, b) = (0,0)
4 
0.1 
2.83 
2.26 
1.61 
119.51 
0.3 
7.75 
0.24 
2.90 
129.26 

0.5 
12.41 
0.01 
3.00 
213.61 

0.7 
15.82 
0.00 
3.00 
281.46 

6 
0.1 
2.83 
4.18 
1.61 
169.74 
0.3 
8.09 
0.78 
2.90 
118.29 

0.5 
13.33 
0.04 
3.00 
193.13 

0.7 
17.12 
0.00 
3.00 
267.44 

8 
0.1 
2.83 
6.17 
1.61 
224.79 
0.3 
8.28 
1.74 
2.90 
124.01 

0.5 
14.08 
0.14 
3.00 
172.10 

0.7 
18.32 
0.00 
3.00 
251.23 
In Table 4.1. we demonstrate the results for the mean number of broken dowrn machines £(£?), mean cold standby machines £(VP), mean number of busy repairmen E(B) and average cost E(C) by varying the number of spares (VP) and failure rate (X) for parameters (a,b) = (0,0). We notice that as X increases, £(VP) decreases, but E(Q) and E(B) increase. We also see that as VP attains higher value, E(Q), £(VP), E{B) and £(C) ramp up. Figure 4.1(ac) demonstrates the effect of X on the reliability for(a,f>) = (0.5,0.3), (a,fo) = (l, 1) and (a,b) = (1.5,1.5), respectively. We note that when the failure rate (A.) increases, the reliability decreases. As time passes, the reliability seems to lower down in the starting but later on it becomes constant. For particular values of X, the reliability has the low'er value for (a.b) = (0.5.0.3) in comparison to (a,b) = (1,1) and (a,b) = (1.5,1.5). It is observed that the reliability for (a,b) = (1,1) is higher than that in the case (a,b) = (1.5,1.5).
The impacts of increments in operating units on the total cost CT(t) for (a,b) = (0,0), (0.5, 0.3) and (1.5, 1.5) are shown in Figure 4.2(ac), respectively. We see that as the number of operating machines and time “t” increase, the reliability decreases. For constant values of number of operating machines and t, the reliability reveals the increasing trend.
Figure 4.3(a) and (b) show's the sensitivity of standby units on the reliability for (a,b) = (0.5,0.3) and (a,b) = ( 1,1), respectively. We notice that the reliability takes higher values as standby units increase but decrease as time grows up. Furthermore, reliability for (a,b) = (0.5,03) is larger than that for (a,b) = ( 1,1). Figure 4.3(c) displays the effect of spare part on reliability when the parameters (a,b) = (1.5,1.5). We note that the reliability for VP = 5 = 2 is almost constant, i.e., 1. We also see that as standby units increase, the reliability decreases, but as time increases, reliability diminishes and finally takes almost constant value as time t grows up.
FIGURE 4.2 (a) Effect of number of operating machines on the reliability for(«, b) = (0.5.0.3); (b) effect of number of operating machines on the reliability for (a,b) = (1.1); (c) effect of number of operating machines on the reliability for (a,b) = ( 1.5.1.5).
FIGURE 4.3 (a) Effect of number of standby machines on the reliability for(a,ft) = (0.5,0.3); (b) effect of number of standby machines on the reliability for(a,b) = (l, 1); (c) effect of number of standby machines on the reliability for (a,Z>) = (1.5, 1.5).
FIGURE 4.4 Effect of repair rate (p) on the reliability for (a,b) = (0.5,0.3).
Figures 4.4 and 4.5 are plotted to reveal the impact of repair rate (p) on the reliability for (a,b) = (0.5,0.3) and («,b) = (1.5,1.5), respectively. We see that the reliability increases with repair rate (p); the reliability for (a,b) = (0.5.0.3) has lower values in comparison to (a,£) = (1.5,1.5). Figure 4.6 represents the reliability for (a,b) = (0.5,0.3) for different values of the number of repairmen.
Figures 4.7 and 4.8 display the effects of the number of spares and repairmen, respectively, on the total cost function. The total cost exhibits the convexity with respect to both the number of spares and repairmen, and minimum cost is achieved for 5 = 6 and R = 3 as shown in Figures 4.7 and 4.8, respectively.
FIGURE 4.5 Effect of repair rate (p) on the reliability for (a,b) = (1,1).
FIGURE 4.6 Effect of number of repairmen on the reliability for (a,b) = (1.5, 1.5).
FIGURE 4.7 Expected cost by varying the number of standbys.
FIGURE 4.8 Expected cost by varying the number of repairmen.
CONCLUSION
In the concerned chapter, transient analysis of Markov model for fault tolerance machining system having state dependent rates as well as cold standby support has been facilitated. Some performance indices, namely, average counts of broken down machines, average counts of spares, average counts of busy/idle repairmen and operating utilization, etc. are provided. Our study will provide a valuable support for the performance prediction and estimation of inventory of spares and repairmen crew. The sensitivity analysis conducted demonstrates the usefulness of investigation done for the system engineers to find out the appropriate combination of repairmen and spares at optimum cost.
REFERENCES
Agnihothri, S. R. 1989. Interrelationship between performance measures for the machine repairmen problem. Naval Research Logistic. 36:265271.
ALSeedy. R. O., 1992. The service Erlangian machine interference with balking. Microelectronics Reliability. 32:705710.
Baker. K. N. 1973. A note on operating policies for the queue M/M/l with exponential startups. INFOR: Information Systems and Operational Research. 1:7172.
Blackburn, J. D. 1972. Optimal control of a single server with balking and reneging. Management Science. 29:307319.
Chiang, D. T. and Niu. S. C. 1981. Reliability of a consecutive KoutofN: G system. IEEE Transaction on Reliability. 30:8790.
Chikara. D„ Jain, M. and Baghel, К. P. S. 2006. Interdependent machine repair problem with controllable arrival rates and spares. Acta Ciencia Indica Mathematica. 32:569574.
Dhakad, M. R. and Jain. M. 2016. Finite controllable Markovian model with balking and reneging. International Journal of Science Technology and Engineering. 2:3645.
Gaver, D. P. and Lehoezky, J. P. 1977. A diffusion approximation solution for repairmen problem with two types of failures. Management Science. 24:7181.
Goel, L. R., Gupta. R. and Singh. S. K. 1985. Cost analysis of a two units cold standby system with two types of operation and repair. Microelectronics Reliability. 25:7175.
Gupta, S. M. 1995. Interrelationship between controlling arrival and service in queueing systems. Computer and Operations Research. 22:10051014.
Jain, M. 1997a. (m, M) machine repair problem with spares and statedependent rates: A diffusion process approach. Microelectronics Reliability. 37:929933.
Jain, M. 1997b. Optimal Npolicy for single server Markovian queue with breakdown, repair and state dependent arrival rate. International Journal of Management System. 13:245260.
Jain, M. 2013. Transient analysis of machining systems with service interruption, mixed standbys and priority. International Journal of Mathematics in Operational Research. 5:604625.
Jain, M. and Baghel. К. P. S. 2001. A multicomponent repairable system with spares and statedependent rates. Nepali Mathematical Science Report. 19:8192.
Jain, M. and Chauhan, D. 2010. Optimal control policy for state dependent queueing model with service interruption, setup and vacation. Journal of Informatics and Mathematical Sciences. 2:171181.
Jain, M. and Gupta, R. 2018. Npolicy for redundant repairable system with multiple types of warm standbys with switching failure and vacation. International Journal of Mathematics in Operational Research. 13:419449.
Jain. M. and Lata, P. 1994. M/M/R machine repair problem with reneging. Journal of Engineering and Applied Sciences. 13:139143.
Jain. M. and Mittal. R. 2008. Transient analysis of channel allocation in cellular radio network with balking and reneging. Journal of Computer Society of India. 38:1118.
Jain. M. Rakhee and Singh. M. 2004. Bilevel control of degraded machining system with warm standbys, setup and vacation. Applied Mathematical Modeling. 28:10151026.
Jain. M. and Sanga, S. S. 2019. Admission control for finite capacity queueing model with general retrial times and state dependent rates. International Journal of Mathematics in Operational Research. 144. doi: 10.3934/jimo.2019073.
Jain. M., Sharma. G. C.. Baghel. К. P. S. and Shinde, V. 2004. Performance modeling of machining system with mixed standby components, balking and reneging. International Journal of Engineering. 17:169180.
Jain. M., Sharma. G. C. and Rani, V. 2014. M/M/R+r machining system with reneging, spares and interdependent controlled rates. International Journal of Mathematics in Operational Research. 6:665679.
Jain. M., Sharma. G. C. and Sharma. R. 2008. Performance modeling of state dependent system with mixed standbys and two modes of failure. Applied Mathematical Modeling. 32:712724.
Jain. M., Sharma. G. C. and Sharma. V. 2009. Machine repair problem with two types of spares, set up and multiple vacations. Ganita. 60:111125.
Jain. M., Sharma. R. and Meena, R. K. 2020. Performance modeling of fault tolerant machining system with working vacation and working breakdown. Arabian Journal for Science Engineering. 44:28252836.
Jain. M., Shekhar. C. and Shukla, S. 2016. A timeshared machine repair problem with mixed spares under Npolicy. Journal of Industrial Engineering International. 12:145157.
Jain. M., Singh. M. and Baghel. К. P. S. 2000. M/M/C/K/N machine repair problem with balking, reneging, spares and additional repairman. GSR. 2627:4960.
Kumar, K.. Jain, M. and Shekhar, C. 2019. Machine repair system with Fpolicy. two unreliable servers and warm standbys. Journal of Testing and Evaluation. 47:123.
Maheshwari. S„ Sharma, P. and Jain, M. 2010. Machine repair problem with Кtype warm spares, multiple vacations for repairmen and reneging. International Journal of Engineering Science and Technology. 2:252258.
Moustafa, M. S. 1998. Transient analysis of reliability with and without repair for Kout ofN: G systems with M failure modes. Reliability Engineering and System Safety. 59:317320.
Sharma. Y. K., Sharma. G. C. and Jain. M. 2017. Analysis of a queueing model with balking for buffer sharing in ATM. Analysis. 13:111.
Shinde. V., Sharma. G. C. and Jain, M. 2011. A parallel system sustain by standby units with failure in bulk. Quality Control and Applied Statistics. 56:153154.
Singh, C. J. and Jain, M. 2007. (m.M) machine repair problem with spares and reneging. Pakistan Journal of Statistics. 23:2335.
Singh, C. J.. Jain, M. and Kumar. B. 2013. Analysis of unreliable bulk queue with state dependent arrival. Journal of Industrial Engineering International. 9:19.
Wang. К. H. and Ke, J. C. 2000. A recursive method to the optimal control of an M/G/l queueing system with finite capacity and infinite capacity. Applied Mathematical Modeling. 24:899914.