Information Processing and Security Systems
Information Processing and Security Systems Edited by
Khalid Saeed
Jerzy Pejas
Biatystock Technical University, POLAND
Technical University of Szczecin, POLAND
4L[ Springer
Khalid Saeed Bialystock Technical University, POLAND Jerzy Pejas Technical University of Szczecin, POLAND Library of Congress CataloginginPublication Data Information processing and security systems / edited by Khalid Saeed, Jerzy Pejas. p. cm. ISBN13: 9780387250915 (HB : alk. paper) ISBN10: 0387250913 (HB : alk. paper) ISBN10: 038726325X (ebook) ISBN13: 9780387263250 (ebook) 1. Computer networkSecurity measures. 2. Information storage and retrieval systemsSecurity measures. 3. ComputersAccess control. I. Saeed, Khalid. E. Pejas, Jerzy, 1954TK510559.1525 2005 005.8dc22
2005049008 © 2005 Springer Science+Business Media, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 9 8 7 6 5 4 3 2 1 springeronline. com
SPIN 11352075
Table of Contents Preface
ix
Foreword
xi
PART I  DIGITAL IMAGE AND SIGNAL PROCESSING Fourier DescritporBased Deformable Models for Segmentation of the Distal Femur in CT Eric Berg, Mohamed Mahfouz, Christian Debrunner, Brandon Merkl, William Hoff Hierarchical Segmentation of Sparse Surface Data Using EnergyMinimization Approach RaidAlTahir Interactive Realtime Image Analysis System for Distant Operation Mahinda P. Pathegama, Ozdemir Gol
3
13
23
Analysis/Synthesis Speech Model Based on the PitchTracking PeriodicAperiodic Decomposition Piotr Zubrycki, Alexander A. Petrovsky
33
Bioinspired voice activity detector based on the human speech properties in the modulation domain A.Shadevsky, A.Petrovsky
43
A New Step in Arabic Speech Identification: Spoken Digit Recognition Khalid Saeed, Mohammad K. Nammous
55
Split Vector Quantization of Psychoacoustical Modified LSF Coefficients in Speech Coder Based on PitchTracking PeriodicAperiodic Decomposition Alexander Petrovsky, Andrzej Sawicki, Alexander Pavlovec New results in 3D views of polyhedron generation on view sphere with perspective projection M. Fry die r, W.S. Mokrzycki
67
77
GramSchmidt OrthonormalizationBased Color Model for Object Detection Mariusz Borawski, Pawel Forczmanski Eyes detection with motion interpretation Arkadiusz Kogut
87
95
Financial Distress Prediction Using Different Pattern Recognition Methods Wiestaw Pietruszkiewicz, Leonard Rozenberg
103
Genetic algorithms applied to optimal arrangement of collocation points in 3D potential boundaryvalue problems Eugeniusz Zieniuk, Krzysztof Szerszen, Agnieszka Boltuc
113
PART II  COMPUTER SECURITY AND SAFETY Fast Computation of Approximation Tables Krzysztof Chmiel
125
Cryptographic Properties of Some Cryptosystem with Modulation of the Chaotic Sequence Parameters 135 Stefan Berczynskil, Yury A. Kravtsov, Jerzy Pejas, Adrian Skrobek Keys distribution for asymmetric cryptographic systems Eugeniusz Kuriata
149
Twopattern test generation with low power consumption based onLFSR Miroslaw Puczko, Wiaczeslaw Yarmolik
159
Unauthorized servers for online certificates status verification Witold Mackow, Jerzy Pejas
167
Micropayments with Privacy  a New Proposal for Ecommerce Krzysztof Szczypiorski, Aneta Zwierko, Igor Margasinski
175
A modelbased approach to analysis of authentication protocols Janusz Gorski, Marcin Olszewski
187
Accessibility of information in realtime systems Tomasz Hebisz and Eugeniusz Kuriata
197
Protocol for Certificate Based Access Control Policies Description Language Jerzy Pejas, Pawel Sukiennik
207
Impact of the address changing on the detection of pattern sensitive Faults Bartosz Sokol, Ireneusz Mrozek, Wiaczeslaw Yarmolik
217
Software IP Protection Based on Watermarking Techniques Vyacheslav Yarmolik, Siarhei Partsianka
227
Probabilistic Analysis of Operational Security for Network Systems Jolanta Koszelew
235
Quality of Service Requirements in Computer Networks with Blocking Walenty Oniszczuk
245
PART III  ARTIFICIAL INTELLIGENCEORIENTED TRENDS AND APPLICATIONS Genetic BDDoriented Pattern Classifiers Witold Pedrycz, Zenon A. Sosnowski
257
A fuzzy way to evaluate the qualitative attributes in bank lending creditworthiness Gisella Facchinetti  Giovanni Mastroleo
269
Multidimensional Systems, Signals, Circuits, and Repetitive Processes: Theory, Applications, and Future Trends K. Galkowski, A. Kummert
283
Modelling using probabilistic algorithms A. Borowska, W. Danko, Joanna KarbowskaChilinska
303
Fuzzy Parametric Integral Equations System in modelling of polygonal potential boundary problems described by the Laplace equation Eugeniusz Zieniuk, Andrzej Kuzelewski
317
From Integrated Circuits Technology to Silicon Grey Matter: Hardware Implementation of Artificial Neural Networks Kurosh Madani
327
A Tiny Flatisland in a Huge Lake — How can we search for it if completely fiatland elsewhere? Akira Imada
353
A Soft Computing Based Approach Using SignalToImage Conversion for Computer Aided Medical Diagnosis (CAMD)
365
A Soft Computing Based Approach Using SignalToImage Conversion for Computer Aided Medical Diagnosis (CAMD) Amine Chohra, Nadia Kanaoui, V. Amarger
365
The prediction of behaviours of chaotic dynamical systems in 3D state space M. Pankiewicz, R. Mosdorf
375
Idiotypic Networks as a Metaphor for Data Analysis Algorithms Slawomir T. Wierzchon
389
Global learning of decision trees by an evolutionary Algorithm Marek Krqtowski, Marek Grzes
401
Ships' domains as collision risk at sea in the evolutionary method of trajectory planning Roman Smierzchalski
411
Inputs' Significance Analysis with Rough Sets Theory Izabela Rejer SemiMarkov process in performance evaluation of asynchronous processors Wojciech Kadlubowski Organization of the modeling and simulation of the discrete processes Emma Kushtina, Alexandre Dolgui, Bartlomiej Malachowski
423
433
443
The Jeep Problem, searching for the best strategy with a genetic algorithm Przemyslaw Klqsk
453
Evaluation of operation and state of an object using artificial intelligence tools Henryk Piech, Aleksandra Ptak, Marcin Machura
465
Preface This book is based on the most recent results of collaborative research in the field of Information Processing Systems conducted by both university professors and young computer scientists. The extensive research has yielded some novel findings that might be directly employed in further technological innovations. The work reveals groundbreaking scientific achievements and indicates their practical applications. The topics have been selected so as to cover three basic fields in Computer Science. The contents encompass three parts. Part I contains twelve chapters on Digital Image and Signal Processing. Part II consists of thirteen chapters on Computer Security and Safety. Part III includes seventeen chapters dealing with Artificial IntelligenceOriented Trends and their Applications. Throughout the book, a great emphasis is placed on theory as well as practice. The contributions not only reflect invaluable experience of eminent professors in relevant areas but also point out new methods and approaches developed by computer scientists and researchers. Most of the contributions are extended versions of original papers on the topics mentioned above, which were introduced at recent international conferences. These works were reviewed by two or three referees, who recommended publishing them in their modified versions. We expect that this book will throw some new light on unresolved problems and will inspire the reader to greater challenges. Hopefully it will be an effective tool for both senior and young researchers.
Editors, K. Saeed and J. Pejas
Foreword This book outlines new trends that have emerged in scientific research in the field of Computer Science. It provides a forum of academic works and researches conducted by both distinguished and young computer scientists. It is a pleasure to recognize the fact that a great number of academic teachers working for the Faculty of Engineering in Elk at the University of Finance and Management in Bialystok have appeared to be actively committed to research studies. Undoubtedly, their remarkable efforts, which are presented in this book, make a valuable contribution to science. This ascertains the fact that the newly founded academic core, which Elk became as late as the beginning of XXI century, constantly raises its educational standard. This book is a result of a fruitful collaboration between the Department of Computer Engineering in the Faculty of Engineering in Elk and experienced Departments of Computer Science in Bialystok University of Technology and Technical University of Szczecin. I would like to extend my sincere congratulations to the editors of the book, Dr Khalid Saeed and Dr Jerzy Pejas, on their joint endeavor in producing a work at such a high level.
Professor JozefSzablowski, President University of Finance and Management in Bialystok
PARTI Digital Image and Signal Processing
Fourier DescritporBased Deformable Models for Segmentation of the Distal Femur in CT Eric Berg1, Mohamed Mahfouz2'3, Christian Debrunner1, Brandon Merkl1, William Hoff1 1
Colorado School of Mines
Golden, Colorado, USA, email: {eberg,cdebrunn,bmerkl,whoff}@mines.edu 2
University of Tennessee
Knoxville, Tennessee, USA, email:
[email protected] 3
Oak Ridge National Laboratories
Oak Ridge, Tennessee, USA
Abstract: Anatomical shapes present a unique problem in terms of accurate representation and medical image segmentation. Threedimensional statistical shape models have been extensively researched as a means of autonomously segmenting and representing models. We present a segmentation method driven by a statistical shape model based on a priori shape information from manually segmented training image sets. Our model is comprised of a stack of twodimensional Fourier descriptors computed from the perimeters of the segmented training image sets after a transformation into a canonical coordinate frame. Our segmentation process alternates between a local active contour process and a projection onto a global PC A basis of the statistical shape model. We apply our method to the segmentation of CT and MRl images of the distal femur and show quantitatively that it recovers bone shape more accurately from real imagery than a recently published method recovers bone shape from synthetically segmented imagery. automatic 3D image segmentation, Fourier shape descriptors, principal Keywords: components analysis, statistical shape model, active contours, snakes.
1.1 Introduction Current methods in threedimensional image segmentation typically employ statistical shape models, first developed by Cootes and Taylor [1] as a means to
incorporate a priori shape information. A statistical shape model is trained by a considering the canonical parameterization of a set of similar shape instances, or training shapes. Performing a principal components analysis (PCA) on the parameterized shapes highlights the statistical modes of variation in the shape, allowing for a possible reduction in the dimension of the model shape space, discussed in more detail in section 2.3. A wellconstructed statistical model will provide a shape constraint on subsequent segmentations of new images. The primary original contributions of this work are our segmentation algorithm and our statistical shape representation. The segmentation algorthm iterates two steps: a local active contour fitting of shape curves on slices through the dataset, and a global projection of the resulting shape curves onto the PCA basis of our statistical shape model. Prior methods (see e.g., [5]) simultaneously optimize for both local (image) constraints and global (statistical model) constraints. Our method allows us to include a final local (active contours) optmization step, which finds a solution close to (in the image space) but not in the model subspace determined by PCA. Our results show that this final local optimization substantially improves the accuracy of the resulting model. Our statistical shape representation is based on the principal components of the contour shape estimated on the slices, but PCA is not applied directly to the contour shape. Instead, the contour shapes are represented with Fourier descriptors and the smoothed by removing the high frequency components, and PCA is applied to the Fourier descriptors collected from the entire bone. This approach reduces the dimensionality of the PCA and restricts the statistical model to smooth bone surfaces. For the purpose of developing and testing our method, we modeled the human distal femur from 19 sets of 3D images generated by both CT and MRI image modalities (15 CT, 4 MRI); the datasets include both left and right femurs, so we mirrored the left femur across the midsagittal plane as was done in [2, 3], creating additional samples of the right femur.
1.2 Approach In this section, we describe how the statistical models are computed from images and how these models are used in our segmentation approach.
Information Processing and Security Systems
1.2.1 Image Preprocessing For training and evaluation purposes, each of our nineteen 3D image volumes is manually segmented to model the distal section of the femur. The resulting binary images are rigidly registered to a coordinate system defined in terms of reference points on the bone. We have built a User Interface (UI) to aid the user in both initial alignment and to visualize the segmentation process. To perform this alignment the user first loads an image data set and specifies the voxel size. The user can then visualize the image data as one or more orthogonal planes (axial, coronal, or sagittal) or as an entire volume, when used with volume visualization hardware such as VolumePro. The user then loads the appropriate mean model and using visual manipulators shown below in Fig. 1, can scale, rotate or translate the model into alignment with the image data.
1.2.2 Fourier Descriptors Based on the manual segmentation, we compute binary images at slices of constant zvalue in the registered volume. Each binary image contains at least one region representing the bone crosssection; in the case of images intersecting the condyles, two distinct regions may be present. The regions on each slice are closed contours, allowing for a Fourier descriptor (FD) [6,7] representation to be computed. A more recent application of FDs involves medical image segmentation [5], where the combined optimization approach discussed in the introduction is used. The most significant features of the contour are captured in the lower frequency terms of the FDs, so we can produce a more concise description of the contour by eliminating the highfrequency terms. We found through experimentation that 32 coefficients sufficiently describe the shape of all distal femur crosssections.
Fig. 1. Screenshot of User Interface.
Fig. 2. Model of the distal femur demonstrating the slice structure. Note that the slices shown are a subsampling of the full set of slices
1.2.3 Shape Model Representation and Construction Each femur model is composed of M contour slices parallel to the axial plane and stacked in the zdirection (Fig. 2). Each of the M contours is described by 32 FDs. A contour is represented by a single vector formed by concatenating the real and imaginary parts of the FD coefficients that represent it. The model vector m is then formed by concatenating the M contour vectors to form a vector of length n = 64 M . The modes of variation among the training models are captured by a PCA. Any shape instance in the training set can be represented by the mean model and a linear combination of the eigenvectors [8]. Typically, a model can be accurately approximated by t < T principal components (PCs), corresponding to the t largest eigenvalues. A model is then approximated as, (1) where bt is the coefficient multiplying the ith PC ph and m is the mean of the training model vectors. The /?, form the principal basis, or shape space of the model. Thus, our representation of the statistical model consists of the mean model vector and the vectors corresponding to the first t PCs. Section 3 will provide more insight into the effects of model reduction.
Information Processing and Security Systems
1.2.4 3D Image Segmentation A 3D image volume that is not included in the training set can be segmented by iteratively deforming the FD statistical model until it closely matches the correct features in the 3D image set. The statistical model ensures a segmentation consistent with the training models. Prior to segmentation, the new image set must be registered by hand to the coordinate frame as described in Section 2.1. A search for the optimal segmentation consists of separate local and global optimizations; a local search (unconstrained by the statistical model) is performed independently on each slice using an active contours algorithm, and a global search takes the form of a projection of the FDs of the active contours solutions onto the principal basis of the statistical model. Since the two searches are forced to intersect periodically, the constraints from both optimization functions are imposed. The approach is outlined below in Algorithm 1. Most previous work in this area involves minimizing a single objective function that combines terms driven by the image data and terms driven by the prior shape information. This approach will often result in a solution that is not the true global minimum, but a spurious local minimum. Our algorithm also allows the final solution to lie outside the principal component space. This is important since one cannot otherwise extract new models that extend beyond the training set. By separating the local and global searches, we can find solutions that the combined optimization cannot find. Algorithm 1. Segmentation procedure 1) Initialize active contours (see Local Search) based on the mean model, m 2) Allow active contours to deform independently on each image for h iterations 3) Convert active contours to Fourier descriptors 4) Project FDs onto principal basis 5) Form a new model from the projected FDs 6) Check Equation (2) for convergence (see below) a) if A < threshold, repeat steps 2 and 3 only b) if A > threshold, repeat steps 26 with the new model from step 5 as the new initialization for active contours The alternating process of active contour deformation and shape space projection continues until the following convergence function reaches some empirically determined threshold, or n > 1,
(2)
where A represents the change in the model parameters from one iteration to the next. When this squared sum is less than the specified threshold, the model is assumed to have converged on a shape space solution. One additional optimization of the active contours is performed after this convergence. Local Search The local search for a solution occurs in the individual 2D image slices using active contours. Active contours, originally developed by Kass et al. [4] employ an energy minimization approach to detect features in images such as lines and edges. The technique typically works best with images that have clearly defined edges with minimal noise; otherwise a good initialization is required for an active contour to converge on an acceptable solution. In the case of shape modeling, we have a good initialization provided by the model information from the previous iteration, provided by steps 1 or 6b in Algorithm 1. The active contours method minimizes an energy functional consisting of an internal energy dependent on the spacing of and bend between the contour points, and an external energy dependent on the magnitude of the image gradient under the contour. The combined internal and external energies tend to drive the contour toward the image gradients, while maintaining control over the shape of the contour. Global Search The global search for a solution occurs in the principal space by computing the parameters b\ in Equation (2) by projecting the FDs from step 3 in Algorithm 1 onto the principal basis. This has the effect of constraining the active contours solution to be in the shape space, thus from one iteration to the next, the local solution is projected into shape space to find a global solution. After several iterations, the parameters b{ will vary negligibly, indicating that a global solution has been found. As previously discussed, this may not represent the optimal solution since the actual solution might fall slightly outside the shape space. With this optimal global solution as an initialization for one final local optimization, ie. an additional active contours adjustment, a solution that is close to the shape space, but optimized to fit the 3D image information can be found.
1.3 Results In order to verify the ability of our modeling and segmentation techniques to extract anatomical features from medical images, we performed experiments with the 19 CT and MRI datasets. The datasets are each segmented manually and the resulting binary images are transformed and resampled as described in section 2.1. For testing purposes we sample the image volumes to a voxel size of 0.75 x 0.75 x
Information Processing and Security Systems
3.00 mm, resulting in 40 images spanning the distal 120 mm of the femur. The original grayscale images are transformed and resampled by the same transformation as the binary images so that subsequent modelbased segmentations will be performed in the model domain. Each of the 19 image sets was autonomously segmented according to Algorithm 1 with a model defined via a leaveoneout approach (see e.g., [11]), where the remaining 18 manual segmentations are used as training shapes. To illustrate the effect of the number of principal components on modelbased segmentation, each of the 19 image sets is segmented multiple times, each time with a shape model defined by the remaining 18 datasets with a varying number of PCs (t = 1,2, ... , 18). This results in 19*18 = 342 total segmentations of the 19 datasets. We compare the autonomous modelbased segmentation results to the manual segmentation results to obtain a measure of the quality of the segmentation. This comparison is performed in the 3D image domain by computing the shortest 3D Euclidean distance between the two voxel "surfaces" for each voxel on the autonomously segmented set. For each of the 342 segmentations, we obtain a mean separation distance and a maximum, or worst case, distance. Ideally, these distances are zero for a perfect fit, assuming that the solution we seek is the same as the manual segmentation. We measured the accuracy of our system both with and without the final local optimization of step 6a in Algorithm 1 to determine if this final local optimization produced a better result than a solution constrained to be fully within the shape space. Figs. 3 and 4 show, respectively, the mean and maximum distance averaged 1.5
——
active contours final shape model final
9
12
1.0•
6
Number of principal components
Fig. 3. Mean separation distance between autonomously segmented 3D image volumes and corresponding manually segmented volume. The solid line represents the solution where the active contours are adjusted after the final shape space projection; the dashed line is the solution resulting from the final shape space projection.
10
6 • 5 • 4 • 3 2 
active contours final shape model final
1 0
t0
6
9
12
15
18
Number of principal components
Fig. 4. Maximum separation distance between autonomously segmented 3D image volumes and corresponding manually segmented volume. The solid line represents the solution where the active contours are adjusted after the final shape space projection; the dashed line is the solution resulting from the final shape space projection. over all segmentations as a function of the number of PCs in the statistical model. The two curves in each plot represent the solution with a final active contours optimization and the solution constrainted to lie in the shape space. The mean separation distance shown in Fig. 3 demonstrates that an increase in the number of PCs yields a solution closer to the manual segmentation. Additionally, the final active contours adjustment pushes the mean distance between the expected surface and the actual surface to approximately half of the mean distance found by constraining the solution to be fully within the shape space. We can see the same trend for the maximum separation. Note that the mean distance levels off between 11 and 12 PCs, indicating that an increase in the number of PCs beyond 11 or 12 may not improve the segmentation. Larger training sets may of course require more PC's for accurate represntation. Our best mean and maximum distance measures compare well to the best values presented in [3] and are summarized in Table 1. In fact, our best mean value is five times less than the best mean reported by Kaus et al. [3]. In this work the authors are modeling full femurs which are more variable than the distal femurs we are modeling. However we must also note that the authors of [3] are not solving the automated image segmentation problem, rather they are fitting a triangular mesh to manually segmented noiseless binary 3D images to form a surface model. We apply our method to the automated segmentation of noisy 3D
Information Processing and Security Systems
11
medical images and evaluate our results using the 3D distances between the automatically segmented model and a manually segmented model. The authors of [3] evalute their results using the 3D deviation between the automatically computed mesh (vertices) and the manually segmented binary 3D images (voxels). Thus while our method is solving the more difficult problem of automatic image segmentation, the measures for evaluating the results are similar enough that a comparison to their results is valid.
max Euclidean distance (mm)
mean Euclidean distance (mm)
Our results
3.35(13)
0.28(12)
Kaus et al
~ 4.50 (20)
~ 1.40 (20)
Tab. 1. Comparison of results with Kaus et al. Value in parenthesis is the lowest number of PCs at which the value occurs
1.4 Conclusion and Future Work The experiments have demonstrated that our model building method accurately and compactly captures the shape of the distal femur and our segmentation procedure successfully segments new datasets (Fig. 5). A comparison of our results to previously published values of Kass et al. [3] indicates that the accuracy of our segmentation method compares well to earlier methods. With only 12 PCs we were able to segment a 3D image volume with an average deviation of only 0.28 mm, a deviation of less than a voxel in most 3D medical images.
1.5 References [1]
T. Cootes and C. Taylor, "Active shape models—'Smart snakes'," Proc. British Mach. Vision Conf., pp. 266275, 1992.
[2]
C. Taylor, T. Cootes, A. Hill, and J. Haslam, "Medical Image Segmentation Using Active Shape Models," Proc. Medical Imaging Workshop, Brusseles, Belgium, pp. 121143, 1995.
[3]
M. Kaus, V. Pekar, C. Lorenz, R. Truyen, S. Lobregt, and J. Weese, "Automated 3D PDM Construction from Segmented Images Using Deformable Models," IEEE Transactions on Medical Imaging, Vol. 22, No. 8, pp. 10051013, Aug 2003.
12
Fig. 5. Segmentation result. Initial mean model (left) and final, autonomously segmented model (right)
[4]
M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active contour models," Int. J. Comput. Vis., vol. 1, pp. 321331, 1987.
[5]
L. Staib and J. Duncan, "Boundary Finding with Parametrically Deformable Models," IEEE PAMI, Vol. 14, No. 11, pp. 10611075, Nov 1992.
[6]
C. Zahn and R. Roskies, "Fourier Descriptors for Plane Closed Curves," IEEE Transactions on Computers, Vol. 21, No. 3, pp. 269281, Mar 1972.
[7]
E. Persoon and K. Fu, "Shape Discrimination Using Fourier Descriptors," IEEE Trans, on Sys.,Man, and Cyber., vol. SMC7, no. 3, pp. 629639, Mar 1977.
[8]
T. Cootes, A. Hill, C. Taylor, and J. Haslam, "The Use of Active Shape Models for Locating Structures in Medical Images," Image and Vision Computing, vol. 12, no. 6, pp. 355365, Jul 1994.
[9]
T. Cootes, G. Edwards, and C. Taylor, "Active appearance models," in Proc. Eur. Conf. Computer Vision, vol. 2, H. Burkhardt and B. Neumann, Eds, pp. 484498, 1998.
[10]
T. Hutton , B. Buxton, P. Hammond, and H. Potts, "Estimating Average Growth trajectories in ShapeSpace Using Kernel Smoothing," IEEE Transactions on Medical Imaging, Vol. 22, No. 6, pp. 747753, Jun 2003.
[11]
A. Kelemen, G. Szekely, and G. Gerig, "Elastic ModelBased Segmentation of 3D Neuroradiological Data Sets," IEEE Transactions on Medical Imaging, Vol. 18, No. 10, pp. 828839, Oct 1999.
Hierarchical Segmentation of Sparse Surface Data Using EnergyMinimization Approach Raid AlTahir Centre for Caribbean Land and Environmental Appraisal Research (CLEAR) Department of Surveying and Land Information The University of the West Indies, Trinidad and Tobago
[email protected] Abstract:
The main objective for this research is to develop an algorithm that produces a dense representation of a surface from a sparse set of observations and facilitates preliminary labeling of discontinuities in the surface. The solution to these issues is of a great interest to the new trends and applications in digital photogrammetry, particularly for largescale urban imagery. This study adopts the approach of a concurrent interpolation of the surface and detection of its discontinuities by the weak membrane. The solution was achieved through developing a multigrid implementation of the Graduate NonConvexity (GNC) algorithm. The conducted experiments proved that the developed method is adequate and applicable for dealing with largescale images of urban areas as it was successful in producing a realistic surface representation and fulfilling other set criteria.
Key words:
1
Surface Reconstruction, Discontinuioty Detection, Multigrid Regularization.
Introduction
Surface interpolation is a common and important task for several disciplines and applications in geosciences and engineering. This topic has regain research interest with the emergence of new trends and technologies in the collection of geospatial data such as digital photogrammetry and lidar. Such systems provide a large amount of data in the form of discrete points of threedimensional coordinates. Photogrammetry is a 3dimensional coordinate measuring technique that uses mainly aerial photographs as the fundamental medium for measurements. The basic mode of operation is based on taking photographs from at least two different view points; light rays are then traced back from each photograph to points on the
14
object and mathematically intersected to produce the 3dimensional coordinates for these points [1]. The advances in computer technology, digital cameras, and the increasing resolution of recorded images have allowed more optimal utilization and efficiency, as well as developing new applications [2]. The tasks of surface interpolation and surface analysis play an important role in reconstructing the surface based on photogrammetric data. Properties of visual surfaces, such as breaklines and abrupt discontinuities in surface normals, must be made explicit. They have a critical part for the success of the earlier processes of image matching and stereopsis. These processes are usually executed over a hierarchy of image resolutions (image pyramid) [3]. On another aspect, the performance of any surface interpolation method will suffer greatly without incorporating such discontinuities. The matter becomes more critical particularly for largescale urban imagery normally associated with higher percentage of occlusion, repeated patterns, and many surface discontinuities [4]. The large quantity of data generated by a photogrammetric system makes it a challenge to deduce 3D object descriptions from such data. Visual surface discontinuities are definite indications for boundaries of objects on the topographic surface [3]. These boundaries are crucial for furnishing appropriate surface representation, and providing the means for surface segmentation and object recognition [5]. Identifying spatial discontinuities is vital in many applications such as segmentation, optical flow, stereo, image reconstruction [6], and extracting highlevel cartographic features (e.g., building representations) needed in cartographic analysis and visualization [3]. Despite the significance of surface interpolation and surface analysis, not all algorithmic and implementation aspects for solving the related subtasks are fully developed. There is no straightforward choice for an optimal surface interpolation method that would accommodate the needs. Thus, the objective for this study is to develop an algorithm that produces a dense representation for a surface from a set of sparse observations and, at the same time, facilitates the preliminary labeling of discontinuities in the surface.
2
Surface Interpolation and Discontinuity Detection
Surface interpolation may be a simple and wellunderstood routine for some applications, or a more complicated and critical issue for others. The reason behind this is the fact that the interpolation is a prediction of what is not known that would agree with the data to some extent and behaves reasonably between data points. Based on the density and the distribution of the data points as well as the computational principle, different methods provide different interpretations of the data, and thus, different representations of the surface may result [7].
Information Processing and Security Systems
15
A reasonable expectation for an adapted method for surface interpolation is to construct as realistic a surface representation as possible. In addition, it should preserve essential surface characteristics implied by the observations. A violation to this condition occurs when a smooth surface is interpolated over breaklines. Preferably, the adopted algorithm should refrain from introducing new characteristics to the surface, for example, creating new maxima or minima in the surface. It should also allow incorporating and utilizing information on discontinuities whenever such information becomes available [4]. The number of methods for interpolating data over a grid is large if one considers all different variations and solution methods [8]. Several methods were reviewed, tested and evaluated according to the aforementioned criteria. The conclusion was that interpolating a surface by membrane splines fulfills the criteria above, and provides explicit information for surface analysis [4]. Depth data, in this physical/mechanical rendering, are represented by a set of vertical pins scattered within the region; the height of an individual pin is related to the elevation of the point. Fitting a surface is then analogous to constraining a membrane (thin film) to pass over the tips of the pins. Such a model would generate minimal area surface that is continuous, but need not have continuous first partial derivatives [8], On a different aspect of the issue, and since they assume a smooth and continuous surface, the simplistic and direct implementations of conventional methods for surface interpolation may not be adequate when encountering a breakline or discontinuity in the surface. The interpolated surface will exhibit an unrealistic behavior that is manifested as oscillation and overshooting. Hence, discontinuities must be detected at a certain phase of the interpolation process. To address the issue of detecting discontinuities in the surface, several research studies have adopted the concept of a "line process", first introduced by [9]. A line process is a set of binary variables located at the lines that connect grid cells representing the surface. The purpose of a line process is to decouple adjacent cells if the values of these cells are different. However, a penalty should be paid when a breakline is introduced. Thus, a break line will only be introduced when paying the penalty (within a cost or energy function) is less expensive than not having the break line at all. Presetting the line process variable associated with specific cells can accommodate available information on their boundaries.
2.1
Energy Minimization Approach
Because some information is lost in the twodimensional photographic recording and measuring processes, reconstructing the surface from a discrete set of data is an illposed problem [3]. Thus, regularization is normally used for obtaining the unique solution to the problem. It involves minimizing an energy function E that essentially includes two functionals. The first functional, D(s), provides a measure of the closeness of the solution (s) to the available data. It is usually taken to be
16
the sum of the square of the differences between interpolated values and the original data. The second functional, S(s), measures the extent to which the solution conforms to the underlying assumption, usually expressed by the smoothness of the solution [8]. The minimization of E is a tradeoff between maintaining closeness to the original data and obtaining a smooth surface. The regularization parameter, X2, controls the influence of the functionals. A penalty function is added to the original energy function E. This function takes the form Pal,, where a is the penalty and /, is the line process. This produces a combination of a continuous function for the surface and a discrete one for the lines:
The energy functional in equation (1) prefers continuity in the surface, but allows occasional discontinuities if that make for a simpler overall description; a theme called "weak continuity constraints" [10]. This combination facilitates achieving surface reconstruction and discontinuity detection at the same time.
2,2
Graduated Nonconvexity Algorithm
The energy function E is a nonconvex function that has many local minima. In such a case, variational methods are used to guarantee obtaining the global minimum. Accordingly, the solution is achieved by means of embedding the functional E into a oneparameter family of functionals F*p). The parameter p represents a sequence of numbers ranging from one to zero. The function F1 is a crude, but convex, approximation to the nonconvex function. However, as p goes to zero, F9 becomes closer to the original nonconvex one [11]. To obtain the function sequence F*p), the approach of Graduate NonConvexity (GNC) algorithm developed by [10] is adopted. Accordingly, the line process P is first merged with the interpolation function S to create the neighbor interaction function (gat) that is not continuous (Figure 1). The modified configuration is then solved by the graduated nonconvexity algorithm, where the nonconvex function E is gradually approximated by a convex one, F, through a family of p intermediate functions. The final solution represents a convex approximation that encompasses the original function and has only one global minimum. The neighbor interaction function is also modified accordingly and approximated with a set of splines, as shown in Figure 1, denoted as gj£ (t) , and expressed as X2t2
if 2
\t\ TRACE TRACKING
PITCH ' REFINING
PITCH TRACK
Figure 1. Pitch estimation and tracking algorithm. First, the input speech signal is weighted in 256point time window (sampling rate is 8kHz and overlapping coefficient is 75%). Then, autocorrelation vector is computed. Maximum of autocorrelation sequence in interval corresponding to possible fundamental frequency values (typical from 60 to 500Hz) is taken as initial estimation of pitch frequency. In order to improve robustness of the algorithm following preprocessing operations are performed according to Sondhi [14]. Next initial estimate ( F o ) of fundamental frequency is tuned to all present pitch harmonics [11]. After the pitch refining voiced/unvoiced decision is made on the basis of the following parameters: input signal energy, maximum of the autocorrelation sequence and harmonic factor computed as follows: h

0)
where nh is number of present harmonics, nmax  number of all possible harmonics with given pitch. In case of inability to identify at least two out of four leading
36
pitch frequency harmonics, the segment is considered unvoiced. Refined pitch value Fr for each weighting window is identified with the harmonic factor, which can be understood as adequacy of the estimation. False pitch frequency estimations got during speech flow pauses are discarded on the base of analysis of the weighting factors of pitch frequency estimations and analysis of the values of the input signal level, of the speech and silence levels. In order to prevent grose errors and provide better quality, pitch estimation is performed with a delay of two analysis windows. Estimations of the pitch frequency are included in the current track in case the difference between neighbouring windows does not exceed the allowed one. Trace tracking estimation of pitch frequency is calculated using linear approximation of current trace according to the leastsquare method. The condition determining end of the trace tracking is checked by availability of preliminary estimations to the right of the window being analyzed and by harmonic factors. Resulting pitch frequency is determined as: (2) F0=hfFr + {lhf)F,, where Fr  refined pitch; Ft  trace tracking estimation. Robustness of the pitch track estimation algorithm is shown on fig. 2. Measurements were performed with a different additive noise level. It can be considered, that algorithm can work in presence of big noise and standard deviation is less than 1% even if SNR coefficient is OdB. 1 1 1 1
i
i
1 1
i i
i
L 1
L
i
;
1

J L
J
1 J
1
J I
1 /
f
4 /
\_
a>0,5 1 I
1
i
I
1
i
1
i
r —
1—
"I "/"
i
I 0,4
T
"I r
1
A— 30
25
20
1;
SNR ratio [dB]
Figure 2. Pitch tracking algorithm experiments results.
3
TimeVarying Discrete Fourier Transform
It was mentioned in introduction that most important in spectral analysis of speech is determination of amplitudes and phases of harmonics. Using STFT as a spectrum analysis tool does not bring good results. Spectral lines of DFT are positioned always at the same frequency. In the PitchSynchronous DFT this
Information Processing and Security Systems
37
problem is partially solved [9] by choosing length of analysis frame with respect to fundamental frequency. Spectral lines of this transformation are positioned on harmonics, but problem of timevarying pitch remains, because length of analysis frame is chosen using only one estimate of fundamental frequency within a frame. Experiments proved, that fundamental frequency changes can be even about 20% in 32ms. Without information about fundamental frequency change, it is impossible to obtain proper analysis frame length, also position of spectral lines does not match exactly harmonic frequencies. TimeVarying DFT does not have these defects. It provides spectral analysis in harmonic domain, so the fundamental frequency of analyzed signal and its harmonics are always on spectral lines [12]. This feature greatly simplifies spectral analysis of described above signals. DFT transform providind analysis in spectral domain is given by: N\
_ .2mkF0
where X(k)  kth spectral component corresponding to kth harmonic, x(n)  input signal, TV  transformation length, Fs  sampling frequency, F o  fundamental frequency. Kernel of transformation has to be modified in case tracking analysis. Argument of exponential can be written as follows: 0 for n = 0 otherwise'
^ '
where F0(i) is fundamental frequency at time specified by /. Transformation providing tracking harmonic analysis is given as follows: X(k) = ^x(ri)eJ*M)
,k = 0..K
(6)
Nonorthogonal TVDFT kernel can cause energy leakage to neighbouring spectral lines. There are possible two strategies to deal with the leakage phenomenon [12]. First is to use strategy similar to the one used in PSDFT relying on choosing analysis frame length with respect to fundamental frequency changes. Second strategy is based on using pitchtracking time windows, can be useful. Idea of this solution is to design a spectral window, which follows fundamental frequency changes. Experiments showed, that for leakage compensation good results could be achieved when Kaiser Window is used as a prototype [11]:
' n=0"N~l
(7)
where N  window length, ft  window parameter, I0(x)  zero order Bessel function, x(n) is a function enabling timevarying feature, given as:
38
(8)
5r'
where (p{n,\) is computed using formula (5), Fo is average fundamental frequency in analysis frame.
4
PeriodicAperiodic Decomposition Scheme
Speech signal decomposition is the basis in proposed coding method. Proposed method performs decomposition in whole band, which provides to more accurate representation of speech, thus synthesized signal sounds more natural. Speech signal decomposition scheme is shown on fig. 3. First step of decomposition is pitch tracking. This information is passed to TimeVarying DFT analyzer. Output of TVDFT gives information about amplitudes and initial phases of harmonics. For production of harmonic component a set of timevarying oscillators can be used using formula: K
h(n) = ^TiAk(n)cos((p(n,k) + 5.234 3.721 7.843 8.132 3.923 11.345 10.291 2.874 2.127 3.114 3.451 3.984 6.291 5.342 3.157 2.678 6.132 4.187 2.882 6.340 5.842
4 dB, % 0.082 0.042 0.124 0.078 0.061 0.184 0.152 0.034 0.012 0.042 0.034 0.032 0.092 0.120 0.012 0.009 0.018 0.023 0.016 0.073 0.098
Using Phon/Bark and Sone/Bark scales for representation of spectral components of voiced speech can enhance performance of split vector quantization. VQ with Bark/Sone and Bark/Phon scale has better average spectral distortion and less outliers than using dB/Hz scale. Experiments show (in Table 4) that for this splitting scheme 334 we have maximum performance : average spectral distortion SD is equal 1.081 dB , about 2 % of outliers in range 24 % and less than 0.01 % of outliers in range upward of 4 %.
4.
Conclusion
From the work performed we can make the following conclusion. Firstly, classical split vector quantization is quite applicable in case of quantizing of periodic part of speech signal providing good results.
76
Our experiments show that 23 and 24 bit/frame split VQ quantizes LSF coefficients of harmonic speech components with near transparent quality. Some modification of vector codebook structure allows obtain improvement of codebook quality. Perceptual principles applied to speech coefficients improve performance of standard vector quantization and provide reduction of spectral distortion and numbers of outliers, so it adjusts quality of quantised speech. Also we have found that bit reallocation between subcodebooks gives better performance than uniform bit allocation. The quantizer at bit rate of 23 bit/vector was designed and evaluated.
5.
References
[1] Petrowsky A., Zubrycki P.,Sawicki A. Tonal and Noise Components Separation Based on a Pitch Synchronous DFT Analyzer as a Speech Coding Method , Proceedings of ECCTD, 2003, Vol. Ill, pp. 169172 [2] Linde, Y., Buzo, A., and Gray, R.M. "An algorithm for vector quantizer design", IEEE Transactions on Communications, vol. COM28, pp. 84  95, Jan. 1980. [3] Gersho, A. and Gray, R. Vector quantization and signal compression. Boston, Kluwer Academic Publishers, 1992. [4] DARPA TIMIT AcousticPhonetic Continuous Speech Corpus, Department of Commerce, NIST, Springfield, Virginia, Oct. 1990. [5] Palival, K.K. and Atal, B.S. "Efficient vector quantization of LPC parameters at 24 bits/frame", IEEE Transactions on Acoustics, Speech and Signal Processing., vol.1, N« 1, pp. 3  1 4 , Jan. 1993. [6] Katsavounidis, I., Kuo, C.CJ., Zhang, Z. "A new initialization technique for Generalized Lloyd Iteration", IEEE Signal Processing Letters, vol. 1, N« 10, pp. 144 146, Oct. 1994. [7] Laroia, R., Phamdo, N., Farvardin, N. "Robust and efficient quantization of speech LSP parameters using structured vector quantizers", in Proceedings IEEE International Conference on Acoustics, Speech, Signal Processing, (Toronto, Canada), pp. 641  644, May 1991. [8] A. H. Gray, Jr. and J. D. Markel, "Quantization and bit allocation in speech processing," IEEE Trans. Acoust, Speech, Signal Processing,vol. ASSP24, pp. 459473, 1976. [9] W.R. Gardner and B. D. Rao, "Theoretical Analysis of the High Rate Vector Quantization of LPC Parameters" IEEE Trans. Speech Audio Processing,vol. 3, no. 5 pp. 367381, 1995. [10] E. Zwicker, H. Fasti, "Psychoacoustics: Facts and Models". Berlin: SpringerVerlag, 1990
New results in 3D views of polyhedron generation on view sphere with perspective projection M. Frydler, W.S. Mokrzycki Institute of Mathematical Machines, 02798, Warsaw, Krzywickiego 34,
[email protected] Institute of Computer Science PAS, 01237 Warsaw, Ordona 21, wm
[email protected] Abstract:
This article concerns generating of 3D multiview model of convex polyhedron that are a complete representation of this polyhedron, according to viewing sphere with perspective projection. Those models are going to be used for visual identification based on them and a scene depth map. We give a new concept and an algorithm for facedepended generation of multiface views. It does not require any preprocessing nor auxiliary mechanisms or complex calculations connected with them.
Key words:
Object visual identification, depth map, 3D multiview precise model, viewing sphere with perspective, projection, model completion state of viewing representation.
1. Introduction Method of generating 3D multiview representation of polyhedron for object visual identification described in [10] is based on the following idea: centrally generate views relative object features chosen for identification, calculate singleview areas on viewing sphere which correspond with earlier generated views, check if whole viewing sphere is covered with singleview areas. If this cover is complete, generation of viewing representation is finished. If not than generate additional views corresponding with uncovered areas of viewing sphere and again check if this cover is complete. Continue until complete viewing sphere cover is generated. Complete viewing sphere cover with singleview areas means that generated representation is complete. Methods from [16] and [18] are better. To achieve complete representation we don't need to act in a loop. Complete representation is obtained by strict covering viewing sphere by singleview areas and controlling "edge" register (of no covered area). When register is set to "empty" generation of multiview representation is done. Generated representation is complete which follows from the generation method. However to achieve complete representation we have to calculate oneview areas on viewing sphere and operate them in a given order. Without their help it is not possible to get a complete set of views of virtual polyhedron model. On the top of that described methods are for convex polyhedron only. In this article we present a
78
method for generating a complete viewing representation more computational efficient then described above.
2. Research assumptions This research focuses on developing of a method and of an algorithm for generation of multiview, polyhedron representation. For representation generation we use viewing sphere with perspective projection. For this following conditions have to be met: 1. Models are accurate  every model is equivalent to Brep model. 2. Models are viewing models  it is possible to identify object from any view Use of a viewing sphere with projection (Fig 1.) as a projection space allows simple view standardization. Uses: Recognition of objects not bigger then a few meters and distant (from the system) not more then 1 0  2 0 meters. Mentioned above uses allow to make certain assumptions about recognition system strategies. We assume following steps of recognition processes 1. 2. 3. 4. 5. 6. 7.
Determining recognizable object types. Definition of identification task(choose an object's shell feature used for object identification) Generation of viewing models for each object system should identify Creation of database containing all views of all models Acquisition of scene space data and visual data Isolation of scene elements and their transformation to model structures stored in the database Identification of objects by comparing them with database models.
3. View generation space  viewing perspective projection. Basic concepts.
sphere
with
Let object be a convex, non transparent polyhedron without holes or pits. Let's consider its faces as features areas, those areas will be used as a foundation for accurate multiview model determining. This model is a set of accurate views, acquired through perspective projection from viewing sphere, according to the model from [10] (Fig 1). This model is best for 3D scene data acquisition and gives identification system reliability.
Information Processing and Security Systems
79
singleview area
view area
Fig. 1 Concept of view sphere and "single view" areas. The concept from [18] of generating 3D multiview representation based on assumed generation space model is as follows:  Circumscribe a sphere on a polyhedron. Sphere is small (radius r) and its center is at the polyhedron center.  On this sphere place a space cone with angle of flare 2a. This is the viewing cone. Vertex of this cone is a model viewing point VP. Distance between polyhedron's center and model viewing point is R. Viewing axis always goes through the sphere's center.  Unconstrained movement of cone vertex, where cone is tangent to the small sphere creates large sphere with radius R. This sphere is called viewing sphere (fig 1).  Dependencies between values a , r and R and polyhedron vertices coordinates (Xvi,Yvi,Zvi) are:
r  max /=1
k
yjxvi2 + yvi2 + zvi2
sin a Generate views, taking into account only object features selected for identification i.e. faces. Faces visible in the viewing cone create a view, external edges from this view create view's contour. Calculate singleview areas. Those areas correspond with particular views. Complete set of views for a given polyhedron is obtained by covering of viewing sphere with singleview areas. Views are generated in such a way that corresponding singleview areas completely cover viewing sphere. Algorithm makes this approach complete.
80 Changing one view to the other is a visual event. This event occurs as a result of point VP movement. This event is manifested by appearance of a new feature in a view, disappearance of a feature or both.
4. Basic Definitions
Complementary viewing cone is a cone defined by current viewing axis (it's collinear with its height) and has an opposite direction of flare then the viewing cone. It intersects viewing cone with angle 7i/2, so its angle of flare is 7i/2a.
Vector Representation of polyhedron VREP is a set of polyhedron faces normals translated to point (0,0,0) Fig. 2a.
a)
b) Fig. 2a Vector representation of polyhedron a) Polyhedron with normals b) Normals translated to point (0,0,0)
6 Algorithm of obtaining Views by rotating complementary cone. For each vector v from vector representation (VREP) V of polyhedron rotate around it (vector) complementary cone and note every visual event. This event is manifested by appearance of a new versor inside complementary cone or by disappearance of versor. Result of such routine is set o vectors faces that can be seen by observer from view sphere. If the polyhedron is convex all the faces inside view contour are visible. If it is no convex some faces can be invisible
Information Processing and Security Systems
81
Finally create set of different sets of vectors faces that can be seen. It is possible to achieve this by adding to set all obtained sets and by removing sets that repeats.
Representation completeness For convex polyhedron fact that some faces normals translated to center of small view sphere contains in complementary cone mean that there are in same view. By rotating complementary cone around all normals all combination of faces normals that are in some view are obtained. That is why it is true to assume that by this algorithm all views will by collect.
B)
C)
D)
Fig. 4 Complementary cone rotation around selected normal, that is coaxial to one of main axis, perspective view
Fig. 4 is visual representation of mentioned process that is rotation of complementary cone. On this figure selected vector v is represented as blue arrow; red arrows are vectors from vector representation of polyhedron. Violet cone positioned toward observer is complementary cone which is rotated around blue vector. Green wired cone represent view cone.
82
Mathematical approach At the beginning lets describe the algorithm of computing orientation of complementary cone (around vector v) in which it intersect. It is useful to use idea of coordinates transformation to simplify this task. For vector v described in cartesian coordinates, that is in base B that contains three vectors x (1,0,0), y( 0,1,0), z (0,0,1) or simply B((l,0,0), (0,1,0), (0,0,1)), designate base B \ B' is such base that can be created by multiplying B elements by (transformation) matrix [B'B]. B' is selected in such way that coordinates of vector v in this base equals v'( 1,0,0) v'= [B'B]*v. Set of matrix's that fulfils mentioned condition is infinite. It is possible to use following approach to acquire one of them. From base vectors (from B) select this one that angle distance between it and v is closest to 90 degrees. That is select this one that dot product is closest to 0. Mark it as t, than: x'= v y ' = vAt z'=v A (v A t),
Inversed matrix (x',y\z') fulfils our condition . Now lets represent complementary cone that is rotated around vector v' as vector r(s) that lies in center of symmetry of cone and is function of rotation angle around v' (if consider base B'). Compute r(0) as rotation vector (1,0,0) around z' axis at p/2 angle ( b is complementary cone angle), than: r(s) = ( r(0).x , r(0).y* cos(s), r(0).y* sin(s)) Complementary cone intersect with h' (that is vector h transformed from base B to B') only if there is such s that: r(s)*h' = cos(p/2) It is true because h' intersect with complementary cone only if angle distance to center of symmetry of cone equals half of cone angle. It leads to following equation
r(0).x * h'.x + r(0).y * cos(s) * h'.y + r(O).y * sin(s) * h'.z = cos(p 12)
Information Processing and Security Systems
83
If mark known values as A = r(O).y * h'.y B = r(O).y * h\z C = r(0).x * h'.x  cos((3 12) than equation simplify to "intersection equation" A * cos(s) + B * sin(s)  C = 0 For particular data this equation can have 0 solutions when vector h' do not intersect with cone for all s, one solution when cone "touch" h' for some s and 2 solutions (sO<sl) when h' enters and leaves cone during its rotation. Only third case h is considered as to "be in one view" with vector v. Computational results achieved in base B' are also true in base B because of transformation approach .It is important to check and note if h belongs to cone if sO<s<sl or ( 0<s<sl or s2<s PG,h PBJ  points that belong to certain subset (R,G, B), SR, SG, SB  mean value of color component,  vector designating the main axis, N  number of points. The values calculated according to (1) can be interpreted as the coordinates of a vector which is parallel to the main axis of the set (fig. 3). Therefore it is taken as the first vector of a new color coordinates system (color space). We move from source color space (for example RGB) to the new one where the dominant color is related to one of the axes, which will make all calculations (e.g. description of
90
set's boundary) easier. In the simplest case, the line y=ax+b can be changed by y=c\ and x=c2.
Fig. 3. Projections of sample color set onto RG and BG planes (the main axis is presented) The color space we create is a three dimensional one, hence it is necessary to calculate two other missing axes. The most important condition to satisfy is the linear independence of all three vectors (or axes). That is why we choose these vectors as nonparallel to the main U . To simplify we can freely choose any two vectors designating RGB, but only if they are nonparalel to U . In such case they will create complete system (in most cases also nonorthogonal). In that kind of system we can represent any color, but the loss of orthogonality causes the calculation to be very timeconsuming. When we take (as two additional axes) vectors related to R and G we can utilizing orthonormalization procedure of GramSchimdt transform nonorthogonal system of coordinates into orthonormal one (orthogonal and normalized). After orthonormalization the subsequent vectors of the new coordinates system will be described as following:
(2)
W
2

where , W 2 , W 3  vectors designating the axes of new coordinates system
Information Processing and Security Systems
(W,,R)
91
dot product of W1 and R
In above described way we get orthogonal system of coordinates, where axis Wj is a vector normalized to one parallel to U . All points can be transformed into new system by means of [2]. In our case we use the following:
* Wj ,/,./'
(YX/
V\7
(3)
where P /y  a point in RGB space Pwitij> Pw2,ij> Pw3,ij  new coordinates in W 1 W 2 W 3 space After this transformation the new main axis is parallel to Wj (fig. 4). In the process of evaluating the similarity of analyzed color to the template one (or certain range) we omit this component because it is less important (the span along this axis is to large). The larger the distance to this axis, the less similar to the template the color is. Hence it is sufficient to introduce only two limits for each direction (dimension) W 1 W 2 W 3 t o get good enough set boundary. It defines a range of color changes within analyzed objects. Limiting Wj is not allways necessary.
20 100
Fig. 4. Analyzed color set presented in W 1 W 2 W 3 space
92
We carried out an experiment to check, if this approach can cope with human faces detection. The results are presented in scatter plot (fig. 5). The color we have been looking for ranges from 230 to 310 for component W , , from 5 to 40 for W 2 and from 0 to 16 for W 3 . Additionally we can calculate an angle a between analyzed color A and axis W j , which can be taken as an extra similarity coefficient. It can be calculated as following:
where A w is a projection of A onto axis Wl The angle a determines the direction of vector A . If a is close to zero, then A is parallel to W j . Figure 5 presents a graphical representation of a for each pixel of sample image (intensity of pixels is related to a value  lower a is represented with darker pixels). However a can not be taken as the only one similarity factor because for color components close to zero it creates a noise ranging from 0 to 90 degrees. The length of color vector and its deviation from Wj can be used to evaluate the similarity between template color and analyzed one.
Fig. 5. Cosinus of an angle between Wj and analyzed color (left) and extracted regions (right) It should be stressed that the operations in custom color space are not so straightforward as in classical one (RGB, CMY, YUV). Moving along each dimension changes not only the intensity but also the tint. For better usability the
Information Processing and Security Systems
93
color model should be constructed in a way in which intensity and tint are independent. However it will lower the quality of detection, but on the other hand, it can be useful for creating more diverse models for larger class of objects with larger changes in colors. In most cases we can assume that the main axis of a set is related to intensity changes. It means that the pixels colors change mostly in this manner. Hence, the variations in tint are smaller. Using these assumptions we define the main axis of color space as the vector U with starting point in origin of system of coordinates and ending point in the middle of the points set. The vectors defining axes of a new color space can be calculated using (3). In order to make operaions on intensity easier, we transfrom each point from W j W 2 W 3 space into destination space using the following:
B = c, a = arccos ± L /3 /3 = arctan —^ ,
(4)
where C  a vector representing certain color, C w , C w , C w  projection of onto W 1 W 2 W 3 B  length of C , intensity component a  an angle between C and W j , related to similarity between sample color and template one. P  an angle between projection of C on to plane W 2 , W 3 and W 2 related to tint similarity of template anf sample colors Figure 6 shows two more examples of applying custom color model {Bap) for detecting certain color regions. In each case separate color model has been created.
3
Summary
Presented custom color models can be applied in many practical tasks not restricted to human faces detection and remotely sensed images processing only. Their main advantage over other "classical" methods [4,5,6] is the precision of setting the limits of colors. It is no longer needed to calculate complicated
94
boundaries for color sets. Instead we have only parameters describing the simple boundaries (box) or intensity and deviation of analyzed color from the template one.
Fig. 6. Examples of locating certain areas (B,D) in satellite images (A, C) [3]
4
References
[1]
Biatasiewicz J. T. ,,Falki i aproksymacje" Warszawa 2000 [in Polish]
[2]
Lathi B. P. ,,Teoria sygnalow i uktadow telekomunikacyjnych" Warszawa 1970 [in Polish]
[3]
Earth Sciences and Image Analysis, NASAJohnson Space Center. ,,The Gateway to Astronaut Photography of Earth."
[4]
Arbib M. A. and Uchiyama T. ,,'Color image segmentation using competitive learning". IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 16, 1994
[5]
Yang MingHsuan and Ahuja Narendra. ,,Detecting Human Faces in Color Images". International Conference on Image Processing . Vol. 1. 1998
[6]
Garcia C. , Tziritas G., ,,Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis", IEEE Transactions on Multimedia, 1(3) 1999
Eyes detection with motion interpretation Arkadiusz Kogut Wolinska 43, 72100 Goleniow, Polska, email:
[email protected] Abstract: The paper presents two methods of eyes detection in digital colour images with using template matching and projection method. It present a way of using results of the algorithms as a motion interpretation system. Keywords:
1
Object detection, eyes detection, video sequence analysis
Introduction
The most applications created for the eyes detection are concentrated on the result and they does not show how easy creation of the detection system can be. It makes the biometrics more inaccessible for common people. Closing the science for people can slow down the progress of it. The system presented in this article is an example of full accessible and intuitive application, which allows for the user to test and make his own conclusion based on various materials and conditions including graphical representation of the result.
2
The system basics
The application is based on simple rules of HSV analysis in detecting faces regions at colour images [1], projection and template matching method for detection eyes position [1]. The system was written in Borland Delphi 5.0, which allows for easy connection with video source  internal and external. Using this kind of software, preparing of the application can concentrate on contentrelated part of the problem. On the picture 1 is shown the main form of the system, which lets to learn the results of all particular steps of the algorithm. At the right side of the window can be found all main parameters important for the results. There is possibility of changing the analysed materials, analogous parts of the algorithm, templates and their coefficient of size, and others.
96
Plik
£
Parametry . 2rodlo twarzy: ! f* Wyci^la klatka [ r AnalizaHSV
AnafeaHSV • ' ' Fir dolnoprzepustowy t medianowy
' t Metoda poszukiwania oczu W 2 filtraqa medianow^ • :
J3 •JRozmiar maski ' P XorazY f Pok az wykres Metoda projekcji Osobno dla oka lewego i prawego '•' Metoda porownywania wzotca Zmien wzotzec j
I
;Wsp6lczyn
skali wzorca10,3
F/,g. 1. The main form of the application Every frame depends of the set values in parameters area in main window of the application. HSV analysis is the first step of the algorithm [1]. It is the method which determines the ranges for Hue, Saturation and Values typical for skin colour. The ranges are presented in picture number 2. The source image is analysed pixel by pixel and every of them is checked if it is in the range of skincolour or not. According to those information is created the binary map of the regions with skincolour. The binary map is decreased six or eight times so the small noise is removed from the image. To support this process the binary image is median filtered with the mask size 3x3. All that process is illustrated frame, by frame during the action of the program. The biggest areas with skincolour are treated as a face, and they are put to the dynamic array of the TImage objects, and all analysis is taken on that array which is showed at the bottom frame of the main window (fig. !.)•
2.1 Eyes detection methods The next step is actual eyes detection. The applications offers two kind of solution of this problem: the Projection Method [1] with median filtering of the added values, the Template Matching Method, with searching for minimum deviation [1].
Information Processing and Security Systems
97
2.1.1 Projection method First of them is very simple method based on adding the white pixels in every line and searching for the maximum which determines the position of the eyes. As it can be noticed at the picture number 3 the gradient image has the biggest concentration of edges at the eyes areas, so adding the values of the pixels can detect the line where the maximum (and the eyes line) is. Component 1
Hue
2
Saturation
3
Value
Range
Area
H < 25.5 or H > 242.25 Whole face
58,65 < S < 173,4 153 < V < 2 5 5
Fig. 2. Ranges for H, S i V to find the skincolour areas [2]. This method is very easy to implement, but has one very weak point. During the adding the pixel values in the line it can treat the mouth line as a eyes line because when lips are opened, they can create maximum which is bigger than local maximum at the real eyes line. There are ways to remove this problem. The newest of them, detected during the application creation, is using the median filtering for removing narrow strip of local maximum created on the mouth line, because the strip of local maximum at the eyes line is more wide. The process of implementing the median filtering needs to use sorting algorithm. It is not important what kind of solution is used, because in the worst situation, it has nine values to sort, so the economy of the sorting method has low meaning. The efficacy of the method groves about 80%. The results of the method with, and without are showed at picture number 3. The fourth column shows the graph of the added values in line array. It is easy to notice the big difference with second column, where the maximum located at mouth line determines the error of detecting eyes line_. Without median filtering Input image
The graph of values line adding of pixels
Horizontal eyes line
With median filtering Median filtering of array of added values in line
Fig. 3. The way the Projection Method works [3]
Horizontal eyes line
98
According to guidelines of this article, user can determine the source of image for analysis, size of the mask for median filtering, chose the way of filtering and detecting eyes (the same eyes line for both or separately), etc. 2.1.2 Template matching method The other implemented method is based on searching for minimum deviation between source image and template. The program compares the part of the image (same size as template) and calculates the deviation coefficient. The lowest value of the deviation coefficient is treated by the application as the eye region, and the middle of compared area is taken as the centre of the eye. Frame 1 (source: Polsat)
Frame 2 (source: TVN)
Fig. 4. The results of template matching algorithm [3] The user can decide what kind of template the application use, can determine the size coefficient of the template (there is possibility of using every bitmap file  the size will always be change to needed). The template size coefficient is set as one third and it is used to set the height and width of the template. The width of the template is set as one third of the width of the face and height changes proportional to the width. There is possibility to change the size coefficient in range 0.05fl for checking different types of templates. The sample of results are presented at picture number 6. Template matching is very efficient method. Almost every analysed frame have good results if the quality of source is good also.
2,2
Results representation
The results representation and the way the system can be used is motion detection of the user working with the system. It takes a system as a way of 3D objects manipulation. For understanding it we have to concentrate a little at human anatomy.
Information Processing and Security Systems
99
2.2.1 Human skull Taking the way the head is moving we have to concern the move in two axis. For simplification of calculation I have neglected the rotation in the plane of the screen. For calculation unification the normalisation of a and b (fig. 5.) distance needs to be done. That will help in finding the angle of face rotation.
Fig. 5. Human sculpture. Distance between eyes (a) and the axis of occiput knuckle (b) [5] By the measuring the distances between eyes and the distance between eyes line and axis of occiput knuckle we can assume that a : b is like 1 : 1,384. After normalisation of that data we can universally use it to all sizes of detected faces. 2.2.2 Face direction analysis For calculating the face direction method, we have to reverse the 3D images generation method. The image of the face is the result of generated 3D model. The distance between the eyes we take for calculation of distance between the object and projection surface of the screen. The angle of head rotation is very easy to find the rotation angle using trigonometric functions  that describes fig. 6. Horizontal rotation can be calculated in similar way.
Fig. 6. The angle of head rotation, d  distance between eyes, 1  screen surface, b  distance between zero eyes position and real eyes position
100
After image processing the face direction is represented by two angles of rotation in horizontal and vertical position. The size of the angle can be used as a energy of motion. The last problem is to define if the angle means that object should be rotated left or right, but that can be determinate by comparison with zero position of eyes. The start position of eyes can be marked in calibration process similar to manual manipulators.
3
The system description
The system is a very clear structure basing on the rules presented in point two. The first point is calibration of the input image. The person using it should determine the region that will be used for calculations. That can reduce number of operation for finding face and eyes position During the calibration there is also the zero are pointed. This is the region which we can take as a straight looking to the camera. Next it is normal analysis process. Every frame got from the video stream is analysed by HSV algorithm, then by the template matching algorithm too. There is possibility of using various types of templates, which can be changed during the work of application. In every frame the middle point of the segment between the eyes is pointed and this is the most important for comparing with zero point. The distance is changed to angles described in point 2.2.2. The use of that information depends of the user. Here it is presented by 3D model written in OpenGL for Delphi.
Fig. 7. Representation of application processing: a) input image, b) green frame of analysis area got in calibration process with the yellow cross of zero point, c) to f) the middle point at the segment between eyes marked by blue dot.
Information Processing and Security Systems
101
The main problem is the light conditions. In daily light there are no problem with HSV analysis, because the light is distracted and not distort the colours value. In artificial light there is a need to calibrate the camera to get higher efficiency of the algorithm. There is no problem with template matching when the faces areas are located in correct way (fig. 7). During the processing application takes the difference between the blue point (fig. 7) and the zero point to find the direction of rotation, because during common calculation we get the angle but we do not have which direction we should rotate. That situation is made because of trigonometric functions used during the calculation gives only the measure of the angle. This is realized before calculating the angle. The difference value is taken to find bvalue (fig. 6) so it is very easy to connect. Final information as a result of processing are two angles which are necessary to use in OpenGL part of application.
Fig. 8. The example of 3D chemical compound model rotation by implemented motion interpretation system
4
Summary
Presented solutions are examples of simple analysis of existing solution with a small innovations like median filtering used in projection method. The biggest advantage of presented application is simple and intuitive interface with very suggestive way of results representation. It is difficult to find new revolutionary solution, but the newest discoveries based on linking two tapes (or more) of algorithms to fill in disadvantages of one of them by the advantages of another gives a filling that there has not been said the last word in this subject.
102
The most interesting point is that a and p angles can be used in many different situations. They can be used as a typical manipulator which change the position of mouse cursor  what can help invalid persons to work with computer. That also can be new attractive way of piloting the computer games without touching the keyboard, mouse, joystick or any other well known manipulators (fig. 8).
5
Acknowledgements
The author would like to thank Georgy Kukharev dr hab. inz. prof. PS for giving the basics of biometrics systems and for showing that life is more interesting when there is a passion in it.
6
References
[1]
Kukharev Georgy, Kuzminski Adam, Wydzial Informatyki, Politechnika Szczecinska 2003. Techniki Biometryczne  Cz^sc 1  Metody Rozpoznawania Twarzy.'
[2]
G. Kuchariew, Politechnika Szczecinska Wydzial Informatyki, Szczecin, Informa 2001 'Detekcja obrazow twarzy w zadaniach "Nameit1", 'Materiafy VI Sesji Naukowej Informatyki.'
[3]
Arkadiusz Kogut Szczecin 2003, Politechnika Szczecinska "Obliczanie wspolrz^dnych oczu na twarzy w sekwencjach obrazow wideo."
[4]
Baza obrazow badawczych PS http://gpro.wi.ps.pl/
[5]
Wydawnicwo LibraMed http://www.libramed.com.pl/b2b/LWW.htm
Financial Distress Prediction Using Different Pattern Recognition Methods Wiestaw Pietruszkiewicz, Leonard Rozenberg Technical University of Szczecin Faculty of Computer Science ul. Zomierska 49 71210 Szczecin POLAND email:{ wpietruszkiewicz, lrozenberg }@wi.ps.pl
Abstract: Prediction of possible financial distress is one of the most important tasks in business. This paper presents application of several pattern recognition methods in the field of financial distress prediction. All experiments were performed in troublesome Polish environment Keywords: Pattern recognition, financial distress, crisis, memory based methods, neural networks, Discriminate Analysis
1
Forecast of financial distress
Prediction of financial distress is one of the most important tasks in business. It is a must for many groups e.g. capital donors need to protect themselves against possible loss of credits or firm's managers try to forecast crisis, which can drag enterprise into state of insolvency. The most widely used method of firm's condition evaluation is financial analysis. However its simplicity and usefulness is limited by its ambiguity e.g. firm's with good liquidity may be heading for a fall if its debts are above some dangerous limit. In such case financial analysis won't give clear signals. That shortcoming constrains developing of model combining several measures into one meaningful output indicating if firm shows any signs of possible insolvency. There are many factors that can cause crisis. Whatever causes are, one can notice that in each firm there are a few common crisis' stages and events (Fig.l). The schema presents on Fig.l concentrates on crisis' progress, while factors causing it can be found in appropriate positions of literature [8] or [2] and therefore they won't be explained herein.
104
Crisis' characteristic moments The past
Normal functioning
Crisis'beginning
Proofs of crisis
Bankruptcy
/
Firm's characteristic periods
Fig. 1. The stages of crisis Analyzing crisis progress it can be noticed that it starts in a hidden form (point E2). It's an unnoticeable stage of the crisis i.e. cannot be noticed without any deeper analysis of firm's finance. Thus even if crisis hides and isn't obviously visible at this stage, this moment must be considered as a beginning of period where firm's condition is in great danger. The literature mentions about many signals which can be regarded as proofs of a coming insolvency. According to Rozenberg [8] all signals can be divided into 2 groups: financial signals: this groups contains such effects as a increasing net loss or decreasing net profit, loss of liquidity or demand for new sources of capital used not to develop firm but to keep it running, nonfinancial signals: in this group such signals can be mentioned as ascending amount of capital lockedup in investments, lack of plans and frequent change of the managers. Such signals often cannot be identified by a nonskilled person, thus for workers or business partners, it may seems that nothing shows any signs of crisis. Clear proofs of crisis' existence start in point E3, where even nonskilled person can recognize that bankruptcy is closer each day. In period P 3 crisis can be noticed: inside firm, where occur frequent problems with liabilities like late salaries, lack of materials used in production, thus problems with keeping production running, outside the firm, where exists non paid liabilities or lack of power enough to fulfill placed orders on demanded goods. Firm being at the stage P 3 , where proofs of crisis are obvious, must take immediate reparation steps. Unfortunately process of reparation is very difficult at this stage and often does not protect against insolvency. Thus the aim is not identify signals in stage P 3 but to forecast coming crisis in stage P 2 i.e. where are signals of possible disaster but thy are well hidden, also where still is a hope [7].
105
Information Processing and Security Systems
2
Pattern recognition as a solution
The most useful and widely used attempt to problem of financial distress prediction is pattern recognition. It assumes that there must exists some similarities between failed firms and companies failed in the past can be used to estimate some pattern useful in others firms' condition evaluation. The first used method of financial distress prediction and the most frequently used (even nowadays) is Discriminant Analysis (DA in abbreviation). It may be said that DA aims to find a projection vector (called discriminating vector), that allow to project points from Ndimensional space into new space with Mdimensions (MX'=10
I
= 4/\6
r=2
Fig. 2. Approximation table TAf of an example function/ The approximation table of an example function / is presented in figure 2. There exist 10 effective approximations of the function: 2 approximations with probability Ap = 1/2 and 8 approximations with probability \Ap\ = 1 / 4 . For clarity, in figure 3 is presented the basic algorithm computing a single value of the approximation table of function/. Auxiliary function BITXOR(...) computes the XOR of the n least significant bits of parameter X. Function N(...) is the counting function of the approximation. The main function TAF(...) returns the value N(X\ Y\ n, m)  2 . The time complexity of the algorithm is 0((n+m) • 2").
128
TAF(X\ r , n, m) 1. BITXOR(X,rc) 2. w {0,1 }w, where n > 1 and m > 1, holds: TAf = TAfo + TAfx and TAf = TAf0  TAfx. Proof. The linear approximation of function/can be written in the following form:
Y[Y']=X[X']=XnA'.Xn_l®Xr[Xr'], where Xr = (Xn.2, ..., Xo). Let N be the counting function of the approximation of function / and let JV0 and N\ be the counting functions of the approximations of functions/o a n d / , respectively. Furthermore let Nj* denotes the counting function of the approximation of function/for Xn.\ = a and Xn.\ = /?, where cx,/3e {0,1}. X^.f = Owe have: AN =( NQ + NO  2nA = (NQ
2"~2) + (N0
2n'2) = ANQ + ANh
We have obtained that TAf = TAfQ + TAfx. For X n .j' = 1 we have: N = No + N\ = NQ + {2nA  N\°) = /Vo + (2n~l  NO, and 2"l=N0Nl = (NQ2n'2) {N\2n'2)
=
AN0ANh
We have obtained that TAf = TAf0  TAfx.
•
In figure 5 are presented four elementary approximation tables of Boolean functions of one variable. The tables can be used to compute approximation tables of Boolean functions of more variables as well as in general, of functions /: {0,l}"> {0,1 }m, where n , m > l . X
Y
0 1
0 0
X
Y
0 1
1 1
Y'
X' >=>
0 1
Y' 0
1
1 0
1 0
X
Y
X'
0 1
0 1
0 1
X
Y
X'
0 1
1 0
0 1
Fig. 5. Approximation tables of Boolean functions of 1 variable
r 0
1
1 0
0 1
Y' 0
1
1 0
0  1
130
The case of Boolean function of two variables is illustrated for function XOR, in figure 6. To compute the approximation table TAXOR we use elementary approximation tables from figure 5, for onevariable functions XOR(0, Xo) and XO/?(1, Xo). The upper half of TAXOR is obtained as a result of addition and the lower half as a result of subtraction of the two elementary tables. X
Y
X'
0 1 2 3
0 1 1 0
0 1 2 3
Y' 0
X'
1
1 0 0 1 1 0 0  1
•=>
Y' 0
1
0 1
2 0
0 0
3
0
2
/I or sequences of consecutive applications of a finite set of maps { Fm, •••, Fh Fo }: I —>L As writing out explicitly formulas involving repeated applications of a set of maps can be awkward, we streamline the notation by denoting a map composition by
G(* B ;m,a)= F m (F,(F 0 (* 1 , ;a)))= Fm °F, ° F0{xn ;a)
(3)
Then the nth iteration of map (1) and (2) is as follows: Rn
a  certificate expiry time, (5) trca>a  optionally certificate revocation time. Obviously the certificate should have all remaining fields like a public key or CA signature, but these fields does not make any influence on proposed protocol activity and they will be omitted in this paper. Each certificate may have in a moment tx some features from set presented below: 
NI  not issued, where: tx < tica,a \

I  issued, where: tx > teca>a ;

NE  not expired, where: tx < teCa,a \

E  expired, where: tx > teca>a ;

NR  not revoked, where: tx < trca>a ;
Additionally it is obvious that for each CA authority and a certificate serial number there must be fulfilled two basic conditions: 
\/ca,a
:ticaa
a moment)
may (but not need to) occur only in a period from issue to expiry. Taking it under consideration certificate may in one of the below state: 
{NI,NE,NR}  not issued (and obviously not expired and not revoked too);

{I,NE,NR}  issued, not expired, not revoked;

{I,NE,R}  issued, not expired, revoked;

{I,E,NR}  issued, expired, not revoked;

{I,E,R}  issued, expired, revoked.
For the verifier one an only correct state is {I, NE, NR}, each other disqualify certificate usage. This situation is presented on the Fig. 1 in a form of a graph. Three upper axes symbolize time flow related to issue, expiry and revocation events. Lower axes present two possible scenarios of certificate life, dependent on revocation occurrence.
170
If the end user/verifier collects certificate which state he want to define, he is able to check certificate issue and expiry times by himself. In situation like this the verifier is able to decide for any moment tx if the certificate was issued or expired (tica,a < tx < tecaya). He is interested in server answer concerning only possible revocation, so for a tx moment he could get an answer {I,NE,NR} or {I,NE,R}. 4
/ E
NI
te
^ NE ^ NR
tr ®m
{NI,NE,NR}
ti
{I,NE,NR}
R
{I,NE,NR} ^^^^^
tem
fc
{I,E,NR}
Fig. 1. Status changes during certificate lifetime There is possibility that the verifier doesn't have demanded certificate but only signed message. We could assume that signature format allows to get certificate issuer and certificate serial number for certificate associated with key used for this signature. The verifier doesn't know either certificate issue time or expiry time that is why when he asks about certificate state in a tx moment, the server answer may be on of following: {NI,NE,NR}, {I,NE,NR}, {I,NE,R}, {I,E,NR} or
4
Protocol proposal
4.1
General scheme
In this scheme it was assumed existence of many independent Certification Authorities. They are publishing certificates and revocation information in common accessible directories. These directories do not need to be connected with particular CA and further on CA do not need to limit publishing to one particular directory. CA publishes authentication data with help of Trusted Publisher TP (similarly to directory, publisher is not associated with one and only CA). Unauthorized status servers USS have free access to data stored in directories and on that basis they can construct trusted answers to queries about particular certificate status. On the other side end user R ask question about certificate status to local certificate server. His server is responsible for local management of certificates and revocation information and for communication with directories
Information Processing and Security Systems
171
and status servers. Following symbols for protocol sides will be used in the further part of a note: CA  Certification Authority, generally trusted agent responsible for issue and revocation of certificates and directories updates. D  directory, database consisting of certificates and revocation information (in a form of linked authenticated dictionaries). There are stored data actual and archival as well. The directory does not need to be associated with particular CA in onetoone relation. In one directory could be placed some information from many different CA and one CA may publish in many directories. TP  Trusted Publisher, an agent responsible for publication of some verification information; USS  Unauthorized Status Server, server of online certificate status verification service. It accepts requests from LCS and on the basis of the data taken form directories it generates the answers. Revocation information is stored in directories in authenticated dictionaries, so each answer is authenticated a priori by particular CA. USS need not be trusted. LCS  Local Certificate Server, service in R requestor system, responsible for local certificate management and its correctness verification. It could be a standalone server for few requestors or only service for one particular requester in his local machine. R  Requestor, a person or a program demanding certificate status information in a one particular moment. On the Fig. 2 is presented exemplary dependence network in described scheme. certification authorities
R
l
R
l
R
2
R
3
Fig. 2. Example of dependencies between protocol sides
4.2
Backend
In all certificate status verification protocols it could be distinguished two layers. Higher one, so called frontend, defines successive protocol steps, format of sent
172
data etc. Lower one, backend, is a set of all mechanisms, structures and functions allowing protocol operation. For example OCSP protocol is actually only frontend. For correct working it needs backend  usually Certificate Revocation List are used for it. It's obvious that used backend solutions affect frontend functionality. Backend of a proposed protocol is based on linked authenticated dictionaries. This solution was presented in earlier note [3]. Authenticated dictionary is a dictionary, which response to Search command not only with found I not found information, but also adds proof of authenticity of this information. There's a new element in a scheme with authenticated dictionary  untrusted mediator placed between the trusted data holder and the requestor. The trusted data holder shouldn't serve requestors Search queries directly  it's responsible only for data structure updates. The holder distributes updated authenticated dictionary (or the authenticated update package) among mediators. The holder generates the authenticity proof after each update and it's the integral part of the authenticated dictionary. Dictionaries linking allow trusted storage of archival information and prevent frauds based on revocation order changing.
4.3
Frontend
Proposed protocol is shown on Fig.3. Each step is presented as an arrow, which points out a data flow direction and it is indicated by step number. Steps number 2 and 3 are optional and they are drawn with dashline.
D
USS ©:::>;..._
LCS LCS : IPca,r,s 8) On the base of received proofs LCS verifies authenticity of data and it formats answer state. Possible answers are: {I,NE,NR}, {I,E,NR}, {I,NE,R} and {I,E,R}. Answer {NI,NE,NR} is prepared in step 4 if necessary.
174
LCS * R : state
5
Summary
Proposal presented above help us to eliminate described OCSP protocol limitations, i.e. there's a possibility of an unauthorized status servers usage (certification authority needn't control servers) and generation of dependent on time status requests (different from current time). Backend structure, which is a foundation of whole protocol, let us increase frequency of revocation information updates. The choice of optimal structures for TAD and PAD dictionaries implementation is our current goal. Not only efficiency, but also protocol formal description depends mainly on this choice.
6
References
[1]
Arnes A., "Public Key Certificate Revocation Schemes". PhD thesis, 2000.
[2]
Arnes A., Just M., Knapskog S., Lloyd S., Meijer H., "Selecting Revocation Solutions for PKI", Proceedings of NORDSEC 2000 Fifth Nordic Workshop on Secure IT Systems, Reykjavik, 2000.
[3]
Mackow W., "Linked authenticated dictionaries for certificate status verification", Enhanced Methods in Computer Security, Biometric and Artificial Intelligence Systems, Springer, pp.3546, 2005.
[4]
Chocianowicz W., Mackow W., Pejas J., "Protocols of revocation and status verification methods for public key certificates", RSA Conference, Paris, 2002.
[5]
Myers M., et al., "X509 Internet Public Key Infrastructure Online Certificate Status Protocol  OCSP", RFC2560, 1999.
[6]
Fox B., LaMachia B., "Online Certificate Status Checking in Financial Transactions: The Case for Reissuance", Proceedings of the Third International Conference on Financial Cryptography, pp. 104117, 1999.
[7]
Willemson J., "Certifcate Revocation Paradigms", Technical Report, Cybernetica 1999.
Micropayments with Privacy a New Proposal for Ecommerce Krzysztof Szczypiorski, Aneta Zwierko, Igor Margasiiiski Warsaw University of Technology, Institute of Telecommunications, ul. Nowowiejska 15/19, 00665 Warsaw, Poland email: {K.Szczypiorski, A. Zwierko, I.Margasinski}@tele.pw.edu.pi
Abstract: This paper presents an original concept of micropayment schemes which combine both the simplicity of an electronic prepaid card without trusted third party and user's privacy. Two protocols are proposed  the first one is based on a secure oneway hash function, the second one is based on a crypto graphic ally secure pseudorandom bit generator. The main advantage of the proposed schemes is the ability to perform a cryptographic key distribution within the micropayment process. Keywords: micropayments, ecommerce, privacy
1
Introduction
Micropayments (defined as electronic transactions transferring very small sums of money) are very attractive from the privacy's perspective. In this paper we propose two new micropayment schemes (MINX  Micropayments with Secure Network Exchange) based on different cryptographic primitives (an oneway function and a cryptographically secure pseudorandom bit generator). They provide both the user and the vendor with reasonably fast and secure protocol for micropayments. The main idea of the proposed schemes is the ability to perform cryptographic key distribution with the micropayment process. Other advantages of the presented system include an immediate double spending detection, effective forgery prevention and confidentiality of transactions. Some of those unique properties were made possible by merging the role of a service provider (a vendor) and a broker (a guarantor of a payment) into one: an operator. This situation is typical in the telecommunications environment.
176
2
Related Work
The most important and known micropayment schemes are PayWord and Micromint proposed by Ronald L. Rivest and Adi Shamir in 1996 [14]. Both schemes are based on Trusted Third Party (TTP) named broker and need Public Key Infrastructure (PKI) in order to work properly [10]. In the first one, PayWord, an user utilizes a cryptographic oneway function to produce a sequence of coins. A payment is granted by the broker. Once a day vendors contact brokers and vendors' money gets transferred. The second scheme is based on a different idea it also uses the oneway function as a method of producing coins, but the coins come from a broker and then are distributed to users. This special method makes forging coins much more difficult than producing real ones by a broker. This concept is based on the birthdayparadox for oneway functions [10]. Another micropayment scheme was proposed by Torben P. Pedersen and was named the CAFE project ([12], [1]). It is also based on the oneway function and it is very similar to the schemes proposed by Shamir and Rivest, but it was developed independently. The CAFE system is a part of the ESPIRIT project. Other schemes were proposed by Shamir (based on lottery tickets system [13], improved in [11]). A similar micropayment scheme, based on hash functions and called NetPay, was proposed by Xiaoling Dai and John Grundy [2]. Their paper also provides details of a possible architecture and an implementation of such system. The idea of combining some properties of macro and micropayments schemes was introduced by Stanislaw Jarecki and Andrew Odlyzko in [8]. Their scheme combines the simplicity of an offline micropayment scheme with an online security of a transaction, which means that a vendor is consulting a broker from time to time during communication with a client. This enables the vendor to check if the client is not cheating. Many other micropayment methods are discussed in [7] and [4]. One of the commercial systems was proposed by IBM: Internet Keyed Payment Systems (/KP), discussed in [6].
3
Proposals of New Schemes
We propose two new schemes for micropayments. Both are prepaid cards oriented, which means that they have almost all the advantages and disadvantages of reallife prepaid cards. The main novel idea of the MINX system is the performing cryptographic key distribution within the micropayment process.
3.1
Properties of Prepaid Cards
Prepaid cards can be treated as types of micropayments. Contradictory to classic micropayment schemes, there is no TTP. When a user buys a prepaid card the
Information Processing and Security Systems
177
user has to trust an operator that the card is valid and ready to use. In a traditional purchase (not prepaid), the user knows exactly how it works. That is why a trusted operator is the major factor in the schemes discussed. Another advantage of the prepaid card is the possibility to use only a fraction of it. A partially used card is ready to be utilized at any time. The process does not require a user to provide an operator with any information during card purchase or its usage. The proposed schemes are based on two different cryptographic primitives: the oneway hash function and the cryptographically secure pseudorandom bit generator. A hash function h maps an input x to output h(x) of a fixed length. For a given x, h(x) is easy to compute. A oneway hash function (h) has the following properties [10]: • oneway (preimage resistance)  for y = h(x), computing x from y is infeasible, • weak collision resistance (2nd preimage resistance)  for given xj and h(xj) it is computationally infeasible to find such x2 that h(xi) = h(x2), • strong collision resistance  it is computationally infeasible to find such Xj and x2 that h(xj) = h(x2). A pseudorandom bit generator (PRBG) is a deterministic algorithm which for a given input sequence (a truly random binary sequence) outputs a different binary sequence, much longer, which "appears" to be random. The PRBG passes the nextbit test if there is no polynomialtime algorithm which can differentiate between this PRBG output and a truly random sequence with probability significantly greater than Vi. The PRBG, which passes the nextbit test, even under some plausible, unproved mathematical assumptions, is called the cryptographically secure pseudorandom bit generator (CSPRBG).
3.2
MINX General Overview
Basic definitions: • key: in MINX system a key means a secret key for a symmetric cipher (like Advanced Encryption Algorithm  AES, RC6), • impulse: an impulse means one unit of payment which can be extracted from a prepaid card, • ID: every user who wants to use a valid card has a unique identifier  named ID assigned by an operator; the ID enables the operator to find a proper secret key for decrypting data received from each user. Both MINX schemes are based on the same four steps (Fig. 1): • Step 1. A user shows part of a card to an operator.
178
• • •
Step 2. The operator sends a confirmation and an assigned ID to the user; at the same time the operator computes a current key. Step 3. The user computes a current key and an impulse, encrypts it with requested data and sends it to the operator. Step 4. The operator validates it and sends a response back to the user.
After the completion of step 4, it is possible to establish a secure communication between the user and the operator  a key (shared between the user and the operator) is destined to be used as a session key in all secure exchanges between the parties until a new key gets established. The last two steps are repeated until the user wants to use a service provided by the operator (with a set fee) or the user's virtual prepaid card is used up.
Client
part of the card (depends on scheme)
computes current key  identifies whom requests concerns
•
 identifies resources in the system to which access is asked for
•
 identifies individual rule in the policy
•
 a boolean function over subject, resource, action, and environment attributes.
•
 identifies actions to be taken
•
<Apply>  applies a function to its arguments
210
Figure 1 presents a sample policy written in XACML. This policy says that it only applies to requests for the resource "Printer" and it permits to access this resource only if subject does log in. <Subject> Printer <Subject> login
Fig.l Sample X A C M L policy.
3 Certificate Based access control It is possible to extend proposed access system with certificates, attribute certificates and use conditions. In order to assure compatibility of access control system and certificate service XML will also be used. XML Signature Standard described in [4] proposes standard for representation of X.509 based certificates and attribute certificates in XML. There is also related work concerning RBAC Policies in XML for X.509 certificates described in [3]. Certificate Based Access Control System described in [8] and [9] is made of three main units: •
policy management point,
•
certificate service,
•
domain server.
Simplified system architecture is shown in Figure 2. All system information are stored in LDAP service {Lightweight Directory Access Protocol) e.g. system structure ,domain hierarchy. The management concole is uded for assigning roles and policies or policy sets to subjects. Policies and policy sets are written in a policy editor. Policy editor and management console use domain browser to simplify system management. Policy management system uses data stored in LDAP service and Certificate service to make responses to access requests send
Information Processing and Security Systems
211
by the subject. Policy management system uses external Certificate Authority to validate certificates of users unknown to the system.
Policy management
Fig.2 Certificate based access control system architecture Policy management system is described below.
4
Communication protocol
Based on data flow diagram presented in [1] we have introduced certificates, attribute certificates and use conditions to create certificate based access control system. It allows to sing access request send by the subject, as well as policies and policies sets defined by stakeholder. Users certificates are used do verify his/her authenticity and attributes. We have introduced Validation service, which is responsible for validation of all request (e.g. request authorization, policy authorization). It is important to ensure security for the system, therefore at the beginning of each session communication between services will be established. During this a key pair will be generated for each of the services. The mechanism is presented below.
212
7c. environment attributes, certificates, CRL's and other data needed for validation process
Fig. 3. Modified Policy management system for Certificate based access control Presented in Figure 3 model operates as follows: 1.
Policy Administration Point (PAP) defines policies and policy sets. These are signed with his private key, therefore if intruder tries to define his policies which would allow him to access system resources they would be rejected because only authorized useres can define policies or policy sets.
2.
Policy Definition Point (PDA) sends policy or policy set authorization requests along with its signature to validation service, which then verifies if policies were issued by authorized subject. Policy authorization request syntax is presented in Figure 4. <xs:element name="AuthRequest" type="xacmlcontext:RequestType"/> <xs:complexType name="RequestType"> <xs:sequence> <xs:element ref="xacmlcontext:PolicySet"/> <xs:element ref="xacmlcontext:Policy"/> <xs:element ref="xaxmlcontext:Signature"/>
Fig.4 Policy authorization request syntax 3.
At the begining of each session PDP and Validation service generate a key pair for communication. For policy or policy set received in authorization request, validation service creates unique MAC, which is then send back to PDP. MAC is generated from the validation result with
Information Processing and Security Systems
Validation serwice key. PDP generates own MAC value from its own key and then compares it with MAC received form Validation service. If they are the same, the policy or policy sets are valid and are added to policy definitions stored by PDP. Policy authorization response syntax is presented in Figure 5. <xs:element name="AuthResponse" type="xacmlcontext:ResponseType"/> <xs:complexType name="ResponseType"> <xs:sequence> <xs:element ref="xaxmlcontext:MAC"/>
Fig.5 Policy or policy set authorization response syntax 4.
Access requester sends signed access request to policy enforcement point to grant an access to a specific resource. Access request contains a unique number to prevent resending authorization request by intruder. PEP stores unique numbers from each received policy, therefore, if intruder resends captured policy it will be rejected by PEP. Signed access request context syntax is presented in figure 6. <xs:element name="Request" type="xacmlcontext:RequestType"/> <xs : complexType narne= "RequestType"> <xs:sequence> <xs:element ref="xacmlcontext:Subject" maxOccur="unbounded"/> <xs:element ref="xacmlcontext:Resource"/> <xs:element ref="xacmlcontext:Action"/> <xs:element ref="xacmlcontext:Environment" minOccurs="0"/> <xs:element ref="xacmlcontext:Signature"/> <xs:element ref="xacmlcontext:UniqueNumber"/>
Fig.6 Singed access request context syntax 5.
Policy enforcement point sends request to context holder.
6.
Context holder sends query to policy information point to gather required information about surrounding, that is: the resource, environment, validation data.
7.
Policy information point receives: subject's attributes and certificate from certificate service send by environment, environment attributes, resource attributes and use conditions. If subject is unknown to the system, local certificate service receives it's certificate from external Certificate Authority. Local certificate service also gathers information required for validation, e.g. CRL's.
8.
Policy information point sends gathered information to context handler
9.
Resource send it's information to context handler
213
214
10. Context handler sends target, subject's attribute and certificate, environment attributes, and resource use conditions, attributes and other required validation information to policy definition point 11. Policy definition point verifies received information by sending request to certificate service. 12. Validation service sends response MAC back to policy definition point. 13. If policy definition point generated MAC is the same as received from validation service it sends it's positive decision to context handler if not, negative decision is sent. 14. Context handler sends its response to policy enforcement point 15. Policy enforcement point sends its obligations to obligation service, which then grands access to subject if decision was positive, or denies to grant access for negative decision made by PDP. Presented signed policy context syntax includes types specific for certificate based access systems, which could follow XML  signature standard.
5
Sample policy
Presented in Figure 6 sample policy example for resource 'printer'. Its target says, that the policy applies only for the server called 'printer'. The policy has a rule, which specifies, that a subject has to have valid attributes to be able to print on this target (printer). There are many credentials that could be included in a single policy. For example it could be certain time of the day, when user can use the target. It could be use conditions (e.g. user having certain attributes can have video conference if available transfer rate is grater than 1 Mb/s). Presented in Figure 6 samples contains specific information: Lines 002  009: specifies policy target Lines 003  005: specifies subject for target, any subject in this case Lines 003  008: specifies resource of target, a Printer Lines 010  024: specifies a rule in policy Lines Oil  023: specifies a target for rule
Information Processing and Security Systems
215
Lines 012 — 018: specifies rule's subject, in this case any subject who will give valid Attributes Lines 019  019: specifies resources for role  all resources in target Lines 020  022: specifies what actions shuld be taken for the role  authentication 001: 002: 003: <Subject> 004: 005: 006: 007: Printer 008: 009: 010: Oil: 012: <Subject> 013: <SubjectMatch MatchId="GetAttributes"> 014: <Apply> 015: <Apply Functionld="subjectgetattributes"/> 016: 017: 018: 019: 020: 021: authenticate 022: 023: 024: 025:
Fig. 6. Sample Authorization policy.
6
Conclusion and further work
In the paper we have shown, that combining of XACML and XML signature standard can lead to specification of high level, powerful certificate based access control language. The protocol for the communication can be very effective and errors prove. By introducing certificates to role based access control high flexibility can be achieved. Not only users known to the system can be authorized and authenticated, but also users from untrusted sources, who are not known to the system. Public Key Infrastructure and Privileged Management Infrastructure can achieve this. The detailed implementation of presented protocol is the next step. All certificate services will be implemented using OpenCA. Open LDAP will be used to store network information. XACML allows us to specify policy and policy sets and XML Schema for signature all required information for validation services.
216
7 [I]
References
Simon Godik, Tim Moses. 2003. extensible Access Control Markup Language (XACML) Version 1.0. Oasis Open 2003. [2] A Brief Introduction to XACML. Sun Microsystems 2003. [3] D.W. Chadwick, A. Otenko RBAC Policies in XML for X.509 Based Privileges Management, University of Salford [4] M. Bartel, J. Boyer, B. Fox, B. LaMacchia, E. Simon XML Signature Syntax and Processing, W3C Recommendation 2002. [5] R. Chandramouli Specification and Validation of Enterprise Access control Data for Conformance to Model and Policy Constrains, NIST Computer Security Division [6] X. Zhang, J. Park, R. Sandhu Schema Based XML Security: RBAC Approach, 17th IFIP 11.3 Working Conference on Data and Application Security, Estes Park, Colorado, USA August 46 [7] R. Chandramouli Application of XML Tools for EnterpriseWide RBAC Implementation Tasks, NIST Computer Security Division [8] J. Pejas, P. Sukiennik Access Control Description Language in Wide Distributed Systems, VI Krajowa Konferencja NaukowoTechniczna. Diagnostyka Procesow Przemysfowych. [9] P. Sukiennik Framework for CertificateBased Access Control Policies Description Language Ponder, Advanced Computer Systems 2003. [10] M. Kurkowski, J. Pejas A Propositional Logic for Access Control in Distributed Systems, in Artificial Intelligence and Security in Computing Systems, Kluwer Academic Publishers, Boston/Dordrecht/London 2003 [II] J. Pejas CertificateBased Access Control Policies Description Language, in Artificial Intelligence and Security in Computing Systems, Kluwer Academic Publishers, Boston/Dordrecht/London 2003
Impact of the address changing on the detection of pattern sensitive faults Bartosz Sokol1, PhD eng Ireneusz Mrozek2, Prof. Dr hab Wiaczeslaw Yarmolik3 1
Bialystok Uuniversity of Technology, Wiejska 45A street, 15351 Bialystok, Poland, imrozek@ii.pb.bialystok.pl, bsokol@ii.pb.bialystok.pl 2)3 The University of Finance and Management in Bialystok branch in Elk, Grunwaldzka 1, Elk,Poland, yarmolik@gw.bsuir.unibel.by
Abstract: This paper introduces a new concept for memory testing based on transparent memory tests in terms of pattern sensitive faults detection with different address order generation technique. It is commonly known, that only march tests can be in use now to test modern memory chips. Every march test algorithm can be applied in different ways and still be effective to detect target faults. Using properties of Degrees of Freedom in march testing [6] such as address changing, we can detect Pattern Sensitive Faults (PSF). Combination of march tests with proposed technique allows us to detect all memory faults, including PSF, with a high probability. Used tests are more effective and in many cases, experimental studies even show a higher efficiency of use of simple march tests in connection with proposed technique. Keywords:memory
1
testing, pattern sensitive faults, march tests
Introduction
Testing semiconductor memories is becoming a major cost factor in the production of modern memory chips; hence the selection of the most appropriate test, technique and particular set of fault models become increasing importance. In addition, tests with high defect coverage for realistic faults and with an acceptable test length are required. Modern computer systems typically contain a variety of embedded memory arrays like caches, branch prediction tables or priority queues for instruction execution [4]. Fault free memory operations are crucial for the correct behaviour of the complete system, and thus, efficient techniques for production testing as well as for periodic maintenance testing are mandatory to guarantee the required quality
218
standards. However, advances in memory technology and in system design turn memory testing into more and more challenging problem.
1.1
Classification of memory faults
We can divide memory faults on the bases of a number of cells being faulty into onecell faults (e.g. stuckat faults, transition faults) and multiple cells faults (e.g. coupling faults), which are more difficult to detect. The general case of fault belonging to the second group is Pattern Sensitive Fault (PSF). A cell (base cell) is said to have a PSF if its value gets altered as a result of certain pattern of Os and Is, 0>l transition, or l>0 transition in a group of other cells called the cell's neighborhood. Three types of PSFs can be distinguished: Active PSF: the base cell changes its contents due to a change in the neighborhood pattern. This change consists of a transition in one neighborhood cell, while remaining neighborhood cells and the base cell contain a certain pattern Passive PSF: the content of the base cell cannot be changed due to a certain neighborhood pattern. Static PSF: the content of a base cell is forced to a certain state due to a certain neighborhood pattern Pattern Sensitive Faults arise primarily from the high component densities and the related effect of unwanted interacting signals. As a RAM density increases, the cells become physically closer, and PSFs become the predominant faults. Moreover, other faults classes such as stuckat faults, transition faults and coupling faults can be regarded as special types of PSFs. Because of address line scrambling, done to minimize the silicon area and the critical path length, the effective detection of pattern sensitive faults depends on having scrambling information [2]. However this information may not always be available. It may not be published by producers of memory or it can undergo changes (ex. for reconfigured RAM, after reconfiguration, the logical neighborhood of the memory cells may no longer be same as physical neighborhood). That is why it is not always possible to use tests, which take advantage of scrambling information. In this paper we propose to use properties of Degrees of Freedom in march testing and march tests to detect PSF [6]. The suggested technique is based on transparent march tests with different address order based on LFSR (Linear Feedback Shift Register) generation technique. We use the most commonly applied tests in the process of memory testing. Their main advantages are as follows: •
high fault coverage;
•
linear complexity  O(N), where N  memory size
Information Processing and Security Systems
219
Moreover these tests can be easy transformed into transparent tests according to [3]. It enables us to use them in a cyclic transparent memory test with different address order.
2 Detection of pattern sensitive faults with address changing According to [7] it can be stated that all march tests that cover Coupling Faults cover SAFs and TFs too. It should be noticed that CFs can be regarded as special types of PSFs Therefore it can be stated that the relation (1) is true. P(SAF)>P(TF)>P(CF)>PF(PSF3)>P(PSF5)>P(PSF9) where P(F)  probability of detection of fault F, and (P(APSFn), P(SPSFn),P(PPSFn)).
(I) PSFn = minimum
According to (1) if we say that a probability of detection of PSF3 by certain test equals P, this means that the same test detects SAFs, TFs and CFs with probability at least equal P too [8].
2.1
Generation of "orbit"
When we change a content of memory cells, the different multiple faults can be activated, and part of faults will not be activated. In such case, some multiple faults will be activated during the generation of k+1 states in k cells. Studying the efficiency of memory systems tests, we have to take under consideration complexity of generating all 2k combination for k memory cells, which is an essential, and in many cases sufficient condition, which allow us to detect different multiple faults given by a parameter k. To present above mentioned statement, MATS test [1] {tl(ra,wa*); $(ra*)} is going to be considered. Let's take four memory cells (k=4) with addresses X, ju, v, n and increasing address order. Let's mark it as: ah aM, a^ an,ax e{0,l}, i e fX,ju,v,7r}, a* is a complementary value of dj. As a result of onetime use of MATS test, we obtain five different states in four cells: a?.
ax* ax* ax*
H
aM a/ a *
av av av a*
a^ a* a* a_
Fig. 1. Five different states after use of MATS test.
220
In case of address order changing, following example present this situation. Let's take four memory cells with addresses X, ju, v, n (address order: fi—>A,>7C—>v) and initial states a^ aM, av, an, a{ e{0,l}, i e {X,ju,v,7r}, a* is a complementary value of a,. As a result of one  time use of MATS test, we get five different states in four cells: ax ax ax* ax* ax*
aji
a/ a/ a,* a,*
av av av av av*
a* an
a^ a.* an*
F/g. 2. Five different states after use of MATS test with different address order.
When we consider the case of memory test phase with one read operation, during one phase, kmemory cells run through k different patterns. Let's define this set of patterns as an "orbit". We can get different orbits as an effect of: •
initial state changing  using 3 r Degree of Freedom (DOF): if the march test is built symmetrically, the data written to the cells can be changed completely [6];
•
address order changing  using 1st and 2nd DOF: the address sequence can be freely chosen as long as all addresses occur exactly once and the sequence is reversible (1 st DOF); the address sequence for initialization can be freely chosen as long as all addresses occur at least once [6];
•
using more complex march tests.
The following statements are true for orbits (O). Statement 1: There are 2k distinct orbits for 2k initial states in k arbitrary memory cells. Statement 2: In one orbit O different patterns exist. Statement 3: There are no two identical orbits Ot and O} with the same patterns for different initial states i i j . Statement 4: The minimum number minj(O) of orbits received as a result of background changes, essential to obtain all possible 2k patterns should satisfy to the inequality: min}(0) >
, k = 1,2,3, ...
(2)
Statement 5: There are k! different orbits for k arbitrary memory cells with constant initial state and random address order.
Information Processing and Security Systems
221
For larger values of k, quantity of orbits generated during address order changing is significantly bigger than a quantity of orbits obtained during background changing. Except this, we should add, that in case of address order changes, in every orbit the first and the last patterns are the same. For example for k=3 we have 6 different orbits:
oo
0 1 1 1
123 0 0 0 0 1 0 1 1
0 1 1 1
01 132 0 0 0 1
0 0 1 1
0 0 1 1
02 213 0 1 1 1
0 0 0 1
0 0 0 1
03 231 0 1 1 1
0 0 1 1
0 0 1 1
04 312 0 0 0 1
0 1 1 1
0 0 0 1
05 321 0 0 1 1
0 1 1 1
Fig. 3. Six different orbits for k=3 cells. Statement 6: There are different patterns in specific orbit O. Statement 7: Do not exist two orbits O, and Oj with the same patterns. Statement 8: The minimum number min2(O) of orbits received in result of address order changes, essential to obtain all possible 2k patterns should satisfy to the inequality: min2(O) >
(2*2) ,k = 2,3,4, ... k\
(3)
Proofs of above mentioned statements are presented in [9].
2.2
Memory cell address generation technique
During the studies of efficiency of march test with reference to PSF, we have to consider two different ways of address changing: •
Random address order
•
Address order based on LFSR.
First approach consists on use of march test 2" times with different address order, put in random way. In the second approach, testing session consists of onetime use of march test, while each test phase was performed 2" times longer according to addresses generated by LFSR. Address order generation for 4 cells memory with LFSR is presented on the figure 4.
222
MemoiY cells
LFSR: 1+x+x3
cT
00 01 10 11 f
Me nioi y cells / id dress order "\
25 n O*J  1 j
3
J
I
13 Module 2 gModule 3
A U

sub movzx or ret and
1" °' 05 " 11 1 ^ 1 ! .t I M s .§ Instruction mnemonic
O
•8
,^
^
Fig. 1. Frequencies of assembler instructions
Distribution shown in Fig. 1 is strongly irregular. The most frequent operations for Intel x86 series assembler code are: data transfer commands (mov), unconditional control flow (jmp), procedure call (call), pushing values to stack (push). Experiments show, that this consistent pattern keeps constant for other analyzed
Information Processing and Security Systems
229
programs with small deviations (Fig.l). This fact allows to make an assumption that frequencies of assembler instructions in general depend on program execution environment architecture. We also analyzed dependence between the number of different commands and the total number of commands in program module. All modules were split on three groups: modules, which contain small number of commands; modules with intermediate number of commands; modules, containing large amount of commands. Each group was split on subgroups by such property as the number of different assembler instructions (size of alphabet). We determined that the most frequent combinations among the programs chosen for analysis were: small modules with small alphabet, modules with intermediate size and alphabet, big modules with large amount of different instructions. In all further experiments we used members from all three mentioned above sets. We composed an alphabet containing 135 different instructions and assigned an index to each of them. Further, we calculated an average value of this index (mx). Such calculations were performed for about 40 modules (.exe and .dll). Results are shown in Fig. 2. 70 i 66 64 62 60 56 100000
200000
300000
400000
500000
Module length Fig. 2. Dependence between mx and total number of commands in module Obtained experimental data allows us to make conclusion that parameter mx has no strong dependence on program functionality and varies in certain interval. Bounds of this interval are getting closer when the total number of instructions in program is growing. This consistent pattern related to parameter mx is also observed when instead of analysis of the entire program (TV instructions) we analyze the subset containing randomly chosen L instructions. Fig. 3 shows corresponding dependencies for L=N, L=N/\0 and L=M100 obtained for 18 program modules.
230
100 95 90 85 80
Every 100th Every 10th
8 7570
i
65 60 55 50
0
5000
10000
15000 20000 25000 Module length
30000
35000
40000
Fig. 3. Experimental dependency between mx and program length obtained for entire program, 10% and 1% of its instructions
Experimental results let us to make conclusion that value of parameter mx calculated for subset of program instructions also varies in certain interval. Experiments showed that for L=N/\00 the value of mx varies in interval 5 5  9 5 . Experiments showed that analysis of even 1 per cent of program instructions shows existence of consistent pattern. An important issue is determining of minimum UN ratio when subset analysis remains reasonable. This is necessary for advisability estimation of different based on statistical characteristics watermark embedding methods. There is also an important question related to minimal size of watermarked program and minimal size of instructions alphabet used in it. One more statistical characteristic of executable code is its autocorrelation function. Lets consider two programs: P and P\ Program P' is obtained from P by cyclical shift of its commands k times. Now if we compare P and P' and count coincidences of commands having the same positions  we will obtain certain value S(k). Thus, if we will consider mentioned above instruction indices as an elements of random sequence, then S(k) will describe its autocorrelation function. Fig. 4 shows an experimental dependency between shift value k and relative coincidence value Sn(k)=S(k)/N, where N is program length.
Information Processing and Security Systems
231
0,15 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
Fig. 4. Experimental dependency Sn(k) computed for 7 arbitrary taken programs
Experiments showed that starting with certain shift distance the percentage of coincidences remains approximately constant. For considered Win32 application we obtained the value of Sn(k) about 12 per cent for k grater than 2000. We assumed, that such statistical characteristics as percentage of instruction coincidences calculated for certain shift value k also can be considered as parameter that keeps approximately constant value independently on program functionality. In order to check this assumption we analyzed over 40 Win32 program modules and calculated for each module the value Sn(k) for &=2000. Minimal size of studied module was about 700 commands and maximum  over 490.000. Obtained results shown in Fig. 5.
Fig. 5. Dependence between 5nf/c)and module size for shift value k equal to 2000
232
Experimental data allows assumption that starting with program module size grater than, for example, 100.000 commands Sn(k) varies in a fixed interval (0,1 0,2). Thus, if we analyze enough large amount of program modules, we can experimentally determine an interval of values in which parameter Sn(k), calculated for arbitrary program, will be enclosed with high probability. Hence, Sn(k) can be considered as one more statistical characteristic that remains approximately constant independently of analyzed program functionality.
3
Proposed watermarking algorithms
Given above results of experiments show that some statistical characteristics of program executable code have weak dependence on program functionality. This fact will be used for program watermarking. As the basis of all described below watermarking methods we propose to use the modifications of program statistical characteristics. Very similar approach is used in classical algorithm for raster images watermarking called "Patchwork" [4]. This approach is based on fact, that for any quite large number of randomly selected image pixels an average difference between the two selected in series pixels brightness (Lah) is approximately equal to zero. The watermark is embedded by increment or decrement of some pixels brightness. The pixels whose brightness will be changed are selected using pseudorandom generator. The goal of these modifications is reaching of significant difference between new value of an average Lab and the old one for considered set of pixels called "patch". Following steps checks the existence of certain watermark: I. Using the secret key as initial seed for pseudorandom generator the coordinates of pixels, which belong to patch, are generated in the same way like it was done for watermark embedding; II. For selected pixels an average Lab is computed; III. If obtained Lab value is equal to zero, than the watermark is not exists, otherwise  the image contains watermark. Use of other keys for watermark extraction will result selection of completely different pixel sets, which were not modified and has standard value of an average Lab
In order to adopt this approach for software watermarking we need to choose assembler code statistical characteristic, which can be used in the same way as Lah in "Patchwork". Value of this characteristic must be constant or vary in known interval (like Lab for raster images) for any arbitrary taken program [5]. Modifying this statistical characteristic we embed in a program a special sign, (watermark), which existence will in future help to prove the program ownership and determine its origin.
Information Processing and Security Systems
3.1
233
Use of command frequencies distribution
Experimental data reflected in Fig. 2 and Fig. 3 point to the fact, that an average instruction index (mx), which is calculated for any quite big program, has value, which varies in certain interval. Thus, for any arbitrary selected program we can state with certain probability, that calculated for this program mx value will be enclosed in, for example, interval equal to 6080. Fig. 3 shows that the same consistent pattern is observed when instead of the entire program analysis we consider only certain pseudorandomly selected subset of L commands (L«N). The key idea of proposed approach consists in modification of only such commands, which belong to L. The positions of these commands are generated by pseudorandom law using secret initial seed. Such modifications will affect instruction frequencies and therefore change m\ (an average index of instructions, which belong to L), but mx variations, which will be caused by them, will be undistinguishable. Watermark is embedded when absolute difference  mx  m'x  is grater than some value £. Hereby, because the positions of commands constituting L are computed using secret key, the watermark can be detected only by the person why knows this key.
3.2
Use of autocorrelation function
Our experiments showed, that for quite long shift distances (but less than program length divided by two) an average program has coincidences percentage about 1215 % [6]. We propose to embed the watermark by means of program semanticspreserving transformations (instruction substitutions or instruction reordering). The goal of these modifications is to transform the program is such a way that its further pseudorandom modifications (using certain initial seed) will result extraordinary S(k) values for certain shift distances. Mentioned above initial seed is kept in secret and therefore the watermark cannot be extracted without knowledge about its value.
4
Conclusions
In this paper the capabilities of statistical characteristics use for software authorship protection are shown. In particular, an implementation of adopted variant of "Patchwork" algorithm is considered. An assembler code autocorrelation function is shown as one of program statistical characteristics, which can be used for watermark embedding.
234
5
References
[1]
Monden A., Iida H., Matsumoto K. A. 2000. 'Practical Method for Watermarking Java Programs', The 24th Computer Software and Applications Conference (compsac2000), Taipei, Taiwan, pp. 191197.
[2]
Collberg C , Thomborson C. 1999. 'Software Watermarking: Models and Dynamic Embeddings', The 26th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, Jan. 1999, pp. 311324.
[3]
Stern J.P., Hachez G., Koeune F., and Quisquater JeanJacques. 2000. 'Robust Object Watermarking: Application to Code', In A. Pfitzmann, editor, Information Hiding '99, volume 1768 of Lectures Notes in Computer Science (LNCS), Dresden, Germany, SpringerVerlag, pp. 368378.
[4]
Bender W., Morimoto G. N., Lu A. 1996. 'Techniques for data hiding', IBM Syst. /. vol. 35, pp 313336.
[5]
Yarmolik V.N., Portyanko S.S. 2003. 'State of Art in Software Ownership Protection', Computer Information Systems and Industrial Management OlliPekka Application, Editors: Khalid Saeed, Romuald Mosdorf Hilmola. Bialystok, Poland, pp. 188195.
[6]
Partsianka S.S., Yarmolik V.N. 2003. 'Software modules unauthorised use defense by means of watermarking', Izvestiya Belorusskoi Inzhenernoi AkademiU No l(15)/3, pp. 163165.
Probabilistic Analysis of Operational Security for Network Systems Jolanta Koszelew1'2 lr
The University of Finance and Management in Bialystok, Faculty of Engineering
2
Bialystok Technical University, Faculty of Computer Science, email: koszelew@pb.bialystok.pl
Abstract: Survivability is the ability of system to continue operating in the presence of failures or malicious attacks [4]. We present an original method for performing probabilistic analysis of survivability of network systems. We can simulate failures and intrusion events in our method and then observe the effects of the injected events. Our model is based on Markov Decision Processes which are generalization of Markov Chains and provides the analysis of probabilistic measures for network systems, such us: probability that a service that has been issued will be finished or the expected time it takes a service to finish. We illustrate the idea of our techniques by a simply example.
Keywords:
1
Markov Decision Processes, Computation Tree Logic, Bayesian network.
Introduction
Survivability is the ability of system to continue operating in the presence of failures or malicious attacks. Network systems, which are recently used, are exposed on a lot of accidental failures (e. g. a disk crash) and malicious attacks (e.g. denial of service attack). Disruption of services caused by such undesired events can have catastrophic effects, in particular if network system serves medical application. A network system consists of nodes and links connecting the nodes. Communication between the nodes occurs by passing messages over the links. An event in the system can be either a user event, a communication event, or a fault. A service is associated with a given start event and an end event. Our method can be used by the system architect to simulate the effect of a fault or to count probability that a service that has been issued will finish or expected time
236
it takes a service to finish. With this information, the architect can weigh some decisions related to survivability of the designing system. Survivability analysis is fundamentally different from analysis of properties found in other areas (e.g., algorithm analysis of faulttolerant distributed systems, reliability analysis of hardware systems and security analysis of computer systems). First, survivability analysis must handle a broader range of faults than any of these other areas; we must minimally handle both accidental and malicious attacks. Our method allows an architect to incorporate any arbitrary type of fault in the system model. Second, events may depend on each other, especially fault events. In contrast, for ease of analysis, most work in the faulttolerant literature makes independence assumption of events. Third, survivability analysis should also be service dependent. The architect for a network system might choose to focus on the one select service as being critical, although the system provides other services. Finally, survivability analysis deals with multiple dimensions. It simultaneously deals with functional correctness, faulttolerance, security and reliability.
2
Formal Model
Model checking for our method is based on Markov Decision Processes (MDP) [1]. MDPs are a generalization of classic Markov chains, where the transition probabilities depend on the events in the history. A MDP is a 4tuple where: S is a finite state space; A is a finite set of actions. P are transition probabilities, where Psas, is the probability of moving from state s to s' if action is chosen; c:(SxA)^^i is the immediate cost, i.e., c(s,a) denotes the cost of choosing action a at the state s. For a state se S, A ( s ) c A is the set of actions available at state s. History at time t (denote by ht) is the sequence of states encountered and actions taken up to time t. A policy u takes into account the history ht and determines the next action at time t. Specifically, ut\a\ht) is the probability of taking action a given history ht. A policy u defines a value function V'4 : S —> 9^, where
Vu(s)
is the expected cost of the actions taken if the MDP uses policy u and starts in state s (the cost c is used to define expected cost). The technical definition of Vu can be found in [1], Our aim is to find a policy that minimizes the value Vu.
Information Processing and Security Systems
3
237
Description of method
In this section we provide a detail overview of our model, illustrating it with a very simple example. We consider an abstract model of exchange of secret information system, depicted in Fig. 1. There are three levels of organizations: the Agents on the bottom, the Contact Boxes and the Central Agencies at the top. If two Agents are connected to the same Central Agency, then the exchange of secret message between them are handled by the Contact Box; there is no need to go through the Central Agencies. To illustrate the architecture, suppose the AgentA sends the secret message to the AgentC. Because of Agents A and C have a different Contact Box, the message must be verify by the proper Central Agency. AgentA and AgentC are not connected through the same Contact Box, so the message is then sent to the Contact box connected to AgentA. In this case, let's choose Contact Box I. 1. The message is sent to the Central Agency closest to Contact Box I, in this case Central Agency II. 2. The message is then sent to the Central Agency that has jurisdiction over AgentC, in this case Central Agency II. 3. The message finally makes it way to AgentC through the Contact Box III. We now present details of each step in our method illustrating them with the above example.
Agent B
Contact Box II
Contact Box I
Contact Box III 1 1
1 1
I
Central Agenc>' I
Central Agency II
Legend: links that can be faulty, links that can not be faulty two possible tracks for message from Agent A to Agent C, Xi
links between AgentX and Contact Box i (Al, A2, Bl, B3, C3). Fig. 1 Secret Information System
238
3.1
Model of Network
First, the architect models a network system, which can be done using one of many formalisms. We choose to use state machines and we use them to model both network nodes and links. We use shared variables to represent communication between the state machines. In our secret information system example, we use state machines to model the Agents, the Contact Boxes, the Central Agencies and links. We make some simplifying assumptions in the model of our system: 1. 2.
3.2
There is only one secret message active at any time. The name of source and destination Agents nondeterministically.
are
decided
Assumptions for Faults
Both links and nodes may be faulty. With our state machine of the network system, we need not make a distinction between nodes and links when considering faults. To represent faults in our method, for each state machine representing a node, we introduce a special variable called fault, which can range over a userspecified set of symbolic values. For example, the following declaration states that there are three modes of operation for a node, representing whether it is in the normal mode of operation, failed, or compromised by an intruder: fault ={ normal, failed, intruder}. In our secret information system example, we inject the following faults: 1. Links between the Agents and the Contact Boxes are the only network elements that can be faulty. The Central Agencies and the Contact Boxes do not fail. 2. When a link is faulty, it blocks all messages and no message ever reaches the recipient. 3. Links may become faulty at any time. Therefore, we allow a nondeterministic transition to the state where fault is equal to failed. The intruded value is not used in this case. 4. Agents can sense a faulty link and route the message accordingly. 5. A service is associated with a start event (e.g. Agent sends a message to any other Agent) and an end event (e.g. Agent receives a message from any other Agent). On the base of the above assumption we can conclude that the Central Agencies are impenetrable and links between them are highly reliable and secure. Under the normal mode of operation, the Agent receives (nondeterministically) a message with its source address. Depending on the destination address of the message, the Agent either clears it locally or routes it to the appropriate Contact Box. For example, if a message from AgentA to AgentB is sent, then it is first
Information Processing and Security Systems
239
sent to the Contact Box I and then sent to the AgentB. On the other hand, the message from AgentA to AgentC has to clear through the Central Agency (as in Fig. 1). If an Agent is faulty, then message are routed arbitrarily by the intruder. Agent can then at any time nondeterministically transition from the normal mode to the intruder mode. Once the Agent is faulty it stays in that state forever.
3.3
Definition of Survivability Properties
We focus on two classes of these properties: fault and service related. We use CTL (Computation Tree Logic) [9] to specify survivability properties. 1. Faulty Related Properties: It is not possible for a node N to reach a certain unsafe state if the network starts from one of the initial states. The precise semantics of an unsafe state depends on the application. For example, if a node represents a computer protecting a critical resource, it could represent the fact that somebody without the appropriate authority has logged onto the computer. Let the atomic proposition unsafe state represent the property that node N is in an unsafe state. We can then express the desired property CTL as follows: AG(iunsafe)
(1)
which says that for all states reachable from the set of initial states it is true that we never reach a state where unsafe is true. The negation of the property is EF(unsafe)
(2)
which is true if there exists a state reachable from the initial state where unsafe is true. 2.
Service Related Properties: Service issued always finishes. Let atomic proposition start express that a service was started, and finished express that the transaction is finished. AG(start > AF (finished))
(3)
The above formula expresses that for all states where a service starts and all paths starting from that state there exists a state where the service always finishes. For secret information system example, we would like to verify that a message if sent is always eventually received. The analysis of survivability properties can help identify the critical nodes in a network system. Suppose we have modeled the effect of a malicious attack on node N. Now we can check whether the desired properties are true in the modified
240
network system. If the property turns out to be true, the network is resistant to the malicious attack on the node N.
3.4
Generate Scenario Graph
We automatically construct scenario graphs via model checking. A scenario graph is a compact representation of all traces that are counterexamples of a given property. These graphs depict ways in which a network can enter an unsafe state or ways in which a service can fail to finish. We can generate three kinds of scenario graphs: Fault Scenario Graph, Service Success Scenario Graph and Service Fail Scenario Graph. First we describe the construction of Faulty Scenario Graph. We assume that we are trying to verify using model checker whether the specification of the network satisfies AG(—ainsafe). The first step in model checking is to determine the set of states Sr that are reachable from the initial state. Next, the algorithm determines the set of reachable states Sunsafe that have a path to an unsafe state. Then, we construct the set Ru which consists of transitions between unsafe states. Faulty Scenario Graph is G = \Sunsafe,Ruj,
where Sumafe and Ru represents the nodes and
edges of the graph respectively. messagelsSent (A, C) (start event) P(down(A2) & up(Al))=2/9
P(up(A2))=2/3,
[1] transferOfMessage(A, CB1) [1] transferOfMessage(A, CB2)
i
[1] transferOfMessage(CBl, CA1)
I
[1] transferOfMessage(CB2, CA2)
[3] transferOrMessage(CAl, CA2)
[2] transferOfMessage(CA2, CB3) P(up(C3))=3/4
I I
[1] transferOfMessage(CB3, C)
[1] messageIsReceived(C, A) (end event) Fig. 2. A Simple Service Success Scenario Graph
Information Processing and Security Systems
241
For services we are interested in verifying that service started always eventually finishes. The Service Success Scenario Graph captures all the traces in which the service finishes. The Service Fail Scenario Graph captures all the traces in which the service fails to finish. These scenario graphs are constructed using a procedure similar to the one described for the Faulty Scenario Graph. The scenario graph shown in Fig. 2 shows the effect of link failures on the transfer of message service with source address AgentA and destination address AgentC. (Legend for Fig. 2: A, B, C  Agents; CB1, CB2, CB3  Contact Boxes; CA1, CA2  Central Agencies). The start event is labeled as messageIsSent(A, C) in the figure. The event corresponding to sending a message from location LI to L2 is denoted as transferOfMessage(Ll, L2). We also denote link between AgentX and Contact Box n as Xn (Al, A2, Bl, B3, C3) (Fig. 1). The predicates up(Xn) and down(Xrc) indicates whether link Xn is up or down. Recall that we allow links to fail nondeterministically. Therefore, an event transferOfMessage(A, CB2) is performed only if A2 is up, i.e., up(A2) is the precondition for event transferOfMessage(A, CB2). If a precondition is not shown, it is assumed to be true. Note that a fault in a link can also be constructed as an intruder taking over the link and shutting it down. From the graph it is easy to see that a message is received if link A2 and link C3 are up, or if link A2 is down and link Al and link C3 are up.
3.5
Probabilistic Properties Analysis
We have now a scenario graph and we can perform reliability and latency analysis. First, the architect specifies the probabilities of certain events of interest, such as faults, in the system. We do not assume independence of events and therefore we use formalism based on Bayesian networks [8]. Our scenario graph with probabilities produces an MDP. Then using MDP we compute reliability and latency by calculating the value function corresponding to the optimal policy. First we calculate these properties for our example. We define probability event Xn which corresponds to link Xn being up and ^Xn event which corresponds to link Xn being down. Assume that event A2 is dependent on A1 and there are no other dependencies. Let P(Al)=2/3 and P(C3)=3/4 where P(A\) and P(C3) are the probabilities of link Al and link C3 being up. The probability of event A2 depends on the event Al, and we give its conditional probabilities as P(A2\ A\) = 2/3 and P(A2\ Al) = 1/3 reflecting that if link Al is down, it is more likely that link A2 will go down. If an event A depends on the set of two events Al and A2,
then we have to compute P(A\\AlrA2), P(Al\AlnA2).
P(Al\Aln^A2), P(Al\AlnA2) and
In our example, first we have to compute the probability of the two events A2 and —A2nAl. These events correspond to events up(A2) and down(A2) & up(Al).
242
P{A2 n Al) = P{A2\Al) • P(Al)  (l  P(A2Al)) • P(Al)  We add these probabilities to the relevant edges of the scenario graph in Fig. 2. Since we might assign probabilities to only some events (typically faults) and not others, we obtain a structure that has a combination of purely nondeterministic and probabilistic transitions. In our example, the architect might assign probabilities only to events corresponding to faults. The user of the our example system still nondeterministically sends message. We now explain the algorithm to compute reliability and latency by first considering a property about services. Let G be the Service Success Scenario Graph. Now, the goal of the environment is to devise an optimal policy or equivalently choose nondeterministic transitions in order to minimize reliability or maximize latency. We define a value function V assigns a value V(s) for each state s in the scenario graph. Next we describe an algorithm to compute the value function V corresponding to this optimal policy. Later we explain how the value function can be interpreted as worst case reliability and latency. 1. 2.
Initial step: V(s) = 1 for each step which has the property finished and V(s) = 0 for others. We compute V(s) separately for probabilistic and nondeterministic states separately: if s is finished state
V(s) = sesucc[s)
< —» s') + V(s'))
if 5 is probabilistic state
s'esncc{s)
In definition given above, succ(s) is the set of successors of state s and p(s, s') is the probability of a transition from state s to s1' and c(s —> s1) is the cost of edge s —> s1' in the scenario graph. Intuitively speaking, a nondeterministic move corresponds to the environment choosing an action to minimize the value. The value of a probabilistic state is the expected value of the value of its successors. After the above algorithm converges, we end up with the desired value function Vp. Let So be the initial state of the scenario graph.  If the cost c, associated with the edges is zero, then V^fao) is the worst case reliability corresponding to the given property.  If the cost c, associated with the edges corresponds to negative of the latency, then the value V%y0) corresponds to the worst case latency of service. In our example (Fig. 2) the worst case reliability using our algorithm is
Information Processing and Security Systems
243
f —] = —  That is, the worst case probability that a message which is 2 — 1 +—
3j {A 9) 3 sent by AgentA is received by AgentC is 2/3. Latency in days for all the events is shown in Fig, 2 inside square brackets, e.g., latency of the event transferOfMessage(CA2, CB3) is 2 days. The worst case latency using our algorithm computes to be 4 days.
4
Final Remarks and Future Work
Survivability is a fairly new discipline, and viewed by many distinct from the security and faultytolerance areas [4]. The most famous method for analyzing the survivability of network architectures is SNA [4] methodology, which is recommended as the "best practices" to an organization on how to make their systems more secure or more reliable. In contrast, our method is formal, because it is based on model checking. Research on operational security by Ortolo, Deswarte, and Kaaniche [6] is closest to Step: Generate Scenario Graph of our method. Moreover, in our method a scenario graph corresponds to a particular service; in contrast their graph corresponds to a global model of the entire system. Besides, events in our model may be depend on each other, especially fault events. There are many well known techniques which are used on verifying probabilistic systems and our algorithm for computing reliability draws on this work [3]. The novelty in our work is the systematic combination of different techniques into one method. There are several directions for future work. First, we plan to implement the prototype tool that supports our method. We try to use PrAl semantics [7] in this implementation. PrAl is an algorithmic language special for projecting probabilistic systems. Since for real systems, scenario graph can be very large, we plan to improve the display query capabilities of our tool so architect can more easily manipulate its output. Finally, to make the fault injection process systematic, we are investigating how the best to integrate operational security analysis tools (cf. [5]) into our method.
5
References
[1]
Altman E.,"Constraint Markov Decision Processes", Champan and Hall, 1998
[2]
Clarke E. M., Grumberg O., Peled D, "Model checking", MIT Press, 2000
244
[3]
Courcoubetis C , Yannakakis M., "The complexity of probabilistic verification", Journal of ACM, 42(4), pp. 857907, 1995
[4]
Ellison R., Fisher D., Linger R., Longstaff T., Mead N., "Survivability network system analysis: A case study", IEEE Software 16/4, pp. 307317, 1999
[5]
Farmer D., Spafford E., "The cops security checker system", Proceedings Summer Usenix Conference, 1995
[6]
Ortalo R., Deswarte Y., Kaaniche M., "Experimenting with quantitative evaluation tools for monitoring operational security", IEEE Transactions on Software Engineering, 25/5, pp. 633650, 1999
[7]
Koszelew J., "Some methods for verification of probabilistic programs interpreted in finite structures", Computer information systems and industrial management applications : CISIM'03. WSFiZ, pp. 140147, 2003
[8]
Pearl J., "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference", Morgan Kaufmann, 1998
[9]
Pnueli A., "A temporal logic of concurrent programs", Theoretical Comput. Sci., 13, pp. 4560, 1980
Quality of Service Requirements in Computer Networks with Blocking Walenty Oniszczuk Bialystok University of Technology Faculty of Computer Science 15351 Bialystok, ul. Wiejska45A, Poland Email: walenty@ii.pb.bialystok.pl
Abstract: This paper provides an analytical study of the closed type, multicenter networks with two different blocking strategies. The measures of effectiveness related to such models, and based on Quality of Service (QoS) requirement, are studied. In finite population (closed) multinode models, where the number of tasks is equal to the population in the network, there are service centers and source centers treated as an infinite server (IS  means, ampleserver model). In the systems described here, there are a finite number of tasks cycling from one service center to the other. If the buffer at a front of the service center is full, the accumulation of new tasks by this center is temporally suspended (blocking). Keywords: Quality of Service (QoS) requirements, Finite Source Queuing Models with Blocking, Blocking Strategies.
1
Introduction
The theory of queuing networks has been widely applied as a powerful tool for modeling of discrete flow systems, such as computer systems, computer or communication networks [2], [4], [7], [11], [14], [15], [17]. Quite often, in complex information systems composed of shared nodes, there are limitations on the capacity of buffers at the front of each operating node. When the buffer is full, the accumulation of new tasks is temporally suspended and a phenomenon called blocking occurs, until the queue empties and allows new inserts. Simply defined, blocking forces a departure from the queue or an arrival to the queue to stop temporarily due to lack of space in the queue. Queuing network models, with finite capacity queues and blocking, have been introduced and applied as more realistic models of the systems with finite capacity resources and with population constrains [1], [3], [5], [8], [12], [13], [16], [19]. Consider a computer network consisting of several service centers (or nodes). The number of tasks waiting in one or more service buffers is restricted due to finite holding space associated with these nodes. The system may be modeled as an
246
open or closed queuing network with finite capacities [10], [20]. When modeled as a closed network, the total number of tasks waiting for service and those currently receiving service is defined by the parameter N [5], [6], [9], [13], [18].
2 Closed twocenter networks with blocking Let us consider the twonode closed network with two different blocking strategies as shown in Fig. 1 and Fig. 2. The general assumptions for these models are [13]: • The source of tasks is finite, say of size N, • All tasks are generated independently by a unit source (the arrival process is Poissonian with parameter X=l/a, where a is mean interarrival time), • c number of service lines is available, • Mean service time is identical (exponential random variables) with S=1/JU (where, ju is mean service rate), • Service center buffer capacity is finite, for example equal m.
2.1
Model with source blocking
In this kind of model (Fig. 1), if there are (m+c) < N, we have a classical system with blocking. If buffer is full, the rest of Nmc source units cannot proceed to the service station. More precisely, any new generated tasks are forced to wait in front of the source center. A given twocenter model with source blocking has only Hk possible states (k= 0, ..., c+m+1). State Ho describes the idle system (empty service lines and service buffer) and state Hc+m+} describes the state, where c number of tasks are being processed, m tasks are waiting in the queue (the buffer is full) and the next task is being generated but is waiting for the buffer to become available (network is blocked). The number of tasks in the network at time Ms a Markovian birth and death process in which the rates are given by: Xo = N X, Xj = (Nl) X, X2 = (N2) • A, ..., Xc+m = (N(c+m)) f*i = P> V2 = 2M> ..., juc=cju, ..., juc+m+] = cju
X
Based on the queuing theory [2], [13], [14], [15] before the evaluation the main measurements of effectiveness, must be calculated for all probabilities of states pk (k = 0, . . ., c+m+1) in the statistical equilibrium. The steadystate probability pk can be interpreted as the probability of finding k tasks in the system with blocking at any arbitrary point of time after the process has reached its statistical equilibrium. The set of equation to get the steadystate solution for pk may be written as: 0 = (Xk+ Mk)Pk + Vk+1 Pk+i + hiPki 0 =  Xo po + Mi Pi
for for
k=l,2, k=0
3, ..., c+m+1 (1)
Information Processing and Security Systems
247
Source Center
blocking {N  m  c) sources units
Fig. 1. Twocenter model with sources blocking. These equations may be solved recursively: Pk = 
Pk =
where
'Po =
',,,Pk'Po (Nk)!k!
for k is the function node (root), V is the set of internal nodes, E is a set of edges, and 0, 1 are the terminal nodes. As an example, Figure 1 illustrates a representation of the function / (xi,x2,xs) defined by teh truth table given on the left, for a special case where the graph is actually a tree. Each inner note is labeled by a variable and has edges directed towards two children: THENedge (shown as T) corresponding to the case where the variable is assigned to 1, and ELSEedge (shown as E) corresponding to the case where the variable is assigned to 0. Each terminal note is labeled 1 or 0. For a given assignment to the variables, the value yielded by the function is determined by tracing a path from the root to a terminal node, following the edges indicated by the values assigned to the variables. The function value is then given by the terminal node label. Due to the way the edges are ordered in this figure, the values
260
of the terminal nodes, read from left to right, match those in the truth table , read top to bottom.
xl
x2
x3
f
0
0
0
1
0
0
1
0
0
1
0
0
0
1
1
0
1
0
0
0
1
0
1
1
1
1
0
1
1
1
1
1
1
0
0
0
0
1
1
1
Figure 1. Truth Table and Decision Tree Representations of a Boolean Function. A dashed (solid) tree branch denotes the case where the decision variable is 0 (1).
Figure 2. OBDD Representation of a Single Function for Two Different Variable Orderings.
Information Processing and Security Systems
261
It is known that the form and size of the BDD is very sensitive to the variable order. A random, or carelessly chosen variable order will frequently result in an exponential size of the BDD. For example, Figure 2 shows two BDD representation of the function denoted by the Boolean expression x l x 4 + x2x5 + x3x6, where • denotes the AND operation and the + denotes the OR operation. For the case on the left, the variables are ordered xl < x2 < x3 < x4 < x5 < x6, while for the case on the right they are ordered xl < x4 < x2 < x5 < x3 < x6. Many variable ordering heuristics have been proposed. Most of these ideas depend on a fundamental operation, adjacent variable swapping [6]. One of the most efficient algorithms, sifting, was proposed by Rudell [7].
3 A genetic binarization of data As BDDs operate in binary environment (viz. the framework of binary data), it is essential to convert continuous data (features) into their binary representatives. Obviously, such a process is not unique and may invoke a number of interesting and potentially promising alternatives. No matter how we proceed, it seems to be a clear objective: we would like to divide the range of each continuous variable into two intervals so that the derived intervals are as homogeneous as possible with respect to the class allocation to each of these intervals. Ideally, one should have elements (patterns) belonging to a single class fully allocated to a single interval. As we are concerned with the binarization of the features, one has to determine "n" (n Sn))> where ij is the attribute number, and Sj is the binarization point of the ijth feature. Fitness function. A fitness function to be minimize represents the number of conflict in the dataset.. The calculation the number of conflicts is based on the binary representation of data points which is performed by the mapping f:Rn+1>{O,l}nxR defined as f(x k u , x k i2 ,.... xkin,cok) = ( b \ , b k 2 ,.... bkn,cok)
262
where
x <s
_{°
b1
u u
[1 otherwise
We say that two data points are in conflict if they belong to the different decision class but their binary representation is equal. Such situations can be captured by the fitness function assuming the form fit: { 0 , l ) n x R  > N where
k=2
and
iff
? ™ otherwise
The results of the GA binarization (discretization) for the experimental data are shown in Table 2. This table shows the split points taken as a simple average. Which contrast the genetically optimized split points with a very basic strategy. Point No Attribute No 2 1 12 2 5 3 0 4 11 5 9 6 7 8 9 10
17 15 11 16
MAX
MIN
Medium split point GA split point
40
112
76
50.3055
109
268
188.5
151.2036
2
55
28.5
7.6693
73
119
96
103.7397
184
1018
601
468.0225
118
188
153
137.0857
181
211
196
188.0424
0
41
20.5
22.7929
184
1018
601
266.4064
176
206
191
177.5681
Table 2. The results of the GA binarization.
Information Processing and Security Systems
263
In all experiments, we used a standard GA with the following parameters: population 200, mutation 0.1, crossover 0.7. These values are a result of some experimentation and they pretty much coincide with the typical values uncountered in the literature. Subsequently the obtained values of the fitness function (best individual and average population) are shown in Table 3. Iter.
Min
Max
Average
00 50 100 150 200 250 300 350 400 450 500
221 199 194 186 186 179 179 173 156 156 156
380 374 390 374 375 372 380 393 373 404 367
315.14 265.79 262.56 258.38 256.94 255.01 252.07 251.72 250.88 249.21 248.76
Standard deviation 29.745 43.162 52.985 54.771 56.519 57.047 57.588 58.35 58.754 58.783 59.12
Table 3. The values of the fitness function (shown are the best individual and an average fitness for the entire population). In comparison to a simple equalsplit strategy we get far worse results as again illustrated in Table 4.
Experiment no.
GA split point
1
169
290
2
165
287
3
156
277
4
162
277
5
162
277
Average
162,8
281,6
Standard deviation
4,764452
6,387488
Medium split point
Table 4. Number of conflicts for the GA split point and uniform split point; note a substantial improvement provided by the genetic optimization.
264
4 The overall design methodology of BDD classifiers and experimental results The design of the BDD classifiers consists of two main phases •
Optimal binarization of continuous features
•
Minimization of BDDs using one of the existing minimization techniques (those are standard in digital system design so we will not cover them in this study)
Each class is described by a separate BDD, so for "c" classes we end up with "c" BDDs. The performance of BDD for the class 2 is summarized in Table 5. Each path of the BDD corresponds to the decision rule, which is described by the sequence of binary variables listed starting from the top of the tree. The use of the classifier concerns traversing the diagram for a given vector of binary features. We count the number of the most activated paths that are also the shortest. The performance of the classification rules on testing set is described by number of properly classified cases.
Rule description
Classification rate of the rule
8 variables 1609 1511 512 2
5
9 variables 1609 11 5 12 17112
27
1609 1511 5 17 112
0
160915115121711
0
10 variables 16091511 512 17 11 2
4
16091511 5 12 1711 2
1
16091511 5 121711 2
1
16091511 51217 11 2
1
160915115 12 17112
0
160915 11 5 12 17112
0
Table 5. The performance of BDD for the class 2.
Information Processing and Security Systems
265
To obtain reliable and stable results as well as to avoid any potential experimental bias, we divided the data set into the training (60%) and testing set (40%) and repeated the experiment 5 times (a rotation method). To compare the results of geneticbased BDD classification, we take into consideration the results of classification obtained when exploiting fuzzy contextbased clusters [8]. As summarized in Table 6 some general observation can be derived that hold over the series of experiments conducted so far. It becomes apparent that the geneticbased BDD classification performs better on the testing set (the results on training set are worst). Note, however, that the size of decision trees based on fuzzy contextbased cluster is much higher comparing the ones being generated with the proposed method.
Experiment no.
No of nodes Error on Training (%)
Error on Testing (%)
1
67
29,13
36,98
2
66
28,94
32,54
3
85
25,2
35,2
4
85
25,59
36,69
5
85
25,59
36,09
Average
77,6
26,89
35,5
Standard diversion
10,13903
1,965719
1,789288
Average
495,2
8,5
36,4
Standard deviation
62,94
0,9
1,81
contextbased clustering
Tab. 6. Classification results of the genetically optimized BDDs.
The relevance of particular attributes from the auto data set is shown in Table 7. This table is based on statistics for 5 decision trees. All of the trees were built on the base of 10 attributes selected by GA. The columns of this table correspond to the number of attributes while its rows denote the frequency of the selection of particular attribute by GA. It is easy to notice that there are two dominant attributes, i.e., 11 and 5 that appear in the decision trees 8 and 7 times respectively. Obviously, one should not make too general projections that should not go far beyond this particular data set.
266
Exper. no
Attribute no 2
3
5 6
7
2
1
1
2
0
1
1
1
2
2
1
3
1
4
1
1
1
5
1
1
L
5
3
1
4
1
1
1
9
10 11 12 13 14 15 16 17 1
1
2
1
1
1
2
1
1
2
1
1
1
1
1
1
2
1
1
1
7 0
5
2
3
2
1
2
8
0
4
0
8
1
1 1
1
1
3
0
1
4
Table 7. The distribution of the attributes in decision rules reflecting their relevance. The resulting rules sorted with respect to the number of properly classified cases (for the training set) are gathered in Table 8. Class 2 Rule no 2 1 4 3 7 9 10 5 6 8
Number of variables
Classification r a t e (training)
Classification r a t e (testing)
9 8 9 9 10 10 10 10 10 10
45 10 9 6 6 5 3 2 1 1
27 5 0 0 4 1 1 0 0 1
Table 8. The resulting rules the class 2. The compression rate calculated as the number of rules divided by the number of objects in training set is shown in Table 9. It is worth noting that when we consider only 50% of the best rules from each class, the compression rate is less then 3% while the classification error rises only slightly from 35.5% to 36.7 %. The resulting trees are very small and easy to analyze by the human expert.
Information Processing and Security Systems
Experiment no.
267
No of rules Compression rate (%)
1
20
3,94
2
25
4,92
3
34
6,69
4
33
6,50
5
33
6,50
Average
29
5,71
Standard deviation
6,20
1,22
Table 9. Values of the compression rate versus the number of rules (reported are 5 experiments).
5
Conclusions
We have dealt with the genetic BDDoriented design of pattern classification and showed how BDDs can be used in the granulation (binarization) of the attributes encountered in the problem. This granulation is carried out for all attributes at once through a genetic algorithm. It has been found through the series of experiments that this approach outperforms other of designing decision rules methods (including decision trees based on fuzzy context based clustering) in terms of more compact BDD and higher classification accuracy.
Acknowledgment Support from the Natural Science and Engineering Research Council of Canada (NSERC) is gratefully acknowledged. The second author gratefully acknowledges support from the Technical University of Bialystok (grant W/WI/8/02).
268
References [1]
Akers B., "Binary decision digrams", IEEE Trans. CompuL, C27(6), pp.50916, 1978.
[2]
Wegener I., "BDDs  design, analysis, complexity, and applications", Discrete Applied Mathematics, 138, pp. 229251, 2004.
[3]
Byrant R., "Symbolic Boolean manipulation with ordered binaty digrams", ACM Comput. Surv., 24, pp. 293318, 1992.
[4]
Jung W.S., Han S. H., and Ha J., "A fast BDD algorithm for large coherent fault trees analysis", Reliability Engineering & System Safety, 85, pp. 369374, 2004.
[5]
Siebert J.P., "Vehicle recognition using rulebased methods", Turing Institute Research Memorandum, TIRM87017 (March 1987).
[6]
Ishiura N., Sawada H., and Yajima S., "Minimization of binary decision diagrams based on exchanges of variables", In Proceedings of the International Conference on ComputerAided Design, pp. 472475, Santa Clara, CA, November 1991.
[7]
Rudell R., Dynamic variable ordering for ordered binary decision diagrams,In Proceedings of the International Conference on ComputerAided Design, pp. 4247, Santa Clara, CA, November 1993.
[8]
Pedrycz W., Sosnowski Z.A., "Designing Decision Trees with the Use of Fuzzy Granulation", IEEE Transactions on Systems, Man, and Cybernetics PartA, 30, pp. 151159, 2000.
A fuzzy way to evaluate the qualitative attributes in bank lending creditworthiness Gisella Facchinetti  Giovanni Mastroleo Faculty of Economics, University of Modena and Reggio Emilia, Italy facchinetti@unimore.it, mastroleo@unimore.it
Abstract: In this paper we address bank evaluation of clients in lending credit, based on qualitative attributes. Till now, the banks have dodged to face this part of the lending credit. There are several reasons for this. One is the impossibility of using a statistical approach; the variables are linguistic attributes, not numbers. Another one, which we think really serious, is the difficulty of fixing which qualitative attributes are important. Every bank uses a personal contact with the client, the experts have not a unique behaviour. Here we present a sketch of our work, performed with an Italian bank, in which a fuzzy approach is used. In particular, we have used two different methods: a fuzzy expert system and a fuzzy cluster method.
Keywords: bank creditworthiness, qualitative attribute, fuzzy expert system, fuzzy cluster
1
Introduction
In this paper we address how one bank may evaluate clients in lending credit procedure basing oneself on their qualitative attributes. All banks have always recognized that lending credit is based on two aspects, one quantitative, one qualitative. Historically, only the first has been treated in a systematic way. The literature is rich of researchs that use instruments of statistical and econometric type. Recently, the soft computing instruments have brought a new impulse into this area and several interesting results have been obtained using neural and fuzzy approaches. ([36]). Every bank has always treated the second aspect, the qualitative one, in a private mode. No research is available about the codification of this problem. The banks avoid the codification of this part in lending credit. There are several reasons for this lack. Qne is the impossibility of using a statistical approach; the variables are linguistic attributes, not numbers. Another, we think really serious, is the difficulty of fixing which are
270
the qualitative attributes the experts regard to be important. Every bank uses a personal contact with the client. All credit officers have their own methods of evaluating the client's qualitative attributes and therefore any information is available in the bank's archives. Recently, for several reasons, some Italian banks have decided to address this problem. One of these banks has constructed a questionnaire including several evaluations about the client's qualitative attributes. They have proposed this questionnaire to their officials with regard to the monitoring problem. The replies to this questionnaire contain information on clients current position: "Good Position" and "Bad Position". As they use an expert system for the evaluation of the quantitative attributes, they were interested in a similar idea for the qualitative ones. Because of the several joint works we have had with banks in landing credit problems, they contact us with this new problem. Immediately we understand that this new problem is very interesting even from the point of view of research as no papers are present in this field. We discuss for a long time with them about the possible instruments we may use to evaluate the questionaire results. We agree that a weighted average is not the best method for a final evaluation, inter alia, because they have no idea of which would be the correct weight to use. We propose two different possibilities based on fuzzy logic. The motivation is very natural. Fuzzy logic is able to "compute with word", as L. A. Zadeh says, and the variables of the problem are words, not numbers. Here we present a sketch of the two ideas that are: a fuzzy expert system and a fuzzy cluster method. The first one is the natural evolution of expert system they know, the second one is due to their interest to have a rating method. The two methods produce about the same results. Notice that these are the results of the qualitative information only. The complete evaluation needs of the mix of qualitative and quantitative data, so we are sure that the addition of quantitative information can only to improve the final results. The misclassified cases are the 15,7%. We have checked these cases and, for the 96% of them, they are cases in which the input data are typical of Good client but the actual situation is of Bad position and vice versa. We have some clients with the same expert judgement that are in different classes. So we can say that the two methods show a good performance to discriminate the two classes of clients. The second method offers a rating with two clusters, Good and Bad client, but even a rating with four clusters, GoodGood, Good, Bad, BadBad.
2
Bank's evaluation of client's creditworthiness
The phase concerning the bank lending risk evaluation is one of the most important but also the most difficult in credit activity. One way to estimate client reliability is to assign a value (either a number or a judgement) which represents
Information Processing and Security Systems
271
the risk level. One of these possible values is client solvency analysis. This is comprehensive of both the moral qualities and the means of the client. The credit file has to supply a photograph of the firm, but it is influenced both by the way in which this is done and by the quality of the persons who are involved in credit lending. The criteria of analysis used by banks are of two types: staticpatrimonial and dynamicprofitability criteria. These criteria are based on economic and financial characteristics of the firm requesting credit, but evaluation of the validity of the programs that show an assumed income capacity is fundamental. In this respect, a lot of qualitative aspects must be considered — more precisely, the aspects connected with the technical and organizing characteristics of firm and the market and logistic opportunities. In fact these two aspects contribute to determine the future of the firm and may help the bank to know the firm's economic and financial prospects. All these aspects are present in every lending credit request, but this complex of researches is not developed in the same way. Every bank has different procedures that reflect the bank's internal situation, such as bank dimension or organizing complexity but the same procedure may depend on the characteristics of the client, such as his economic activity or his dimension and organization. In any case, the procedure consists of two phases: one quantitative, one qualitative. Mixing the two analysis results produces the final evaluation. Many researches and many different methods have been proposed to approach the quantitative valuation. These methods are mostly of statistical, econometric and probabilistic type.(see [8],[11]). Recently, "soft computing" techniques have entered the credit world, producing interesting results.
3
Qualitative analysis: an Italian case
The qualitative aspects of a firm requesting credit, are useful in the corporate case, but have particular relevance for the small business. In this situation, bookvalues are poorly reliable, often having fiscal origins. This suggests that it is better to use qualitativecompetitive analysis than the traditional budget to decide a firm's creditworthiness. No research or methods have been proposed for qualitative valuation. This is due to the fact that, till now, the banks have avoided to codify this aspect. There are various reasons for this. One is the impossibility of quantifing the results; the variables are linguistic attributes, not numbers. Another, we think really serious, is the difficulty of standardizing a procedure able to evaluate the qualitative attributes. Every bank uses a personal contact with the client. All credit officers have their own methods for evaluating the client's qualitative attributes. Therefore, at the end, any information is available in bank archives.
272
Since we have had many opportunity to work toghether with banks on clients creditworthiness, an Italian bank, which decided to study a method for qualitative evaluation of their clients, ask us to begin a joint work with them. The bank has already devised a questionnaire which incorporated several questions about the client's qualitative attributes. The questionnaire was proposed to their officials with regard to the monitoring problem. The replies contain information on clients actual position, "Good Position" and "Bad Position". A first analysis shows to us that the submitted questionaire is too wide and, in several case, the questions are illposed. Then we face the problem of how to evaluate the replies to those questionnaires. The simplest way is the typical approach connected with a sort of weighted average, like in social statistics studies, but the bank does not know what are the correct weights to use. As they have an expert system for qualitative evaluations, they was very happy to test a fuzzy approach. We proposed two different possibilities: a fuzzy expert system and a fuzzy cluster method. Here we present a sketch of the two ideas.
4
The two fuzzy methods
The reason for using fuzzy logic as a basic instrument is very natural. Fuzzy logic is able to "compute with words", as Professor Zadeh says, and the variables of the problem are judgements expressed in a linguistic way, not numbers. One of the most fascinating and useful features of fuzzy models is their ability to be both linguistically tractable and mathematically sound. In fact, many investigations ([10],[14]) have established that any sufficiently regular mapping from inputs to outputs could be approximated with any degree of accuracy by fuzzy systems maintaining linguistic interpretability. Formally speaking, it is possible to demonstrate that: given a normed space (X, .  ) p e R and feCK,
with K a compact of X, V^> 0 a fuzzy system exists with an input
output relationship F such that
  /  F  < s, where
.  is the Lebesgue norm
of some order p. Fuzzy modelling schemes exist and generate extremely smooth approximation of target functions. The Mamdani systems are certainly the most common and were the first fuzzy models for which the universal approximation property was proved. The essential steps for a fuzzy system design are: (i) identifying the problem and choosing the type of fuzzy system which best suits the requirement of the problem, (ii) definition of the input and output variables, their fuzzy values and their membership function (fuzzification of input and output), (iii) definition of the control rules and the translation of these into a fuzzy relation, (iv) treatment of any
Information Processing and Security Systems
273
input information to choose the fuzzy inference method, (v) translation of the fuzzy output in a crisp value (defuzzification methods), (vi) tuning the fuzzy system for validation of the results. ([9], [13]). The problems of fuzzification and construction of blocks of fuzzy rules are treated in several ways. Here we use the interview with experts of the problem. This method does not use the past history of the problem and permits a real contact with the experts that may allow into the study all the experience matured in years of work in that field. The fuzzy cluster approach is a datamining method, which lives on the past data. Here we have inputoutput data, but we use only the first ones as we leave free the number of clusters the data produce. Mathematical tools (see chapter 7) produce that for these data the optimal number of clusters are two and four. Owing to the bank's request we show only the case of two clusters. The two methods we present, are able to discriminate Good and the Bad clients.
5
The bank questionnaire
The analysis areas deal with several aspects of the firm situation. • • • • • • • •
The activity of the firm. The ownership structure (independent firm or belonging to a group). The characteristic of the markets in which it works. Its position in the market. The critical factors of sector success/ weakness. Management evaluation. Presence of risk factors in the past or in the present. The explicative factors of the actual economicfinancial position.
Looking at this list, some remarks can be made. Some evaluations may be considered objective while others are subjective. For example, the "firm's position in the market" is an objective aspect as it derives from information that are measurable. "Management evaluation" is a subjective aspect. It deals with a valuation based on reputation, image, notoriety and credibility of owners and management. Owing to the bank's request for privacy, we cannot show the complete questionnaire that contains more than sixty questions. We show the thirtyfive variables of the expert systems, which are obtained by an aggregation of the initial inputs, done earlier in the process. This aggregation was decided with the bank itself. The work of aggregation is due to the fact that some variables were considered redundant and alreadv considered in other information. In the next we
274
present the meaning of inputs (varxx), the intermediate variables (intxx) and the output, present in system picture (figure 1). varOl varO2 varO3 varO4 varO5 varO6 varO7 varO8 varO9 varlO varll varl2 varB varl4 varlS var 16 varl? varlS varl 9 var20 var21 var22 var23 var24 var25 var26 var27 var28 var29 var30 var31 var32 var33 var34 var35 intOl intO2 intO3 intO4 intO5 intO6 intO7 intO8
Cohesion level and goal unitariness. Contracting capacity, experience, flexibility and expertise of the decisor. Undertaker risk propensity. Number of release signals in the firm's management. Resources drag. Decisor morality and will to meet his obligations. High or irregular standard of living. Reluctance in the budget presentation. Insufficient clearness and entirety in the information supplied. Insufficient caution in budget valuation. Distribution activity and client assistance. Image and quality of the product in the market. Efficiency and capacity of production and procurement. Price competitiveness. Marketing with one or few clients or channel of trade. Geographic typology of the channel of trade. Level of the life cycle of the firm's products. Firm's competitiveness in the market. Membership and position in a group. Tenure years of the present shareholder. Firm manager characteristics. Years of relations with the bank. Owners' typology. Problems connected with the generational transfer. Stopping of the financial underwriting on behalf of the control group or the owners. Dependence on one or few suppliers. Redundancies. Legal or fiscal controversy. Difficulty and delay in cashing credits. Reduction of the guarantee capital. Delay in payment of hires, taxes, and suppliers. Past crisis, delay in paying a debt, etc. Firm in economic difficulties. Financial reorganization or office closure. Firm belonging to a group which is in financial reorganization. Other risk factors Capability Marketing Capability Produttivity Capability Clients Characteristics Decisors Profile Ownership Characteristics Historic Characteristics
Information Processing and Security Systems
intO9 intlO intll intl2 intl3 intl4 intl5 intl6 intl7 intl8 intl9 int2O int21 int22 int23 output
6
275
Behaviour of Decisors Managerial behaviour of Decisors Informations on Behaviour Aggregated judgement of firm Decisors Behaviour Risk Factors Marketing Strenght Complex of informations about firm Competitivity Position Ownership Risks Risk Situations Crisis Situations Ownership situation Firm Evaluation Total Evaluation
The fuzzy system and results
The complexity of the problem and the high number of variables create a lot of intermediate variables, which collect together few variables in groups that have a significant meaning. As you see, the system (figure 1), is built by two macrogroups: all the information on the firm and the risk factors. The two partial outputs produce the final evaluation. This choice is due to the fact that, for the bank, the risk factors have a greater importance than the firm evaluation. The final rulesblock takes into account this bank decision. As example in Figure 2 we shaw some details about var 01 = Cohesion Level, var 02 = Decisor Capacity, var 03 = Risk Propensity and the rules block that generates intermediate variable int 06 = Decisor Profile. In figure 3, the first rules block shows the aggregation int 09 = Decisor Behaviour = f (intO6,intlO), where int 06 = Decisor Profile and int 10 = Managerial behaviour of Decisors. The second is relative to intl l=Behaviour Information =g (var08,var09,varl0), with var08 = Insufficient Caution, varO9 = Insufficient clearness, var 10 = Reluctance. The third is relative to Int 13 = Decisor = h (int9,intl 1), with int 09 = Decisor Behaviour, intll=Behaviour Information. The last rules block (Figure 4) is relative to the partial final evaluation int 23 = Firm Evaluation that depends by int 12 = Aggregate Judgement of Firm and int 18 = Ownership. The last is the fuzzification of the partial output Firm Evaluation Its range is [0,1] and has eight linguistic attributes.
276
Fig. 1. Design of the system
Information Processing and Security Systems
277
#J CohesionLevel! DedsorCapacity JlRiskPropensi^ low v medium
low
high
low v medium
high
low
right
medium medium
hkjb
low
disposed
low
adverse v right
medium
low v medium
medium
low
medium
low
medium v high
medium
medium
high
medium v high
right
low
high
adverse v right v disposed
medium
10 medium v high high
adverse
high
right
high
11 medium
high high
medium
13 high
high
high
disposed
0.5
X. 0.6
1 Units
1.4
2
Fig. 2. Description of varOl, varO2, varO3 and relative rule block
T
high
12 medium
I.U
nn
Dec isorProfde _ j '
adverse v right v disposed
J
Fig. 3. Rule block of intO9, intl 1 and intl3
278
j RrrnEvaiuatior
j Ownership
2 very_low 3 very_taw
medium
bad bad
medium_hihg v high
veryjow
4 5 6 7
veryjow low mediumjow mediumjow
veryjiigh
tow
medium low v mediumjow
veryjow veryjow
8 9
mediumjow medium_low
10 11 12 13 14 15 16 17
medium medium medium mediumjiigh mediumjiigh mediumjiigh high v veryjiigh high v veryjiigh
i1
low v mediumjow
veryjow
medium mediumjiihg v high veryjiigh low v mediumjow medium_hihg high v veryjiigh
mediumjow
low mediumjow mediumjiigh mediumjow mediumjiigh
taw mediumjow v medium_hihg high v veryjiigh
18 high 19 20 21 22
low""
tow
high high v veryjiigh veryjiigh veryjiigh
low
high medium Jiigh
mediumjow medium mediumjiihg v high veryjiigh
high high veryjiigh excellent
medium rnediumjithg v high
veryjiigh excellent
y \ i %; ^: t bad
low
medium_low
mediurri_high
high
Fig. 4. Rule block of intl2, intl8 and int23 (FirmEvaluation) with layout We tested the system using as input values the results obtained by the questionnaires proposed to the bank officials. The bank give us data of 287 positions, 171 "in Good Position"(G), 116 in "Bad Position" (B). To decide what is the best cutoff point, we used the classical method of the cumulated frequencies. The results are presented in table 1, where four columns are shown. The first one contains the valuation intervals, the second one, for every interval, contains the cumulate percentages of'Good Position", the third one the analogous values for the "Bad Positions" and the fourth contains the difference between the bad and the good cumulated percentage.
Information Processing and Security Systems
279
Good Bad Diff
1,00
=0 0
for Rzpk
>0, k = l,...,n
(7)
In this case, the scattering matrix obeys the equation
IS*(Pl,...,pn)S(pu...,Pn)>0
for Rep t >0, * = l,...,n
(8)
In other words, passivity leads to stability in a Lyapunov type sense. This aspect is of greatest interest in the context of the design of nD systems, since structural passivity guarantees stability of the resulting system. Furthermore, lossless systems form a very important subclass of passive systems. In such special cases the scattering matrix additional obeys the property v...Jcon)
= I,cokeR,k
= l...,n.
(9)
In the case of real networks this property is equivalent to
For simplicity of notation, let us restrict our attention to real circuits in the following, although all results also can be generalized to the complex case.
Stability in particular depends on the denominator polynomial of the scattering matrix 5 ( / ? 1 , . . . , / 7 r t ) . In contrast to ID systems theory, different classes of Hurwitz polynomials have to be discussed in the nD case [5, 11, 15, 16]. The class appearing as denominator of scattering matrices of passive circuits is called "scattering Hurwitz polynomial" and obeys the following properties. An nvariable real polynomial is called scattering Hurwitz, if
g(Pi,...,pn)*0
for
g (/?!,...,p n )and g (#,...,p n ) are relatively prime. Following the above discussion, there might be the impression that the concept of nD circuits is only of academic importance since such systems cannot be realised in a physical sense by using classical electrical building elements. However, such a conclusion is not true, nD circuits serve as reference networks for the design of stable and robust nD discrete systems by using bilinear transform. Such a procedure is well known from ID filter design and can be generalized to the nD case. To be more specific, let £•• ( / ? j , . . . , pn)bt
an element of the scattering
matrix of a passive circuit, by using the bilinear transform
Information Processing and Security Systems
Pk=—
1 zk +1
^—T'
,
293
1
k = l,...,n,
(ll)
1 ^ Zfc 1 we obtain H (7
7 1
=
S0,
Here on pass k , xk (t) is the nX1 state vector, yk(t) is the m X l pass profile vector, and uk{t) is the / x l vector of control inputs. To complete the process description, it is necessary to specify the 'initial conditions'  termed the boundary conditions here, i.e. the state initial vector on each pass and the initial pass profile. Here no loss of generality arises from
assuming xk+l(0) = dk+v: k >0 and yo(t) = f(t) , where dk+l is an nXl vector with known constant entries and f(t) is an m x l vector whose entries are known functions of t. Such a model is clearly a continuous discrete linear 2D system but it is also possible to define the discrete linear repetitive process clearly resembling the standard discrete 2D system. Following Rogers and Owens [29] the statespace model of a discrete linear repetitive process has the following form o v e r 0 < p0 (14)
yk+l (p) = Cxk+X (p) + DuM (p) + Doyk (p) where on pass k,xk(p)
is the nX\
state vector, yk(p)
is the raXl vector
pass profile, and Uk (p) is the / X1 vector of control inputs. To complete the process description, it is necessary to specify the 'initial conditions'  termed the boundary conditions here, i.e. the state initial vector on each pass and the initial pass profile. Here no loss of generality arises from
Information Processing and Security Systems
assuming ^ + ] ( 0 ) = dk+l,
295
k > 0 and yo(p) = f(p)
vector with known constant entries and f(p)
where dk+l is an n X l
is an m X l vector whose entries
are known functions of p . Comparing this process with a discrete linear 2D system of the Roesser form we easily see that (14) can be transformed to the form of (1), see e.g. [29], and then
xh (/, y); yk
xk+l A
> Aj j ,
BQ
> A 12 , C
(P) > A 2 i, L'o
^ A 22
The discrete repetitive process dynamics can be highlighted by the folowing figure, which describes the role of the twodimensional information updating in the system.
k+2 CO CO
O
k+1
CO CO
yk(ai)
y k (0)
o
along the pass
cc1
Fig.6: State dynamics
7
Stability and Stabilisation
A great practical, as well, as theoretical interest for all system classes are stability analysis and appropriate stabilising controller design. Now, we revisit concisely base notions and methods. Following e.g. Kaczorek [32] remind the asymptotic stability definition.
296
Definition 1 The 2D linear discretetime system is said to be asymptotically stable if (15)
limjcC/,y)=o under zero input u(i, j) = 0 and the boundary condition such that
sup JbtA(0,7)1 0 and Q > 0 satisfying the following LMI:
A?PAX
0, Z and N such that the following LMI holds
Y+Z
0
0 Z
YAXT YA/
NTB2T Y
wnere
n=u
we define in the following
way: fl iff Pij
J
=
n
rr
[ x r :=t] < 3n > (G5i) = O5j r
0 iff [x r :
[PA] If the program P is of the form xr:=random (r=l,...,h), then the matrix P is as follow:
p
..,. , > w h e r e
p  is a probability distribution corresponding to random generation of elements from universe A. [F] To formulate the other definitions briefly we first define the matrix of the program of the form: while —ry do x:=x\ (we will be denote it by [7?] and the matrix corresponding to it we will by denote by I[y?]). Thus: [l iff i = j and 3,05i = y lj
1o
otherwise
314
[C] If P is of the form: begin Pj; P2 end and Pi, P2 are matrix corresponding to the subprograms Pj and P2, then P is defined by: P=Pi°P 2 [DB] If the program P is a branching of the form: if /then Pj else P2\ then the matrix for it is following: P=I[y?]°Pl+ I[yy?]°P2
[NB] For construction of probabilistic branching of the form eitherp P} or P2, we define matrix P on following way: P=pP,+(lp)P 2 [W] If P is a program of the form while ydo Pj, then we shall use the following equation (I"I[Y?]°Pl)°P::::I[i'y?]
motivated by the equivalence of the programs while ydo Ph begin if ythen P] else begin Pi; while ydo Pj end . If the determinant of the matrix (II[Y?]oPi) is different from 0 then the matrix P can be determined immediatelely (this is the case of the example considered in Section 2); otherwise, we need a more complicated technique described in [1,2,4,8].
7
Appendix 2 (Results of the experiments)
The algorithm P, considered in the Section 2, has been implemented in Pascal and realized several (33000) times. Our analysis was restricted to the case of the average number of steps (for each computation the number of steps is established and the average value was detrermined). This "experimental" average number of steps is equal to 8.7492. The corresponding "theoretical" average number of steps, described by the expression quoted in Section 2 is equal to 8.7475. For our "academic" example of Section 2, the difference between experimental and theoretical results seems to be not essential. However, for models of real situations, each noticable difference should be analysed. Those differencies can be caused by: • errors of numerical computations of "theoretical" values, •
low quality of generators of pseudorandom objects in implemented version of the algorithm.
Information Processing and Security Systems
315
The "numerical" errors, in the case of algorithms of linear algebra, are rather small; the observed errors are rather caused by low quality of used generators of pseudorandom objects. Therefore, in the case, where the errors seems to be to great, we suggest to improve (or change) pseudorandom generators.
Fuzzy Parametric Integral Equations System in modelling of polygonal potential boundary problems described by the Laplace equation Eugeniusz Zieniuk, Andrzej Kuzelewski University of Bialystok, Faculty of Mathematics and Physics, Institute of Computer Science, Sosnowa 64, 15887 Bialystok, Poland email: ezieniuk@ii.uwb.edu.pl, akuzel@ii.uwb.edu.pl
Abstract: The following paper presents an application of the fuzzy Parametric Integral Equations System (PIES) for solving potential boundary problems in polygonal domains with uncertainly defined boundary geometry and boundary conditions. The proposed method is based on the PIES and ordered fuzzy numbers. The boundary geometry is created using a small number of control points and modelled by parametric linear functions. Neither the boundary nor the domain discretization is required to process the algorithm. Keywords: Parametric Integral Equations System, interval arithmetic, ordered fuzzy numbers, boundary problems
1
Introduction
The traditional methods of modelling of boundary problems do not consider a problem of measurement errors (such as an instrumental error, an error of a method, etc.). In effect, the results obtained from the computer simulation can significantly deviate from real values. In general, the measured values can be treated as numbers belong to a certain interval. Therefore, the simulation process requires particular methods of boundary problems modelling, which should consider interval nature of the measurements. One of the widely used methods in solving boundary problems is the Boundary Element Method (BEM) [1]. The Fuzzy Boundary Element Method (FBEM) [2] is an extension of the BEM considering uncertain nature of measurements. However, the FBEM inherits from the BEM a few significant disadvantages (such as discontinuity at the points of segments join, a big number of nodes required to define boundary geometry precisely, a process of boundary discretization).
318
The fuzzy Parametric Integral Equations System (PIES) [3] is an alternative method, free from the abovementioned disadvantages. The fuzzy PIES was elaborated as an extension of the PIES [4] in order to effective solving of not sharply defined boundary problems. The algorithm allows approximately define both boundary geometry and boundary conditions  using ordered fuzzy numbers [5]. The fuzzy PIES was created as a result of modification of the mathematical theory describing the traditional PIES by the ordered fuzzy numbers theory [5] and the Kaucher complete interval arithmetic [6]. The former experiments on applying the fuzzy PIES present, that it inherits the advantages of the PIES, such as effectiveness, high accuracy of results and easiness of modelling of boundary problems. An application of the fuzzy PIES to solving potential problems described by the Laplace equation in polygonal domains with uncertainly defined boundary geometry is presented in [7]. The aim of this paper is to propose an application of the fuzzy PIES (formulated for the Laplace equation) to solving boundary problems in polygonal domains with both boundary conditions and boundary geometry uncertainly defined. The numerical algorithm and solution of an example problem is presented.
2 Definition of boundary geometry and boundary conditions Uncertainly defined polygonal domains in boundary problems can be approximately described using parametric linear functions. The boundary geometry Q described using the abovementioned functions is presented in fig. la. a)
^ _^
^
•  corner point •  range of changing corner point coordinates
b)
region of uncertainty
a=[oc~, a + ] and te{u,p} u  Dirichlet boundary conditions p  Neumann boundary conditions
Fig. 1. Definition of: a) domain fusing corner points, b) boundary conditions
Information Processing and Security Systems
319
The boundary functions are approximated using global base functions [8] (Chebyshev polynomials). The coefficients of these functions are interval values considering uncertain nature of boundary conditions (fig. lb). Each function Fk(s), k  1, 2, ..., n (fig. la) can be described by the following function: rk(s) = aks^ + bk,
0 M(RCP_12, ECP_12, NCP_12, RCP_22, ECP_22, N C P _ 2 2 ) ( O N C P ) ) } •
(6)
Collection of situations forms final fuzzy decisionmaking situation vector O : O = { MORCP(C)RCP), MOECP(OECP)> MONCP(ONCP) }
(7)
Information Processing and Security Systems
373
The Max operation is used for the defuzzification process to give the final decisionmaking situation Oj: Oj = Max{
JLIORCP, ^OECP, HONCP }
(8)
5 Discussion and Conclusion In this paper, a promising soft computing (implying NN and fuzzy logic) image processing aproach for the biomedical signal analysis and interpretation, and particularly for BAEP signals, is suggested in order to take advantage from features which are unreachable from unidimentional signal (time dependent waveform). Based on a hybrid scheme, multiple model approach, the aim of the suggested approach is to develop an efficient tool for a reliable CAMD. For this purpose, at first a processing view of biomedical signals is developed. That consists to a SignaltoImage conversion which opens a large opportunity to beneficiate of more information than if seen and processed as signals. Second, a multiple model approach to medical diagnosis is developed from classification of BAEP signals converted in images. Indeed, this multiple model approach is mainly based on NN classification and Fuzzy decisionmaking stage. More, the efficiency of this classification emerges from the two classification ways: several NN for local subimages resulting in the local indicators, and one NN for the global image resulting in the global indicator. The obtained results from NN classification represent the first step of results which will be significantly enhanced by the Fuzzy decisionmaking exploiting the expert (human) knowledge. The Fuzzy decisionmaking system based on Mamdani's fuzzy inference must be able to decide of the appropriate diagnosis among RetroCochlearPatient (ORCp), EndoCochlearPatient (OEcp), and NormalCochlearPatient (ONCp). Thus, once this Fuzzy part implemented, the multiple model approach built of NN and Fuzzy decisionmaking is expected to be an efficient approach for a reliable CAMD. With regard to other approaches [1], [2], [3], the suggested BAEP signal analysis and interpretation approach for a reliable CAMD exploits the two main advantages from its SignaltoImage conversion and multiple model approach [15]. An interesting alternative for future works could be, on the one hand, the investigation in other neural networks for classification such as fuzzy neural networks or radial basis function networks [10], [16], and on the other hand the generalization of suggested approach to a larger field of applications such as fault detection and diagnosis in industrial plants [17].
374
Acknowledgments Authors would like to thank Prof. Kurosh Madani for his scientific supervision concerning the present work, his helps and useful discussions.
References [I] Wolf A., Hall Barbosa C , Monteiro E. C , and Vellasco M., 'Multiple MLP Neural Networks Applied on the Determination of Segment Limits in ECG Signals', 7th International WorkConf. on Artificial and Natural NN, Proc. Part II, Mao, Menorca, Spain, June 2003, LNCS 2687, SpringerVerlag Berlin Heidelberg, pp. 607614, 2003. [2] Piater J. H., Stuchlik F., von Specht H., and Miihler R. (1995): Fuzzy Sets for Feature Identification in Biomedical Signals with SelfAssessment of Reliability: An Adaptable Algorithm Modeling Human Procedure in BAEP Analysis. Computers and Biomedical Research 28, pp. 335353, Academic Press. [3] Vannier E., Adam O., and Motsch J.F., 'Objective detection of brainstem auditory evoked potentials with a priori information from higher presentation levels', Artificial Intelligence in Medicine, 25, pp. 283301, 2002. [4] Widrow B. and Lehr M. A., '30 years of adaptive neural networks: perceptron, madaline, and backpropagation', Proc. of IEEE, Vol. 78, pp. 14151441, 1990. [5] Bazoon M., Stacey D. A., and Cui C , 'A hierarchical artificial neural network system for the classification of cervical cells', IEEE Int. Conf. On NN, Orlando, July 1994. [6] Goonatilake S. and Khebbal S., 'Intelligent Hybrid Systems', John Wiley & Sons, 1995. [7] Jordan M. I. and Xu L., 'Convergence results for the EM approach to mixture of experts architectures', Neural Networks, Vol. 8, No. 9, pp. 14091431, 1995. [8] Haykin S., 'Neural Networks: A Comprehensive Foundation', International Edition, Second Edition, PrenticeHall, 1999. [9] Zhang G. P., 'Neural networks for classification: a survey', IEEE Trans, on Systems, Man, and Cybernetics  Part C: Applic. and Reviews, Vol. 30, no. 4, pp. 451462, 2000. [10] Azouaoui O. and Chohra A., 'Soft computing based pattern classifiers for the obstacle avoidance behavior of Intelligent Autonomous Vehicles (IAV)', Int. J. of Applied Intelligence, Kluwer Academic Publishers, Vol. 16, No. 3, pp. 249271, 2002. [II] Motsh J. F., 'La dynamique temporelle du tronc cerebral: receuil, extraction, et analyse optimale des potentiels evoques auditifs du tronc cerebral', PhD Thesis, ParisXII University, 1987. [12] Gonzalez R. C , and Woods R. E., 'Digital Image Processing', PrenticeHall, 2002. [13] EgmontPetersen M., De Ridder D., and Handels H., 'Image processing with neural networks  a review', Pattern Recongnition, 35, pp. 22792301, 2002. [14] Turban E. and Aronson J. E., 'Decision Support Systems and Intelligent Systems', Int. Edition, Sixth Edition, PrenticeHall, 2001. [15] MurraySmith R. and Johansen T. A., 'Multiple Model Approaches to Modelling and Control', Taylor & Francis Publishers, 1997. [16] Madani K., De Tremiolles G., and Tannhof P., 'Image Processing Using RBF like Neural Networks: A ZISC036 Based Fully Parallel Implementation Solving Real World and Real Complexity Industrial Problems', Int. J. of Applied Intelligence, Kluwer Academic Publishers, Vol. 18, pp. 195213, 2003. [17] De Tremiolles G. I., 'Contribution a l'etude theorique des modeles neuromimetiques et a leur validation experimentale: mise en oeuvre d'applications industrielles', These, Universite Paris XII, 05 Mars 1998.
The prediction of behaviours of chaotic dynamical systems in 3D state space M, Pankiewicz, R, Mosdorf The University of Finance and Management in Bialystok; Faculty of Engineering, Grunwaldzka Street 1, Elk, Poland; email: mosdorf@ wsfiz.edu.pl
Abstract: In the paper a new threedimensional visualization technique of results of methods of prediction of chaotic time series has been analyzed. The influence of graphical presentation of attractors on the quality of forecasting results has been tested. The following methods of prediction of behaviours of chaotic dynamical systems have been considered: method of analogs, centreofmassprediction method and local linear prediction method. The forecasting quality has been evaluated with using the error function and the correlation coefficient. It has been shown that 3D visualization of attractor is a necessary condition for obtaining the proper results of forecasting with using the deterministic chaos methods.
Keywords: deterministic chaos, forecasting, method of analogs, local linear prediction method
1
centerofmassprediction,
Introduction
Forecasting the behaviors of time series is important in many different branches of science [1]. The behaviors of many real systems are chaotic. The prediction of behavior of such systems is difficult because of their unperiodic character of changes in time. In such systems the time changes are rather similar to noise [2,11]. The creation of various technique of computer visualizations of data is almost as old as an idea of programming itself [12]. During the last years the development of program visualization systems has been discussed in [12, 13]. It has been shown that data visualization is an important step in understanding of many physical processes including the chaotic processes. In this article the selected forecasting techniques of chaotic data have been tested: method of analogs [4,6], centerofmass prediction method (COM) [5,6], local linear prediction method [6]. The effectiveness of these methods has been
376
analyzed. The data for analysis have been selected from various fields such as physics and economy. The low dimensional physical system and the selected indexes of Polish Capital Market have been analyzed. All analyzed systems are chaotic but the characters of changes of values of time series differ in the particular systems. The considered physical system creates in 3D phase space the attractor whose the trajectories are located in the certain pipe of irregular shape. In case of the data from the capital market the pipe defined in such a way does not exist. The correlation dimension of attractors created from capital market data is greater than correlation dimension of attractors from the physical system data.
2 The roles of analysis of behaviors of chaotic systems Generally the analysis of behaviors of chaotic systems is based on the analysis of trajectories of system in the phase space. The trajectories of the chaotic system in the phase space do not form any single geometrical objects such as circle or tours, but form objects called strange attractors of the structure resembling fractal. The analysis starts with the attractor reconstruction. This reconstruction in certain embedding dimension has been carried out using the stroboscope coordination. In this method subsequent coordinates of attractor points are calculated basing on the subsequent samples between which the distance is equal to time delay r. This time delay is a multiplication of time between the samples. For the measured data in the form of time series:
CO the way of calculation of subsequent y7 coordinates of points of attractor has been measured as follows:
)r}
(2)
where: yj is the subsequent points of attractor, D is the dimension of the vector yj9 ris a time delay. The image of the attractor in Ddimensional space depends on timedelay T. When the timedelay is too small, the attractor gets flattened and this makes the further analysis of its structure impossible. When time delay T is too large, the attractor becomes more sophisticated. Therefore the selection of timedelay value is of great significance in the analysis of the attractor properties. For that purpose the autocorrelation function is calculated. The autocorrelation function is the
Information Processing and Security Systems
377
normalized measure of a linear correlation between samples of time series. It is defined as follows [2]:
where: N number of samples, xt value of / sample. For chaotic data the autocorrelation function rapidly decreases, while r increases. The proper value of the timedelay T for attractor reconstruction is determined by the condition [2]: C(t)«O.5C(0)
(4)
In forecasting the behaviors of chaotic time series the following relation between the last point of the attractor (yj) and predicted point (yj+j) of the attractor is searched for: yJ+i=nyj)
(5)
The function / depends on the shape of the attractor, which is a function of time delay. Very often the condition (4) does not give the proper value of time delay. It happens when time series contain a lot of noise. In this case the final verification of the proper attractor reconstruction may be done only with using the 3D visualization techniques. In our work we use the OpenGL technique for the visualization of 3D attractor.
3 Deterministic chaos prediction methods and error estimation The method of analogs has been proposed by E. Lorenz in 1967 [4,6]. In this method the forecasted points are chosen from the previous parts of time series in such a way that expected point yi lays on the trajectory which is close to trajectory on which there is the last point of the attractor yj. In this method the forecasted point is calculated according to the following formula: y j+\ = y i+\,
(6)
where yj+l  forecasted point which belongs to the attractor, y.+l  a next point on chosen trajectory.
378
In the centerofmass prediction method (COM) the forecasted point is calculated basing on the following formula:
where yf is a n neighboring points of attractors on trajectories passing close to the last point of the attractor y.. i  is a subsequent number of a chosen point form neighboring area, i+\ is a next point on chosen trajectory on which the point / lays. y y+1  forecasted point which does not belong to the attractor In the local linear prediction method the extension of attractor trajectory is calculated according to the formula:
where a and b are matrixes, yj+l
 forecasted point which does not belong to
the attractor. Elements of matrixes a and b are calculated according to the formula:
where yf is a w neighboring points of attractors on trajectories passing close to the last point of attractor y.. /  is a subsequent number of chosen point form neighboring area, /+1 is a next point on a chosen trajectory on which the point / lays. In the all methods mentioned above the quality of forecasting depends on the number of considered neighboring points. The forecasting quality has been evaluated by error function (E) and the correlation coefficient (CC). The error function is defined as follows [9] E = °pred
(10) 2
where apred = (jxprell(i)xohs(i)}2^,
i
0 ; but when CC < 0 large values in first series meet low values in other series, then. When CC is close to zero, then the time series xobs and xpre(i are not correlated. For estimation of quality of forecasting the special computer program for visualization has been prepared [10]. The program uses the OpenGL standard. The attractors have been presented with using the 3D scene. The knowledge of system's attractor gives useful information that is necessary to predict trends in the systems. The application of methods based on chaos theory requires the proper attractor reconstruction. In Fig.l the main window of the program has been presented. The sphere indicates the area, from which the points used in prediction procedure have been taken. Bold black line represents the predicted trajectory. The difference between the two forecasts, presented in Fig.l, lies only in value of time delay  in Fig. la the time delay is equal to 19 samples and in Fig. lb it is equal to 20 samples. Analysis of 3D reconstruction of attractor allows us to indicate that in forecasting presented in Fig.la the predicted trajectory passes parallelly to another trajectory but the predicted trajectory in Fig.lb passes across the neighbouring trajectory. Therefore the prediction showed in Fig.lb must be rejected as an incorrect one. Finally, the 3D visualization shows that even small changes of entry parameters can cause the large differences between the forecast results.
380
»I
Figure 1. The main window of the program for 3D presentation of forecasting chaotic systems a) x=19, b) T=20. The sphere shows the area for which the neighboring points are considered. In the application the elements of animation have been introduced. The animation allows us to effectively identify the shape of attractor.
4 Testing the quality of forecasting in physical and economic systems The chaotic nature of time series has been measured with using the correlation dimension and the largest Lyapunov exponents [1,14]. The methods of prediction were tested in the threedimensional phase space. The 20 points of prediction value of time series were calculated and compared with original data. The quality of prediction was calculated with using the function E (Eq.10) and correlation coefficient (Eq.ll). These prediction methods were applied to chaotic time series. The first data set has been generated by chaotically departing air bubbles in water [3] (10000 samples). The other time series containing prices of shares of Polish companies: Budimex and Okocim have been analyzed. The series of prices of Budimex shares contained the 2375 daily observations and the series of prices of Okocim shares contained 1190 daily observations. The selected statistical and nonlinear characteristics of analysed series have been summarized in Table 1. The correlation dimension has been calculated with using the GrassbergerProcaccia method [1], and the largest Lyapunov exponent has been calculated with using the Wolf algorithm [14]. Results of calculation
Information Processing and Security Systems
381
indicated that the data from the capital market are more chaotic compared with the physical system considered in the paper. Table 1. The statistical and the nonlinear parameters of analyzed series
Series
Samples Standard Max. Min. numbers deviation value value
Time The largest Correlation delay Lyapunov Exp. [samples dimension [bit/sample] number]
Physical system
10000
504,12
1439
833
0,012
10
2.1
Budimex
2375
8,70
51,6
8,1
0,067
5
3.6
Okocim
1190
6,82
42,5
1,46
0,055
30
2,3
The examples of 3D visualization of results of forecasting have been presented in Table 2. The attractor reconstruction has been prepared from time series generated by physical system. Three methods of forecasting: analogs, centreofmassprediction and local linear prediction have been used. The last two columns of Table 2 contain the values of: function E (Eq.10) and correlation coefficient CC (Eq.ll). Table 2 3D visualization of prediction methods consists of 20 samples for T = 10 (the light line represents the forecast). The method of The visualization of prediction prediction
CC
2575 X+2T[V]
Analogs
0,291
0,962
382
2575x+2x[v]
Centreofmassprediction
0,131
0,994
0,087
0,996
2575 X+2T[V]
Local linear prediction 2575
In Table 2 the results of 20 predictions have been marked with a light line. The black area contains the attractor trajectories. The predicted trajectory (light line) passes parallelly to another neighbouring trajectories on the attractor. The value of function E is close to value 0 for all methods under consideration that proves the high quality of forecasting. In considered cases the correlation coefficient (CC) is also high. In Fig.2. the comparison between original data (Fig.2.a) and results of prediction has been shown. Changes of function E against the number of samples have been presented in Fig.2.b. We can notice that the prediction quality is high for all considered methods but the local linear prediction method and centreofmassprediction method give us better predictions than these one resulted from the method of analogs
Information Processing and Security Systems
383
x(n)[v]
 Original
1000
Analog
700
COM LLP
400 100 200 9981
9985
9989
9993
9997
b)E
n . Analog .COM .LLP
0,4 
0,2 
9981
9986
9991
9996
10001
n
Figure 2. The forecasting evaluation of chaotic physical system, a) Comparison of original data with forecasting, b) The function E for the prediction methods: method of analogs (Analog), centreofmassprediction method (COM) and local linear prediction method (LLP).
Table 3 shows the results of 3D visualization of prediction methods for the daily prices of shares of Budimex company. Three methods of forecasting: analogs, centreofmassprediction and local linear prediction have been used. The last two columns of Table 3 contain the values of: function E (Eq.10) and correlation coefficient CC (Eq.ll).
384
Table 3. The results of prediction of prices of Budimex shares: 3D visualization of prediction methods consists of 20 samples for T=5 (the light line represents the forecast) The method of prediction
The visualization of prediction
CC
51,6x+2x[vj
Analog
2,241
0,577
1,878
0,230
3,021
0,592
51,6x+2t[v]
Centreofmassprediction
51,BX+2T[V]
Local linear prediction
The shape of attractor created by prices of Okocim shares seems to be less regular than the shape of attractor in the previous example. The function E gives the value larger than in the previous example. This means that the difference between the forecasting and original data increases. In the considered case the correlation coefficient has the value less than in the previous example, this means that the correlation between the forecasting and original data becomes week. In Fig.3 the comparison between original data (Fig.3.a) and results of prediction has been shown. The changes of function E against number of samples has been presented in Fig.3b. We can notice that the quality of prediction is lower
Information Processing and Security Systems
385
than in the previous example. The analog and centreofmassprediction methods give the better prediction than the local linear prediction method. For all considered methods, the function E increases in time (Fig. 3a), differently than it was in previous example. This means that the quality of forecasting decreases in time.  Oiyginal
x(n)[vj'A
a)
!
. Analog COM
6 ]
LLP
4 2 2356
2360
2364
2368
2372
2376 n
2356
2360
2364
2368
2372
2376 n
Figure 3. The forecasting evaluation of Budimex series, a) Comparison of original data with forecasting, b) The evaluation of function E, for the prediction method: method of analogs (Analog), centreofmassprediction method (COM) and local linear prediction method (LLP).
Shape of attractor from daily prices of Okocim shares is similar to the shape obtained for Budimex, therefore the attractor has not been presented in the paper. In Fig.4a the comparison between original data of daily prices of Okocim shares and results of their predictions have been shown. The changes of function E against number of samples has been presented in Fig.4b. We can notice that the quality of prediction share prices is lower than the quality in considered physical system. In considered case the linear prediction method give us the better prediction than the other methods.
386
The function E increases in time (Fig. 4b). The value of calculated correlation coefficient ranges from 0,49 to 0,79. For all methods, the function E increases in time (Fig. 4a), differently than it was in considered physical system. This means that the quality of forecasting decreases in time.
x(n)[v]
a) e 3 
1141
1145
1149
1153 1157 . Analog
E 2,5 
COM LLP
b ) 21 1,5 1 0.5 0 1141
1146
1151
1156
1161 n
Figure 4.The forecasting evaluation of Okocim series, a) Comparison of real changes of time series with forecasting, b) The evaluation of function E, for the prediction method: method of analogs (Analog), centreofmassprediction method (COM) and local linear prediction method (LLP).
The above analysis shows that forecasting of data from the capital market can be more difficult than forecasting of data from considered physical system. It can be caused by too low number of analyzed data from the capital market. According to Peters [14] we need at least 10000 stock exchange quotations to obtain reliable results. The analyzed series contain less than 3000 samples. A consequence of this can be reducing the ability of forecasting. The obtained results showed that the quality of forecasting of chaotic data depends also on the shape of attractor's trajectory in the 3D phase space.
Information Processing and Security Systems
5
387
Conclusion
In the paper the methods of prediction of chaotic dynamical systems behaviors based on the deterministic chaos theory have been discussed. The following methods have been considered: method of analogs, centerofmassprediction method and local linear prediction method. These methods are based on the geometry of the attractor. Therefore, the new 3D visualization technique has been developed in the paper. We can conclude that the graphical presentation of data series is very useful when analysing chaotic time series and evaluating whether setup parameters of forecasting are proper. The modification of presented methods has been tested as well. We can conclude that the worst results of forecasting were obtained in case of using method of analogs with a small number of neighbouring points. The quality of forecasting increases in case of linear prediction method together with increase in the number of neighboring points. Comparison of obtained results for different series shows that nonlinear methods of forecasting give better results for chaotic time series which create the attractors with regular shapes. For data from capital market the shapes of attractors are irregular in comparison with the shape of attractor from the data of physical system considered in the paper.
6
References
[1]
Box G. E. P., Jenkis G. M., ,,Analiza szeregow czasowych prognozowanie, Wydawnictwo Naukowe PWN, Warszawa 1983.
[2]
Schuster H.G., Chaos deterministyczny  wprowadzenie, Wydawnictwo Naukowe PWN, Warszawa 1993.
[3]
Mosdorf R., Shoji M., Chaos in bubbling  nonlinear analysis and modeling, Chemical Engineering Science 58, 38373846, 2003.
[4]
Singh S., Multiple forecasting Recognition, 34, 443445, 2001.
[5]
Reick Ch. H., Page B., Time series prediction by multivariate next neighbor methods with application to zooplankton forecasts, Mathematics and Computers in simulation 52, 289310, 2000.
[6]
Baker G.L., Gollub J.P., Wst^p do dynamiki ukladow chaotycznych, Wydawnictwo Naukowe PWN, Warszawa, 1998.
[7]
Abarbanel H. D. I., Brown R., Sidorowich J. J., Tsimring L. S., The analysis of observed chaotic data in physical system, Reviews of Modern Physics, 65(4), 13311392, October 1993.
using
local approximation,
i
Pattern
388
[8]
Little S., Ellner S., Pascual M., Kaplan D., Sauer T., Caswell H., Solow A., Detecting nonlinear dynamics in spatiotemporal systems, examples from ecological models, PhysicaD 96, 321333, 1996.
[9]
Sidorowich J.J., Farmer J.D., Predicting chaotic time series, Physical Review Letters, 59(8), 1987.
[10]
Zmijkowska M., Wykorzystanie analizy fraktalnej w prognozowaniu. Praca magisterska, Politechnika Biatostocka, 2003.
[11]
Sidorowich J.J., Farmer J.D., Predicting chaotic time series, Physical Review Letters, 59(8), 845848, 1987.
[12]
Hyrskykari A., Development of Program Visualization Systems, 2nd Czech British Symposium of Visual Aspects of ManMachine, Systems, Praha, 1993.
[13]
Szmal P., Francik J., Nowak M., Systemy wizualizacji algorytmow wspomagajace badania naukowe, III Konferencja Komputery w badaniach naukowych KOWBAN, 317322, Polanica Zdroj, 1996.
[14]
Peters E.E., Teoria 1997.
chaosu a rynki kapitalowe, WIGPress, Warszawa
Idiotypic Networks as a Metaphor for Data Analysis Algorithms Stawomir T. Wierzchon* Bialystok Technical University, Faculty of Computer Science, 15351 Bialystok, Poland, and Institute of Computer Science of Polish Academy of Sciences, 01237 Warszawa, Poland; email: stw@ipipan.waw.pl
Abstract: This paper was intended as a tutorial presentation of different models used to reproduce and analyze main immune functions. An express tour over vast literature devoted to the subject is offered. The choice of corresponding bibliographic positions was motivated by their relevance in current researches, computational simplicity, and richness of behavior of the model suggested by given source of information. Particularly, some remarks on discrete models are given, and general hints concerning designing of artificial immune systems are given. Keywords: Immune System, Artificial Immune Systems,Exploratory Data Analysis
1
Introduction
Immune algorithms, IA's for short, are representatives of still growing family of biologically inspired algorithms, like genetic algorithms, ant algorithms, particle swarm algorithms, etc.  consult [8] for extensive review of such algorithms and their applications. Shortly speaking, IA's are inspired by works on theoretical immunology and some mechanisms (described in Section 2.1) used by the natural immune system to cope with external and internal invaders. IA's are adaptive algorithms in which learning takes place by evolutionary mechanisms similar to biological evolution. Their adaptability relies on continuous generation of novel elements to handle an unpredictable and varying set of situations. Generally we distinguish populationbased and networkbased IA's. In this paper we focus on populationbased algorithms designed for exploratory data analysis.
* This paper was partly supported by Bialystok Technical University grant W/WI/5/04.
390
Most of these algorithms is based (at least conceptually) on the model proposed in the paper [14], although its authors recognized very soon that "the kinetic equations used in our original paper were highly idealized" ([13], p. 172). Hence, the aim of this tutorial paper is brief presentation of alternative models used in theoretical immunology with the hope, that these models will improve behavior of currently designed artificial immune systems (AIS's in brevity) for data analysis. Extended bibliography will allow an interested reader to navigate himself in different approaches to this subject and in common ideas used by researches to model important properties of the immune system.
2
Mathematical models of idiotypic networks
The aim of this section is short description of immune mechanisms important in designing computer algorithms, like clonal selection, hypersomatic mutation, tolerance, immune response, idiotypic networks and immune memory. Next we briefly review mathematical models designed to analyze these phenomena.
2.1
Basic notions from immunology
The main actors of the adaptive immune system are socalled lymphocytes, i.e. white blood cells. We distinguish lymphocytes of type B (or Bcells) and lymphocytes of type T (or T cells). Each Bcell admits about 105 receptors located on its surface and called antibodies (or immunoglobulin). Roughly speaking, antibodies are soluble proteins which have high affinity towards external intruders called antigens. The key portion of antigen that is recognized by the antibody is called "epitope"; it can be viewed as the identifier of that antigen. Similarly, the "paratope" is a specific region of antibody that attach to the epitope. Each type of antibody has also its own antigenic determinant called "idiotype". Real paratope and epitope are 3D molecules. If they are complementary with respect to their geometric or physicochemical characteristics, we say that the paratope recognizes just introduced epitope; alternatively, we say that the paratope has high affinity with the epitope. To study analytically the interactions among paratopes and epitopes Perelson and Oster introduced in [24] the notion of "shapespace"1. If d is the number of the characteristics influencing complementarity, then a point in ddimensional vector space, called "shapespace", represents corresponding molecules. Typically as shape space ddimensional Euclidean (or Hamming) space is used.
It has been observed that the idea of shape space may be misleading and produces artifacts which do not reflect any underlying biological reality  see [4]. Hence it should be used with caution.
Information Processing and Security Systems
391
Suppose a large number of copies of a specific, and never seen, antigen are introduced into an organism. Once a sufficient number of paratopes binds the epitopes describing just introduced antigen, socalled primary immune response occurs. It relies upon clonal expansion and somatic hypermutation. By clonal expansion we understand rapid replication of those Bcells which have a high affinity to the antigen. To "tune" molecular shapes of the paratopes characterizing produced clones to the shapes of invading epitopes, each clone is subjected very intensive mutation what leads to variation of immune response. Mutated clones with highest affinity to the antigen are subjected further expansion and cloning, while mutants with lowest affinity are removed from the organism. The process is continued until the concentration of epitopes decreases below sufficiently low threshold. It should be noted that during primary immune response the interaction with Tcells is crucial to the dynamics of the system. These lymphocytes control the activity of Bcells and they may have excitatory or inhibitory role. A reader interested in details on how B and Tcells cooperate is referred to e.g. [20]. A crucial effect of all these interactions is that the response to a new antigen has a bellshaped form. There exists a minimum concentration (0i) of epitopes that will elicit the immune response; similarly for very high concentration of epitopes (exceeding the second threshold 9 2 » 9i) the response decreases. In other words, in the immune system we observe low and highdose tolerance. Only medium dose of the antigen causes immune response manifested with rapid production of antibodies. In theoretical immunology the response function/(/z,), representing the concentration of antibodies of ith type, is modeled by the equation (see e.g. [25]) (1) (0! +/!,•) (9 2 where /*, stands for the "field" representing the strength of influence of all epitopes present in the system on a given antibody. Usually, if m,y stands for the affinity between ith paratope and y'th epitope, Xj denotes concentration of y'th epitope, and N is the number of different types of epitopes then the field h\ is computed according to the equation hi = ltjsit...tNmuxj
(2)
Equation (2) says that ith antibody can be stimulated by all the epitopes present in the organism, no matter they come from antigens or other antibodies constituting given immune system. This is because a new antibody, say Abi, generated e.g. through somatic mutation is a new protein for the organism, and its intensive reproduction during clonal expansion causes new immune response resulting in production of antibody of other type, say Ab2. In summary, the production of
392
antibody Ab, stimulates production of other types of antibodies2 and these subsequent generations of proteins form a kind of network called by Jerne "idiotypic network" (consult [22], or [14] for details). Its characteristic feature is that it can be maintained even in the absence of antigens inducing immune response. This is due to symmetric interactions between antibodies: if Ab/ stimulates Ab/+i then Ab,+i stimulates production of Ab,. Since antibody Abj was induced by an antigen Ag, one of its descendants, Ab,, must have epitope structurally similar to the epitope characterizing the Ag. It is obvious that during absence of the antigen Ag the antibody Ab, will maintain production of other antibodies belonging to the chain Abj >...—> Ab, ... called autocatalytic loop (consult [14] or [13]). Now, if the antigen Ag enters the organism next time, its shape is "remembered" by such a loop and effective antibodies are produced almost immediately. This phenomenon is referred to as "immune memory" and fast production of effective antibodies is termed "secondary immune response".
2.2
Models of the immune networks
Many different mathematical approaches have been developed to reproduce and analyze the main immune functions. Broadly speaking we distinguish between continuous and discrete models. Ordinary differential equations are typical for the first group of models while cellular automata are commonly used in the second group. From a methodological point of view these models can be labelled as "Bmodels" where no distinction between free and cellbound antibodies is made, and "ABmodels" where both forms of antibodies are described. Surprisingly, both Band ABmodels lead to the same conclusions as regards the fixed point properties, (stable fixed points for dynamics are necessary for achieving tolerance in a model), [4]. Below we briefly characterize the most prominent representatives of these groups. A reader interested in other models is referred to [25], [15] and [16]. 2.2.1 Continuous models One of the first, and most popular among AIS community, continuous models was "bitstring model" proposed in [14], It takes into account only interactions among paratopes and epitopes of antibodies and antigens represented by binary strings. The affinity rrty between /th epitope and yth paratope is computed in a way reflecting partial matching between the two molecules. The dynamics of the system consisting of N antibody types with concentrations {x\, ..., xN} and M
2
This idea was confirmed experimentally. Namely, it was observed that e.g. in the case of polio virus infection, Ab2 has the internal image of polio antigen. It means that Ab2 is induced by the paratope of Abj rather than by its idiotype. See: Fons, U., et al., "From Jenner to Jerne: towards idiotypic vaccines". Immunol. Rev. 90:93113, 1986
Information Processing and Security Systems
393
antigens with concentrations {y\, ..., yM) is described by the following system of ordinary differential equations: N
dxt(t) dt
M
i (t)Xj (t) kx ]T mu Xi (t)xj (t) + Y mjixi (*)yj (0 / = 1, ...,A0
(3)
The first term represents the stimulation of ith paratope by the epitope of an antibody of typey, the second term represents the suppression of antibody of type / when its epitope is recognized by the paratope of type j , third term represents the stimulation of ith antibody by the antigens, and the last term models the tendency of cells to die. The parameter c is a rate constant that depends on the number of collisions per unit time and the rate of antibody production stimulated by a collision. Constant k\ represents a possible inequality between stimulation and suppression and constant k2 represents the rate of natural death. The model was used by Bagley et al. [1] who studied another important concept of theoretical immunology  plasticity in an immune network. By plasticity we understand the process of removing and recruiting certain types of antibodies from/into the network. This process enables the immune system to decide which idiotypic determinants should be included/removed in/from the network without referring to an explicit fitness function. Consequently, the network is flexible and is able to modify its structure. Soon, it became obvious, [13], that the model is too simple to describe emergence of a selfasserted structure which would be able to sustain immune functions. A prominent representative of the "Bmodel" was proposed in [9] and [10]. It was assumed that Bcells (not antibodies) interact with antigens, and this interaction determines the kinetics of the cell response. Bcells proliferate according to a log bellshaped response function / which depends on a single field, /z,(Odefined as in equation (2). In a system consisting of N different clones of Bcells, their concentrations, Xj(t), / = 1, ...,N, vary according to the equation: dx ( O r dt where p is the rate of clone proliferation, and b is the rate at which new clones are inserted into the system by the bone marrow. Here it is assumed that: (a) the occurrence of any type of Bcell paratope is equiprobable, and (b) the parameter p suffices to summarize the role of Tcells in proliferation of stimulated Bcells. Varela and Coutinho proposed in [29] the "second generation immune network model" in which both Bcells and antibodies are taken into account. Their model reflects essential features of immune network and allows to study such significant functions like: recognition, memory or tolerance. In this model free antibodies and Bcells interact with each other through idiotypes. It is assumed that both a Bcell and free antibodies produced by this cell have identical idiotypes. Following [27]
394
the affinity m,7, used in eqn. (2), between ith andyth idiotype can take only two values: " 1 " (indicating a threshold affinity between the corresponding idiotypes) or "0" (lack of affinity). The matrix M = [m,y] is said to be the connectivity matrix. Suppose there are N different idiotypes, and /• (resp. bj) stands for the concentration of antibodies (resp. Bcells) with idiotype of type /. Then the strength of influence of antibodies on ith antibody is determined by the field G/ = S/=i,.. jv mij'fj The concentration of Bcells and free antibodies varies according to the differential equations
^PP
klarfi(t)k2
at
^
at
bi(t)
= k4  ^ ( 0 + ^5 • Prolfri)
bi(t) + k6 , i = 1, ..., N
(5a) (5b)
where: k\ is the rate of neutralization of a given type of antibodies by other types of antibodies, k2 is the rate of death of antibodies, /c3 is the rate of creation of antibodies by Bcells, £4 is the rate of death of Bcells, k5 is the rate of Bcells creation, and the term k6 represents the rate of production of Bcells by the bone marrow, mat() and prol()  the counterparts of the activation function defined in eqn. (1)  are bellshaped functions describing how Bcells mature and proliferate upon activation. In [6] these functions are modeled by the equations:
) = exp 
v f
"m/
, prol^)
= exp
ln(af./fij
(6)
where sm, sp, i w , and \ip are parameters3. This model was studied in depth in [6], [3] and [21]. Particularly, it was verified that in the model described by (5a,b) a large spectrum of long run behaviors can be observed obeying fixed point, chaotic as well as periodic and states exist. Further, it was shown in [21] that there are periodic states in which some Bcell type plays the role of switching other types to be excited. The behavior exhibited by the system hardly depends on the form of the connectivity matrix. 2.2.2 Discrete models The first cellular automaton model  referred to as BSP model  was proposed by De Boer, Segel and Perelson in [11]. Here each Bcell is characterized by a vector r in ^dimensional Euclidean space and two Bcells interact if they have complementary shapes. It was assumed that the interaction between two Bcells characterized by the idiotypes r,, and r ; is proportional to the affinity m,y =/(r/, r;) = exp[dist(rh r ; )/c], where dist stands for the Euclidean distance, and c is a scaling factor. Note that the interaction is maximal whenever the spatial 3
Precise values of these parameters and the parameters k\, ...,k6 can be found e.g. in [6]
Information Processing and Security Systems
395
coordinates are equal and opposite, i.e. r, = r ; . The authors observed that starting from a nearly homogenous distribution of initial population of cells, the final distribution was very inhomogeneous. Particularly, in 2dimensional case formation of circular clusters of small radius was observed. To study this model in higher dimensions, Stauffer and Weisbuch proposed in [26] a simplified version of the BSP model called BSPIII. They replaced Gaussian distribution of interactions by nearestneighbor interactions, and they assumed that the Bcell concentrations /?, can take only three values representing the virgin (/?, = 0), suppressed (b, = 1) and immune (/?, = 2) states. With such simplifications, the authors observed a transition from stable to chaotic regimes in higher (n > 3) dimensions, but for lower dimensions (n < 3) the BSPIII model always evolved to stable configurations. Zorzenon dos Santos and Bernardes studied in [32] another version of BSPIII with a simplified response function defined as f(h) > 0 if h e [Lo, Hi] and f(h) = 0 otherwise. The model is simulated on ttdimensional hypercubic lattice with N = Ln sites, in such a way that site i interacts with 2n + 1 sites centered at its mirror image, i.e. when n  2 the cell (x, y) interacts with its mirror image (x\ y') = (Lx, Ly) and its four neighbors: (Lx±\, Ly) and {Lx, Ly±\). The field of ith site is computed as h{  Z/eM/) fy, where N(j) stands for the (mirror) neighborhood of ith site. The concentration of ith Bcell is updated according to the rule: tf
l
if
WW'Hii
(7)
hiWeiLo.Hi]
but no change is made if it would lead to &,(*+1) = 1 or bfjt+l) = 3. This model exhibits a transition from stable to chaotic regimes for n > 2, depending on the activation threshold Lo and the width of the activation interval {Hi  Lo). The shorter the activation interval the faster the system goes to the chaotic behavior. Moreover, it was observed that the final results are sensitive to the choice of the initial distribution of concentration of states. This distribution was defined as follows: &/(0) = 0 with probability (1  x), bj(O) = 1 with probability Vzx and fr,(0) = 2 with probability Vix. Another variant of BSPIII was proposed by Bersini in [2]. Here antibodies are randomly recruited to points on a twodimensional lattice of size Lx L. Only one new cell / is located randomly at each iteration at (xh yj). Denote /?,(0) > 0 its initial concentration. Like in previous models, it is assumed that this cell exerts an affinity in a zone Z(i, r) centered around its mirror image / with coordinates (x,,, y~j) = (L  X(, L  )>,), where r is a prespecified radius. Now, the field of such an antibody is computed at iteration Ml as follows:
h&t+\) =
Y,ba(t)^{rdist(~
aeZ(a,r)
i9aj)
(8)
396
where ^(x) = 0 if x < 0 and ^(x) = x if x > 0, and dist(,) stands for a distance between two cells. The concentration of the cell / evolves according to the equation (7). A cell dies, and is removed from the system, if its concentration bj(t) equals zero. This model was carefully studied by Hart and Ross in [18]. Figure 1 shows two exemplary cells distribution after 10,000 iterations. These authors observed a number of empirical facts without their formal analysis: (1) Tolerant and nontolerant zones emerge that are not predetermined. A cell placed at tolerant zone can live for a long time. (2) The zones are unsymmetrical what follows from the random nature of the system. (3) The boundaries of zones are generally lines. (4) The set of cells S that lie in the complementary region of any persistent cell x lying on a zoneboundary occur only around boundaries of the complementary region of x. (5) The emergence of stable regions is dependent on the size of the recognition radius.
Figure 1. Stable distribution of immune cells in a selfasserted system. Small squares represent immune cells, circles  their regions of affinity. There are two slightly different modes of activity: (a) a cell can belong to its own region of affinity (left), and (b) selfstimulating cells are not allowed (right).
It seems that to explain these facts it is sufficient to note that the probability of locating new cell in ith site equals p = L'\ and probability of locating sufficient number of cell in the affinity region Z(/, r) is governed by the binomial distribution Fn(k) of obtaining k successes in n independent Bernoulli trials (the probability of success in a single trial is just p). To stimulate a virgin, or just introduced, cell /, k cells with rather high concentrations should be placed in the
Information Processing and Security Systems
397
zone Z(i, r) in relatively short period proportional to r and inversely proportional to the value /?,(0). This last requirements follows from the dynamics (7): nonstimulated cell decreases its concentration by one at each iteration. Practically it means that when e.g. &,(0) = 10 and r = 10, about 3 cells should be placed in Z(7, r) within n = 6 subsequent iterations. Such a reasoning explains why sufficiently high values of r, Z?,(0), and Lo are requested to initiate the process of stable structure formation. Figure 2 illustrates different stages of such a process. Figure 3 shows that the number of cells stabilizes around a fixed value through iterations while Figure 4 shows that the process itself admits a "freescale" nature, the probability that a cell will leave t iterations obeys the rule P(t) = tx, with T ~ 1.78.
•
.."
'
•*
."•
.
*
.
.

 ^
.
,
• •
• "i
" ='. v
.
• .
V"*.'
" " " '.
• • • • ^ • :  . s 
.
•
"'
• •
•
. "%J.l^i
• .
.
•
.•
.
:
*
• ' • ' " ' .
.
. , • " " • • "
. •
"
/ »
/ •
•
• %

: •
"
•
.
A
"
. „. .
'
i
•
•
••\....*••'
:Z
..
Figure 2. Snapshots of the selfassertion simulations (from upper left to lower right): after 1300, 1500, 2000 and 4000 iterations; the last structure is almost stable.
398
1000
i
100
10
)
190
" ^ ^
1
Figure 3. Number of cells in subsequent Figure 4. Average number of cells iterations. (rhombs) living through t iterations vs. power law P(t) = /? > 0, n > 1, a,b,c,d values of regulate the risk coefficient.
min R = a[e 0
3 Domain targets
,
d]
0.1] [
Tjcpa Dh, Tjcpa > nTh
estimation
for
approaching
moving
A moving target representing a collision threat is configured as an area of danger, moving with the speed and at the direction (course) identified by the ARPA system. The most significant factors affecting the scale of the ship domain are the following: (1) psychological, human factors, such as: naval experience of the navigator, along with his intuition and knowledge on the navigation area, (2) physical factors, characteristic of the type of the ship, referring to its dimensions,
416
relative speed of other approached ships, (3) physical factors, common for all ships in the area of concern, such as traffic level, hydrometeorological conditions, etc. Goodwin E.M. [5] presented the method for estimating the area of danger on the basis of statistical data analysis. Following the maritime law regulations, the area of the object occurrence was divided into three sectors defined by the actual relative bearing to this object. Sector 1 is on the starboard side within the bearing limits of 0° DCPAO,
(10)
where: B  ship width (here, the width B of the own ship is used instead of, an unknown, width of the ship detected by the ARPA system), DCPA0 ~ assumed value of DCPA ^ A>
The above width of the domain is valid for the starboard side. On the port side, however, like after the stern: dj = DCPAOE
or
dj = DQPAO^ »
but no less than 0,5 Nm
(11)
where: E  relative speed to own ship speed ratio: VREL IVO •
5
Exemplary planning of ship trajectory
The system of evolutionary planning of the ship trajectory was tested for the numerous cases of passing fixed navigational constraints and moving targets. The analysis of cases displays the need for the introduction of an additional parameter to the evolutionary trajectory planning algorithm, namely the change of the own ship's speed along particular trajectory sections and ships' domains (see: Fig. 5). In practice, the speed was modified using an additional genetic operator: the speed mutation. A set of permissible speed values was defined as V={3,6; 8,6; 13,6 knots} from which the mutation operator could take a speed at the trajectory section of concern. Additionally, the total time of trajectory passing was added to the function of the trajectory fitness, which took into consideration changes in the own ship's speed. After this modernisation of the evolutionary process of the trajectory search, a trajectory will be looked for which, besides meeting the formerly set safety and economy conditions, will represent the shortest time needed for covering the studied distance. For this version of the algorithm, the example represents the moving targets sailing with opposite courses on the right and left sides of the own ship  a ferry, constraints having the form of islands. Here, the population consisted of 40 individuals (passage trajectories), and the changes of trajectories in the population stopped being recorded after 1000 generations. The estimated trajectories secure the passage of the ferry behind the stern of the targets on the left side. The ship's speed changed from one trajectory section to another. Initially, the ship reduced its speed to pass targets on the left
420
side, then, having sailed between the islands it increased the speed as it did not produce unacceptable nearing to target on the right side.
Fig. 5: Trajectory evolution for the case of approaching moving targets in the presence of static navigation constraints (population 40 trajectories).
6
Conclusions
The evolutionary system of ship trajectory planning makes it possible to steer the ship in a well known environment both with static, and dynamic navigation constraints, as well as to make adaptation corrections of the ship trajectory in order to follow unforeseeable changes in the situation at sea. The evolutionary method of determining the safe and optimum trajectory in the environment is a new approach to the problem of avoiding collisions at sea. A number of preliminary tests have made it possible to formulate the following conclusions: • evolutionary algorithms can be effectively used for solving the problem of planning ship trajectory in areas of extensive traffic, like harbour entrances, coastal regions,
Information Processing and Security Systems
421
• introduction of the own ship's speed as a changing parameter makes it possible to solve the problem in a wider range. For particular trajectory sections, the actual speed is evaluated with which the ship covers this section in order to pass safely and economically all navigational constraints, both fixed and moving.
7 [ 1] [ 2]
[ 3] [ 4]
[ 5] [ 6]
[ 7] [ 8]
[ 9] [ 10]
[11] [ 12]
[ 13]
[ 14]
References Burns RS, An Intelligent Integrated Ship Guidance System. 2 nd IFAC Workshop Control Applications in Marine Systems, Genova, Italy 1992. Dove MJ, Burns RS, Stockel CT, An Automatic Collision Avoidance and Guidance System for Marine Vehicles in Confined Waters. Journal of Navigation, Vol. 39 1986. Davis P.V., Dove M.J., Stockel C.T., Computer Simulation of Multiship Encounters, Journal of Navigation, 1982, Vol. 35. Furuhashi T, Nakaoka K, Uchikawa Y, A Study on Classifier System for Finding Control Knowledge of MultiInput Systems F. Herrera, J.L.Verdegay Editors. Genetic Algorithms and Soft Computing, PhisicaVerlang 1996. Goodwin E.M., A statistical study of ship domains, Journal of Navigation, 1975, Vol. 31. Hayashi S, Kuwajima S, Sotooka K, Yamakazi H, Murase H, A stranding avoidance system using radar image matching: development and experiment. Journal of Navigation, Vol. 44 1991. Iijima Y, Hayashi S, Study towards a twentyfirst century intelligent ship. Journal of Navigation, Vol. 44 1991. Iijima Y, Hagiwara, H Results of Collision Avoidance Manoeuvre Experiments Using a KnowledgeBased Autonomous Piloting System. Journal of Navigation, Vol. 47, 1994. Lenart, A.S., Collision Threat Parameters for a New Radar Display and Plot Technique, Journal of Navigation, 1983, Vol. 36. Lin HS, Xiao J, Michalewicz Z, Evolutionary Algorithm for Path Planning in Mobile Robot Environment. Proceeding IEEE Int. Conference of Evolutionary Computation, Orlando, Florida, 1994. Michalewicz Z, Genetic Algorithms + Data structures = Evolution Programs. SprigerVerlang, 3 rd edition 1996 . Michalewicz Z, Xiao J, Evaluation of Paths in Evolutionary Planner/Navigator. Proceedings of the International Workshop on Biologically Inspired Evolutionary Systems, Tokyo, Japan 1995. Smierzchalski R, The Application of the Dynamic Interactive Decision Analysis System to the Problem of Avoiding Collisions at the Sea (in Polish). 1st Conference "Awioniki", Jawor, Pqland, 1995. Smierzchalski R, The Decision Support System to Design the Safe Manoeuvre Avoiding Collision at Sea. 14th International Conference Information Systems Analysis and Synthesis, Orlando, USA, 1996.
422
[ 15]
[ 16]
[ 17]
[ 18]
[ 19]
[ 20]
[21] [22]
[ 23] [ 24]
Smierzchalski R, MultiCriterion Modeling the Collision Situation at Sea for Application in Decision Support. 3 rd International Symp. on Methods and Models in Automation and Robotics, Miedzyzdroje, Poland 1996. Smierzchalski R, Trajectory planning for ship in collision situations at sea by evolutionary computation. 4th IFAC Conference on Manoeuvring and Control of Marine, Brijuni, Creotia, 1997. Smierzchalski R, Dynamic Aspect in Evolutionary Computation on Example of Avoiding Collision at Sea. 4th International Symp. on Methods and Models in Automation and Robotics, Miedzyzdroje, Poland 1997. Smierzchalski R, Evolutionary Guidance System for Ship in Collisions Situation at Sea. 3 rd IFAC Conference Intelligent Autonomous Vehicle, Madrid, Spain 1997. Smierzchalski R, Michalewicz Z, Adaptive Modeling of a Ship Trajectory in Collision Situations. 2nd IEEE World Congress on Computational Intelligence, Alaska, USA 1998. Sudhendar H, Grabowski M, Evolution of Intelligent Shipboard Piloting Systems: A Distributed System for the St Lawrence Seaway. Journal of Navigation, Vol. 49 1996. Wawruch R.: System of steering ship movement. Works of Navigational Department, Gdynia Maritime Academy, Gdynia 1998. Trojanowski K, Michalewicz Z, Planning Path of Mobil Robot (in Polish). 1st Conference Evolutionary Algorithms, Murzasichle, Poland 1998. Witt NA, Sutton R, Miller KM, Recent Technological Advances in the Control and Guidance of Ship. Journal of Navigation Vol. 47, 1994. Xiao J, Michalewicz Z, Zhang L, Evolutionary Planner/Navigator: Operator Performance and SelfTuning. Proceeding IEEE Int. Conference of Evolutionary Computation, Nagoya, Japan 1996.
The work was supported by the State Committee for Science Research in Poland (grant no. 3 Tl 1A 003 26).
Inputs' Significance Analysis with Rough Sets Theory Izabela Rejer University of Szczecin Mickiewicza 64/66, Szczecin, Poland, email: i rejer@uoo.univ.szczecin.pl
Abstract:
The aim of this article is to show that a proper choice of a discretisation method is a key point in analysing the significance of attributes describing a specific real system of continuous attributes. In the second section of the article three most popular automatic discretisation methods are presented, which are next, in the third section, used for a discretisation of continuous attributes describing a system of an unemployment in Poland. The results obtained after application of these methods are compared with the results obtained with an expert method, proposed in the last part of the article.
Keywords:
Inputs' significance, rough sets theory, discretisation methods
1
Introduction
The most essential problem that may be encountered in the process of modelling of a real multidimensional system is a lack of prior knowledge about the factors determining its behaviour. Sometimes there are so many "experts" and so many different ideas about significant factors influencing the analysed system that it is extremely difficult to separate the most important ones. One possibility of dealing with this problem is to apply in the modelling process a rough sets theory. There are two main approaches which can be used, according to this theory, in the inputs' significance analysis. Both are based on a comparison of the approximation quality of different subsets of attributes (inputs), which is calculated according to the following equation: cardfos{X)_ (]) cardU where: card Pos(X)  cardinality of the positive regions of the concepts family X, card U  cardinality of the universe. A
424
Approach I: First the approximation quality of the whole attributes' set is calculated. Then the attributes are one by one eliminated from the set and the quality of succeeding reduced subsets is estimated. Until both qualities (of the whole and reduced subset of attributes) are approximately the same, the last discarded attribute can be regarded as not important one. Approach II: First the number of attributes which will be used for modelling of the analysed system is assumed. Then the approximation quality for all subsets of attributes consisted of chosen number of attributes is calculated. The subset of the biggest approximation quality is chosen for the modelling process. Naturally when a system is described by a large number of attributes and small number of examples (what is the most common situation in the real world) only the second approach can be used. As it has been proved a lot of times, the rough sets theory is a very powerful tool for modelling real systems of discrete attributes. For example it was successfully applied for diagnosing diseases of urinary system [1] and for calculating costs of hospital treatment [4]. However, not always attributes describing the analysed system are the discrete ones. Sometimes they are continuous and the discretisation process has to be accomplished before the application of the rough sets theory. Since there are a lot of different discretisation methods of different characteristics, the question is how to decide which one is the most suitable for a specific system? Generally it is very difficult to give satisfactory answer for above question because it is often impossible to chose one method suitable for the whole analysed system. The reason for this is that different attributes describing the system can have different characteristics what means the discretisation method should be fitted individually to each of them.
2
Discretisation Process
There are a lot of different automatic methods used in the discretisation process. In this article only three, most commonly used, will be described. First of them is called "the equal subintervals method". According to this method the variation interval of an analysed attribute has to be divided into some equal subintervals (fig. la). This method is a good solution when the data are equally distributed in the whole interval but it can give negative results in other case. Figure lb shows an example of an attribute characterised by an irregular distribution. As it can be observed, there are a lot of data in the first and last part of the interval and very rare data in the middle. If this attribute's interval is divided in the same way as the interval from the fig. la, then the middle subinterval will contain very limited number of data points (fig. lc). In most real systems intervals containing so little data are unwelcome because they lead to creation of very weakly supported rules.
Information Processing and Security Systems
425
Naturally, the inconvenience mentioned above can be easy eliminated by expanding the middle subinterval like in the fig. 2d. Of course, the discretisation from the fig. 2d is not any more the example of the equal subintervals' method but it should be regarded as an expert method.
b)
a)
•j 0
1
0.67
0.33
c) • • •
d) •
0.33
1 
0.67
•
•
* *• * • 1 • • '
1
0
0.18
0.82
Fig. 1. The method of equal subintervals
The second popular automatic discretisation method is the method of equal number of data points. According to this method the whole variation interval of the attribute is divided into required number of subintervals containing approximately the same number of data points (fig. 2a). Mostly, this approach allows to obtain subintervals better fitted to the characteristic of the attribute than the method of equal subintervals. However sometimes, especially when it is possible to find the clear clusters in the interval, it can result in an improper discretisation. An example of mentioned situation is shown in the fig. 2. If the discretisation of the attribute from the fig. 2b is based on the assumption of equal data points in each subinterval, the clusters existed in the data set will not be separated from each other but the margin ones will be improperly split in the middle (fig. 2c). Naturally the proper discretisation of this attribute should look like in the fig. 2d. The third popular automatic discretisation method is the method of average values. The discretisation process carried out according to this method begins with calculation of the arithmetic average of the data points contained in the attribute's variation interval. This average defines the splitting point, which is used for partitioning of the whole interval into two subintervals (fig. 3a). Next, the data from the first and second subintervals are averaged and two next splitting points are defined. These splitting points allow to divide the previous subintervals into next two subintervals (fig. 3b). The process is continued until the required number of subintervals is gained. Naturally, this method allows only for splitting the whole interval into even number of subintervals. Like the methods presented before, also the method of average values is not suitable for each type of attributes. For example, the application of this method to
426
the attribute shown in the fig. 2b, will split the clusters existed in the data set (fig. 3c), instead of placing each of them into separate subintervals (fig. 2d).
b)
a)
•
•
•
0.40
0 c)
0.58
1 ;
i • •
• • ••
j
•!; \:
0.28
0.73
d)
•:;! 1
0
0.36
0.62
Fig. 2. The method of equal number of data points in subintervals
b)
a)
1
0.40 1 1
0.70
0.40
0.18
i i i
• • J1
1 •
^ •
#
•
0
0
,
c)
* 1 •
0.20
* •
*I* i * * « •
0.50
i
•
i i* i •
i
j*
• *i
•
0.77
1
Fig.3. The method of average values
3
Case study
The influence of the discretisation method on the rough sets' skills to determine the most significant conditional attributes were analysed via a system of an unemployment in Poland in years 19922001. The decision attribute was the value of an unemployment rate and the conditional attributes were seven macroeconomic factors chosen on the basis of economic literature (tab. 1). The decision table contained 120 examples [5][6].
Information Processing and Security Systems
Symbol Name CA1 corporate income CA2 sold production of industry CA3 dollar's rate of exchange CA4 money supply
427
Name Symbol rate of rediscount CA5 CA6 export number of inhabitants CA7
Tab. 1. Conditional attributes of the analysed system Naturally, regarding the number of input variables and number of examples, the second approach to determine inputs' significance (from these presented in the section 1) had to be used. Hence, the task of the survey was set as follows: find two attributes of the biggest importance for the analysed system. Since all attributes describing the system had continuous characteristics, all of them had to be discretised before the application of the rough sets theory. First the discretisation process was carried out with use of the methods described shortly in the section 2. The approximation quality of subsets containing different pairs of attributes obtained after applying in the discretisation process different discretisation methods is presented in the tab. 2. First column of the table contains numbers of attributes and three next columns  the approximation quality gained after applying succeeding methods: second column  the method of equal subintervals, third column  the method of intervals of equal number of data points, fourth column  the method of average values. The bold numbers inside the table underline the subsets of attributes of the best approximation quality.
No. 12 13 14 15 16 17 23 • 24 25 26 27
Method 1
Method 2
Method 3
No,
Method 1
Method 2
Method 3
0.133 0.017 0.050 0.025 0.008 0.017 0.058 0.067 0.000 0.000 0.025
0.008 0.008 0.017 0.017 0.108 0.017 0.017 0.008 0.067 0.017 0.183
0.108 0.042 0.017 0.017 0.142 0.217 0.033 0.067 0.017 0.000 0.183
34 35 36 37 45 46 47 56 57 67
0.000 0.000 0.042 0.142 0.158 0.008 0.000 0.017 0.017 0.000
0.000 0.017 0.025 0.017 0.050 0.067 0.150 0.133 0.117 0.200
0.042 0.017 0.108 0.183 0.117 0.150 0.200 0.000 0.183 0.183
Tab. 2. The approximation quality of subsets containing different pairs of attributes As it can be noticed (tab. 2), completely different pairs of attributes were pointed as the most important ones after applying different discretisation methods: •
method of equal subintervals: money supply and rate of rediscount,
428
•
method of equal number of data points in subintervals: export and number of inhabitants,
•
method of average values  corporate income and number of inhabitants.
At this stage of the analysis very important question appeared. How to decide which of three chosen subsets of attributes really contained the most significant attributes? In other words: which of these three methods was the most suitable for the examined system? Naturally, the attributes' subsets chosen after application of different discretisation methods could not be compared on the basis of the approximation quality. The reason for this is that the approximation quality is not the absolute measure but is closely related to the values contained in the discretised decision table, which is naturally completely different in each case. Therefore, another method had to be used to compare the results obtained after the application of different discretisation methods. In order to deal with this task, a set of neural models (10 models per each pair of attributes) was constructed. The parameters of neural networks used in the modelling process were as follows [2][3] (fig. 4): •
flow of signals: oneway;
•
architecture of connections between layers: all to all;
•
hidden layers: 1 hidden layer with 5 sigmoid neurons;
•
output layer: 1 linear neuron;
•
training method: backpropagation algorithm with momentum and changing learning rates;
•
length of the training process  20.000 epochs;
•
testing method: 16crossfold validation.
The models' performance was compared on the basis of the mean absolute error:
(2)
where: yi  real outputs, y.  modelled outputs, n  number of data points. After the training process one model per each pair of attributes (nonoverfitted and characterised by the smallest mean absolute error) was chosen to the further analysis. The errors of the chosen models are presented in the tab. 3. This table shows that the most precise model was created when the method of subintervals of equal number of data points was applied in the discretisation process. Its error
Information Processing and Security Systems
429
was equal to 5.15%. The models created on the basis of two other discretisation methods were less precise: 5.70% (the method of average values) and 8.25% (the method of equal subintervals).
Fig. 4. A scheme of a neural network used in the modelling process
No.
Error [%]
No.
Error [%]
No.
Error [%]
No.
Error [%]
12 13 14 15 16 17
14.11 13.51 5.78 12.79 11.35 5.70
23 24 25 26 27
17.71 7.28 15.91 13.86 8.45
34 35 36 37 45
6.55 14.65 12.10 8.69 8.25
46 47 56 57 67
7.13 2.90 11.20 8.25 5.15
Tab. 3. The absolute errors of neural models built on the base of all pairs of attributes The application of the discretisation method of subintervals of equal number of data points allowed to gain better results than the application of the other methods. However, as it can be observed in the tab. 3, the chosen subset of attributes was not the best one. There is another pair of attributes in the tab. 3 which allowed to construct almost twice better fitted model than the pair chosen after application of the method mentioned above. Hence the question is: is it really reasonable to use rough set theory for determining inputs' significance in the systems of nondiscrete attributes? As it will be shown below, the answer is affirmative but only
430
under one condition  each continuous attribute describing the analysed system has to be discretised separately according to its own characteristic. That means the discretisation process has to be carried out with the expert method. The expert method which was used to discretise the attributes of the analysed system is based on placing the splitting lines in the discontinuous areas of the attribute's variation interval or in areas in which the slope of the data chain changes rapidly. The reason for such approach is that the discontinuous areas point to the borders of clusters existing in the data set and the slopes' changes point the areas of more (rapid areas) and less (flat areas) density. Naturally, in case of attributes which have no characteristic areas, the discretisation process can be carried out with one of the methods described in the section 2. In such case it is often of no importance which method will be used because lack of discontinuities and inflexions means that the data are equally distributed in the whole data range. The results of the application of the expert method to discretisation of attributes describing the analysed system are shown in the fig. 5.
Fig. 5. The attributes discretisation with the expert method; CA  conditional attribute (numbers like in the tab. 1); DC  decision attribute
Information Processing and Security Systems
431
After discretisation of the attributes of the analysed system, once again the approximation quality for each subset of attributes was calculated (tab. 4). As it can be noticed the application of the expert method in the discretisation process allowed to choose the real most important attributes for the analysed system, i.e. the attributes money supply and number of inhabitants.
No. 12 13 14 15 16 17
Approximation quality 0.100 0.042 0.017 0.100 0.133 0.192
No. 23 24 25 26 27 34
Approximation quality 0.000 0.017 0.017 0.000 0.200 0.042
No. 35 36 37 45 46 47
Approximation quality 0.017 0.033 0.175 0.117 0.158 0.208
No. 56 57 67
Approximation quality 0.067 0.108 0.058
Tab. 4. The approximation quality of the subsets containing different pairs of attributes (discretisation with the expert method)
4
Conclusion
The rough sets theory can be successfully applied in the process of determining inputs' significance not only in discrete but also in continuous systems. Naturally, before it will be applied in the system described by continuous attributes, all of them have to be discretised. In the real world it is very difficult to find systems which attributes behave in a similar way. Normally, each attribute has its own characteristic what causes it cannot be treated in the same way as the other ones. Hence, very important is to remember that the discretisation process should not be carried out with the automatic discretisation methods. Instead of this, each continuous attribute has to be discretised separately, after careful analysis, according to its own characteristic.
5
References
[1] Czerniak J., Zarzycki H., "Application of rough sets in the presumptive diagnosis of urinary system diseases", Proceedings on 9th International Conference, ACS'2002, pp. 4351, Kluwer Academic Publisher, London, 2003. [2] Demuth H. Beale M., "Neural Network Toolbox User's Guide", The Math Works Inc., Natick MA USA, 2000.
432
[3] Masters T., "Practical Neural Network Recipes in C++", The ScientificTechnical Publishing House, Warsaw, 1996. [4] Pelczar M , "Rough Sets Using for Medical Services Costs Modeling in Decision Support System", 9th International Conference, ACS'2002, Informa, Szczecin, 2002. [5] Rejer I., "The Method of Modeling the Multidimensional System with Artificial Intelligence Methods on the Example of an Unemployment in Poland", The Scientific Publishing House of the Szczecin University, Szczecin, 2003. [6] Rejer I., Piegat A., "A Method of Investigating a Significance of Input Variables in NonLinear HighDimensional Systems", Proceedings on 9th International Conference, ACS'2002, pp. 7380, Kluwer Academic Publisher, London, 2002.
SemiMarkov process in performance evaluation of asynchronous processors Wojciech Kadlubowski Technical University of Szczecin 26 Kwietnia 10, 71126 Szczecin, Poland email: wk@ps.pl
Abstract: A novel model of asynchronous processors for performance analysis is presented. The method of performance evaluation by SemiMarkov simulation is described. Keywords:
1
Simulation, asynchronous processor, SemiMarkov model
Introduction
There is no fixed cycle in asynchronous processors. All interactions between components are synchronised by handshaking. The state of processor is altered after each interaction. Usually trace driven simulators, in which simulation time depends on the length of benchmark, are used for performance evaluation of processors [1], [2]. The simulations have to be performed for each configuration of the processor and for each benchmark. In this method the trace of benchmark is analysed one time only. Simulation time is independent of length of the trace. Except performance evaluation, the stochastic method gives additional information about interactions between components of a processor. Researchers usually take into account an input cycle of the asynchronous processor [3], [4]. It describes average behaviour of the processor during execution of a program. There are also other important performance parameters like e.g. occupation of components. In order to get performance parameters we use a SemiMarkov process [5], which describes behaviour of asynchronous processor. For this process, steady state distribution is calculated and then performance parameter is obtained. Stochastic performance evaluation for synchronous processors is presented in [6].
434
2
Model of asynchronous processor
2.1
General Model
A general model of asynchronous processors consists of components whose work is synchronized by handshaking. We can consider both, superscalar and superpipeline microarchitecture (see Fig. 1). Presented pipeline includes n stages. Each stage consists of m components. It is assumed, that pipeline of length n can be split into p new pipelines (see Fig.2). . Each of p pipelines has length np.
n=l
n=2
n=3
n=N
m=l m=2 m=3 m=4 Fig. 1. The model of superpipelined and superscalar processor
n 1= 2 n1
n2
n
i=3
p=l
/
n3
P=2
m=l m2 •
m=4
p=3
—•
m3
\
\
np=l
nP=2
nP=3
P=P
Fig.2. The model of superpipelined and superscalar processor with many execution pipes
Information Processing and Security Systems
435
Stages riinn represent first stages of processor (e.g. instruction fetch, instruction decode, etc.). Each of p pipelines represents functional unit of the processor (e.g. integer unit, floating point unit etc.). Inorder execution of instructions is assumed. Instruction, after completion in current stage, is sent to the next stage as soon as possible (i.e. when next stage is idle). In order to perform stochastic simulations we need to analyse particular model of the processor and create related SemiMarkov process (see [7], [8]).
2.2
Detailed Model
Detailed model depends on the delays connected with each component. The model presented in this paper is different from the one described in [7], [8]. We drop the idea of execution phases related directly to input cycle of the processor and use clocks connected with each component. When instruction enters into component, appropriate clock is set up. Setup value kj max is equal to delay of component. Clock related to given component is counted down to zero during execution of an instruction. When clocks reaches zero, instruction can move to a next component (if next one is idle), or is blocked (if next one is busy). The model presented here is more flexible than in [7], and very useful in case of calculation performance parameter other than input cycle. The state of the model is represented by set of variables:
where  x, are connected with components of the processor and describe their idle or busy states. Variable Xj takes value 1 when there is an instruction in corresponding component and value 0 when the component is empty  ki represent execution phase of instruction in Xj and is described by related clock; ki max means delay of components  yj describe type of instruction in Xj  Sj describe dependency distance of instruction in xx from previous instruction (i.e. instruction in other components being ahead of pipeline); e.g. 0 means instruction is dependent on previous instruction, 1 instruction is dependent on second previous instruction (see [6]) The state of the model, which is defined above, allows the designer to look at the changes that occur in asynchronous processor. Processor move from one state to another when any instruction in the processor moves from one component to another one.
436
Number of variables Xj and kj in the detailed model may vary, when the model with other number of components is considered. When delays connected with the components are different, then values of kj are different as well. In every case we need to create separate SemiMarkov process.
3
Creation of SemiMarkov process
An analysis of the instruction trace can be done independently from the model of the processor (see [6], [7], [8]). The sequences of instructions are counted and sequence probabilities are calculated. Based on the processor model and instruction sequence a SemiMarkov process is constructed. This process describes behaviour of asynchronous processor. Next steady state probabilities are calculated and sojourn time for each state is estimated. Calculation of the performance parameters consists of two parts: • The matrices P and S, which define SemiMarkov process (see [5]), are created (P is probability matrix for Embedded Markov Chain and S describe sojourn time probabilities of the process). • Based on steady state distributions the performance parameters for asynchronous processor are calculated: o
Particular states related to given performance parameter are identified (e.g. states in which occur blocking of fetch component).
o
Probability of these states is calculated
o
Performance parameters are calculated
For example, when we have sojourn time of every state and steady state distribution of SemiMarkov process, then we can calculate average cycle of the asynchronous processor.
4
Case study
A simple model of asynchronous processor is considered in this paragraph. There are four components which represent Instruction Fetch (IF), Instruction Decoder (ID), Functional Unit X (FU_X  execute instructions of type X) and Functional Unit Y (FU_Y execute instructions of type Y) (see [7], [8]). The component IF represents fetch stage and ID describes decode stage of the processor. Functional unit X (Y) corresponds to execution stages for instruction X (Y).
Information Processing and Security Systems
437
A configuration of the processor is: m=l, n=2, p=2, ni nax =l, n2max=l d(IF) = 3 unit of time
d(ID) = 2 unit of time,
d(FU_X) = 5 unit of timed
d(FU_Y) = 8 unit of time
where d() means delay of a component 4.1
Definition of the state in the considered model
The state of the model for given delays is represented by variables: X j , X 2 , X3, X4, Kj, K 2 , K3, K4
where:  xi, x2 are connected with fetch and decode of the processor. They can take value: • x  when there is an instruction of type x • y when there is an instruction of type y • 0  when there is no instruction in component. In order to reduce state space, we combined (for fetch and decode) the type of instruction represented by variable y\ with occupation of components represented by xj in one variable Xj.  x3, x4 are connected with FU_X and FU_Y respectively. They can take value 1 when there is an instruction in corresponding component and value 0 when the component is empty.  kj represent execution phase in Xj describe by related clock; kj max means delay of components Presented model is very simple but useful in explanation of the main idea. Much more complicated models can be constructed, as well. For these delays the detailed model of asynchronous processor will be created. Different delay of components can introduce blocking or starving (see [7]) in interactions between them. In our example first two components (IF and ID) can suffer from blocking and instruction decoder ID can suffer from starving. The delays presented above were chosen in order to show nontrivial relation between components of the processor. In Tab.l, an example of the instruction trace is presented. We assume that asynchronous processor computes this trace in an infinite loop (i.e. after instruction i2i comes instruction i0, see Fig.3.). In Tab.2 several cycles together with related states of the processor are presented. First column includes states that correspond to model of the processor in [7]. Second column includes states of the processor for model discussed in this paper.
438
In next eight columns variables, that describe state of the processor, are presented. In last column sojourn time of the state is given.
instruction no.
io
ii
h
>3
14
15
instruction type Y
y
y
X
y
X
instruction no
112
113
114
115
X
X
X
y
in
instruction type X
17
18
19
Mo
y
X
y
X
Y
116
hi
Ms
119
120
h\
y
y
X
X
X
X
Tab. 1. Example of instruction trace
state in old model
state in current model
Xl
ki
x2
k2
x3
k3
x4
k4
Ti
0
0
y
3
X
2
1
5
0
0
5
1
1
y
3
y
2
1
5
0
0
2
2
y
1
0
0
1
3
1
8
1
3
y
3
y
2
1
2
1
7
2
4
y
1
y
0
0
0
1
5
5
3
5
X
3
y
2
0
0
1
8
8
4
6
X
3
X
2
0
0
1
8
2
7
X
1
0
0
X
5
1
6
1
2
Tab.2. The states of processor in first cycles In Fig.3 there are four charts (IF, ID, FU_X, FU_Y) which present instruction flow in components of the processor. There is time scale at the bottom of the picture. Above it, two periods are presented. Variable c0 and ci represent number connected to sojourn time in state in old model ([7], [8]), and in current model respectively. The instructions computed in components are shown as rectangles with instruction number; e.g. i3 means the third instruction. Each rectangle denotes execution period of the instruction in a component. We have to be careful, because the rectangle does not mean the occupation of the component by the instruction. For example rectangle i3 in the chart IF shows only the execution of instruction i3 in fetch (execution last 3 units of time). Later this instruction stays in fetch for next 2 units due to blocking of fetch by instruction decoder. Blocking is expressed by B and starving by S. An input cycle is defined as a period between the start of
Information Processing and Security Systems
439
execution of two consecutive instructions in fetch. A length of every input cycle can be different and is determined by timing relation between components.
B
B
B 1 i, ID I ISl i,
B
B
B
FU X
FU Y
cn=0
c.=l f—fCl =5
0
2
4
6
8
10
12
14
16 18
20 22 24 t
Fig.3. Instruction computed by asynchronous processor Identification of states and instruction sequences related to them is similar to identification described in [8]. The model presented here does not work with input cycle directly. Instead, we have periods Cj (see Fig.3), which represent duration of states. At the beginning of period ci=0 we have:  instruction i0 enters fetch
>xj=y, ki=klmax=3
 instruction \2\ enters decode
>x 2 =x, k2=k2max=2
 instruction i2o enters functional unit X
>x 3 =l, k3=k3max=5
 no instruction in functional unit Y
> x4=0,
k4=0
Execution in instruction decoder is completed (after 2 units of time), but the instruction is blocked because functional unit X is still busy (instruction ij). After
440
5 units of time execution of the instruction i2o is completed and processor goes to next state. At the beginning of the period Ci=l we have:  instruction ii enters fetch
> xi=y, ki=kimax=3
 instruction i0 enters decode
> x2=y, k2=k2max=2
 instruction \2\ enters functional unit X
> x 3 =l, k3=k3max=5
 no instruction in functional unit Y
> x4=0, k4=0
Execution in instruction decoder is completed after 2 units of time. Then instruction moves to next component and processor enters new state. At the beginning of the period Ci=2 we have:  instruction i\ is executed in fetch
> Xi=y, kj=l
 no instruction in decode
> x2=0, k2=0
 instruction i2] is executed in functional unit X> x 3 =l, k3=3  instruction i0 enters functional unit Y
> x4=y, k4= k4max=8
Execution in instruction fetch is completed (after 1 unit of time), and instruction moves to next component. Processor enters next state. At the beginning of the period Cj=3 we have:  instruction i2 enters fetch
> xj=y, ki=k]max=3
 instruction ij enters decode
> x2=y, k2=k2max=2
 instruction i2J is executed in functional unit X > x 3 =l, k3=2  instruction i0 is executed in functional unit Y > x 4 =l, k4=7 Execution of instruction in functional unit X is completed (after 2 units of time) and processor goes to new state. Matrix P for Embedded Markov chain is created according to method described in [7]. In next step steady state distribution TC for this chain is calculated. Then long time behavior probability cp for SemiMarkov chain can be computed according to equation (1) (see also [5]).
(pj  long time behavior probability of SemiMarkov chain 7Cj  steady state probability of Embedded Markov chain
Information Processing and Security Systems
441
Tj  average sojourn time in state j In Tab. 3 long time behavior probability distribution of SemiMarkov process is presented (for considered model and for given instruction trace). O2
q>o
q>4
0,0983
0,0393
0,0196
0,0393
0,0983
95
q>6
O7
98
99
0,15573
0,0393
0,0196
0,0786
0,0196
0.
(2)
As one can see, the solution of the Jeep Problem "has something to do with" a special sum called the harmonic number, H n = 2Ik=i k The solutions for successive n are 0,1,  , ff, j^f > f,... In a more general case not constrained to a natural n, when being given u + a as the quantity of fuel (with 0 < a < 1), the solution is
D

a
a
]
iv •
•
2nf 1 2
•
•


•
•
'
•
•
» '
t
forn>0>
(3)
where y is EulerMascheroni constant1 and \;o (z) the polygamma function2. The \ [•] term is another way of producing the value of H2 n — j^n, [6]. This paper describes an attempt of solving the Jeep Problem by means of a genetic algorithm. The author was trying to answer two main questions: 1. can a genetic algorithmfindthe best strategy for the jeep? — which can be understood roughly as finding the correct sequence of actions; 2. can a genetic algorithm find the optimal quantities going with the best strategy? — i.e. finding the right 1
The EulerMascheroni constant is defined by the following series y =limn>oo (Lk=i i — lnnj =limn4oo (Hn —Inn) y0.577215664901532860606512090082402431042.... It is not known if this constant is irrational, let alone transcendental [5]. 2 Polygamma function is given by the (n+ 1)st derivative of the logarithm of the gamma function V (z), \K (z) = ddzn+i ^n r (z) . Whereas P [z] is the extension of the factorial to complex and real arguments, r (z) = j£° \z~] e~l dt. See [7].
456
optimal values of distances to travel forth and back, and the optimal quantities of fuel to load or unload. The author executed experiments with both binary and realcoded GAs. Details of genetic operations as well as results are described in the paper.
2 Representation of the jeep strategy  chromosome coding In the Jeep Problem the strategy can be thought of as the sequence of actions, which describe the behavior of the jeep throughout the whole journey. A single action in the strategy could be represented by the following pair (type of action, value., where the type of action is one of: —> (go forth), f (go back), T (take fuel), I (leave fuel); whereas the value denotes the length of travel or the quantity of fuel. Figure 2 shows the phenotypic representation of exemplary chromosomes, being actually the solutions for n = 2 and n = 3.
strategy for n = 2: (( T j),(^,$),(i4),(^.J)JT,i),(».i),(T,i).(Ojy strategy for n = 3: ((T. i),
(». ^), a, t). (•, ^). IT. i), (». ^>, (T, ^). fk = 5> "H? > • • • > i f • Fig u r e 5 illustrates the binarycoding of chromosomes.
Actually, 4 would also be the next possible bit length for the value part, in case of n = 2.
Information Processing and Security Systems
457
As regards the realcoded chromosomes, their form is quite straightforward and corresponds strictly to the phenotypic representation, see figure 5. Genes for type of action and value interlace one after another. The type of action can be one of: {0,1,2,3,4}, whereas the value can be an arbitrary real number from [0;1] interval. For realcoded chromosomes it was necessary to introduce a small tolerance term e when executing the journey of an artificial jeep. This was to allow the jeep to take fuel (f action) from a certain neighborhood around its current position5 X k , i.e. from the range (xi< — £,*]< + e). Obviously, this is not needed for the binary version, which makes the jeep travel only to discrete positions. binarycoded chromosome: ( 0,0,1,1,1,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,1,1,0,1, 0,1,0,0,0,1,1,1,0,1,0,0,1,0,0,0,1,1,1,0,0,1,0,0^ corresponding strategy: ((>, H) , (T, £ ) , (T, £ ) , (_,, i  ) , (,__, ^ ) , (^ ^ ) , (T) ^ ) , (T>,
Fig. 4. Exemplary binarycoded chromosome representing a jeep strategy. Bits in bold represent the types of actions: 0,0 — g 0 forth; 0 , 1 — go back; 1> 0 — take fuel; 1> 1 — leave fuel. In the example, 4 bits were applied for each value part of j_
an action, providing the granularity of
15
.
realcoded chromosome: f 0,0.233,1,0.958,3,0.192,2,0.55,3,0.01,0,0.78883,1,0.5,2,0.119 corresponding strategy: ((>, 0.233), (,«,?,!}* KM)8.
(8)
3 Crossover, mutation, fitness function, penalty terms For the crossover operation, twopoint crossover was applied. Twopoint crossover is often recommended as having reduced susceptibility to so called: "positional bias", "hitchhikers", "endpoint effect" [4, pp. 171173] and proving most effective in typical GAs [3, 2]. In binarycoded chromosomes the slight simplification was applied, such that the randomized crossing points could have occurred everywhere but inside type of action bit string. Figures 6 illustrate the version of twopoint crossover that was applied respectively to binarycoded GAs. p a r e n t 1: [ 1 0 , 0 , 1 1 , 1 , 1 , 0 J l , O , b , 1 , 0 , 1 J l , O , b , 1 , 0 , i J o , O , 1 , 1 , 0,1 parent 2: [ 11,0, 0,1,0, offspring 1: ( [ 0 , 0 , i ,1,1,
1.1
MJI,
1,0,
offspring 2: f 11,0,0,1,0, Oj 1,0, 0,1,0,1 , 1,0, 0,1,0,1 , 0,0, i, 1, 1,1
Fig. 6. Illustration of twopoint crossover applied to the binarycoded GA for the Jeep Problem. Type of action parts were for simplification not breakable by crossover, which is illustrated by frames around them. As regards the mutation operation, in the binarycoded GA it was a usual flipflop on a bit picked up at random. In the realcoded GA, the mutation was of greater importance (mainly to the value genes) and was done according to the following equation7 for 0 < s < 0.5; , for 0.5 < s < 1,
This refers only to the value genes. The type of action genes, were simply flipflopped, since they are discrete.
460
where Aj denotes the so far value of the gene under mutation selected at random location j , Aj is the new value after mutation, s is the random number of uniform distribution taken from [0; 1) interval, I and u are respectively lower and upper bound of possible interval for a gene, and finally gCurrent stands for the index of current GA generation, g m a x stands for the maximum value of such index, while (3 is a heuristic exponent, usually chosen to be 2, which shapes how the generation factor (1 — gCUrrent/gmax) narrows the mutation neighborhood around Aj as the evolution progresses 8 . This equation provides with aggressive exploration early in the evolution and concentrated sharpened exploitation in later stages [2]. In experiments, three probabilities were used:
Pmutation PtypeOfActionMutation
PvaiueMutation.
to
Therefore,
setting
those
e.g.
Pmutation = 0.3,
PtypeOfActionMutation = 0 . 1 , PvaiueMutation = 0.7 can be understood so that around
30% of the whole population is selected to undergo mutation, then, from within those 30% around 10% of genes representing the type of action gets mutated and around 70% of genes representing the value gets mutated. The selection operation was fitness proportionate, i.e. typical roulette wheel selection was applied. It could be also possible to use the rank selection operation, which is often recommended for keeping good population diversity in early stages and preventing from the premature convergence on local optima. However, the similar effect was achieved by the mutation formula (9)9. The fitness function was 2n 2
f=
—
,
(10)
K  T  CO
where X2n2 2
is the final position of jeep after the last step, the term
2 x
X
Hk=2l ^ ~ k11 calculates the total distance the jeep has traveled throughout its "lifetime", and K, T, CU represent exponential penalty terms (their meaning will be explained later). The additional payment term — the total distance of the jeep journey, can be equally regarded as the quantity of fuel the jeep has consumed in its lifetime. In other words, being a lazy jeep did not pay.
8
If 3 = 1, the narrowing term (1  gCUrrent/gmax)p would be linear.
n
In some experiments also the mutation probabilities themselves were being extinguished along with the evolution progress using the(l — gcurrent/gmax)P factor.
Information Processing and Security Systems
461
The penalty terms were exponential with positive arguments starting from 0, so that when a certain jeep did nothing wrong then his penalty was equal to 1. Otherwise, the penalty term increased fast punishing jeep severely. The meaning of each penalty term was as follows, K was the penalty for the number of actions in the strategy the effective value of which was 0, e.g. if the jeep tried to make a traveling action but its tank was empty, or if the jeep tried to load some fuel, but there was no fuel available at its position, T was the penalty for going to negative positions (behind the base), cu was the penalty necessary only in case of realcoded GA, and it punished the jeeps that in many steps were traveling very short distances — distances within the range of tolerance e. Without cu it was quite interesting to discover that some jeeps in the population had developed a kind of cheating strategy consisting of many short moves, which allowed them to avoid penalty K. For example if £ = 0.02 the starting sequence (—>, 0.019), (, 1.0) would make the second action successful, since the fuel containers at the base are still accessible within the £ tolerance.
4
Building blocks
Several first experiments with GA showed that the populations contained too many chromosomes with not the right interlace of moving {—>,}. Much computation time was lost because of this, and it took many generations in the evolution to eliminate such chromosomes. E.g. in the chromosome the sequence (T>0.3) > (T>0.5) could be just as well replaced with (T,0.8). Because of this flaw, the author decided to introduce larger building blocks. One may notice that the optimal strategy can always be built from such three blocks: • take fuel and go forth — (T, some value), (—>, some value), • leave fuel and go back — (i, some value), (