Abnormal behaviors Modeling and detection. Some of the approaches developed in the CAnADA project
p. 35-71
Texte intégral
1The importance of intelligent tools that extract in real-time abnormal (suspect) behaviors of video surveillance data increased these last years, with the need of the security team to access informative data for security purpose. This is much more important with the explosion of the video surveillance systems in various environments such as airports, malls and subway stations. The objective is to develop a system that detects in real-time abnormal events from video-surveillance data in order to ameliorate the security.
2For example, the security team asks for video segments where a certain event happened (collapsing) during a week. Or ask questions, such as: What is the average frequency of abnormal events during a day considering all video streams?. What is the relationship between the crowd density and the abnormal event? What are the video streams with frequent abnormal events? What are the videos that are the most similar to this one (segment of video with specific abnormal event)? etc.
3So-called abnormal events can arise from individual or collective behaviors. In the first case, a given person can perform a set of actions that is known to be potentially dangerous. In the second case, that set of actions is conducted by a significantly large group of people (this is typically what can happen in crowded places).
4As a consequence, the methods that can be exploited to automatically detect abnormal events from video streams must fit to these two possible cases. This requirement led us to a dilemma: determining the activity of a particular individual required to focus on this one during a sufficiently large period of time, in order to extract relatively complex features vector. If this is possible for a relatively reduced amount of “targets” (say, less than a dozen), the computation cost required to do so for a crow (about a hundred of persons) is still too high to be achieved in real time. Thus, in the scope of the CAnADA project, we decided to develop two different approaches. The first, referred to as the “global analysis”, is designed to deal with crowded scene analysis, while the second one, named the “local analysis” is targeted to analyze individual actions or behaviors. The final decision making process can finally take into account both of the outputs provided by these two complementary methods, as illustrated in the above picture:
5The next sections detail these two different methods.
1. Global Analysis
1.1 Approach overview
6Our approach is articulated on a framework operating in three levels of features:
Low level: It concerns the low level features extracted directly from motion (interest points, blobs). We use the mixture of Gaussian to detect the foreground in case of low crowded (low density) areas, and optical flow vectors on points of interest for high crowded (high density) areas.
Intermediate-level: It concerns features that contain more semantics than low-level features, and are computed directly from the low level features, such as crowd density, trajectory, velocity, direction, acceleration, energy. For example, the density is the ratio of the blobs in the scene. The intermediate-level features are computed on low-level features (blob, interest points), and are stored in basic structures.
High level: It concerns features with more semantics than intermediate-level, and they are enough to take decision. In this level, we have the normal and abnormal events. The first level of features is generic and it is not domain dependent. However, it is too weak from a semantic point of view. The second level is not dependent upon the application domain; and it contains more semantics than the first level. The third level is completely dependent upon the application domain, with high level semantics, necessary to make contextual decision. In our case, the high level features concern the normality or the abnormality of the event in the video. Our approach extracts abnormal events (high level features) on the basis of the intermediate level features, themselves based on low-level features.
7More precisely, the approach starts by calculating the motion heat map during a period of time to extract the main regions of motion activity. The use of the heat map image improves the quality of the results and reduces the space for processing. Secondly, points of interest are extracted in the hot regions of the scene. Thirdly, optical flow is computed on these points of interest, delimited by the hot areas of the scene. Fourthly, variations of motions are estimated to discriminate potential abnormal events. We consider that we are in a crowd environment with no restrictions on the number of people.
1.2. Related Work
8In the approaches of the state of the art, various features have been proposed, depending on the target abnormal event they deal with. Furthermore, learning, adaptive, and incremental processes have been studied, generally suitable to categories of abnormal events. There are not solutions that deal with all abnormal events. There are two categories of approaches. The first category is related to crowd flow analysis, and the second category is related to abnormal event detection in crowd flows.
9The works in the first category estimate crowd density (Lin, Chen & Chao 2001; Ma, Li, Huang & Tian, 2004; Rahmalan, Nixon & Carter, 2006). Some describe a new technique for automatic estimation of crowd density, which is a part of the problem of automatic crowd monitoring, using texture information blazed on grey-level transition probabilities on digitized images (Marana, Velastin, Costa & Lotufo, 1997). Crowd density feature vectors are extracted from such images and used by a self organizing neural network which is responsible for the crowd density estimation. (Marana, Da Fontoura Costa, Lotufo & Velastin, 1999) propose a technique for crowd density estimation based on MINKOWSKI fractal dimension. Fractal dimension has been widely used to characterize data texture in a large number of physical and biological sciences. Finally (Marana, 2005) present a technique for real-time crowd density estimation based on textures of crowd images. In this technique, the current image from a sequence of input images is classified into a crowd density class. Then, the classification is corrected by a low-pass filter based on the crowd density classification of the last n images of the input sequence.
10Schlögl, Wachsmann, Kropatsch & Bischof (2001) address the problem of the evaluation of people counting systems. The relation between the real people number and the output of the people counting system is used to motivate the computation of the maximum number of people as a scenario and geometry specific quantity which supports the evaluation of the system output. Based on the camera field-of-view authors determine the maximum number of detectable people using a basic people model. The maximum number of people is computed for an overhead camera and for an arbitrary oblique camera orientation which is the typical geometry of standard surveillance applications.
11Lin, Chen & Chao (2001) goes one step further to estimate the number of people in crowded scenes in a complex background by using a single image. Therefore, more valuable information than crowd density can be obtained. There are two major steps in this system: recognition of the head-like contour and estimation of crowd size. First, the Haar wavelet transform (HWT) is used to extract the featured area of the head-like contour, and then the support vector machine (SVM) is used to classify these featured area as the contour of a head or not. Next, the perspective transforming technique used to estimate crowd size more accurately.
12These methods are based on textures and motion area ratio and developed static analysis for crowd surveillance, but do not detect abnormal situations. There are also optical flow based techniques (Boghossian & Velastin, 1999; Davies, Yin & Velastin, 1995) to detect stationary crowds or tracking techniques by using multiple cameras (Cupillard, Avanzi, Bremond & Thonnat, 2004).
13The second category detects abnormal events in crowd flows. The general approach consists of modeling normal behaviors, and then estimating the deviations between the normal behavior model and the observed behaviors. These deviations are labeled as abnormal. The principle of the approaches is to exploit the fact that data of normal behaviors are generally available, and data of abnormal behaviors are generally less available. That is why the deviations from examples of normal behavior are used to characterize abnormality. In this category, (Andrade, Blunsden & Fisher, 2006a) combines HMM and spectral clustering with principal component for detecting crowd emergency scenarios. The method was experimented in simulated data. (Ali & Shah, 2007) uses Lagrangian Particle Dynamics for the detection of flow instabilities; this method is efficient for segmentation of high density crowd flows (marathons, political events, etc.).
14In the same category, but for low crowded scenes, authors in (Stauffer & Grimson, 2000) propose a visual monitoring system that passively observes moving objects in a site and learns pattern activity from those observations, detect unusual events in the site that don’t have common patterns using a hierarchical classification.
15Also, authors (Boiman & Irani, 2007) address the problem of detecting irregularities in visual data as a process of constructing a puzzle: Regions in the observed data which can be composed using large contiguous chunks of data from the database are considered very likely, whereas regions in the observed data which cannot be composed from the database (or can be composed, but only using small fragmented pieces) are regarded unlikely/suspicious. The spatial and spatiotemporal appearance-based patch descriptors are generated for each query and for each database patch. The inference problem is posed as inference process in probabilistic graphical model. Other work is related to the same area of detecting abnormal events by incremental approaches (Davis & Bobick, 1997), (Xiang & Gong, 2006: 65-72), (Ivanov, Stauffer & Grimson, 1999).
16The state of the art related to indexing of abnormal events is relatively weak, (e.g., Xie, Hu & Peng, 2004), compared with the work on abnormal event detection or crowd flow analysis, presented in the previous paragraphs. (Xie, Hu & Peng, 2004) proposes a semantic based retrieval framework for track video sequences. In order to estimate the low-level motion data, a cluster tracking algorithm is developed. A hierarchical self-organizing map is applied to learn the activity patterns. By using activity pattern analysis and semantic concepts assignment, a set of activity models (to represent the general and abnormal behaviors of vehicles) are generated, which are used as the indexing key for accessing video clips and individual vehicles in the semantic level. The proposed retrieval framework supports various queries including query by keywords, query by sketch, and multiple object queries. Other work such as (Wang & Huang, 2002) develops a framework of human motion tracking and event detection, destined for indexing. However, no direct contribution to indexing has been realized.
17Several approaches of the state of the art concern specific event detections from video. No approach is perfect to deal with the targeted specific events. So an approach that deals with a specific event remains a challenge in itself. TRECVID uses the term of challenges instead of abnormal events or specific events detection from video. So, our application is a challenge in TRECVID vocabulary. Generally, the approaches of the state of the art deal with specific events and any adaptation to another application (specific event detection) needs s complete redesign of the approach.
18Our methodology in designing a framework composed of three levels of features contribute to make the approach more general and adaptable to various situations, mainly in the first and second level of features. In the second level of features (intermediate-level), we compute a number of features to cover the large band of applications. This doesn’t mean that our approach is generic and application independent. We are far of the generic approach. However, the organization of the features in this framework will certainly contribute to handling several abnormal events. We underline the fact that our approach results are promising.
1.3. Proposed Approach
19The hypothesis of our approach is to consider the detection of abnormal events in a crowded context from video surveillance data. The methodology followed is to consider that the detection of persons and tracking are difficult in the case of crowded situations; that is why there is no need to represent the person tracking in the intermediate-level of the features; instead, we consider metrics deduced automatically from the low-level features. The main idea is to study the general motion aspect, and more particularly sudden motion variation, instead of tracking subjects one by one. Tracking techniques in crowded scenes do not attempt good results, not to mention the fact that most of the methods don’t take into account factors like density, direction and velocity.
20The approach proposed is composed of several steps. First, the motion heat map is extracted. The heat map represents the motion intensities: hot area corresponds to high motion intensities, cold areas to low motion intensities, etc. Second, Harris points of interest are extracted in the hot regions of the scene. In the simplest case, applied in well limited areas, we consider a binary heatmap, white (movement), and black (no movement). In this case, the points of interest are applied into white regions. Third, blobs are extracted, using Gaussian mixtures. Fourth, optical flow is computed on the points of the interests. Fifth, many features are computed, including density, velocity, direction, etc. Sixth, in this step, we propose a high-level feature, called entropy, that classify events in abnormal/normal, on the basis of intermediate-level of features computed in the previous step.
21The 1st though the 5th steps do not depend on a specific application domain. They concern the extraction of low and intermediate levels of the features. The sixth step is dependent on the application domain. It deals with the challenge of detection of abnormal events from video. The majority of features are extracted in each frame.
1.4. Low-Level Features
1.4.1 - Motion heat map
22A heat map is a graphical representation of data where the values taken by a variable in a two-dimensional map are represented as colors. Motion heat map is a 2D histogram indicating the main regions of motion activity. This histogram is built from the accumulation of binary blobs of moving objects, which were extracted following the background subtraction method. Assume that symbols H and I indicate heatmap and intensity, respectively.
23H is defined as:
24where n is the frame number, and i & j are the coordinates (line and column) of the pixel (i, j) of the frame.
25The obtained map is used as a mask to define the Region of Interest (RoI) for the next step of the method. The use of heat map improves the quality of the results and reduces the processing time which is an important factor for real-time applicationFigure 2 shows an example of the obtained heat map associated to motions in an escalator exit. The results are more significant when the video duration is long. In practice, even for the same place, the properties of abnormal events may vary depending on the context (day/night, indoor/outdoor, normal/peak time, etc.). We built a motion heat map for each set of the conditions. When we consider that the abnormal events happen when the density of crowd is high, there is no need to analyze the whole scene, in particular the scene where there are few motion intensities, or no motions. So, the approach focuses on specific regions where the density of motions is high. The threshold related to the density elevation is the contextual information.
26Our approach can be seen as an extension of the preprocessing step of (Andrade, Blunsden & Fisher, 2006b) where the optical flows are limited to foregrounds. The extensions consider the sensitivity to motion intensities. The use of the heat map makes our approach more efficient in real-time processing.
1.4.2 - Points of interest
27In this step, we start with extracting the foregrounds, using Gaussian mixtures, and then we extract a set of points of interest from each input frame. We use a mask to define the regions of interest on which we extract points of interest. This mask is obtained from the heat map. In our approach, we consider Harris corner as a point of interest (Harris & Stephens, 1988). The Harris corner detector is a famous point of interest detector due to its strong invariance to rotation, scale, illumination variation, and image noise (Schmid, Mohr & Bauckhage, 2000). It is based on the local auto-correlation function of a signal; where the local auto-correlation function measures the local changes of the signal with patches shifted by a small amount in different directions. A discrete predecessor of the Harris detector was depicted by Moravec (Moravec, 1980); where the discreteness refers to the shifting of the patches.
28Assume a point (x, y) and a shift (Δx, Δy), and then the local auto-correlation function is defined as:
29Where I (xi, yi) denotes the image function and (xi, yi)are the points in the smooth circular window W centered on (x, y). The shifted image is approximated by a Taylor expansion truncated to the first order terms as:
30Where Ix(.,.) and Iy(.,.) denote the partial derivatives in x and y, respectively. Substituting the right hand site of Eq.2 into Eq. 1 yields:
31where:
32The 2×2 symmetric matrix M(x,y) captures the intensity structure of the local neighborhood. Let λ1 and λ2 are the eigenvalues of matrix M(x,y). The eigenvalues form a rotationally invariant description. There are three cases to be considered:
33If both λ1 and λ2 are small, so that the local auto-correlation function is flat, i. e., little change in c(x, y) in any direction, then the windowed image region is of approximately constant intensity.
34If one eigenvalue is high and the other is low, so the local auto-correlation function is rigidly shaped, then only shifts along the ridge (i.e., along the edge) cause little change in c(x, y) and there is a significant change in the orthogonal direction. This case indicates an edge.
35If both λ1 and λ2 are high, so the local auto-correlation function is sharply peaked, then shifts in any direction will result in a significant increase in c(x, y). This case indicates a corner.
36Figure 3 shows an example of Harris corner detector. We consider that in video surveillance scenes, camera positions and lighting conditions allow to get a large number of corner features that can be easily captured and tracked.
1.4.3 - Tracking points of interest
37Once we define the points of interest, we track these points over the next frames using optical flow techniques. For this, we use Kanade-Lucas-Tomasi feature tracker (Lucas & Kanade 1981). The Shi-Tomasi algorithm (Shi & Tomasi 1994) utilizes the smallest eigenvalues of an image block as a criterion to ensure the selection of features which can be tracked reliably by the Lucas-Kanade tracking algorithm.
38After matching features between frames, the result is a set of vectors:
V={V1…Vn|Vi = (Xi, Yi, Mi, θi) (1)
39Where:
40Xi: X coordinate of the feature i,
41Yi: Y coordinates of the feature i,
42Mi: distance between the feature i in the frame t and its matched feature in frame t+1,
43θi: direction of motion of the feature i.
44Video of images in Figure 4have been provided by VISUAL TOOLS (video surveillance company and partner of the project). They show the set of vectors obtained by optical flow feature tracking in two different situations. Image (a) of the figure shows an organized vector flow. Image (b) of the figure shows a cluttered vector flow due to the collapsing situation.
453.4 - Mi and θi
46Figure 5illustrates the feature i in frame t with its coordinate P(Xi,Yi) and it’s matched feature in frame t+1 with coordinate Q(Xi,Yi). We can easily calculate the Euclidean distance of these two points using the Euclidean metric as:
47The direction of motion (θm) of the feature i can be calculated using the following trigonometric function:
48where xi = QxiPxi and yi = QyiPyi. But there are several potential problems if we wish to calculate motion direction using Eq. 7; for example:
Eq. 8 does not work for a complete range of angles from 0° to 360°. Only angles between -90° and +90° will be returned, other angles will be out of phase. For instance, let us consider two defined points(x1 = 1, y1 = 1) and (x2 = –1, y2= –1). Using Eq.7, point (x2 = –1, y2 = –1) will produce the same angle as (x1 = 1, y1 = 1) but these are not in the same quadrant.
Points on the vertical axis have xi = 0, as a result, if we wish to calculate yi/xi we will get ∞ exception when calculated on the computer.
49In order to avoid these problems, we use atan2(yi, xi)function which takes both xi and yi as arguments. Henceforth, the accurate direction of motion θm of feature i can be calculated as:
50Where θ is the angle in [0, π/2]such that θ = atan (yi/xi). The sign function sgn(yi) can be defined as:
51Thus the function atan2 (y,x) gracefully handles infinite slope, and places the angle in the correct quadrant (e.g., atan2(1,1) = π/4 and atan2(–1, –1) = –3π/4)..
1.5. Intermediate-Level Features
52In this step, we define the intermediate-level features that will be necessary to induce the abnormal event.
1.5.1 - Motion area ratio
53It calculates the ratio between the number of blocks containing motion and the total number of blocks. We consider that the frame is divided in small blocks of an equal size. In the crowded scenes the area covered by the moving blobs is more important than those in non-crowded scenes. This measure is a form of density estimation. We divide each frame into the equal number of N× M blocks. For each block (i, j) we define:
54If there are several movements in one block, we count the cell as one movement. We count the total number of moving cells divide by the number of blocks. The Motion area ratio (MR) is defined as:
1.5.2 - Direction variance
55After calculating the mean angle θ of the optical flow vectors in the current frame, we calculate the angle variance of these vectors as:
56Where Vθ is the variance of the angle for the current frame and n is cardinality of the optical flows in the frame.
1.5.3 - Motion magnitude variance
57Observation shows that this variance increases in abnormal situations. With one or many people walking even in different directions, they tend to have the same speed; which means a small value of the motion magnitude variance. It is not the case in abnormal (e.g., collapsing situations, panic.) behaviors that often engender a big value for the motion magnitude variance (VM). If M is the mean of the motion magnitude then (VM ) can be defined as:
58where n is the number of optical flows in the frame.
1.5.4 - Direction histogram
59Direction histogram DH indicates the direction tendencies, and PCα indicates the number of the peaks. In the histogram each column indicates the number of the vectors in a given angle. The DH is associated to the frame that can be defined as:
60where DH(θi) is the normalized frequency of optical flows that have the same angle θi. The DH is a vector of size s where s is the total number of the angles considered: (-π, +π). Finally, the maximum number of peaks is:
61α is a constant, that indicates the range of peak intervals.
1.5.5 - Direction map
62Our current approach also considers the Direction Map which indicates the mean motion for each block of the video.
63The direction map is associated to a video, composed of a sequence of frames. The frame is composed of blocks.
64The block (i, j) of any frame of video is characterized by the mean motion vector MMVpn(i,j) To calculate this vector we use the history of p previous frames. We compute for each of these frames the average motion vector and thus the mean of all the average motion vectors for the p frames.
65where:
66n is the number of the current frame.
67p is the number of the considered previous frames is the number of the vectors in block (i, j) of frame k.
68is to the vector l situated in block (i, j)of frame k.
69The formula is decomposed as following:
70V(I,j) to mean motion situated in block (i, j) of frame k.
71And MMVpn(i,j) is the motion mean in p frames, previous to the current one (frame n).
72Figure 7presents the average motion vector computed for block (i, j) of frame k using all motion vectors of this block
73.
74Using p frames, a new mean motion vector is computed using the p average motion vectors computed previously (seeFigure 6). For the computation of the mean motion vector for the block B (i, j) we consider only the p frames for which the average motion vector is not equal to zero.
1.5.6 - Difference of direction map
75The difference DDm between the average direction till the frame m and the instant map corresponding to the current frame p increases in case of abnormal events.
76We consider that the average direction considers the l last frames. And the instant map considers the current p-th frame. The N×M represents the number of the blocks in a frame. The right image of Figure 6shows the mean direction in each block of the view field of an escalator camera. Some tendencies can be seen. In the blue region, the motion is from top to bottom. In the yellow region, the motion is from right to left.
1.6. High-Level Features
77The high-level feature concerns the decision of the event in the current frame is either normal or abnormal. We present the high-level feature related to the detection of collapsing events. This event is the scope of our project. Then, we present also another high-level feature related to the detection of opposite flow events. The objective is to show that the intermediate level-features used to deal with collapsing event detection, are also useful to deal with other challenges.
78The high-level feature is denoted entropy. It helps take decision whether the collapsing event happens or not. We developed a function noted E that extracts the feature at frame p. The function E depends on the motion area ratio (MR), direction variance (Vθ), motion magnitude variance (VM), peak cardinality in direction histogram (PCα), and the difference of the direction map behaviors at frame p. p is not explicated in the formula to keep the presentation simple. E is calculated at each frame. All these intermediate features have been well detailed in the previous paragraphs. They contribute individually to characterize certain aspects of the abnormal event. Generally, they are proportionally higher when the collapsing events happen in a crowded environment.
79The high-level feature E uses certain intermediate-level features directly related to collapsing, and presented above. This doesn’t mean that other intermediate level features are not appropriate and there are not other options to detect collapsing events, by combining differently the intermediate level features. It means simply that the current measure of E is in a good shape to be suitable for collapsing event detections, with focus on empirical prove, based on going experiments.
80When a collapsing event happens at the exit of escalator, the density of crowded motion (motion area ratio MR) increases instantly, because the any trouble at the escalator exit will increase the collision between people, while running. Imagine that a person doesn’t move at the exit, so the persons behind him, moved by the escalators, will be suddenly and abnormally stopped. So the escalator exit, where in normal situation the number of people should is limited, will observe a sudden increase of people that move in a disorder way. So, we observe motions of many people in the same region, and the motions observed are generally in disorder.
81Let us consider now the direction variance (Vθ). The direction variance calculates the variance of the angle θ. The observation of the collapsing events shows the increase of the direction variance. During few seconds, people that are stopped suddenly at the exit, try to move in all the ways, without success. Thus increase the variance of direction.
82The motion magnitude variance (VM) in abnormal events such as collapsing events or panic trigger various motion magnitudes, and then the variance is relatively high compared to normal events. When collapsing events happen, certain points of interests have high magnitude, others have low magnitudes. So, finally the variance of magnitudes is high. This is a typical characteristic of disorder motions.
83The number of peaks (PCα) of the direction histogram DH, is particularly greater in collapsing event than in normal situation. In collapsing event, we have many optical flows, with various directions, typically characteristic of a disorder of motions. So, the cardinality of directions increases, and then the number of peaks is important. However, in normal event in the escalator exit, the optical flows tend in the same directions, which is the normal direction of the exits in the escalator. Thus limit the number of peaks.
84Finally DDm calculates the mean of the difference between the average direction of the previous frames and the directions of the instant frame. This value is important when there are sudden changes of the directions. In our challenge, a person is walking in the direction of the exit of the escalator, and instantly he stops or shut down. So there is a sudden change of direction.
85These intermediate level features increase when the collapsing event happens at the exit, and they contribute at different aspects to characterize the collapsing. We consider that the entropy E that defines the collapsing events as presented in the following:
E = MR Vθ VM PCα DDm (24)
86In order to decide the normality or abnormality of an event on the basis of the entropy E, we need to compute the threshold th that classifies events in the normal and abnormal classes. If E is greater than the threshold th, then the event is abnormal; otherwise, it is normal. The entropy E depends on the controlled environment, namely, the distance of the camera to the scene, the orientation of the camera, the type and the position of the camera. The greater the distance of the camera to the scene; less the quantity of the low-level features (optical flows and blobs). When there is an increase in the movement of the objects in the images, there is more entropy. The dependency between the entropy and the controlled environment does not impact directly the measure itself; the measure of the entropy is invariant in the controlled environment. However, it impacts the threshold that decides the normality or abnormality of the event. A controlled environment means a video stream. As was said before, the controlled environment is composed of the position of the camera, the type of the camera, the distance of the camera to the scene, and the orientation of the camera. For this reason, we expect to have at least one threshold by a video stream. If we have n video streams, which are the cases in the sites such as airport, shopping mall, bank, play ground, etc., then we expect n thresholds. If the environment changes then the threshold should be regenerated.
1.7. Conclusion regarding this approach
87Our philosophy at this stage is to keep the methods suitable to the real-time requirements of the processing. Our effort and results obtained at this stage of the effort will be used as foundations and benchmark for the next period. The scope of the document concerns an approach that extracts parts of videos that corresponds to abnormal event detection, applied to collapsing detection. The approach calculates information such as density, direction and velocity and then the entropy of the motions and decides about the normality/abnormality of the content. We experiment our approach on different videos, corresponding to real world environment: incidents in airport escalator exits.
88We developed a method that detects the abnormality in a crowd flow, with considering a framework of features composed of three levels: low, intermediate and high. The low-level and intermediate features are not specific to collapsing detections. The framework is designed to support extensive extensions of features in these two levels.
89The high-level features are dependent of the application domain. We defined an entropy function suitable to detect collapsing situations in airport escalators. The entropy is sensitive to crowd density, velocity and directions.
90An important contribution of the paper is to propose a method, based on three level-features framework that deals with the application challenge: detection in real-time of collapsing events in a crowded environment. The targeted application is the escalator exists of an airport. However, it is adaptable to collapsing events to any other applications (e.g. doors, exists). An interesting contribution is the construction of a framework composed of three levels of features, including two levels of application-independent features. We mean by application-independent features, low (points of interest, blobs, optical flows) and intermediate level features (motion ration, direction map, etc.). They are certainly useful to deal with the challenge of collapsing event detections. However, they are not exclusively limited to collapsing events. They may be applied to deal with other abnormal events or challenges, in relation with the security, such as the detection of opposite flow events, as presented above, or sudden motion variation (typical behavior in panic situation) and other security challenges. So, we don’t claim that our method, based on the three level features framework, propose a generic method that detect abnormal events in video streams. So, our objective is not to present any evidence to demonstrate that the system is generic and application-independent. However, we consider our method as a contribution in this way. So, judging about the generic aspect of the framework needs further experiments on the detection of other events. However, this is out of the scope of our project. Again, we don’t claim that the system is generic and may be adaptable to other abnormal events. To claim this, we have to develop high-level features to detect several abnormal events. We simply insist that our framework may be extended by intermediate level features to deal with new abnormal events (high-level features).
91The experimental results are promising, and we certainly need much more data to decide definitely about the robustness of the method. We underline the fact that these data are confidential, and Madrid airport is very reserved to furnish us large quantities of video surveillance data on escalator exits. However, the method applied on the available videos (3 hours), showed promising results, on the basis of statistical evaluation (detection accuracy and recall) we used. The video surveillance data used in our work are confidential, and not available in public, that is why, we met some difficulties to obtain large quantity of video data for our experiments. Our partner (Visual Tools) needs in advance agreement of their customer (the owner of data - Madrid Airport) to use their videos in our experiment.
92The intermediate-level features are based on the results of low-level features (optical flows, points of interest, blobs). So the quality of the intermediate-level features depends of the quality of the low-level features. To limit the noisy in low-level features: Firstly, the method analyses the scene in certain regions defined by the heat map. So, only regions characterized by motions are considered. Secondly, the scene is decomposed on blocks of equal sizes and the points of interests are extracted in each block. Generally, the blocks contain several persons. So, we avoid the problems where the intermediate-level features are sensitive to textures. If a person wears a grid-dress-like cloth, there will be many corner points detected from his region, however limited to a few part of the block. So that motion directions in this frame are the same as the movement direction of the person, but limited to a small part of the block. Features like direction histogram will not be much distorted in this situation. So the size of the block is enough to limit the consequence of this situation. Thirdly, the blobs are extracted, only when ratio motion is low. In this case, we optimize the matching between the blobs and the associated to persons.
93There are several parameters used for intermediate-level features (e.g. block size, direction histogram size, number of previous frames analyzed in direction map, etc.). The settings of these parameters are held in two levels. The first one is the application: collapsing events in a crowded environment. For example, the direction histogram size is the total number of the angles considered: (-π, +π). We may consider 4 angles east, north, west and south. We can extend it with north-east, northwest, south-east and south-west. In the direction map, we considered only four directions – east, west, north and south. It is enough to show the direction map of the motions in escalator exits. Another application (e.g. Mall traffic) will consider suitable number of directions (8 at minimum). The number of previous frames analyzed in the direction map turns around few seconds, noted empirically. The range of peak intervals is generally little, we consider that we should have at least 3 peaks, to be significant for collapsing detection. The second level of settings is video streams: the block size N×M depends of the video stream, and more precisely depends of the distance between the camera and the observed scene. Any learning processes will be done at these two levels. In few words, the settings of these parameters depend of two levels context: the application and the video streams, and are set empirically.
94If the method is based on a framework composed of three level features, the high-level feature, entropy, is the most visible part of the framework. It is calculated on the basis of the following intermediate-level features: the motion ratio, the direction variance, the motion magnitude variance, the peak cardinality in the direction histogram, and the difference of the direction map behaviors. These features have one dimension, and they are normalized between 0 and 1. Each feature discriminates one aspect of the collapsing. They generally increase individually when there is disorder of motions such as collapsing. So, the product of these features result the entropy. We showed empirically that the entropy discriminates collapsing events.
2. Local Analysis
2.1 Overview
95This kind of approach relies on a set of preprocessing steps that can be seen as generic, owing to the fact in can be used in a very wide type of situations (indoor, outdoor) and are sufficiently application dependent. In fact, most of the systems providing a description of a specific individual behavior differ from each other by the way one or more of these steps are performed rather than exhibiting an “exotic” processing flow chart.
96In our case, the main steps of the local analysis are;
Video sequence segmentation
97At this level, entity of interest must be extracted from the video stream. Different techniques can be exploited, such as background removal or time derivation.
Tracking related features extraction
98Once “candidate” shapes are obtained from the previous step, a set of features, summarizing their main characteristics is calculated. The corresponding vector must be discriminative enough to singularize each of the observed entities, without including too many components in order not to become too complex to handle.
Filtering
99Using the features vector attached to each of the candidates, entities that are “too far” (using a specific metric) from the typical “entity of interest” (person) are pruned.
Robust tracking
100The remaining entities are then tracked along the video sequence. This tracking process use a similarity measurement between entities of interest, based on the tracking related features vector and a few assumptions regarding trajectory continuity over time. The output of this step is a set of “trajectories” (one for each entity). In this context, a trajectory not only contains simple data such as the positions of the center of gravity, but it also comprises the features vector.
Action recognition related features extraction
101From the previously obtained trajectory, time related characteristics attached to each of the formerly extracted “targets” can be calculated (thus, describing how the entities’ appearance evolves over time). Other type of information, such as colorimetric and textural characteristics can be added to this new feature vector.
Action classification
102On the basis of the last step’s output, classification techniques are applied in order to recognize the different type of action (walking slowly or quickly, running, etc) performed by the tracked individual along the trajectory.
103As we can see, applying this process to a video stream allows us to obtain a high level description of the observed scene from “raw” video data. This description, that is a serial of actions located in space and time, corresponds to the individual “activity” and can further be used to evaluate its “abnormal” or “unexpected” aspect.
104While all the steps of the previous flow chart are “generic”, the final decision making process is, conversely, deeply application dependant and context sensitive. For instance, running in a train station is something common (everybody had experienced being late and about to miss a train), while running in a commercial center let expect a potential robbery.
2.2 Local analysis implementation
2.2.1 The tracking process
105The approach we develop is similar to the global flowchart discussed in the introduction. The video stream segmentation is based on a background removal process, while the tracking by itself relies on a classical approach: after getting the entities of interest, a feature vector is computed for each of the candidates. This vector includes elements chosen to express the size, the location, the orientation and the pixels’ distributions along the image axis (x and y), more precisely, given the kth candidate shape in the current image, we find:
The coordinates (xBk, yBk) of the upper left corner of the bounding box including the candidate shape,
The width wk and the height hk of this bounding box,
The histograms Hvk and Hhk of the cumulated pixels of the shape along horizontal and vertical axis, respectively,
The image coordinates (xCoGk, yCoGk) of the shape’s center of gravity,
The eigenvalues {l1k, l2k} of the shape’s pixels distribution,
The principal axis direction Vk = (Vxk, Vyk)T, as a 2D vector.
The total number Nk of pixels in the candidate shape.
106The above picture illustrates the different parts of this features vector:
107Thanks to the descriptors associated to each of the candidate shapes in the image sequence, the tracking process itself consist in associating each descriptor in image n to its counterpart in image n+1. This association is based on the computation of a similarity measurement between two given descriptor. This metric takes into account the following relative variations between the two entities we try to match:
Bounding boxes aspect ratio,
area,
number of pixel,
orientation,
Maximum eigenvalue.
108In addition, the similarity between the pixels distributions is evaluated using the histograms (along the two possible directions). To do so, we compute the intercorrelation between these two signals (in both direction), that practically corresponds to a discrete convolution product whose value is normalized. As usually, the closest the value is to 1, the best is the similarity between the two distributions.
109The decision function, allowing use to make the final match between two shapes, is based on a Gaussian kernel applied to a vector including the previous relative variations and the intercorrelation values. It also takes into account the distance between the centers of gravity of the shapes whose similarity is evaluated. The respective weights of all of these parameters in the decision function should be tuned depending on the type of application. In practice, the tracking test we have made give very satisfactory results although all weights were set to 1 (none of the factors such as “shape” or “distance” were dominant).
110The following set of pictures illustrates this tracking process. The upper couple of figures show the extraction of the tracking related features, while the lower screenshot represents the set of individuals ‘trajectories obtained after the trajectory tracking process:
2.3 The classification process
111Many different types of classifier can be exploited to associate a given action to a subpart of a trajectory. Human activity characterization is a special topic that has been addressed in the literature from different point of views (Wang and Singh 2003, Turaga and Chellappa 2008). Several techniques are both applied in automatic activity and person recognition field by their gait as a biometric feature. In this application, we focus on recognizing some human activities like walking, running, hand waving and boxing in order to automatically detect abnormal activities in specific places like commercial spaces for example.
112In this approach, we focus on the human activity representation. We distinguish two categories of approaches for human activity analysis: model based approaches (Meyer and Posl 1998, Cunado and Nixon 1997, Wagg and Nixon 2004, Yam and Nixon 2004, Ning and Wang 2002, Fan and Bobbitt 2009) and appearance based approaches (BenAbdelkader and Cutler 2002, Collins and Gross 2002, Kale and Cuntoor 2003). Model based approaches incorporate shape and motion knowledge in the extraction process. They have the advantages of extracting parameters corresponding to used data. This allows measuring their relative importance in the recognition task (Wagg D.K. and Nixon M.S 2004). However, finding efficient model for one kind of activity (i.e. walking/running) requires the search of a complex and high dimensional parameter space. Thus, the use of a generic model for several human behaviors is quite complex and requires a long computing time.
113On the other hand, appearance based approaches typically use human silhouettes and derived features for person or activity recognition (Turaga and Chellappa 2008). These approaches which use spatio-temporal information from extracted silhouette have the advantages of being relatively faster and straightforward compared to model based approaches.
114Our study aims to establish an automatic activity recognition method based on spatio-temporal silhouette analysis measured during the motion of a person. Intuitively, recognizing people activities depends greatly on the changes of the silhouette shape over time in an image sequence. Since human activity is composed of a sequence of static body poses, our aim is to extract some discriminative and compact signatures from those body poses by considering their temporal variations. Eigenspace transformation based on PCA has successfully been used for gait and activity recognition (Wang and Tan 2003, BenAbdelkader and Cutler 2002, Verbeke and Vincent 2007, Guo and Miao 2008). Wang et al. (Wang and Tan 2003) propose a PCA based silhouette analysis approach for people recognition by their gait. Abdelkader et al. (BenAbdelkader and Cutler 2002) conducted PCA on a self similarity plot which is an efficient representation for repetitive motions computed via correlation on each pair of images in a sequence. More recently, Verbeke and Vincent (Verbeke and Vincent 2007), use eigenspace technique with projection map data in a lower dimensional space, while Guo et al. (Guo and Mia 2008) use PCA on subsequence frame histogram and apply Hidden Markov Models (HMM) for recognition of 9 different postures.
115We were inspired from Wang’s method (Wang and T 2003) to conduct a similar preprocessing step in order to obtain a first distance signal. Indeed, this preprocessing step is simple in essence and leads to promising results. Then, we perform LDA on time varying distance signals derived from a sequence of silhouette images in order to reduce the dimensionality of the input feature space. Fisher analysis based on LDA has been demonstrated to be a robust representation method in face recognition and image retrieval (Belhumeur and Hespanha 1997, Swets and Weng 1996). Based on these observations, this paper proposes a silhouette analysis-based activity recognition algorithm using both PCA and LDA.
116From the previous trajectories extraction process, using the serial of binary silhouette attached to a given individual, we deduce a fine and connected contour which can be browsed without ambiguity. Then a normalized distance vector is computed by contour unwrapping with respect to the silhouette centroid. Accordingly, the shape changes of these silhouettes over time are transformed into a sequence of distance vectors in order to approximate temporal changes of body motions. Finally, LDA is applied on those time-varying distance vectors to compute a discriminative components subspace of motion signatures. This training phase is followed by the classification module which consists of the extracted vectors projection on the built subspace and the determining of the person’s activity using standard non-parametric pattern classification techniques.
117To measure the effectiveness of the proposed algorithm, we have performed a number of experiments on the KTH database (Schüldt and Laptev 2004). This database contains six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed by 25 subjects in four scenarios. All sequences were taken with a static camera with 25 fps frame rate. The sequences have a length of four seconds in average.
118In our silhouette based recognition approach, we used the silhouette distance signals as an input to the subspace LDA method. We recognize walking and running activities using the LDA features. Every sequence consists of 20 frames. An average number of 70 sequences from each activity was used to build the feature space. In all experiments we compared our LDA recognition method with the PCA reference method (Wang and Tan 2003). The following frame summarizes some of the results we got.
119As the aim of these experiments was mainly to evaluate the effectiveness of the silhouette based feature vector, we have to keep in mind that these results are provided using a simple metric. No doubt that more sophisticated classifier such as SVM would produce much better results.
2.4 Conclusion regarding this approach
120We have proposed in this paper a recognition approach of human actions using Linear Discriminant Analysis. For each image sequence, a background subtraction algorithm is first used to segment and track the moving silhouettes of humans. Then, time-varying distance signals are derived from each sequence of silhouette images. After that, Linear Discriminant Analysis is applied to a learning set of these signals in order to deduce a new feature subspace. Experimental results on KTH Benchmark database show that the proposed approach has encouraging recognition performances and outperform state of the art PCA based approach. Further use of more powerful classifiers should lead to even better performances regarding the activities recognition step.
3. General Conclusion
121The work conducted during this project led to a system that is able to analyze scenes from a “local” (individual) or “global” (crowd) point of view. As stated before, we mostly put the stress on parts that are generic enough to be considered as application independent.
122In fact, a big part of the remaining tasks are related to the final decision process, that is, excepts in simple cases were the actions or the events by themselves are relevant enough to warn or alarm the end user, many of the dangerous situations are a set of complex combinations of activities and events that are deeply connected to the context.
123In this condition, designing a system that can handle all possible situations seems to be unrealistic. On the other hand, developing a full set of specific systems (one for each application) is clearly a painful task. As a consequence, one possible direction for our further works consists in implementing on line learning algorithms targeted for the decision making process. This way, the system will adapt itself to the application context. The main drawback of this approach is the necessary period of time during which the system will “learn” and will not be directly functional. Conversely, the main advantage is a fully generic structure.
Bibliographie
Des DOI sont automatiquement ajoutés aux références bibliographiques par Bilbo, l’outil d’annotation bibliographique d’OpenEdition. Ces références bibliographiques peuvent être téléchargées dans les formats APA, Chicago et MLA.
Format
- APA
- Chicago
- MLA
Andrade E. L., Blunsden S. and Fisher R. B., Hidden markov models for optical flow analysis in crowds. Pattern Recognition, 2006a, ICPR 2006, 18th International Conference, 2006, vol. 1, pp. 460-463
10.1109/ICPR.2006.621 :Andrade E. L., Blunsden S. and Fisher R. B., Modelling crowd scenes for event detection, in ICPR ‘06: «Proceedings of the 18th International Conference on Pattern Recognition», Washington, DC, USA, 2006b, IEEE Computer Society, pp.175-178
10.1109/ICPR.2006.806 :Belhumeur P., Hespanha J. and Kriegman D., Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, July 1997, vol. 19, pp. 711-720
10.1109/34.598228 :BenAbdelkader C., Cutler R. and Davis L., Stride and Cadence as a Biometric in Automatic Person Identification and Verification. Proc. FGR, 2002, pp. 372-377
10.1109/AFGR.2002.1004182 :BenAbdelkader C., Cutler R. and Davis L., Motion-Based Recognition of People in EigenGait. Space. Proc. International Conference on Automatic Face and Gesture Recognition (FGR), 2002.
10.1109/AFGR.2002.1004165 :Boghossian B. and Velastin S., Motion-based machine vision techniques for the management of large crowds. Electronics, Circuits and Systems, 1999, in Proceedings of ICECS ‘99. The 6th IEEE International Conference, vol. 2, 1999, pp. 961-964
10.1109/ICECS.1999.813392 :Boiman O. and Irani M, Detecting irregularities in images and in video, International Journal of Computer Vision, vol. 74, n° 1, 2007, pp. 17-31
10.1007/s11263-006-0009-9 :Collins R. T, Gross R. and Shi J., Silhouette-based Human Identification from Body Shape and Gait. Proc. FGR, 2002, pp. 351-356
10.1109/AFGR.2002.1004181 :Cunado D., Nixon M.S. and Carter J.N., Using Gait as a Biometric, via Phase-Weighted Magnitude Spectra, In: J. Bigün, G. Borgefors, G. Chollet, (eds.) AVBPA 1997. LNCS, Springer, Heidelberg, 1997, vol. 1206, pp. 95-102
10.1007/BFb0015972 :Cupillard F., Avanzi A., Bremond F. and Thonnat M., Video understanding for metro surveillance, Networking, Sensing and Control, 2004, IEEE International Conference, vol. 1, 2004, pp. 186-191
10.1109/ICNSC.2004.1297432 :Dabin J., La technique de l’élaboration du droit positif spécialement du droit privé, Bruylant, Sirey, 1935
Davies A., Yin J.H. and Velastin S., Crowd monitoring using image processing, Electronics & Communication Engineering Journal, vol. 7, n° 1, 1995, pp. 37-47
10.1049/ecej:19950106 :Davis J. W. and Bobick A. F., The representation and recognition of action using temporal templates, M.I.T Media Laboratory Perceptual Computing Section Technical Report n° 402, in CVPR’97, 1997
Douadi L, Khoudour L., Chaari A. and Boonaert J., Full Motion Detection System with Post-Processing, IEEE International Conference on Image Processing Theory, Tools and Applications, Paris, France, July 2010.
10.1109/IPTA.2010.5586768 :Fan Q., Bobbitt R., Zhai Y., Yanagawa A., Pankanti S. and Hampapur A., Recognition of repetitive sequential human activity, In CVPR, 2009.
10.1109/CVPR.2009.5206644 :Guo P., Miao Z. and Yuan Y., Posture and Activity Recognition Using Projection Histogram and PCA Methods. Proc. of IEEE Computer society, Congress on Image and Signal Processing, 2008, vol.2, pp.397-401
10.1109/CISP.2008.367 :Harris C. and Stephens M. J., A combined corner and edge detector, Alvey Vision Conference, 1988, pp. 147-152
10.5244/C.2 :Ivanov A. B. Y., Stauffer C. and Grimson W. E. L., Videosurveillance of interactions, in CVPR Workshop on Visual Surveillance, 1999
Kale A., Cuntoor N., Yegnanarayana B., Rajagopolan A.N and Chellappa R., Gait Analysis for Human Identification, Proc. AVBPA, 2003, pp. 706-714Lin S.-F., Chen J.-Y. and Chao H.-X., Estimation of number of people in crowded scenes using perspective transformation, in « Systems, Man and Cybernetics », Part A, IEEE Transactions, vol. 31, n° 6, 2001, pp. 645-654
10.1007/3-540-44887-X :Lucas B. and Kanade T., An iterative image registration technique with an application to stereo vision, Proceedings of the International Joint Conference on Artificial Intelligence, n° 1, 1981, pp. 674-679
Ma R., Li L., Huang W. and Tian Q., On pixel count based crowd density estimation for visual surveillance. Cybernetics and Intelligent Systems, in IEEE Conference, vol. 1, 2004, pp. 170-173
Marana A. N., Cavenaghi M. A., Ulson R. S. and Drumond F. L., Real-Time Crowd Density Estimation Using Images, in ISVC, « First International Symposium », 2005, Lake Tahoe, NV, USA, 5-7 décembre 2005
10.1007/11595755 :Marana A. N., Da Fontoura Costa L., Lotufo R. A. and Velastin S. A., Estimating crowd density with Minkowski fractal dimension, in IEEE international Conference « Proceedings of the Acoustics, Speech, and Signal Processing, 1999 », vol. 6 (15-19 mars 1999), 1999
10.1109/ICASSP.1999.757602 :Marana, A. N. Velastin S. A., Costa L. F., Lotufo R. A., Estimation of crowd density using image processing, in « Image Processing for Security Applications », IEE Colloquium, 10 Mars 1997, Digest n° 1997 / 074, vol. 11, pp. 1-8 Moravec H., Obstacle avoidance and navigation in the real world by a seeing robot rover, Technical Report CMU-RI-TR-3, Carnegie-Mellon University, Robotics Institute, 1980
10.1049/ic:19970387 :Ning H. Wang L., Hu W. and Tan T., Articulated Model-Based People Tracking Using Motion Models. Proc. Int. Conf. on Multimodal Interfaces, 2002.
Meyer D., Posl J. and Niemann H., Gait Classification with HMMs for Trajectories of Body Parts Extracted by Mixture Densities, Proc. BMVC, 1998, pp. 459-468
10.5244/C.12 :Rahmalan H., Nixon M. S. and Carter J. N., On crowd density estimation for surveillance, in International Conference on Crime Detection and Prevention, 2006
10.1049/ic:20060360 :Schlögl T., Wachmann B., Kropatsch W. and Bischof H., Evaluation of People Counting Systems, in Proceedings of Workshop of the AAPR/OAGM 2001, Berchtesgaden, Germany. 2001, pp. 49-53
Schmid C., Mohr R. and Bauckhage C., Evaluation of interest point detectors, International Journal of Computer Vision, vol. 2, n° 37, 2000, pp. 151-172
Schüldt C., Laptev I. and Caputo B., Recognizing human actions: a local SVM approach, in Proceedings of the 17th International Conference on Pattern Recognition (ICPR ‘04), UK, August 2004, Cambridge, vol. 3, pp. 32-36
10.1109/ICPR.2004.1334462 :Shi J. and Tomasi C., Good features to track, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1994, pp. 593-600
Stauffer C. and Grimson W. E. L., Learning patterns of activity using real-time tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, n° 8, 2000, pp. 747-757
10.1109/34.868677 :Swets D.L and Weng J., Using discriminant eigenfeatures for image retrieval, IEEE Trans. Pattern Analysis and Machine Intelligence, 1996, vol 18, pp. 831-836
10.1109/34.531802 :Turaga P. and Chellappa R., V.S. Subrahmanian, O. Udrea, Machine Recognition of Human Activities: A Survey, IEEE T-CSVT, 2008.
10.1109/TCSVT.2008.2005594 :Verbeke N. and Vincent N., A PCA-Based Technique to Detect Moving Objects, 15th Scandinavian Conference on Image Analysis (SCIA 2007). 10-14 juin 2007, Aalborg, Danemark.
10.1007/978-3-540-73040-8 :Wagg D.K. and Nixon M.S., On automated model-based extraction and analysis of gait, in Proceedings of the 6th IEEE Inter-national Conference on Automatic Face and Gesture Recognition, Seoul, Korea, May 2004, pp. 11-16
10.1109/AFGR.2004.1301502 :Wang J.J.L and Singh S., Video analysis of human dynamics – A survey, Real-Time. Imaging, 2003, 9(5): 321-346
10.1016/j.rti.2003.08.001 :Wang R. R. and Huang T., A framework of human motion tracking and event detection for video indexing and mining, DIMACS Workshop on Video Mining, 2002
Wang L., Tan T., Ning H. and Hu W., Silhouette analysis based gait recognition for human identification. IEEE Trans. Pattern Anal. Mach. Intell, Dec. 2003, vol. 25, no. 12, pp. 1505-1518
10.1109/TPAMI.2003.1251144 :Xiang T. and Gong S., Incremental and adaptive abnormal behavior detection, in IEEE International Workshop on Visual Surveillance, 2006, pp. 65-72
Xie T. T. D., Hu W. and Peng J., Semantic-based tracffic video retrieval using activity pattern analysis, International Conference on Image Processing, vol. 1, 2004, pp. 693-696
Yam C. Nixon M.S. and Carter J.N., Automated Person Recognition by Walking and Running via Model-based Approaches. Pattern Recognition, 2004, 37(5) 1057–1072
10.1016/j.patcog.2003.09.012 :Auteurs
Doctorant, Université des sciences et techniques de Lille 1.
Laboratoire d'Informatique Fondamentale de Lille, Université des sciences et techniques de Lille 1.
Enseignant-chercheur, informatique, Ecole des Mines de Douai, ARMINES. Il dirige le projet CanADA.
Professeur en Informatique, rattaché au LIFL, Laboratoire d’informatique fondamentale de l'Université des Sciences et Technologies de Lille 1 (UMR 8022 CNR)
Le texte seul est utilisable sous licence Licence OpenEdition Books. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.
L’Allemagne change !
Risques et défis d’une mutation
Hans Stark et Nele Katharina Wissmann (dir.)
2015
Le Jeu d’Orchestre
Recherche-action en art dans les lieux de privation de liberté
Marie-Pierre Lassus, Marc Le Piouff et Licia Sbattella (dir.)
2015
L'avenir des partis politiques en France et en Allemagne
Claire Demesmay et Manuela Glaab (dir.)
2009
L'Europe et le monde en 2020
Essai de prospective franco-allemande
Louis-Marie Clouet et Andreas Marchetti (dir.)
2011
Les enjeux démographiques en France et en Allemagne : réalités et conséquences
Serge Gouazé, Anne Salles et Cécile Prat-Erkert (dir.)
2011
Vidéo-surveillance et détection automatique des comportements anormaux
Enjeux techniques et politiques
Jean-Jacques Lavenue et Bruno Villalba (dir.)
2011