Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final Project – Anomalies Detection

Similar presentations


Presentation on theme: "Final Project – Anomalies Detection"— Presentation transcript:

1 Final Project – Anomalies Detection
Nizan Ifrach Ori Peri Supervisor: Dr. David Ben-Shimon

2 The problem Our problem is to try to detect anomalies in a set of tracks in an urban area that consist from a sequence of points.

3 The goal Given a set of tracks, the main goal is identifying the anomalies tracks according to a set of configured attributes using Machine Learning algorithms.

4 Data-set background The data (input) : Microsoft’s GeoLife project – building social network using human locations. The GPS trajectory dataset was collected in GeoLife project in a period of over three years (from April 2007 to August 2012).

5 Data-set background This dataset contains 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours. These trajectories were recorded by different GPS loggers and GPS phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point.

6 General Details Total number of points – 2400046
Total number of tracks – 5376 Total number of users – 24

7 Data example The data is divided to directories according to different users. Each file is identified by following fields (columns), from left to right : Latitude, Longitude, a field that is always set to 0 in this dataset, Altitude, Date ,Date as a string, Time as a string.

8 Data example Each user logged his transportation mode which he used for each time of the day.

9 Data Processing Parse Load Cross-checking
Parse trajectories and transportation intervals files Load Loads parsed data to SQL database into relevant tables Cross-checking Cross information between trajectories and transport intervals Creating tracks

10 Data Processing Filter Location Filter Transportation
Cleaning corrupted data GPS errors Same track at 2 different coordinates at the same time Filter Location Filter tracks relevant only to Beijing area. Filter Transportation Filtering to relevant transport modes. Data Processing

11 Data Manipulation LEVEL1 LEVEL 2 LEVEL 3
Tracks were split according to their transport mode. Track with n different transport modes has been separated into n different tracks LEVEL 2 Jump (in distance and time) between two sequential points lead to separating the track into 2 different ones. LEVEL 3 Dividing Beijing map into small cubes, for improving learning data by gathering data from the same (small) area together.

12 Beijing Map Map coordinates: (latitude, longitude) Top – (40.1,116.55)
Bottom – (39.8,116.2)

13 Transportation Modes Walk Bike Car Bus
* After filtering, original transportation modes includes Boat, Airplane, etc.

14 Beijing Map – Walk Tracks

15 Beijing Map – Bike Tracks

16 Beijing Map – Car Tracks

17 Beijing Map – Bus Tracks

18 Transport Distribution

19 Walk Tracks

20 Bike Tracks

21 Car Tracks

22 Bus Tracks

23 HBOS Algorithm What: Histogram-based Outlier Score – a fast unsupervised anomaly detection algorithm. Why: relatively very good results, easy implementation.

24 HBOS Algorithm HOW: HBOS algorithm is based on the common assumption that holding a unique result, regardless if it is high or low, is making you relatively anomalous. Let’s say that we have 20 cars driving in some dense urban area inside Beijing, with the following behavior: A) 18 cars are driving in an average speed of 30 km\h. B) 2 cars are driving in an average speed of 50 km\h.

25 HBOS Algorithm X axis : average speed, bins are separated every 10 km\h. Y axis : frequency.

26 HBOS Algorithm From HBOS point of view, the 2 cars that were driving in an average speed of 50 km\h, can be considered as irregular, i.e. anomalous. Anomalous detection in HBOS is based on grades, and the grades are based on the histograms.

27 HBOS Algorithm Each one of the 18 tracks of group A ( cars driving 30 km\h) will get a grade of 1\18 (1 divide their bin height). Group B tracks will get a grade of 1\2. Tracks with high grade are marked as potentialy anomalies tracks. While dealing with multiple attributes (e.g. average speed) the idea is calculating each attribute grade separately, and them multiply the grades to one final grade.

28 HBOS Algorithm We calculated grades for each track for the following 3 attributes: ) Average speed rate ) Direction ) Ineffectiveness

29 The Attributes 1) Average speed – sum speed of all 2 sequential points. Calculating average. 2) Direction – direction was measured between the first and last point. 3) Ineffectiveness – sum distance between all 2 sequential points in the tracks. Divide it by the distance between the first and last point in the tracks.

30 Walk – average speed

31 Walk – direction

32 Walk – ineffectiveness

33 Walk – final grades histogram

34 Anomalies Next are some slides presenting anomalies according to :
1) Average speed grades 2) Direction grades 3) Ineffectiveness grades 4) Final grades Anomalies tracks are highlighted in red (2 percent of all walking tracks).

35 Average speed grades anomalies

36 Direction grades anomalies

37 Ineffectiveness grades anomalies

38 Final grades anomalies

39 Final grades anomalies
User id = 31 Track id =264 Average spead = Direction = Ineffectiveness= Final grade = 0.002

40 Final grades anomalies
User id = 31 Track id =119 Average spead = Direction = Ineffectiveness= Final grade =

41 Problems 1) Selecting attributes 2) Weighted attributes
3) Size of bins 4) Dividing to cubes – different topography inside the cube


Download ppt "Final Project – Anomalies Detection"

Similar presentations


Ads by Google