Presentation is loading. Please wait.

Presentation is loading. Please wait.

Erik de Nooij, IT Chapter Lead Fraud&Cybersec.

Similar presentations


Presentation on theme: "Erik de Nooij, IT Chapter Lead Fraud&Cybersec."— Presentation transcript:

1 Erik de Nooij, IT Chapter Lead Fraud&Cybersec.
StreamING models Realtime model deployment of ML capabilities Erik de Nooij, IT Chapter Lead Fraud&Cybersec.

2 Who Am I? IT Chapter Lead within the Fraud & Cybersecurity department, based in Amsterdam Before ING implemented Enterprise Software, mainly knowledge management and CRM related Background in: Scala, Java, C# (MCSD), Tomcat, Websphere, Oracle, Cassandra and now….Flink

3 About ING

4 About ING Worldwide 35 Million customers 51.000 Employees
Presence in over 40 countries Netherlands 9 Million Customers Billion logins yearly on 1 million transactions per day The Netherlands Market leaders Benelux Growth markets Commercial Banking Challengers

5 Threats related to fraud & cybersecurity
Fake ID Skimming Phishing APT ? Criminal organization Individuals Small groups worldwide groups Organized crime Manual detection Rule based detection Response Model based detection Scanomaly detection

6 Carbanak APT (Advanced Persistent Threat)
This started via a phishing …

7 Goals Support various types of (ML) models
Tools to create models versus scoring models One codebase, SaaS deployment model Make changes instantly (no downtime) Multiple domains

8 Goals Support various types of (ML) models
One codebase, SaaS deployment model Pre-processor, Decoupled architecture Make changes instantly (no downtime) Multiple domains

9 Goals Support various types of (ML) models
One codebase, SaaS deployment model Make changes instantly (no downtime) Use case Feature extraction Enriching streams End user tooling Demo Multiple domains

10 Goals Support various types of (ML) models
One codebase, SaaS deployment model Make changes instantly (no downtime) Multiple domains examples

11 Support various types of models

12 Creating models offline, scoring online
Model creation HDFS offline Model execution Streaming platform online Portable model <PMML /> {PFA}

13 Predictive Model Markup Language (PMML)
The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format if field1 > 500 AND field2 == 1 field3 > 1 <SimpleRule score="Alert" weight="1.0"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="field1" operator="greaterThan" value="500"/> <SimplePredicate field="field2" operator="equal" value="1"/> <SimplePredicate field="field3" operator="greaterThan" value="1"/> </CompoundPredicate> </SimpleRule>

14 Predictive Model Markup Language (PMML)
The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format

15 Machine learning tools supporting pmml

16 Model scoring using OpenScoring.io library
Parse the pmml file(s) Pass on the Feature Set to the model(s) Run the ‘predict’ function which returns the output of the model(s) Control stream Data stream Score Feature sets model scoring

17 Supported models Supported models(*) Association rules Regression
Cluster model Rule set General regression Scorecard Naive Bayes Support Vector Machine k-Nearest neighbours Tree model Neural network Ensemble model (*) supported models by

18 Goals Use of various types of models
One codebase, SaaS Deployment model Pre-processor, Decoupled architecture Make changes instantly (no downtime) Multiple domains

19 One Bank Strategy Market leaders Benelux Challengers Growth markets
Commercial Banking Challengers

20 How flexible is this architecture?
Amount = “42,00” Feature extraction & Model scoring Amountincents = 4200 Amount = 42.00

21 Decoupled architecture
Amount = “42.00” Pre- Processor Feature extraction & Model scoring Amountincents = 4200 Amountincents = 4200 Business events Amount = 42.00

22 Goals Use of various types of models
One codebase, SaaS Deployment model Make changes instantly (no downtime) Use case Feature extraction Enriching streams End user tooling Demo Multiple domains

23 Use case Your phone with the banking app installed is stolen
Limit on the banking app is 1.000,- Funds are transferred from your account (A) to a mule account (B)

24 Model features and model output
Amount > 500 NrOf Trxs Last 1h First Trx <24h ago Alert || OK Model

25 Stream with stateless operators
B 1000 Ev.1 Amount, Unknown, PrevTrxs (1000, ?, ?) FeX PMML Feature extraction Model scoring

26 Stream with stateful operators
B 1000 Ev.1 A B 1000 Ev.2 Amount, Unknown, PrevTrxs Amount, Unknown, PrevTrxs Alert || OK Alert || OK (1000, true, 1) (1000, true, 0) FeX PMML Model scoring STATE Key Value (A,B, FirstTrx) Ev.1 (A,B, HistoricalTrxs) ev11000 Key Value (A,B, FirstTrx) Ev.1 (A,B, HistoricalTrxs) ev11000, ev21000

27 How to perform aggregate functions on a stream?
Average amount last week: € 37,04 Average amount last week: € 37,04 Max amount last month: € 834,12

28 Enriching the stream based on multiple keys
1000 Ev.1 Aggregation step Calculating features A B IP 1000 Ev.1 Split A,E,I .. A 192.x.x.1, 192.x.x.5 A’ B D,F .. B 192.x.x.2, 192.x.x.6 B’ A.B C G, H .. A.B’ 192.x.x.3, 192.x.x.7 IP J, K .. IP’ 192.x.x.4, ……. Accounts are distributed across the task managers

29 Aggregating and model scoring
Aggregation Model Scoring Amount (A.B).FirstTrx (A.B).NrTrxs A B IP 1000 Ev.1 (A.B’, 1000) (B’) B’ A.B’ B’ IP’ …. A B IP 1000 Ev.1

30 Domain Specific Language (DSL)
A DSL is a domain specific language. We use it to define the behaviour of our operators. The persist rules (which data to store within state) Feature calculation rules Model definition rules

31 Definition instead of code - Persist rule
history[double, := $amount

32 Feature Calculation rules
NrOf Trxs Last 1h $eventtime,$eventtime-1hour)); First Trx A to B <24h @(sourceAccntNr.destAccntNr).FirstUsed >= $eventtime-24hours;

33 Creating models offline, scoring online
Model execution Streaming platform online Model creation DSL HDFS Portable model <PMML /> {PFA} Data scientist with offline tooling

34 Control streams

35 Streaming in the defintions
DSL files Broadcast Model definitions Feature calculation rules Persist rules Split Fex & Model scoring

36 Demo

37 Goals Use of various types of models
One codebase, SaaS Deployment model Make changes instantly (no downtime) Multiple domains

38 Multiple domains – ponder on this
We have built a feature-extraction engine and used that to make a Fraud-Risk Engine Can we also build this?…. Customer Notifications? Calculating RFQ’s for Bond Prices? Product Fullfilment engine? Other?

39 Take aways Decoupled architecture with preprocessor
Enriching events with multiple keys End users making changes Multiple domain


Download ppt "Erik de Nooij, IT Chapter Lead Fraud&Cybersec."

Similar presentations


Ads by Google