Download presentation
Presentation is loading. Please wait.
1
Erik de Nooij, IT Chapter Lead Fraud&Cybersec.
StreamING models Realtime model deployment of ML capabilities Erik de Nooij, IT Chapter Lead Fraud&Cybersec.
2
Who Am I? IT Chapter Lead within the Fraud & Cybersecurity department, based in Amsterdam Before ING implemented Enterprise Software, mainly knowledge management and CRM related Background in: Scala, Java, C# (MCSD), Tomcat, Websphere, Oracle, Cassandra and now….Flink
3
About ING
4
About ING Worldwide 35 Million customers 51.000 Employees
Presence in over 40 countries Netherlands 9 Million Customers Billion logins yearly on 1 million transactions per day The Netherlands Market leaders Benelux Growth markets Commercial Banking Challengers
5
Threats related to fraud & cybersecurity
Fake ID Skimming Phishing APT ? Criminal organization Individuals Small groups worldwide groups Organized crime Manual detection Rule based detection Response Model based detection Scanomaly detection
6
Carbanak APT (Advanced Persistent Threat)
This started via a phishing …
7
Goals Support various types of (ML) models
Tools to create models versus scoring models One codebase, SaaS deployment model Make changes instantly (no downtime) Multiple domains
8
Goals Support various types of (ML) models
One codebase, SaaS deployment model Pre-processor, Decoupled architecture Make changes instantly (no downtime) Multiple domains
9
Goals Support various types of (ML) models
One codebase, SaaS deployment model Make changes instantly (no downtime) Use case Feature extraction Enriching streams End user tooling Demo Multiple domains
10
Goals Support various types of (ML) models
One codebase, SaaS deployment model Make changes instantly (no downtime) Multiple domains examples
11
Support various types of models
12
Creating models offline, scoring online
Model creation HDFS offline Model execution Streaming platform online Portable model <PMML /> {PFA}
13
Predictive Model Markup Language (PMML)
The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format if field1 > 500 AND field2 == 1 field3 > 1 <SimpleRule score="Alert" weight="1.0"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="field1" operator="greaterThan" value="500"/> <SimplePredicate field="field2" operator="equal" value="1"/> <SimplePredicate field="field3" operator="greaterThan" value="1"/> </CompoundPredicate> </SimpleRule>
14
Predictive Model Markup Language (PMML)
The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format
15
Machine learning tools supporting pmml
16
Model scoring using OpenScoring.io library
Parse the pmml file(s) Pass on the Feature Set to the model(s) Run the ‘predict’ function which returns the output of the model(s) Control stream Data stream Score Feature sets model scoring
17
Supported models Supported models(*) Association rules Regression
Cluster model Rule set General regression Scorecard Naive Bayes Support Vector Machine k-Nearest neighbours Tree model Neural network Ensemble model (*) supported models by
18
Goals Use of various types of models
One codebase, SaaS Deployment model Pre-processor, Decoupled architecture Make changes instantly (no downtime) Multiple domains
19
One Bank Strategy Market leaders Benelux Challengers Growth markets
Commercial Banking Challengers
20
How flexible is this architecture?
Amount = “42,00” Feature extraction & Model scoring Amountincents = 4200 Amount = 42.00
21
Decoupled architecture
Amount = “42.00” Pre- Processor Feature extraction & Model scoring Amountincents = 4200 Amountincents = 4200 Business events Amount = 42.00
22
Goals Use of various types of models
One codebase, SaaS Deployment model Make changes instantly (no downtime) Use case Feature extraction Enriching streams End user tooling Demo Multiple domains
23
Use case Your phone with the banking app installed is stolen
Limit on the banking app is 1.000,- Funds are transferred from your account (A) to a mule account (B)
24
Model features and model output
Amount > 500 NrOf Trxs Last 1h First Trx <24h ago Alert || OK Model
25
Stream with stateless operators
B 1000 Ev.1 Amount, Unknown, PrevTrxs (1000, ?, ?) FeX PMML Feature extraction Model scoring
26
Stream with stateful operators
B 1000 Ev.1 A B 1000 Ev.2 Amount, Unknown, PrevTrxs Amount, Unknown, PrevTrxs Alert || OK Alert || OK (1000, true, 1) (1000, true, 0) FeX PMML Model scoring STATE Key Value (A,B, FirstTrx) Ev.1 (A,B, HistoricalTrxs) ev11000 Key Value (A,B, FirstTrx) Ev.1 (A,B, HistoricalTrxs) ev11000, ev21000
27
How to perform aggregate functions on a stream?
Average amount last week: € 37,04 Average amount last week: € 37,04 Max amount last month: € 834,12
28
Enriching the stream based on multiple keys
1000 Ev.1 Aggregation step Calculating features A B IP 1000 Ev.1 Split A,E,I .. A 192.x.x.1, 192.x.x.5 A’ B D,F .. B 192.x.x.2, 192.x.x.6 B’ A.B C G, H .. A.B’ 192.x.x.3, 192.x.x.7 IP J, K .. IP’ 192.x.x.4, ……. Accounts are distributed across the task managers
29
Aggregating and model scoring
Aggregation Model Scoring Amount (A.B).FirstTrx (A.B).NrTrxs A B IP 1000 Ev.1 (A.B’, 1000) (B’) B’ A.B’ B’ IP’ …. A B IP 1000 Ev.1
30
Domain Specific Language (DSL)
A DSL is a domain specific language. We use it to define the behaviour of our operators. The persist rules (which data to store within state) Feature calculation rules Model definition rules
31
Definition instead of code - Persist rule
history[double, := $amount
32
Feature Calculation rules
NrOf Trxs Last 1h $eventtime,$eventtime-1hour)); First Trx A to B <24h @(sourceAccntNr.destAccntNr).FirstUsed >= $eventtime-24hours;
33
Creating models offline, scoring online
Model execution Streaming platform online Model creation DSL HDFS Portable model <PMML /> {PFA} Data scientist with offline tooling
34
Control streams
35
Streaming in the defintions
DSL files Broadcast Model definitions Feature calculation rules Persist rules Split Fex & Model scoring
36
Demo
37
Goals Use of various types of models
One codebase, SaaS Deployment model Make changes instantly (no downtime) Multiple domains
38
Multiple domains – ponder on this
We have built a feature-extraction engine and used that to make a Fraud-Risk Engine Can we also build this?…. Customer Notifications? Calculating RFQ’s for Bond Prices? Product Fullfilment engine? Other?
39
Take aways Decoupled architecture with preprocessor
Enriching events with multiple keys End users making changes Multiple domain
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.