這是分類算法在欺詐檢測方面的應用。

5.4.1交易數據中關於欺詐檢測的一個用例

假設有如下樣例數據:

正常交易集合:data/ch05/fraud/descriptions.txt

AMAZON.COM

USAIRWAY

EXPEDIA TRAVEL

欺詐交易集合:data/ch05/fraud/fraud-descriptions.txt

CAFE QWERTY

whole flash

food ASDFG

以及利用這些集合生成的訓練數據集:

6cbb8645gw1edng2wegroj20or0j4dl9  

每條交易由如下的屬性值所確定(按序羅列):

 

•用戶ID

•交易ID。

•交易的描述。

•交易總額。

•交易的GPS坐標。

•交易的坐標。

•—個用於確定交易是(true)否(false)屬於欺詐的二值變量。

目標是創建一個分類器,基於上面的數據學習如何辨識一個欺詐交易。

5.4.2神經網絡概覽

6cbb8645gw1ednge3fqejj20ql0gzabp  

由具備IO的神經節點和其他神經節點構成。

5.4.3 —個可用的神經網絡欺詐檢測器

還是三步驟:訓練、檢驗、生產:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
package  com.hankcs;
 
import  iweb2.ch5.usecase.fraud.NNFraudClassifier;
import  iweb2.ch5.usecase.fraud.data.TransactionDataset;
import  iweb2.ch5.usecase.fraud.data.TransactionLoader;
import  iweb2.ch5.usecase.fraud.util.FraudErrorEstimator;
 
public  class  ch5_3_FraudNN
{
    public  static  void  main(String[] args)  throws  Exception
    {
        // 載入訓練集
        TransactionDataset ds = TransactionLoader.loadTrainingDataset();
 
        // 收集每個用戶的消費習慣
        ds.calculateUserStats();
 
//
//CREATE the classifier
//
 
        // 分類器的實現,是對神經網絡模型的包裝
        NNFraudClassifier nnFraudClassifier =  new  NNFraudClassifier(ds);
 
// Give it a name.
// It will be used later when we serialize the classifier
 
        nnFraudClassifier.setName( "MyNeuralClassifier" );
 
//
//TRAIN the classifier
//
 
// Configure classifier with attributes that will be used as inputs into NN
 
        // 使用交易屬性:總額、位置與描述
        nnFraudClassifier.useDefaultAttributes();
 
// Set the number of training iterations
 
        // 數據會在網絡中傳播多少次
        nnFraudClassifier.setNTrainingIterations( 10 );
 
// Start the training ...
 
        nnFraudClassifier.train();
 
//
// STORE the classifier
//
        // 序列化防宕機
        nnFraudClassifier.save();
 
 
// You can load a previously saved classifier
 
        // 載入一個己訓練好的分類器
        NNFraudClassifier nnClone = NNFraudClassifier.load(nnFraudClassifier.getName());
 
// Classify a couple of samples from Training set
 
// This should be a legitimate transaction
        // 準備好要對兩個交易進行分類,第一個ID (1)是合法交易
        nnClone.classify( "1" );
 
// This should be a fraudulent transaction
        // 第二個ID (305)屬於欺詐交易。這只是一個檢查性的測試
        nnClone.classify( "305" );
 
// Now, calculate error rate for test set
        // 創建了一個新的數據集
        TransactionDataset testDS = TransactionLoader.loadTestDataset();
 
        // 輔助類,它幫助我們評估分類器的精確度
        FraudErrorEstimator auditor =  new  FraudErrorEstimator(testDS, nnClone);
 
        auditor.run();
    }
}

ds.calculateUserStats()裡每個用戶的消費習慣包含合法交易的最大金額和最小金額;合法交易描述中的單詞集合;交易位置範圍和中心點:

6cbb8645gw1edngo096vnj20me0bo76q  

輸出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
saved classifier in file: C:\iWeb2\data\ch05\MyNeuralClassifier
loaded classifier from file: MyNeuralClassifier
Transaction:
  >> 1:1:EXPEDIA TRAVEL:63.29:856.0:717.0:false
 
Assessment:
  >> This is a VALID_TXN
Transaction:
  >> 1:305:CANADIAN PHARMACY:3978.57:52.0:70.0:true
 
Assessment:
  >> This is a FRAUD_TXN
Total test dataset txns: 1100, Number of fraud txns:100
Classified correctly: 1100, Misclassified valid txns: 0, Misclassified fraud txns: 0

看起來失誤率是0,但假如我們將data/ch05/fraud/test-txns.txt裡面的“BLACK DIAMOND COFFEE”換成“TAOBAO”的話,就會發現有失誤了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
saved classifier in file: C:\iWeb2\data\ch05\MyNeuralClassifier
loaded classifier from file: MyNeuralClassifier
Transaction:
  >> 1:1:EXPEDIA TRAVEL:63.29:856.0:717.0:false
 
Assessment:
  >> This is a VALID_TXN
Transaction:
  >> 1:305:CANADIAN PHARMACY:3978.57:52.0:70.0:true
 
Assessment:
  >> This is a FRAUD_TXN
 - n_txnamt = 0.33646216373137205 - n_location = 0.6601082057290067 - n_description = 0.0 - userid = 25.0 - txnid = 500523 - txnamt = 63.79 - location_x = 533.0 - location_y = 503.0 - description = TAOBAO --> VALID_TXN
 - n_txnamt = 1.0138677641585399 - n_location = 0.5745841533228392 - n_description = 0.0 - userid = 26.0 - txnid = 500574 - txnamt = 127.97 - location_x = 734.0 - location_y = 507.0 - description = TAOBAO --> VALID_TXN
 - n_txnamt = 0.35626185958254264 - n_location = 0.658153849503683 - n_description = 0.0 - userid = 23.0 - txnid = 500273 - txnamt = 47.76 - location_x = 966.0 - location_y = 991.0 - description = TAOBAO --> VALID_TXN
 - n_txnamt = 0.48453914767096135 - n_location = 0.655796929157372 - n_description = 0.0 - userid = 21.0 - txnid = 500025 - txnamt = 50.47 - location_x = 980.0 - location_y = 996.0 - description = TAOBAO --> VALID_TXN
Total test dataset txns: 1100, Number of fraud txns:100
Classified correctly: 1096, Misclassified valid txns: 4, Misclassified fraud txns: 0

這是因為第一次用的測試數據跟訓練集數據的屬性值是相同,而第二次的TAOBAO對於分類器來說是個陌生的描述。這39個TAOBAO交易中有4個被冤枉了。

5.4.4神經網絡欺詐檢測器剖析

最重要的一步是訓練神經網絡:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    /**
     * 訓練神經網絡
     * @param nIterations 實例在神經網絡中傳播的次數
     */
    private  void  trainNeuralNetwork( int  nIterations)
    {
 
        for  int  i =  ; i <= nIterations; i++)
        {
            for  (Instance instance : ts.getInstances().values())
            {
                double [] nnInput = createNNInputs(instance);
                double [] nnExpectedOutput = createNNOutputs(instance);
 
                nn.train(nnInput, nnExpectedOutput);
            }
 
            if  (verbose)
            {
                System.out.println( "finished training pass: "  + i +  " out of "  + nIterations);
            }
        }
 
    }

nn指的是TransactionNN,也就是—個特別的用於欺詐檢測案例的神經網絡:

1
2
3
4
5
6
 public  TransactionNN(String name)
    {
        super (name);
 
        createNN351();
    }

這個神經網絡的規模是351:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
/**
     * 三個輸入節點、五個隱層節點與一個輸出層節點
     */
    private  void  createNN351()
    {
 
 
        // 1. Define Layers, Nodes and Node Biases
        Layer inputLayer = createInputLayer(
                ,  // layer id 
                  // number of nodes 
        );
 
        Layer hiddenLayer = createHiddenLayer(
                ,  // layer id 
                ,  // number of nodes
                new  double []{ ,  1.5 ,  ,  0.5 ,  }  // node biases
                                                // 節點額外權值
        );
 
        Layer outputLayer = createOutputLayer(
                ,  // layer id 
                ,  // number of nodes 
                new  double []{ 1.5 }  // node biases
        );
 
 
        setInputLayer(inputLayer);
        setOutputLayer(outputLayer);
        addHiddenLayer(hiddenLayer);
 
        // 2. Define links and weights between nodes
        // Id format: <layerId:nodeIdwithinLayer>
 
        // Weights for links from Input Layer to Hidden Layer
        // 我們逐個為節點間建立連接(突觸)
        setLink( "0:0" ,  "1:0" ,  0.25 );
        setLink( "0:0" ,  "1:1" , - 0.5 );
        setLink( "0:0" ,  "1:2" ,  0.25 );
        setLink( "0:0" ,  "1:3" ,  0.25 );
        setLink( "0:0" ,  "1:4" , - 0.5 );
 
        setLink( "0:1" ,  "1:0" ,  0.25 );
        setLink( "0:1" ,  "1:1" , - 0.5 );
        setLink( "0:1" ,  "1:2" ,  0.25 );
        setLink( "0:1" ,  "1:3" ,  0.25 );
        setLink( "0:1" ,  "1:4" , - 0.5 );
 
        setLink( "0:2" ,  "1:0" ,  0.25 );
        setLink( "0:2" ,  "1:1" , - 0.5 );
        setLink( "0:2" ,  "1:2" ,  0.25 );
        setLink( "0:2" ,  "1:3" ,  0.25 );
        setLink( "0:2" ,  "1:4" , - 0.5 );
 
        // Weights for links from Hidden Layer to Output Layer
 
        setLink( "1:0" ,  "2:0" , - 0.5 );
        setLink( "1:1" ,  "2:0" ,  0.5 );
        setLink( "1:2" ,  "2:0" , - 0.5 );
        setLink( "1:3" ,  "2:0" , - 0.5 );
        setLink( "1:4" ,  "2:0" ,  0.5 );
 
        if  (isVerbose())
        {
            System.out.println( "NN created" );
        }
 
    }

對於351的規模,3指的是交易金額的標準化、交易描述的雅克比係數、用戶交易中心點和當前交易點的距離這三個輸入。

其中setLink()是很重要的方法:

1
2
3
4
5
6
7
/**
     * 建立突觸鏈接
     * @param fromNodeId 起點
     * @param toNodeId 重點
     * @param w 權值
     */
    public  void  setLink(String fromNodeId, String toNodeId,  double  w)

5.4.5創建通用神經網絡的基類

也就是TransactionNN的基類、神經網絡的通用實現——BaseNN類。

BaseNN (結構層面):通用神經網絡基類代碼摘錄

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
    /**
     * 為網絡創建輸入層,它以層的ID和節點數量作為參數,實例化一個BaseLayer對象
     * @param layerId
     * @param nNodes
     * @return
     */
    public  Layer createInputLayer( int  layerId,  int  nNodes)
    {
 
        BaseLayer baseLayer =  new  BaseLayer(layerId);
        for  int  i =  ; i < nNodes; i++)
        {
            // 節點
            Node node = createInputNode(layerId +  ":"  + i);
            // 突觸(入鏈)
            Link inlink =  new  BaseLink();
            inlink.setFromNode(node​​);
            // 初始權重為1,訓練過程中保持不變
            inlink.setWeight( 1.0 );
            node.addInlink(inlink);
            baseLayer.addNode(node​​);
        }
 
        return  baseLayer;
    }
        /**
     * 為網絡創建隱層,它以層的ID、節點數量以及這些節點的偏移值作為參數
     * @param layerId
     * @param nNodes
     * @param bias
     * @return
     */
    public  Layer createHiddenLayer( int  layerId,  int  nNodes,  double [] bias)
    {
        if  (bias.length != nNodes)
        {
            throw  new  RuntimeException( "Each node should have bias defined." );
        }
        BaseLayer baseLayer =  new  BaseLayer(layerId);
        for  int  i =  ; i < nNodes; i++)
        {
            Node node = createHiddenNode(layerId +  ":"  + i);
            node.setBias(bias[i]);
            baseLayer.addNode(node​​);
        }
        return  baseLayer;
    }
        /**
     * 構造輸出層
     * @param layerId
     * @param nNodes
     * @param bias
     * @return
     */
    public  Layer createOutputLayer( int  layerId,  int  nNodes,  double [] bias)
    {
        if  (bias.length != nNodes)
        {
            throw  new  RuntimeException( "Each node should have bias defined." );
        }
 
        BaseLayer baseLayer =  new  BaseLayer(layerId);
        for  int  i =  ; i < nNodes; i++)
        {
            Node node = createOutputNode(layerId +  ":"  + i);
            node.setBias(bias[i]);
            baseLayer.addNode(node​​);
        }