# SAP Data Science

PAL (Predictive Analysis Library)

1) Association Analysis

2) Classification Analysis

3) Regression

4) Cluster Analysis

5) Time Series Analysis

6) Probability Distribution

7) Outlier Detection

8) Link Prediction

9) Data Preparation

10) Statistics Functions (Univariate)

11) Statistics Functions (Multivariate)

APL (Automated Predictive Library)

Data Preparation

IDA (Initial Data Analysis)

Scatter Plot Matrix

Bubble Plot

Parallel Coordinate PLot

Box Plot

EDA (Exploratory Data Analysis)

Missing Values

Outliers

1) Inter Quartile Range Test

2) Nearest Neighbor Outlier algorithm

IQR (Inter Quartile Range)

Example:

10, 11, 10, 9, 10, 24, 11, 12, 10, 9, 1, 11, 12, 13, 12

Step 1: Sort values

1, 9, 9, 10, 10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 24

Step 2: Find median

1, 9, 9, 10, 10, 10, 10, **11**, 11, 11, 12, 12, 12, 13, 24

Median of 15 values is value at position 8

Step 3: Find lower and upper half from median

(1, 9, 9, 10, 10, 10, 10), **11**, (11, 11, 12, 12, 12, 13, 24)

Lower half from position 1 to 7 and upper half from position 9 to 15

Step 4: Find LQ and UQ

(1, 9, 9, **10**, 10, 10, 10), **11**, (11, 11, 12, **12**, 12, 13, 24)

Median of lower half and upper half at positions 4 and 12 respectively

Step 5: Calculate MID

MID = UQ – LQ = 12 – 10 = 2

Step 6: Calculate lower and upper fence with coefficient = 3

Lower fence = LQ – 3 * 2 = 10 – 6 = 4

Upper fence = UQ + 3 * 2 = 12 + 6 = 18

Step 7: Find outliers

Values 1 and 24 are outliers because they are below the lower fence and above the upper fence respectively

Nearest Neighbor Outlier

Example:

Number of neighbors = 3

Number of outliers to detect = 2

Step 1: Calculate Euclidean distances for each object to all neighbors

Step 2: Find the 3 shortest distances and calculate the average

Step 3: Find the 2 objects with the largest average distance

Anomaly Detection using K means clustering

Number of clusters, K

Percentage of outliers to detect

Descriptive Models

1) Cluster models

2) Association rules

Predictive Models

1) Classification models

2) Regression models

3) Neural network models