Introduction to {mmetrics} package

Introduction

Are you boring of repeated tabulation work? This package is for you. With this package, you can reuse metrics which you define and easily do tabulation work on different analysis axes. Focus on more productive things with this package! Have a wonderful life!

Load Dummy Data

First, we load dummy data from {mmetrics} package for this example.

# Load dummy data
df <- mmetrics::dummy_data
df
#>    gender age cost impression click conversion
#> 1       M  10   51        101     0          0
#> 2       F  20   52        102     3          1
#> 3       M  30   53        103     6          2
#> 4       F  40   54        104     9          3
#> 5       M  50   55        105    12          4
#> 6       F  60   56        106    15          5
#> 7       M  70   57        107    18          6
#> 8       F  80   58        108    21          7
#> 9       M  90   59        109    24          8
#> 10      F 100   60        110    27          9

Define metrics

As a next step, we define metrics to evaluate using mmetrics::define.

# Example metrics
metrics <- mmetrics::define(
  cost = sum(cost),
  ctr  = sum(click)/sum(impression)
)

How to use mmetrics::add()

mmetrics::add() with sigle grouping key

Call mmetrics::add() with grouping key (here gender) then we will get new data.frame with defined metrics.

mmetrics::add(df, gender, metrics = metrics)
#> # A tibble: 2 x 3
#>   gender  cost   ctr
#>   <fct>  <int> <dbl>
#> 1 F        280 0.142
#> 2 M        275 0.114

mmetrics::add() with multiple grouping keys

We can also use multiple grouping keys.

mmetrics::add(df, gender, age, metrics = metrics)
#> # A tibble: 10 x 4
#>    gender   age  cost    ctr
#>    <fct>  <dbl> <int>  <dbl>
#>  1 F         20    52 0.0294
#>  2 F         40    54 0.0865
#>  3 F         60    56 0.142 
#>  4 F         80    58 0.194 
#>  5 F        100    60 0.245 
#>  6 M         10    51 0     
#>  7 M         30    53 0.0583
#>  8 M         50    55 0.114 
#>  9 M         70    57 0.168 
#> 10 M         90    59 0.220

mmetrics::add() without any grouping keys

If we do not specify any grouping keys, mmetrics::add() summarize all data as a default option.

mmetrics::add(df, metrics = metrics)
#> # A tibble: 1 x 2
#>    cost   ctr
#>   <int> <dbl>
#> 1   555 0.128

If we want mmetrics::add() to behave like dplyr::mutate() use mmetrics::disaggregate().

mmetrics::add(df, metrics = mmetrics::disaggregate(metrics), summarize = FALSE)
#> # A tibble: 10 x 7
#>    gender   age  cost impression click conversion    ctr
#>    <fct>  <dbl> <int>      <int> <dbl>      <int>  <dbl>
#>  1 M         10    51        101     0          0 0     
#>  2 F         20    52        102     3          1 0.0294
#>  3 M         30    53        103     6          2 0.0583
#>  4 F         40    54        104     9          3 0.0865
#>  5 M         50    55        105    12          4 0.114 
#>  6 F         60    56        106    15          5 0.142 
#>  7 M         70    57        107    18          6 0.168 
#>  8 F         80    58        108    21          7 0.194 
#>  9 M         90    59        109    24          8 0.220 
#> 10 F        100    60        110    27          9 0.245

Remove aggregate function from metrics using `mmetrics::disaggregate()`

It is hassle for users to re-define metrics when you would like to use these for dplyr::mutate(). In this case, you can use mmetrics::disaggregate() to remove the first aggregation function for the argument and return disaggregated metrics.

# Original metrics. sum() is used for this metrics
metrics
#> <list_of<quosure>>
#> 
#> $cost
#> <quosure>
#> expr: ^sum(cost)
#> env:  global
#> 
#> $ctr
#> <quosure>
#> expr: ^sum(click) / sum(impression)
#> env:  global

# Disaggregate metrics!
metrics_disaggregated <- mmetrics::disaggregate(metrics)
# Woo! sum() are removed!!!
metrics_disaggregated
#> $cost
#> <quosure>
#> expr: ^cost
#> env:  global
#> 
#> $ctr
#> <quosure>
#> expr: ^click / impression
#> env:  global

You can use these metrics with dplyr::mutate() for row-wise metrics computation.

dplyr::mutate(df, !!!metrics_disaggregated)
#>    gender age cost impression click conversion        ctr
#> 1       M  10   51        101     0          0 0.00000000
#> 2       F  20   52        102     3          1 0.02941176
#> 3       M  30   53        103     6          2 0.05825243
#> 4       F  40   54        104     9          3 0.08653846
#> 5       M  50   55        105    12          4 0.11428571
#> 6       F  60   56        106    15          5 0.14150943
#> 7       M  70   57        107    18          6 0.16822430
#> 8       F  80   58        108    21          7 0.19444444
#> 9       M  90   59        109    24          8 0.22018349
#> 10      F 100   60        110    27          9 0.24545455

…or, you can do the same compucation using mmetrics::gmutate() defind in our package. In this case, you do not need to write !!! (bang-bang-bang) operator explicitly.

mmetrics::gmutate(df, metrics = metrics_disaggregated)
#> # A tibble: 10 x 7
#>    gender   age  cost impression click conversion    ctr
#>    <fct>  <dbl> <int>      <int> <dbl>      <int>  <dbl>
#>  1 M         10    51        101     0          0 0     
#>  2 F         20    52        102     3          1 0.0294
#>  3 M         30    53        103     6          2 0.0583
#>  4 F         40    54        104     9          3 0.0865
#>  5 M         50    55        105    12          4 0.114 
#>  6 F         60    56        106    15          5 0.142 
#>  7 M         70    57        107    18          6 0.168 
#>  8 F         80    58        108    21          7 0.194 
#>  9 M         90    59        109    24          8 0.220 
#> 10 F        100    60        110    27          9 0.245

gmutate() and gsummarize()

mmetrics::add() is a just wrapper function for mmetrics::gmutate() and mmetrics::gsummarize(). We can use these functions directly instead of mmetrics::add().

# Completely the same result with mmetrics::add(df, gender, metrics = metrics)
mmetrics::gsummarize(df, gender, metrics = metrics)
#> # A tibble: 2 x 3
#>   gender  cost   ctr
#>   <fct>  <int> <dbl>
#> 1 F        280 0.142
#> 2 M        275 0.114

metrics::gmutate() is useful to calculate the metrics like “ratio in a group”.

# Cost ratio in each gender group
mmetrics::gmutate(df, gender, metrics = mmetrics::define(cost_ratio = cost/sum(cost)))
#> # A tibble: 10 x 7
#>    gender   age  cost impression click conversion cost_ratio
#>    <fct>  <dbl> <int>      <int> <dbl>      <int>      <dbl>
#>  1 M         10    51        101     0          0      0.185
#>  2 F         20    52        102     3          1      0.186
#>  3 M         30    53        103     6          2      0.193
#>  4 F         40    54        104     9          3      0.193
#>  5 M         50    55        105    12          4      0.2  
#>  6 F         60    56        106    15          5      0.2  
#>  7 M         70    57        107    18          6      0.207
#>  8 F         80    58        108    21          7      0.207
#>  9 M         90    59        109    24          8      0.215
#> 10 F        100    60        110    27          9      0.214

Run multiple tabulations at once

If you would like to run code with multiple keys all at onece, you can use the combination of !!(bangbang operator) and rlang::sym as the following:

# Define keys
keys <- c("gender", "age")
# Run
purrr::map(keys, ~ mmetrics::add(df, !!rlang::sym(.x), metrics = metrics))
#> [[1]]
#> # A tibble: 2 x 3
#>   gender  cost   ctr
#>   <fct>  <int> <dbl>
#> 1 F        280 0.142
#> 2 M        275 0.114
#> 
#> [[2]]
#> # A tibble: 10 x 3
#>      age  cost    ctr
#>    <dbl> <int>  <dbl>
#>  1    10    51 0     
#>  2    20    52 0.0294
#>  3    30    53 0.0583
#>  4    40    54 0.0865
#>  5    50    55 0.114 
#>  6    60    56 0.142 
#>  7    70    57 0.168 
#>  8    80    58 0.194 
#>  9    90    59 0.220 
#> 10   100    60 0.245