Are you boring of repeated tabulation work? This package is for you. With this package, you can reuse metrics which you define and easily do tabulation work on different analysis axes. Focus on more productive things with this package! Have a wonderful life!
First, we load dummy data from {mmetrics} package for this example.
# Load dummy data
df <- mmetrics::dummy_data
df
#> gender age cost impression click conversion
#> 1 M 10 51 101 0 0
#> 2 F 20 52 102 3 1
#> 3 M 30 53 103 6 2
#> 4 F 40 54 104 9 3
#> 5 M 50 55 105 12 4
#> 6 F 60 56 106 15 5
#> 7 M 70 57 107 18 6
#> 8 F 80 58 108 21 7
#> 9 M 90 59 109 24 8
#> 10 F 100 60 110 27 9
As a next step, we define metrics to evaluate using mmetrics::define
.
Call mmetrics::add()
with grouping key (here gender
) then we will get new data.frame
with defined metrics.
mmetrics::add(df, gender, metrics = metrics)
#> # A tibble: 2 x 3
#> gender cost ctr
#> <fct> <int> <dbl>
#> 1 F 280 0.142
#> 2 M 275 0.114
We can also use multiple grouping keys.
mmetrics::add(df, gender, age, metrics = metrics)
#> # A tibble: 10 x 4
#> gender age cost ctr
#> <fct> <dbl> <int> <dbl>
#> 1 F 20 52 0.0294
#> 2 F 40 54 0.0865
#> 3 F 60 56 0.142
#> 4 F 80 58 0.194
#> 5 F 100 60 0.245
#> 6 M 10 51 0
#> 7 M 30 53 0.0583
#> 8 M 50 55 0.114
#> 9 M 70 57 0.168
#> 10 M 90 59 0.220
If we do not specify any grouping keys, mmetrics::add()
summarize all data as a default option.
mmetrics::add(df, metrics = metrics)
#> # A tibble: 1 x 2
#> cost ctr
#> <int> <dbl>
#> 1 555 0.128
If we want mmetrics::add()
to behave like dplyr::mutate()
use mmetrics::disaggregate()
.
mmetrics::add(df, metrics = mmetrics::disaggregate(metrics), summarize = FALSE)
#> # A tibble: 10 x 7
#> gender age cost impression click conversion ctr
#> <fct> <dbl> <int> <int> <dbl> <int> <dbl>
#> 1 M 10 51 101 0 0 0
#> 2 F 20 52 102 3 1 0.0294
#> 3 M 30 53 103 6 2 0.0583
#> 4 F 40 54 104 9 3 0.0865
#> 5 M 50 55 105 12 4 0.114
#> 6 F 60 56 106 15 5 0.142
#> 7 M 70 57 107 18 6 0.168
#> 8 F 80 58 108 21 7 0.194
#> 9 M 90 59 109 24 8 0.220
#> 10 F 100 60 110 27 9 0.245
mmetrics::disaggregate()
It is hassle for users to re-define metrics when you would like to use these for dplyr::mutate()
. In this case, you can use mmetrics::disaggregate()
to remove the first aggregation function for the argument and return disaggregated metrics.
# Original metrics. sum() is used for this metrics
metrics
#> <list_of<quosure>>
#>
#> $cost
#> <quosure>
#> expr: ^sum(cost)
#> env: global
#>
#> $ctr
#> <quosure>
#> expr: ^sum(click) / sum(impression)
#> env: global
# Disaggregate metrics!
metrics_disaggregated <- mmetrics::disaggregate(metrics)
# Woo! sum() are removed!!!
metrics_disaggregated
#> $cost
#> <quosure>
#> expr: ^cost
#> env: global
#>
#> $ctr
#> <quosure>
#> expr: ^click / impression
#> env: global
You can use these metrics with dplyr::mutate()
for row-wise metrics computation.
dplyr::mutate(df, !!!metrics_disaggregated)
#> gender age cost impression click conversion ctr
#> 1 M 10 51 101 0 0 0.00000000
#> 2 F 20 52 102 3 1 0.02941176
#> 3 M 30 53 103 6 2 0.05825243
#> 4 F 40 54 104 9 3 0.08653846
#> 5 M 50 55 105 12 4 0.11428571
#> 6 F 60 56 106 15 5 0.14150943
#> 7 M 70 57 107 18 6 0.16822430
#> 8 F 80 58 108 21 7 0.19444444
#> 9 M 90 59 109 24 8 0.22018349
#> 10 F 100 60 110 27 9 0.24545455
…or, you can do the same compucation using mmetrics::gmutate()
defind in our package. In this case, you do not need to write !!!
(bang-bang-bang) operator explicitly.
mmetrics::gmutate(df, metrics = metrics_disaggregated)
#> # A tibble: 10 x 7
#> gender age cost impression click conversion ctr
#> <fct> <dbl> <int> <int> <dbl> <int> <dbl>
#> 1 M 10 51 101 0 0 0
#> 2 F 20 52 102 3 1 0.0294
#> 3 M 30 53 103 6 2 0.0583
#> 4 F 40 54 104 9 3 0.0865
#> 5 M 50 55 105 12 4 0.114
#> 6 F 60 56 106 15 5 0.142
#> 7 M 70 57 107 18 6 0.168
#> 8 F 80 58 108 21 7 0.194
#> 9 M 90 59 109 24 8 0.220
#> 10 F 100 60 110 27 9 0.245
mmetrics::add()
is a just wrapper function for mmetrics::gmutate()
and mmetrics::gsummarize()
. We can use these functions directly instead of mmetrics::add()
.
# Completely the same result with mmetrics::add(df, gender, metrics = metrics)
mmetrics::gsummarize(df, gender, metrics = metrics)
#> # A tibble: 2 x 3
#> gender cost ctr
#> <fct> <int> <dbl>
#> 1 F 280 0.142
#> 2 M 275 0.114
metrics::gmutate()
is useful to calculate the metrics like “ratio in a group”.
# Cost ratio in each gender group
mmetrics::gmutate(df, gender, metrics = mmetrics::define(cost_ratio = cost/sum(cost)))
#> # A tibble: 10 x 7
#> gender age cost impression click conversion cost_ratio
#> <fct> <dbl> <int> <int> <dbl> <int> <dbl>
#> 1 M 10 51 101 0 0 0.185
#> 2 F 20 52 102 3 1 0.186
#> 3 M 30 53 103 6 2 0.193
#> 4 F 40 54 104 9 3 0.193
#> 5 M 50 55 105 12 4 0.2
#> 6 F 60 56 106 15 5 0.2
#> 7 M 70 57 107 18 6 0.207
#> 8 F 80 58 108 21 7 0.207
#> 9 M 90 59 109 24 8 0.215
#> 10 F 100 60 110 27 9 0.214
If you would like to run code with multiple keys all at onece, you can use the combination of !!(bangbang operator)
and rlang::sym
as the following:
# Define keys
keys <- c("gender", "age")
# Run
purrr::map(keys, ~ mmetrics::add(df, !!rlang::sym(.x), metrics = metrics))
#> [[1]]
#> # A tibble: 2 x 3
#> gender cost ctr
#> <fct> <int> <dbl>
#> 1 F 280 0.142
#> 2 M 275 0.114
#>
#> [[2]]
#> # A tibble: 10 x 3
#> age cost ctr
#> <dbl> <int> <dbl>
#> 1 10 51 0
#> 2 20 52 0.0294
#> 3 30 53 0.0583
#> 4 40 54 0.0865
#> 5 50 55 0.114
#> 6 60 56 0.142
#> 7 70 57 0.168
#> 8 80 58 0.194
#> 9 90 59 0.220
#> 10 100 60 0.245