Creating a custom provider for faker with user-specified weights

It wasn’t immediatey obvious how to do this so I thought I’d write it down here.

I wanted to create a custom provider for faker that would select from a list of strings.

I started out with a Dynamic Provider like this:

from faker.providers import DynamicProvider

kebab_type_provider = DynamicProvider(
    provider_name = "kebab_type",
    elements = ["Doner", "Shish", "Kofte", "Adana"]
)

However, I couldn’t find a good way to specify weights for the choices, and as any regular kebab customer will be aware the distribution of these choices is unlikely to be uniform.

After a brief stroll through the project issues1 I came up with the following:


from collections import OrderedDict
from typing import List

from faker.providers import BaseProvider

_kebab_types = OrderedDict(
    [
        ("Doner", 65,),
        ("Shish", 30,),
        ("Adana",5,),
        ("Kofte", 20,),
        
        
    ]
)

class KebabTypeProvider(BaseProvider):
    def kebab_types(self, length: int = None) -> List[str]:
        return self.random_elements(_kebab_types, length, unique= False, use_weighting=True)

    def kebab_type(self) -> str:
        return self.kebab_types(1)[0]

It turns out that if the parameter to BaseProvider.random_elements is an OrderedDict rather than a List, then the numeric values will be used as weights. There’s no need for the weights to sum to 1.0 or 100, or any particular number - they are normalised internally. However, this only works if use_weighting=True is specified in the call to random_elements(). The default value for this parameter is None. This isn’t particularly clear in the documentation, and the symptom of not providing this parameter is that the distribution will be uniform.

Testing

from faker import Faker

import pandas as pd

fake = Faker()


fake.add_provider(KebabTypeProvider)


l = fake.kebab_types(10000)

df_test = pd.DataFrame(l, columns=['k_type'])
df_test.groupby('k_type').size()

And the results seem fine:

k_type
Adana     422
Doner    5429
Kofte    1621
Shish    2528