Comparing vcard vcf files with Python

For the last few years I’ve been using Nextcloud for storing much of my personal data, including contacts and calendar entries.

I’m sure many people have had the experience of getting their contacts munged, particularly when changing phone, as although both iOS and Android are capable of saving contacts to Nextcloud by default, sometimes it seems as if they forget this capability. So, it’s not uncommon to end up with a contact list split between Google and/or Apple and/or Microsoft etc.

Having resolved to sort out my own lists, I ended up writing a little python function to help, which is reproduced here.

import vobject, phonenumbers
import polars as pl

def parse_vcard_file(vcfs):
    numbers = []
    
    for c in vobject.readComponents(vcfs):
        if hasattr(c, 'tel'):
            n = phonenumbers.parse(c.tel.value, 'GB') 

            numbers.append({"phone" : phonenumbers.format_number(
                                n, phonenumbers.PhoneNumberFormat.E164),
                            "name" : c.fn.value })
    return  pl.from_dicts(numbers).lazy()

The function takes a stream or a multline string and returns a polars LazyFrame. It depends on a few external libraries, namely vobject, phonenumberslite and polars.

vcard is a slighly odd format, it looks a bit like this:

BEGIN:VCARD
VERSION:3.0
PRODID:-//Apple Inc.//iOS 10.3.3//EN
N:Brewer;Bill;;;
FN:Bill Brewer
item1.TEL;type=pref:0700973645
EMAIL;TYPE=work:bill.brewer@widdicome.fair
REV:2019-01-30T13:01:38Z
END:VCARD

and no two implementations seem to agree on what to write into what field.

vobject is able to process a stream (file or otherwise) into vcard objects by calling the function readComponents().

I use the phonenumberslite library to normalize the phone number fields into a standard format, in this case +447700973645. GB is the default region (i.e. +44) that will be used if the number doesn’t have a country code.

This function can be used to generate data frames from exported vcf address books for further analysis, such as:


with open('apple.vcf') as apple_file, open('nextcloud.vcf') as nextcloud_file :
    apple_numbers = parse_vcard_file(apple_file)
    nextcloud_numbers = parse_vcard_file(nextcloud_file)


apple_numbers_not_in_nextcloud = apple_numbers.join(nextcloud_numbers, on="phone", how="anti")

display(apple_numbers_not_in_nextcloud.collect())

numbers_in_both_files = apple_numbers.join(nextcloud_numbers, on="phone", 
                                           how="inner",suffix ="_nextcloud")
numbers_in_both_files_with_different_names = numbers_in_both_files.filter(
    pl.col("name") != pl.col("name_nextcloud") )

display(numbers_in_both_files_with_different_names.collect())