How to efficiently get the mean of the elements in two list of lists in Python

PythonListMean

Python Problem Overview


I have two lists as follows.

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

I want to get the mean of the common elements in the two lists as follows.

myoutput = [["chocolate", 0.5], ["egg", 0.45]]

My current code is as follows

for item1 in mylist1:
    for item2 in mylist2:
        if item1[0] == item2[0]:
             print(np.mean([item1[1], item2[1]]))

However, since there are two for loops (O(n^2) complexity) this is very inefficient for very long lists. I am wondering if there is more standard/efficient way of doing this in Python.

Python Solutions


Solution 1 - Python

You can do it in O(n) (single pass over each list) by converting 1 to a dict, then per item in the 2nd list access that dict (in O(1)), like this:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

l1_as_dict = dict(mylist1)

myoutput = []
for item,price2 in mylist2:
    if item in l1_as_dict:
        price1 = l1_as_dict[item]
        myoutput.append([item, (price1+price2)/2])

print(myoutput)

Output:

[['chocolate', 0.5], ['egg', 0.45]]

Solution 2 - Python

An O(n) solution that will average all items.
Construct a dictionary with a list of the values and then average that dictionary afterwards:

In []:
d = {}
for lst in (mylist1, mylist2):
    for i, v in lst:
        d.setdefault(i, []).append(v)   # alternative use collections.defaultdict

[(k, sum(v)/len(v)) for k, v in d.items()]

Out[]:
[('lemon', 0.1), ('egg', 0.45), ('muffin', 0.3), ('chocolate', 0.5), ('milk', 0.2), ('carrot', 0.8)]

Then if you just want the common ones you can add a guard:

In []:
[(k, sum(v)/len(v)) for k, v in d.items() if len(v) > 1]

Out[]:
[('egg', 0.45), ('chocolate', 0.5)]

This extends to any number of lists and makes no assumption around the number of common elements.

Solution 3 - Python

Here is one solution that uses collections.defaultdict to group the items and calculates the averages with statistics.mean:

from collections import defaultdict
from statistics import mean

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

d = defaultdict(list)
for lst in (mylist1, mylist2):
    for k, v in lst:
        d[k].append(v)

result = [[k, mean(v)] for k, v in d.items()]

print(result)
# [['lemon', 0.1], ['egg', 0.45], ['muffin', 0.3], ['chocolate', 0.5], ['milk', 0.2], ['carrot', 0.8]]

If we only want common keys, just check if the values are more than 1:

result = [[k, mean(v)] for k, v in d.items() if len(v) > 1]

print(result)
# [['egg', 0.45], ['chocolate', 0.5]]

We could also just build the result from set intersection:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

d1, d2 = dict(mylist1), dict(mylist2)

result = [[k, (d1[k] + d2[k]) / 2] for k in d1.keys() & d2.keys()]

print(result)
# [['egg', 0.45], ['chocolate', 0.5]]

Solution 4 - Python

You can use the Pandas library to avoid writing any sort of loops yourself.

Your code would be really concise and clean.

Install Pandas like: pip install pandas.

Then try this:

In [132]: import pandas as pd

In [109]: df1 = pd.DataFrame(mylist1)

In [110]: df2 = pd.DataFrame(mylist2)

In [117]: res = pd.merge(df1, df2, on=0)

In [121]: res['mean'] = res.mean(axis=1)

In [125]: res.drop(['1_x', '1_y'], 1, inplace=True)

In [131]: res.values.tolist()
Out[131]: [['egg', 0.45], ['chocolate', 0.5]]

Edit

Pandas is crazy fast because it uses numpy under the hood. Numpy implements highly efficient array operations.

Please check the post : Why is Pandas so madly fast? for more details on calculating mean through pure Python vs Pandas.

Solution 5 - Python

To easily manipulate your values, I'd suggest using a dict, find the common keys, and compute the mean:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

recipe_1 = dict(mylist1)  # {'lemon': 0.1, 'egg': 0.1, 'muffin': 0.3, 'chocolate': 0.5}
recipe_2 = dict(mylist2)  # {'chocolate': 0.5, 'milk': 0.2, 'carrot': 0.8, 'egg': 0.8}

common_keys = recipe_1.keys() & recipe_2.keys()  # {'chocolate', 'egg'}

myoutput = [[item, np.mean((recipe_1[item], recipe_2[item]))] for item in common_keys]
myoutput = [[item, (recipe_1[item] + recipe_2[item]) / 2] for item in common_keys]

Solution 6 - Python

Convert lists to dicts

d_list1 = dict(mylist1)
d_list2 = dict(mylist2)

[[k, (v+d_list2[k])/2] for k, v in d_list1.items() if k in d_list2]
#[['egg', 0.45], ['chocolate', 0.5]]

Solution 7 - Python

You get the common keys from the two lists by using the set intersection method and then using a list comprehension calculate the mean:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

dict1 = dict(mylist1)
dict2 = dict(mylist2)
res = [[key, (dict1.get(key)+dict2.get(key))/2] for key in set(dict1.keys()).intersection(set(dict2.keys()))]
print(res)

Output:

>> [['chocolate', 0.5], ['egg', 0.45]]

Solution 8 - Python

You can do it in the time required for comuting set intersections which is apparently O(min(N1,N2)) where N1, N2 are the list lengths.

intersect = set([a[0] for a in mylist1]).intersection([a[0] for a in mylist2])
d1=dict(mylist1)
d2=dict(mylist2)
{i:(d1[i]+d2[i])/2 for i in intersect}

Solution 9 - Python

Here is a simple, very Pythonic solution:

result = [[x[0], (x[1] + y[1])/2] for x in mylist1 for y in mylist2 if x[0] == y[0]]

It probably is not the fastest solution, but it is faster by virtue of using Python list comprehension to iterate the lists and, since neither this solution nor the OP's will work with multiple instances of a list key value, it replaces the np.mean with a simple average of the two values.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEmJView Question on Stackoverflow
Solution 1 - PythonAdam.Er8View Answer on Stackoverflow
Solution 2 - PythonAChampionView Answer on Stackoverflow
Solution 3 - PythonRoadRunnerView Answer on Stackoverflow
Solution 4 - PythonMayank PorwalView Answer on Stackoverflow
Solution 5 - PythonazroView Answer on Stackoverflow
Solution 6 - PythonTranshumanView Answer on Stackoverflow
Solution 7 - PythonFullstack GuyView Answer on Stackoverflow
Solution 8 - Pythonjeremy_rutmanView Answer on Stackoverflow
Solution 9 - PythonMarkView Answer on Stackoverflow