TypeError: ufunc 'add' did not contain a loop with signature matching types

Python Problem Overview

I am creating bag of words representation of the sentence. Then taking the words that exist in the sentence to compare to the file "vectors.txt", in order to get their embedding vectors. After getting vectors for each word that exists in the sentence, I am taking average of the vectors of the words in the sentence. This is my code:

import nltk
import numpy as np
from nltk import FreqDist
from nltk.corpus import brown


news = brown.words(categories='news') 
news_sents = brown.sents(categories='news') 

fdist = FreqDist(w.lower() for w in news) 
vocabulary = [word for word, _ in fdist.most_common(10)] 
num_sents = len(news_sents) 

def averageEmbeddings(sentenceTokens, embeddingLookupTable):
    listOfEmb=[]
    for token in sentenceTokens:
        embedding = embeddingLookupTable[token] 
        listOfEmb.append(embedding)

return sum(np.asarray(listOfEmb)) / float(len(listOfEmb))

embeddingVectors = {}

with open("D:\\Embedding\\vectors.txt") as file: 
    for line in file:
       (key, *val) = line.split()
       embeddingVectors[key] = val
    
for i in range(num_sents): 
    features = {}
    for word in vocabulary: 
        features[word] = int(word in news_sents[i])        
    print(features) 
    print(list(features.values()))  
sentenceTokens = [] 
for key, value in features.items(): 
    if value == 1:
       sentenceTokens.append(key)
sentenceTokens.remove(".")    
print(sentenceTokens)        
print(averageEmbeddings(sentenceTokens, embeddingVectors))

print(features.keys())

Not sure why, but I get this error:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-643ccd012438> in <module>()
 39     sentenceTokens.remove(".")
 40     print(sentenceTokens)
---> 41     print(averageEmbeddings(sentenceTokens, embeddingVectors))
 42 
 43 print(features.keys()) 

<ipython-input-4-643ccd012438> in averageEmbeddings(sentenceTokens, embeddingLookupTable)
 18         listOfEmb.append(embedding)
 19 
---> 20     return sum(np.asarray(listOfEmb)) / float(len(listOfEmb))
 21 
 22 embeddingVectors = {}

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U9') dtype('<U9') dtype('<U9')

P.S. Embedding Vector looks like:

the 0.011384 0.010512 -0.008450 -0.007628 0.000360 -0.010121 0.004674 -0.000076 
of 0.002954 0.004546 0.005513 -0.004026 0.002296 -0.016979 -0.011469 -0.009159 
and 0.004691 -0.012989 -0.003122 0.004786 -0.002907 0.000526 -0.006146 -0.003058 
one 0.014722 -0.000810 0.003737 -0.001110 -0.011229 0.001577 -0.007403 -0.005355 
in -0.001046 -0.008302 0.010973 0.009608 0.009494 -0.008253 0.001744 0.003263

After using np.sum I get this error:

TypeError                                 Traceback (most recent call last)
<ipython-input-13-8a7edbb9d946> in <module>()
 40     sentenceTokens.remove(".")
 41     print(sentenceTokens)
---> 42     print(averageEmbeddings(sentenceTokens, embeddingVectors))
 43 
 44 print(features.keys()) 

<ipython-input-13-8a7edbb9d946> in averageEmbeddings(sentenceTokens, embeddingLookupTable)
 18         listOfEmb.append(embedding)
 19 
---> 20     return np.sum(np.asarray(listOfEmb)) / float(len(listOfEmb))
 21 
 22 embeddingVectors = {}

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in sum(a, axis, dtype, out, keepdims)
   1829     else:
   1830         return _methods._sum(a, axis=axis, dtype=dtype,
-> 1831                              out=out, keepdims=keepdims)
   1832 
   1833 

C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _sum(a, axis, dtype, out, keepdims)
 30 
 31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
---> 32     return umr_sum(a, axis, dtype, out, keepdims)
 33 
 34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: cannot perform reduce with flexible type

Python Solutions

Solution 1 - Python

You have a numpy array of strings, not floats. This is what is meant by dtype('<U9') -- a little endian encoded unicode string with up to 9 characters.

try:

return sum(np.asarray(listOfEmb, dtype=float)) / float(len(listOfEmb))

However, you don't need numpy here at all. You can really just do:

return sum(float(embedding) for embedding in listOfEmb) / len(listOfEmb)

Or if you're really set on using numpy.

return np.asarray(listOfEmb, dtype=float).mean()

Content Type	Original Author	Original Content on Stackoverflow
Question	Masyaf	View Question on Stackoverflow
Solution 1 - Python	Dunes	View Answer on Stackoverflow

TypeError: ufunc 'add' did not contain a loop with signature matching types

Python Problem Overview

Python Solutions

Solution 1 - Python

SublimeText3 Fold/Unfold all methods

Using curl in a bash script and getting curl: (3) Illegal characters found in URL

Attributions