How to normalize a NumPy array to a unit vector?

Python Numpy Scikit Learn Statistics Normalization

Python Problem Overview

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:

def normalize(v):
    norm = np.linalg.norm(v)
    if norm == 0: 
       return v
    return v / norm

This function handles the situation where vector v has the norm value of 0.

Is there any similar functions provided in sklearn or numpy?

Python Solutions

Solution 1 - Python

If you're using scikit-learn you can use sklearn.preprocessing.normalize:

import numpy as np
from sklearn.preprocessing import normalize

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True

Solution 2 - Python

I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.

import numpy as np

def normalized(a, axis=-1, order=2):
    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
    l2[l2==0] = 1
    return a / np.expand_dims(l2, axis)

A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))

print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))

Solution 3 - Python

This might also work for you

import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))

but fails when v has length 0.

In that case, introducing a small constant to prevent the zero division solves this.

Solution 4 - Python

To avoid zero division I use eps, but that's maybe not great.

def normalize(v):
    norm=np.linalg.norm(v)
    if norm==0:
        norm=np.finfo(v.dtype).eps
    return v/norm

Solution 5 - Python

If you have multidimensional data and want each axis normalized to its max or its sum:

def normalize(_d, to_sum=True, copy=True):
    # d is a (n x dimension) np array
    d = _d if not copy else np.copy(_d)
    d -= np.min(d, axis=0)
    d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
    return d

Uses numpys peak to peak function.

a = np.random.random((5, 3))

b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1

c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1

Solution 6 - Python

There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:

import transformations as trafo
import numpy as np

data = np.array([[1.0, 1.0, 0.0],
                 [1.0, 1.0, 1.0],
                 [1.0, 2.0, 3.0]])

print(trafo.unit_vector(data, axis=1))

Solution 7 - Python

You mentioned sci-kit learn, so I want to share another solution.

sci-kit learn `MinMaxScaler`

In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.

It also deal with NaN issues for us.

> NaNs are treated as missing values: disregarded in fit, and maintained > in transform. ... see reference [1]

Code sample

The code is simple, just type

# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)

Reference

[1] sklearn.preprocessing.MinMaxScaler

Solution 8 - Python

If you don't need utmost precision, your function can be reduced to:

v_norm = v / (np.linalg.norm(v) + 1e-16)

Solution 9 - Python

If you work with multidimensional array following fast solution is possible.

Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.

import numpy as np
arr = np.array([
    [1, 2, 3], 
    [0, 0, 0],
    [5, 6, 7]
], dtype=np.float)

lengths = np.linalg.norm(arr, axis=-1)
print(lengths)  # [ 3.74165739  0.         10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
print(arr)
# [[0.26726124 0.53452248 0.80178373]
# [0.         0.         0.        ]
# [0.47673129 0.57207755 0.66742381]]

Solution 10 - Python

If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.

import numpy as np
import vg

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True

I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.

Solution 11 - Python

Without sklearn and using just numpy. Just define a function:.

Assuming that the rows are the variables and the columns the samples (axis= 1):

import numpy as np

# Example array
X = np.array([[1,2,3],[4,5,6]])

def stdmtx(X):
    means = X.mean(axis =1)
    stds = X.std(axis= 1, ddof=1)
    X= X - means[:, np.newaxis]
    X= X / stds[:, np.newaxis]
    return np.nan_to_num(X)

output:

X
array([[1, 2, 3],
       [4, 5, 6]])

stdmtx(X)
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.]])

Solution 12 - Python

If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:

import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize

vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()

Solution 13 - Python

A simple dot product would do the job. No need for any extra package.

x = x/np.sqrt(x.dot(x))

By the way, if the norm of x is zero, it is inherently a zero vector, and cannot be converted to a unit vector (which has norm 1). If you want to catch the case of np.array([0,0,...0]), then use

norm = np.sqrt(x.dot(x))
x = x/norm if norm != 0 else x

Solution 14 - Python

For a 2D array, you can use the following one-liner to normalize across rows. To normalize across columns, simply set axis=0.

a / np.linalg.norm(a, axis=1, keepdims=True)

Solution 15 - Python

If you want all values in [0; 1] for 1d-array then just use

(a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))

Where a is your 1d-array.

An example:

>>> a = np.array([0, 1, 2, 4, 5, 2])
>>> (a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
array([0. , 0.2, 0.4, 0.8, 1. , 0.4])

Note for the method. For saving proportions between values there is a restriction: 1d-array must have at least one 0 and consists of 0 and positive numbers.

Content Type	Original Author	Original Content on Stackoverflow
Question	Donbeo	View Question on Stackoverflow
Solution 1 - Python	ali_m	View Answer on Stackoverflow
Solution 2 - Python	Eelco Hoogendoorn	View Answer on Stackoverflow
Solution 3 - Python	mrk	View Answer on Stackoverflow
Solution 4 - Python	Eduard Feicho	View Answer on Stackoverflow
Solution 5 - Python	Jaden Travnik	View Answer on Stackoverflow
Solution 6 - Python	Joe	View Answer on Stackoverflow
Solution 7 - Python	WY Hsu	View Answer on Stackoverflow
Solution 8 - Python	sergio verduzco	View Answer on Stackoverflow
Solution 9 - Python	Stanislav Tsepa	View Answer on Stackoverflow
Solution 10 - Python	paulmelnikow	View Answer on Stackoverflow
Solution 11 - Python	seralouk	View Answer on Stackoverflow
Solution 12 - Python	max0r	View Answer on Stackoverflow
Solution 13 - Python	Ka-Wa Yip	View Answer on Stackoverflow
Solution 14 - Python	Cristian Arteaga	View Answer on Stackoverflow
Solution 15 - Python	sergzach	View Answer on Stackoverflow

How to normalize a NumPy array to a unit vector?

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

Solution 5 - Python

Solution 6 - Python

Solution 7 - Python

sci-kit learn `MinMaxScaler`

Code sample

Reference

Solution 8 - Python

Solution 9 - Python

Solution 10 - Python

Solution 11 - Python

Solution 12 - Python

Solution 13 - Python

Solution 14 - Python

Solution 15 - Python

socket.error: [Errno 48] Address already in use

Splitting string with pipe character ("|")

Attributions

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

Solution 5 - Python

Solution 6 - Python

Solution 7 - Python

sci-kit learn MinMaxScaler

Code sample

Reference

Solution 8 - Python

Solution 9 - Python

Solution 10 - Python

Solution 11 - Python

Solution 12 - Python

Solution 13 - Python

Solution 14 - Python

Solution 15 - Python

socket.error: [Errno 48] Address already in use

Splitting string with pipe character ("|")

Attributions

sci-kit learn `MinMaxScaler`