Pandas in AWS lambda gives numpy error

PythonPandasNumpyAmazon S3Aws Lambda

Python Problem Overview


I've been trying to run my code in AWS Lambda which imports pandas. So here is what I've done. I have a python file which contains a simple code as follows(This file has the lambda handler)

import json
print('Loading function')
import pandas as pd
def lambda_handler(event, context):
    return "Welcome to Pandas usage in AWS Lambda"
  1. I have zipped this python file along with numpy, pandas and pytz libraries as a deployment package (Did all these in Amazon EC2 linux machine)
  2. Then uploaded the package into S3
  3. Created a lambda function(runtime=python3.6) and uploaded the deployment package from S3

But when I test the lambda function in AWS Lambda, I get the below error:

Unable to import module 'lambda_function': Missing required dependencies ['numpy']

I already have numpy in the zipped package but still I get this error. I tried to follow the hints given at https://stackoverflow.com/questions/36054976/pandas-aws-lambda but no luck.

Did anyone ran into the same issue. Would appreciate any hint or suggestions to solve this problem.

Thanks

Python Solutions


Solution 1 - Python

To include numpy in your lambda zip follow the instructions on this page in the AWS docs...

How do I add Python packages with compiled binaries to my deployment package and make the package compatible with AWS Lambda?

To paraphrase the instructions using numpy as an example:

  1. Open the module pages at pypi.org. https://pypi.org/project/numpy/

  2. Choose Download files.

  3. Download:

For Python 2.7, module-name-version-cp27-cp27mu-manylinux1_x86_64.whl

e.g. numpy-1.15.2-cp27-cp27m-manylinux1_x86_64.whl

For Python 3.6, module-name-version-cp36-cp36m-manylinux1_x86_64.whl

e.g. numpy-1.15.2-cp36-cp36m-manylinux1_x86_64.whl

  1. Uncompress the wheel file on the /path/to/project-dir folder. You can use the unzip command on the command line to do this. There are other ways obviously.

unzip numpy-1.15.2-cp36-cp36m-manylinux1_x86_64.whl

When the wheel file is uncompressed, your deployment package will be compatible with Lambda.

Hope that all makes sense ;)

The end result might look something like this. Note: you should not include the whl file in the deployment package.

What it might look like

Solution 2 - Python

EDIT: I figured out finally how to run pandas & numpy in a AWS Lambda python 3.6 runtime environment.

I have uploaded my deployment package to the following repo:

git clone https://github.com/pbegle/aws-lambda-py3.6-pandas-numpy.git

Simply add your lambda_function.py to the zip file by running:

zip -ur lambda.zip lambda_function.py

Upload to S3 and source to lambda.

ORIGINAL:

The only way I have gotten Pandas to work in a lambda function is by compiling the pandas (and numpy) libraries in an AWS Linux EC2 instance following the steps from this blog post and then using the python 2.7 runtime for my lambda function.

Solution 3 - Python

After doing a lot of research I was able to make it work with Lambda layers.

Create or open a clean directory and follow the steps below:

Prerequisites: Make sure you have Docker up and running

  1. Create a requirements.txt file with the following:

> pandas==0.23.4 > pytz==2018.7

  1. Create a get_layer_packages.sh file with the following:

> #!/bin/bash >
> export PKG_DIR="python" >
> rm -rf ${PKG_DIR} && mkdir -p ${PKG_DIR} >
> docker run --rm -v $(pwd):/foo -w /foo lambci/lambda:build-python3.6
> pip install -r requirements.txt --no-deps -t ${PKG_DIR}

  1. Run the following commands in the same directory:

> chmod +x get_layer_packages.sh > > ./get_layer_packages.sh > > zip -r pandas.zip .

  1. Upload the layer to a S3 bucket.

  2. Upload the layer to AWS by running the command below:

> aws lambda publish-layer-version --layer-name pandas-layer --description "Description of your layer" > --content S3Bucket=,S3Key=.zip > --compatible-runtimes python3.6 python3.7

  1. Go to Lambda console and upload your code as a zip file or use the inline editor.

  2. Click on Layers > Add a layer> Search for the layer (pandas-layer) from the Compatible layers and select the version.

  3. Also add the AWSLambda-Python36-SciPy1x layer which is available by default for importing numpy.

Selecting the layer from the console

  1. Test the code. It should work now!!!!

Thanks to this medium article https://medium.com/@qtangs/creating-new-aws-lambda-layer-for-python-pandas-library-348b126e9f3e

Solution 4 - Python

AWS Lambda use Amazon Linux operating system. Idea is download Pandas and NumPy compatible with Amazon Linux. What you download using pip is specific to Windows or Mac. You need to download the compatible version for Linux, so that your Lambda function can understand it. These files are called wheel files.

Create new local directory with lambda_function.py file. Install Pandas to local directory with pip:

$ pip install -t . pandas

Navigate to https://pypi.org/project/pandas/#files. Search for and download newest *manylinux1_x86_64.whl package. In my case, I'm using Python 3.6 on my Lambda function, so I downloaded the following:

Download whl files to directory with lambda_function.py. Remove pandas, numpy, and *.dist-info directories. Unzip whl files.

$ rm -r pandas numpy *.dist-info
$ unzip numpy-1.16.1-cp36-cp36m-manylinux1_x86_64.whl
$ unzip pandas-0.24.1-cp36-cp36m-manylinux1_x86_64.whl

Remove whl files, *.dist-info, and __pycache__. Prepare zip.zip archive:

$ rm -r *.whl *.dist-info __pycache__
$ zip -r zip.zip .

Upload the zip.zip file in your Lambda function.

enter image description here

Source: https://medium.com/@korniichuk/lambda-with-pandas-fd81aa2ff25e

Solution 5 - Python

To get additional libraries in Lambda we need to compile them on Amazon Linux (this is important if the underlying library is based on C or C++ like for Numpy) and package them in a ZIP file together with the python script you want to run in Lambda.

To get the Amazon Linux compiled version of the libraries. You can either find a version that someone already compiled, like the one by @pbegle, or compile it yourself. To compile it ourself there are two options:

Following the last option with Docker, it is possible to make it work using the instructions in the blog post above and by adding:

pip install --use-wheel pandas

in the script to compile the libraries:

https://github.com/ryansb/sklearn-build-lambda/blob/master/build.sh#L21

Solution 6 - Python

The main problem is that libraries compiled within a specific OS will only work for that OS. So, if a library is compiled within macOS then it will not run in Linux environment. Why is this a problem?

One of the dependencies of Pandas is Numpy (which is compiled due to speed, see this answer on stackoverflow). AWS Lambda uses Linux hence if the Numpy compilation is done within macOS or windows, then the compilation will only be good for those specific architectures and will not work within AWS. I think those using Linux will not experience this problem.

To solve this problem and create a working AWS Lambda layer for Pandas, follow these simple steps (see the general steps on amazon forum):

mkdir awsPandasLayer #create a directory
cd awsPandasLayer #cd into the directory
pip3 install -t . Pandas #install pandas and all its dependencies 
rm -r pandas numpy *.dist-info __pycache__ #clean up the environment to remove the incompatible numpy and pandas

Then download the latest precompiled packages for Pandas and Numpy into the awsPandasLayer directory earlier created. For my case, since I am using Python 3.7, I downloaded these versions of Pandas and Numpy. Notice cp37 denotes the python version as clearly stated on the site. Then complete the installation as follows:

unzip pandas-1.0.5-cp37-cp37m-manylinux1_x86_64.whl #unzip the pandas precompiled package
unzip numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl #unzip the numpy precompiled package
pip3 install -t . openpyxl #I found this is required within AWS Lambda for excel files
rm -r *.whl *.dist-info __pycache__ #clean up unneeded files
zip -r awsPandasLayer.zip . # zip all the files

The zip file can then be uploaded as an AWS Lambda layer and it should work. Note that I am using macOS.

Solution 7 - Python

Slightly duplicate of https://stackoverflow.com/questions/45204038/cannot-find-mysql-in-nodejs-using-aws-lambda/45204186#45204186

You need to package your libraries with Lambda. As lambda runs on a public cloud, you cannot configure it.

Now in your case, as you are using pandas, you need to package Pandas with your zip. Get a path to pandas(for example: /Users/dummyUser/anaconda/lib/python3.6/site-packages) and copy the library to the place where you have your lambda function code. Inside your code, refer to pandas from your local copy. While uploading, zip the whole set(code + libraries), and upload as you will. It should work.

Solution 8 - Python

I tried some of the solution here but most of them didn't work. I liked the idea @Ranadeep Guha suggested of creating a container and downloading the repos over there so that's what I did.

I worked in the dir my lambda function was located on and created the following files :

Dockerfile :

FROM python:3.8-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt -t /app

requirements.txt : (those were mine)

pandas
numpy
xmltodict

Now in my gitbash I run the following command that will generate a docker image for me with all the dependencies installed :

docker build -t image_name .

Sending build context to Docker daemon  5.632kB
Step 1/4 : FROM python:3.8-slim
 ---> 56930ef6f6a2
Step 2/4 : WORKDIR /app
 ---> Using cache
 ---> ea0bf539bcad
Step 3/4 : COPY requirements.txt ./
 ---> cb4c005f53cc
Step 4/4 : RUN pip install --no-cache-dir -r requirements.txt -t /app
 ---> Running in a0d179a372b4
Collecting pandas
  Downloading pandas-1.0.3-cp38-cp38-manylinux1_x86_64.whl (10.0 MB)
Collecting numpy
  Downloading numpy-1.18.3-cp38-cp38-manylinux1_x86_64.whl (20.6 MB)
Collecting xmltodict
  Downloading xmltodict-0.12.0-py2.py3-none-any.whl (9.2 kB)
Collecting pytz>=2017.2
  Downloading pytz-2020.1-py2.py3-none-any.whl (510 kB)
Collecting python-dateutil>=2.6.1
  Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting six>=1.5
  Downloading six-1.14.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: numpy, pytz, six, python-dateutil, pandas, xmltodict
Successfully installed numpy-1.18.3 pandas-1.0.3 python-dateutil-2.8.1 pytz-2020.1 six-1.14.0 xmltodict-0.12.0

Now just create a Docker container of that image and confirm that everything installed :

winpty docker run --name container_name -it --entrypoint bash image_name

type ls and you will see all the installs.

Now let's copy all the installs to your local pc. You can replace the dot with any location on your pc:

 docker cp container_id:/app/* . 

Solution 9 - Python

Using ideas from these answers and SO 55695187, I've built a layer for AWS Lambda containing pandas and made it available at github. It has pandas v 1.3.1 for Python 3.8. See https://github.com/eoneil1942/pandas. Here is what I did:

Created an ec2 instance for AWS Linux
Installed python3.8, pip3, used pip3 to install pandas
Made an empty directory python/lib/python3.8/site-packages
Copied pandas and pytz from installed site-packages to this one
zip -r pandas.zip python

Clearly this generalizes to any Python package.

Solution 10 - Python

This works for me:

  1. In lambda code only import libraries (Not push the libraries in .zip):

    from scipy.stats import norm

    import pandas as pd

    ...

  2. Then in lambda console, add layers with ARN option:

    Scipy-numpy: arn:aws:lambda:us-east-1:668099181075:layer:AWSLambda-Python38-SciPy1x:29

    Pandas: arn:aws:lambda:us-east-1:770693421928:layer:Klayers-python38-pandas:42

  3. Test your lambda.

Solution 11 - Python

I've been struggling with a similar error while trying to use the python3.6 engine. When I switched to 2.7 it worked fine for me. I used Amazon AMI to create my zip file, but it has only python3.5, not 3.6. I guess the version mismatch was the reason. But it's just a guess, I haven't tried the process on a python3.6 installation yet.

Solution 12 - Python

This is similar to Randeep's answer but you don't need to use Lambda Layers if you don't want to do that.

As others have stated, this is not working because pandas/numpy require binaries to be built and the operating system of your build machine (Linux, Mac, Windows) does not match the operating system of Lambda (Amazon Linux).

To solve this, you can use docker to download/build your dependencies and package them up on Amazon Linux. Amazon provides a Docker image for this purpose. See below for how I built my python package for Python 3.6 runtime (they have other dockers for all other runtimes):

Put all of your dependencies into a requirements.txt file, for example:

openpyxl
boto3
pandas

Create a script (i.e. named build.sh) that will build your package, here is what mine looked like:

#!/bin/bash

# remove old build artifacts
rm -rf build
rm lambda_package.zip

# make build dir and copy my lambda handler file into it
mkdir build
cp lambda_daily_util_gen.py  build/

# Use requirements file to download/build dependencies into the build folder
cd build
pip install -r ../requirements.txt --target .

# Create an lambda package with my files and all dependencies
zip -r9 ../lambda_package.zip .

Ensure you have the Amazon Linux lambda build image pulled:

$ docker pull lambci/lambda

Run your build script inside of the docker container:

Windows:

$ docker run --rm -v "$PWD":/var/task lambci/lambda:build-python3.6 /var/task/build.sh

Mac/Linux:

docker run --rm -v ${PWD}:/var/task lambci/lambda:build-python3.6 chmod +x build.sh;./build.sh

You should now see a file named lambda_package.zip that was built on Amazon Linux you can upload to AWS.

Hope that helps.

Solution 13 - Python

with the serverless framework, you can easily package and deploy your dependencies correctly.

you only need to;

  1. install serverless

    npm install -g serverless
    
  2. create a serverless.yml in the root of your project with the following:

    service: numpy-test
    
    # define the environment of your lambda
    provider:
      name: aws
      runtime: python3.6
    
    # specify the function you want to deploy
    functions:
      numpy:
        # path to your lambda_handler function
        handler: path/to/function.lambda_handler
    
    # add a plugin that allows serverless to package python libraries
    # specified in the requirements.txt or Pipfile
    plugins:
      - serverless-python-requirements
    
    # this section makes sure your libraries get build correctly 
    # for an aws lambda environment
    custom:
      pythonRequirements:
        dockerizePip: non-linux
    

    > adjust the path/to/function.lambda_handler

  3. make sure docker is running and execute

    serverless deploy

once the deployment is finished, go to the AWS console look for the function numpy-test-dev-numpy and test your function.

this article explains the necessary steps in detail.

Solution 14 - Python

Maybe module you're using is for python 2.7.

Try install pandas module for python 3.x

i.e.

> pip3 install pandas -t .

It worked for me.

Or for instant result, Change your runtime to python 2.7 (not recommended)

Solution 15 - Python

To make a deployment package that's compatible with Lambda, download a precompiled package called a wheel (.whl). Uncompress the wheel file on /path/to/project-dir instead of using pip install.

Resolution

  1. Open your module-name pypi.org page. For example: https://pypi.org/project/numpy/

  2. Choose Download files.

  3. Download:

For Python 2.7, module-name-version-cp27-cp27mu-manylinux1_x86_64.whl For Python 3.6, module-name-version-cp36-cp36m-manylinux1_x86_64.whl

  1. Uncompress the wheel file on the /path/to/project-dir folder.

When the wheel file is uncompressed, your deployment package will be compatible with Lambda.

Solution 16 - Python

This is the only thing that worked for me: Python packages in AWS Lambda made easy

Paraphrasing what they did, this is a small summary of the steps you need to follow:

1 - Install packages in Linux instance

Like other users mentioned, you need to install libraries in a linux instance to make sure it works with your Lambda Function. You can use AWS Cloud9 service to do this.

  • Go to Cloud9 in AWS and click Create Environment
  • Name the environment, select a t2.micro and leave the rest with default settings: enter image description here
  • Click next step to review the settings and finally click on Create environment

2 - Create Panda Layer

On the new environment, use the next code to install panda library (which includes numpy):

(Note: You can install more than one module)

mkdir folder
cd folder
virtualenv v-env
source ./v-env/bin/activate
pip install pandas
deactivate

Type the next code to zip de necessary files and publish the layer version into your aws account:

mkdir python
cd python
cp -r ../v-env/lib64/python3.7/site-packages/* .
cd ..
zip -r panda_layer.zip python
aws lambda publish-layer-version --layer-name pandas --zip-file fileb://panda_layer.zip --compatible-runtimes python3.7

3 - Add the layer to your lambda

Once you've completed step 2, you will have a layer called pandas in Lambda section in aws (you can see it on the left menu). Make sure that you've selected Runtime 3.7 for your lambda:

enter image description here

Click on Add Layer on the bottom of the page in your Lambda configuration and select the new Layer you've just created.

You can test the Lambda just by importing numpy and pandas. You'll see that you can execute it just fine.

Finally:

Remember to delete your Cloud9 environment! You can go to AWS Cloud9 section and remove the environment previously created by selecting it and clicking delete. It should terminate the EC2 instance associated with it, but to make sure, go to EC2 section in AWS and check that the instance state is Terminated. If not, click on it and on Instance state select Terminate instance.

Make sure to check the Guide mentioned above to support their work.

Solution 17 - Python

For those still hitting this in 2022 (March 29th to be specific), I encountered this whilst using AWS SAM to build Lambdas. I had the same import pandas as pd line as OP, but my template.yaml was specifying to build on ARM (because it was originally written on a Mac). However, I was packaging and deploying from a windows machine.

So, changing the template from ARM to x86_64 solved the problem for me. I did also bundle in pytz and numpy to my dependencies, but that didn't initially solve my problem.

Solution 18 - Python

Now I wanted to run pandas, numpy and chart_studio (or plotly) plotly worked using the mac zipped file using pip3 install chart_studio -t . in the directory, but pandas and numpy would not work at all using this method. But python worked by putting the layers of the wheel files of pandas and numpy and adding layers in lambda, but now plotly would not work because with plotly it says its too large to load the layers. What I did to finally get it to work is use the wheel files and add layers for pandas and numpy and also did the pip3 install chart_studio -t . and zipped it with the python file to execute and it finally worked when I combined the two methods. Hope you guys are successful with this tip.

Solution 19 - Python

Your code always give this error
because lambda does not contain any external library it having a library which by default come with Python.

if you are using any external library like pandas, numpy or any other. you need to install that library on Aws Lambda
before using it

see you code

import json
print('Loading function')
import pandas as pd
def lambda_handler(event, context):
    return "Welcome to Pandas usage in AWS Lambda"

here no installation of pandas library so your code is not working.
my suggestion is use your code as follows. write all you code inside the lambda function

import json
def lambda_handler(event, context):
    #install python libray here 
    print('Loading function')
    import pandas as pd
    return "Welcome to Pandas usage in AWS Lambda" 

So final code look as follows

 def lambda_handler(event, context):
     import pip
    
    def install(package):
        if hasattr(pip, 'main'):
            pip.main(['install', package])
        else:
            pip._internal.main(['install', package])

    if __name__ == '__main__':
        install('pandas')

    #install python libray here 
    print('Loading function')
    import pandas as pd
    return "Welcome to Pandas usage in AWS Lambda" 

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKingzView Question on Stackoverflow
Solution 1 - PythonchimView Answer on Stackoverflow
Solution 2 - Python0xPeter.ethView Answer on Stackoverflow
Solution 3 - PythonRanadeep GuhaView Answer on Stackoverflow
Solution 4 - PythonkorniichukView Answer on Stackoverflow
Solution 5 - PythonPierre-AntoineView Answer on Stackoverflow
Solution 6 - PythonkoakandeView Answer on Stackoverflow
Solution 7 - PythonDishant KapadiyaView Answer on Stackoverflow
Solution 8 - PythonJeyJView Answer on Stackoverflow
Solution 9 - PythonBetty O'NeilView Answer on Stackoverflow
Solution 10 - Pythonqarly_blueView Answer on Stackoverflow
Solution 11 - PythonPavel AnniView Answer on Stackoverflow
Solution 12 - PythonJD DView Answer on Stackoverflow
Solution 13 - PythonVincent ClaesView Answer on Stackoverflow
Solution 14 - Pythonakshay parkarView Answer on Stackoverflow
Solution 15 - PythonBigData-GuruView Answer on Stackoverflow
Solution 16 - PythonGuillermo GarciaView Answer on Stackoverflow
Solution 17 - Pythonshearn89View Answer on Stackoverflow
Solution 18 - PythonKester BelgroveView Answer on Stackoverflow
Solution 19 - Pythonuser6502956View Answer on Stackoverflow