Performance issue: Java vs C++

JavaC++PerformanceAlgorithm

Java Problem Overview


I have always heard that C++ was way more efficient than Java (and that is why most games are developed in C++).

I wrote a small algorithm to solve the "Eight queens puzzle" in both Java and C++, using the exact same algorithm, and then started to raise the number or squares. When reaching checkboards of 2020 or even 2222, it appears Java is much more effective (3 seconds vs 66 seconds for C++).

I have no idea why, but I am pretty beginning with C++, so it is possible I made some huge performance mistakes, so I will gladly accept any information that would help me understand what is happening.

Below is the code I use in Java:

import java.awt.Point;
import java.util.ArrayList;
import java.util.List;

public class HuitDames {

    /**
     * La liste des coordnnées des dames.
     */
    private static List<Point> positions = new ArrayList<>();

    /**
     * Largeur de la grille.
     */
    private static final int LARGEUR_GRILLE = 22;


    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        int i = 1;
        placerDame(i);
        for (Point point : positions) {
            System.out.println("(" + point.x + "; " + point.y + ")");
        }
    }

    /**
     * Place une dame et return true si la position est bonne.
     * @param i le numéro de la dame.
     * @return si la position est bonne.
     */
    private static boolean placerDame(int i) {

        boolean bonnePosition = false;
        for (int j = 1; j <= LARGEUR_GRILLE && bonnePosition == false; j++) {
            Point emplacement = new Point(i, j);
            positions.add(emplacement);
            if (verifierPrise(emplacement) && (i == LARGEUR_GRILLE || placerDame(i + 1))) {
                bonnePosition = true;
            }
            else {
                positions.remove(i - 1);
            }
        }

        return bonnePosition;
    }

    /**
     * Vérifie que la nouvelle position n'est pas en prise avec une position déjà présente.
     * @param position la position de la nouvelle dame.
     * @return Si la position convient par rapport aux positions des autres dames.
     */
    private static boolean verifierPrise(Point position) {
        boolean nonPrise = true;
        for (Point point : positions) {
            if (!point.equals(position)) {
                // Cas où sur la même colonne.
                if (position.y == point.y) {
                    nonPrise = false;
                }
                // Cas où sur même diagonale.
                if (Math.abs(position.y - point.y) == Math.abs(position.x - point.x)) {
                    nonPrise = false;
                }
            }
        }

        return nonPrise;
    }
}

And below is the code in C++:

#include <iostream>
#include <list>
#include <math.h>
#include <stdlib.h>

using namespace std;


// Class to represent points.
class Point {

    private:
        double xval, yval;

    public:
        // Constructor uses default arguments to allow calling with zero, one,
        // or two values.
        Point(double x = 0.0, double y = 0.0) {
                xval = x;
                yval = y;
        }

        // Extractors.
        double x() { return xval; }
        double y() { return yval; }
};

#define LARGEUR_GRILLE 22
list<Point> positions;


bool verifierNonPrise(Point emplacement) {
    bool nonPrise = true;
    for (list<Point>::iterator it = positions.begin(); it!= positions.end(); it++) {
        if (it->x() != emplacement.x()) {
            if (it->y() == emplacement.y()) {
                nonPrise = false;
            }
            if (abs(it->y() - emplacement.y()) == abs(it->x() - emplacement.x())) {
                nonPrise = false;
            }
        }
    }

    return nonPrise;
}

bool placerDame(int i) {
    bool bonnePosition = false;
    for (int j = 1; j <= LARGEUR_GRILLE && !bonnePosition; j++) {
        Point emplacement(i,j);
        positions.push_back(emplacement);
        if (verifierNonPrise(emplacement) && (i == LARGEUR_GRILLE || placerDame(i + 1))) {
            bonnePosition = true;
        }
        else {
            positions.pop_back();
        }
    }

    return bonnePosition;
}

int main()
{
    int i = 1;
    placerDame(i);
    for (list<Point>::iterator it = positions.begin(); it!= positions.end(); it++) {
        cout << "(" << it->x() << "; " << it->y() << ")" << endl;
    }
    return 0;
}

Java Solutions


Solution 1 - Java

std::list in C++ is a linked list, whereas java.util.ArrayList is an array. Try replacing std::list by std::vector. Also, be sure to compile with optimization turned on.

Solution 2 - Java

##Updates: ###Changes to C++

  • As written:
    Compilation Failure
  • Replace math.h => cmath
    27610 milliseconds
  • Add -O3 flag
    4416 milliseconds
  • Replace std::list with std::vector
    2294 milliseconds
  • Replace Point with std::pair
    2384 milliseconds
  • Made verifierNonPrise const correct
    2351 milliseconds
  • Replaced loop in verifierNonPrise with std::find_if
    929 milliseconds
  • Replacing double with int (to make it equiv to Java)
    549 milliseconds

###Changes to Java

  • As written
    3459 milliseconds
  • Changes verifierNonPrise early return
    368 milliseconds

##Java Vs C++

> javac HuitDames.java
> time java HuitDames
real	0m0.368s
user	0m0.436s
sys	    0m0.042s    
> g++ -O3 -std=c++11 HuitDames.cpp
> time ./a.out
real	0m0.541s
user	0m0.539s
sys	    0m0.002s

##Final Code:

#include <iostream>
#include <vector>
#include <cmath>
#include <stdlib.h>
#include <chrono>
#include <algorithm>

using namespace std;

typedef std::pair<int, int>   Point;

#define LARGEUR_GRILLE 22
vector<Point> positions;


bool verifierNonPrise(Point const& emplacement) {
    return std::find_if(positions.begin(), positions.end(), [&emplacement](Point const& val){
        if (val.first != emplacement.first) {
            if ((val.second == emplacement.second) || (abs(val.second - emplacement.second) == abs(val.first - emplacement.first))) {
                return true;
            }
        }
        return false;
    }) == positions.end();
}

bool placerDame(int i) {
    bool bonnePosition = false;

    for (int j = 1; j <= LARGEUR_GRILLE && !bonnePosition; j++) {
        Point emplacement(i,j);
        positions.push_back(emplacement);
        if (verifierNonPrise(emplacement) && (i == LARGEUR_GRILLE || placerDame(i + 1))) {
            bonnePosition = true;
        }
        else {
            positions.pop_back();
        }
    }
    
    return bonnePosition;
}


int main()
{
    using std::chrono::system_clock;

    system_clock::time_point begin_time = system_clock::now();

    int i = 1;
    placerDame(i);
    for (vector<Point>::iterator it = positions.begin(); it!= positions.end(); it++) {
        cout << "(" << it->first << "; " << it->second << ")" << endl;
    }

    system_clock::time_point end_time = system_clock::now();

    long long elapsed_milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - begin_time).count();
    cout << "Duration (milliseconds): "
         << elapsed_milliseconds
         << std::endl;    
}

Solution 3 - Java

Test this version, updated using C++11 features. Tested in GCC 4.9.0 with -std=c++11. Tested on Celeron 1.6 GHz, 512 MB RAM.

Times in my PC:
Original: Duration (milliseconds): 12658
First Version: Duration (milliseconds): 3616
Optimized Version: Duration (milliseconds): 1745

Changes are:

  • Using vector instead of list Benchmark, and Words from Stroustrup.
  • Using const whatever we can, the compiler is able to optimize much more if it known that the value don't change.
  • Using std::pair instead of Point.
  • Using new for-loop with constant iterators.

Source:

#include <iostream>
#include <vector>
#include <chrono>
#include <iomanip>

using namespace std;

typedef std::pair<int, int> Point;

#define LARGEUR_GRILLE 22
vector<Point> positions;

bool verifierNonPrise(const Point& emplacement) {
    bool nonPrise = true;
    for (const auto& p : positions) {
        if (p.first != emplacement.first) {
            if (p.second == emplacement.second) {
                nonPrise = false;
            }
            if (abs(p.second - emplacement.second) ==
                abs(p.first - emplacement.first)) {
                nonPrise = false;
            }
        }
    }

    return nonPrise;
}

bool placerDame(int i) {
    bool bonnePosition = false;
    for (int j = 1; j <= LARGEUR_GRILLE && !bonnePosition; j++) {
        Point emplacement(i, j);
        positions.emplace_back(emplacement);
        if (verifierNonPrise(emplacement) &&
            (i == LARGEUR_GRILLE || placerDame(i + 1))) {
            bonnePosition = true;
        } else {
            positions.pop_back();
        }
    }

    return bonnePosition;
}

int main(int argc, char* argv[]) {
    std::chrono::system_clock::time_point begin_time =
        std::chrono::system_clock::now();

    positions.reserve(LARGEUR_GRILLE);

    placerDame(1);
    for (const auto& p : positions) {
        cout << "(" << p.first << "; " << p.second << ")" << endl;
    }

    std::chrono::system_clock::time_point end_time =
        std::chrono::system_clock::now();
    long long elapsed_milliseconds =
        std::chrono::duration_cast<std::chrono::milliseconds>(
            end_time - begin_time).count();
    std::cout << "Duration (milliseconds): " << elapsed_milliseconds
              << std::endl;

    return 0;
}

Some more deep changes.

Changes include:

  • Returning as early as possible. As soon as the queen can not be placed.
  • Returning to a simpler Point class.
  • Using find_if algorithm for searching queen placement.

Source (some recommendation updated):

#include <algorithm>
#include <iostream>
#include <vector>
#include <chrono>
#include <iomanip>

using namespace std;

struct Point {
    int x, y;
};

#define LARGEUR_GRILLE 22
vector<Point> positions;

bool verifierNonPrise(const Point& emplacement) {
    return find_if(positions.cbegin(), positions.cend(), [&emplacement](const Point& p) {
               return (p.x != emplacement.x &&
                       (p.y == emplacement.y ||
                        abs(p.y - emplacement.y) == abs(p.x - emplacement.x)));
           }) == positions.cend();
}

bool placerDame(int i) {
    for (int j = 1; j <= LARGEUR_GRILLE; j++) {
        Point emplacement{i, j};
        positions.push_back(emplacement);
        if (verifierNonPrise(emplacement) &&
            (i == LARGEUR_GRILLE || placerDame(i + 1))) {
            return true;
        } else {
            positions.pop_back();
        }
    }
    return false;
}

int main(int argc, char* argv[]) {
    std::chrono::system_clock::time_point begin_time =
        std::chrono::system_clock::now();

    positions.reserve(LARGEUR_GRILLE);

    placerDame(1);
    for (const auto& p : positions) {
        cout << "(" << p.x << "; " << p.y << ")" << endl;
    }

    std::chrono::system_clock::time_point end_time =
        std::chrono::system_clock::now();
    long long elapsed_milliseconds =
        std::chrono::duration_cast<std::chrono::milliseconds>(
            end_time - begin_time).count();
    std::cout << "Duration (milliseconds): " << elapsed_milliseconds
              << std::endl;

    return 0;
}

Solution 4 - Java

Comparing a managed, dynamically compiled language like Java to a statically compiled language like C++ is very difficult.

You will always be comparing apples to oranges in a sense, because they are conceptually very different. It starts with the use of the standard libraries (ArrayList vs std::list/vector) that will have potentially wildly different performance characteristics, even your code looks similar in both languages.

Then there is the inherent problem with microbenchmarks in Java (short test in Java are always slower because the JIT will observe program flow before it decides what and how it is to be compiled). Same goes for compiler options for C++, even the structure of the source code (independently compiled and linked classes versus single file source) can make a significant difference (because it changes the amount of "insight" the C++ compiler has into the other classes).

Next is the general difference in memory management, garbage collection vs manual memory management (smart pointers etc. are still considered manual memory management).

Not to mention the general language differences like you need to explicitly declare a method virtual in C++, while in Java every member method is virtual by default (working out if it's really virtual at runtime is left to the VM).

With all those differences there will always be cases where one langauge will have a massive advantage over the other. A simple test with very limited scope (like your test here) says very little about each language as a whole.

Another point people often tend to ignore is: How productive can you be with a language - speed isn't everything (look a how sucessful script langages are in some domains, despite being hardly competive when looking only at excution speed). Lack of performance can be crippling, but so can be low productivity.

Solution 5 - Java

I may be beating a dead horse here, but simply doing a line by line translation of the Java to C++, not even using const reference parameters or any such thing, you can see the C++ is almost twice as fast as Java. All the "syntactic optimization" emplacing etc. has little effect if any...

rep ~/Documents $ g++ -O3 Queen.cpp
rep ~/Documents $ javac Queen.java 
rep ~/Documents $ time java Queen 
(1; 1)
(2; 3)
(3; 5)
(4; 2)
(5; 4)
(6; 10)
(7; 14)
(8; 17)
(9; 20)
(10; 13)
(11; 19)
(12; 22)
(13; 18)
(14; 8)
(15; 21)
(16; 12)
(17; 9)
(18; 6)
(19; 16)
(20; 7)
(21; 11)
(22; 15)

real    0m4.806s
user    0m4.857s
sys     0m0.067s
rep ~/Documents $ time ./a.out
(1; 1)
(2; 3)
(3; 5)
(4; 2)
(5; 4)
(6; 10)
(7; 14)
(8; 17)
(9; 20)
(10; 13)
(11; 19)
(12; 22)
(13; 18)
(14; 8)
(15; 21)
(16; 12)
(17; 9)
(18; 6)
(19; 16)
(20; 7)
(21; 11)
(22; 15)

real    0m2.131s
user    0m2.113s
sys     0m0.000s
rep ~/Documents $

Queen.java (translated to english)

import java.awt.Point;
import java.util.ArrayList;
import java.util.List;

public class Queen {

    private static List<Point> positions = new ArrayList<>();
    private static final int GRID_SIZE = 22;

    public static void main(String[] args) 
    {
        int i = 1;
        placeQueen(i);
        for (Point point : positions) 
        {
            System.out.println("(" + point.x + "; " + point.y + ")");
        }
    }
    
    private static boolean placeQueen(int i) 
    {
        boolean bIsGoodPos = false;
        for (int j = 1; j <= GRID_SIZE && bIsGoodPos == false; j++) 
        {
            Point emplacement = new Point(i, j);
            positions.add(emplacement);
            if (verifyPos(emplacement) && (i == GRID_SIZE || placeQueen(i + 1))) 
            {
                bIsGoodPos = true;
            }
            else 
            {
                positions.remove(i - 1);
            }
        }

        return bIsGoodPos;
    }

    private static boolean verifyPos(Point position) 
    {
        boolean bIsSafe = true;
        for (Point point : positions) 
        {
            if (!point.equals(position)) 
            {
                if (position.y == point.y) 
                {
                    bIsSafe = false;
                }
                if (Math.abs(position.y - point.y) == Math.abs(position.x - point.x)) 
                {
                    bIsSafe = false;
                }
            }
        }

        return bIsSafe;
    }
}

Queen.cpp

#include <cmath>
#include <vector>
#include <iostream>

using namespace std;

struct Point
{
    int x, y;
    Point(int ii, int jj):x(ii), y(jj){}
};

vector<Point> positions;
int GRID_SIZE = 22;


bool verifyPos(Point position) 
{
    bool bIsSafe = true;
    for(int i = 0; i < positions.size(); ++i) 
    {
        Point point = positions[i];
        if(point.x != position.x || point.y != position.y) 
        {
            if(position.y == point.y) 
            {
                bIsSafe = false;
            }
            if(abs(position.y - point.y) == abs(position.x - point.x)) 
            {
                bIsSafe = false;
            }
        }
    }

    return bIsSafe;
}

bool placeQueen(int i) 
{
    bool bIsGoodPos = false;
    for(int j = 1; j <= GRID_SIZE && bIsGoodPos == false; j++) 
    {
        Point p(i, j);
        positions.push_back(p);
        if(verifyPos(p) && (i == GRID_SIZE || placeQueen(i + 1))) 
        {
            bIsGoodPos = true;
        }
        else 
        {
            positions.pop_back();
        }
    }
    return bIsGoodPos;
}


int main(void) 
{
    int i = 1;
    placeQueen(i);
    for(int i = 0; i < positions.size(); ++i) 
    {
        Point p = positions[i];
        cout << "(" << p.x << "; " << p.y << ")" << endl;
    }

    return 0;
}

Solution 6 - Java

C++ can do it in 21 ms (on a old core i7-860) if you use bit maps. For the timing run I commented out the showSoln() call since a graphic display of the chess board takes twice as long as finding the solution.

#include <iostream>
#include <iomanip>
#include <fstream>
#include <omp.h>			            //omp_get_wtime() is my favorite time function
using namespace std;

static const unsigned n(22);			//size of board
static_assert(n<32,"use long unsigned for bit masks if n > 32");
static const unsigned mask((1U<<n)-1);	//n wide bitmask with every bit set

void showSoln(unsigned* selCol, unsigned numSoln) {		//show a solution
    cout << "\nsolution " << numSoln << '\n';
    for (unsigned row=0; row<n; ++row) {
	    for (unsigned col=0; col<n; ++col)
		    cout << (col==selCol[row]? " Q": " .");
	    cout << '\n';
    }
}

void main() {
	//for each row bitmasks that show what columns are attacked, 1 bit means attacked
	unsigned ulAttack[n];			//cols attacked from upper left, shift right for next row
	unsigned upAttack[n];			//cols attacked from straight up, same for next row
	unsigned urAttack[n];			//cols attacked from upper right, shift left for next row
	unsigned allAttack[n];			//OR of all attacks on given row
	allAttack[0]= ulAttack[0]= upAttack[0]= urAttack[0]= 0; //no attacks on row 0
	unsigned row= 0;				//the row where now placing a queen 
	unsigned selCol[n];			    //for each row the selected column
	unsigned numSoln= 0;			//count of soutions found
	double wtime= omp_get_wtime();
	for (;;) {											//loop until find 1st (or all) solutions
		if (allAttack[row]!=mask) {						//if 'row' has a column not attacked
			unsigned long bit;
			_BitScanForward(&bit,~allAttack[row]);		//find lowest column not attacked
			//note - your compiler may have a different intrinsic for find lowest set bit
			selCol[row]= bit;							//remember selected column for this row
			unsigned move= 1U<<bit;						//convert selected column to bitmask
			allAttack[row]|= move;						//mark column attacked to prevent re-use
			if (row==n-1) {								//if move in last row have a soln
				++numSoln;
				showSoln(selCol,numSoln);
				break;									//remove this break if want all solutions
			} else {									//no solution yet, fill in rows below
				unsigned nrow= row+1;					//next row
				//from attacks on this row plus 'move' decide attacks on row below
				ulAttack[nrow]= (ulAttack[row] | move) >> 1;
				upAttack[nrow]= (upAttack[row] | move);
				urAttack[nrow]= ((urAttack[row] | move) << 1) & mask;
				allAttack[nrow]= ulAttack[nrow] | upAttack[nrow] | urAttack[nrow];
				row= nrow;								//go to next row
			}
		} else {				//else move on 'row' is impossible so backtrack
			if (!row)			//if backtrack from row 0 have found all solutions
				break;
			--row;				//try next move in prior row
		}
	}
	wtime= omp_get_wtime() - wtime;
	cout << "numSoln= " << numSoln << '\n';
	cout << "time= " << wtime*1000 << " msec\n";
}

Solution 7 - Java

Also, there is no reason to use float/doouble types for the coordinates.

You should gain performance if you do not force calling floating point abs library call in your C++

Java stores the Point coordinates as integer. The get functions return double, however this is probably easier to optimize away in Java, then in c++.

Solution 8 - Java

It seems that for code that requires not too much memory access and intensive computing the difference between Java an C++ is low. But in the opposite situation the difference looks amazing :

test.java

import java.lang.String;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.Scanner;

public class test{
	private static Random gna=new Random();
	
	public static double new_value(double value){
		if (value<0.5) value=0;
		return value*value;
	}
	
	public static void main(String[] args) {
		long start_time = System.currentTimeMillis();	
		List<Double> ze_list=new ArrayList();
		for (int i=0;i<1e8;i++){
			double temp=new_value(gna.nextDouble());
			ze_list.add(temp);
		}
		long end_time = System.currentTimeMillis();
		System.out.println("Time (s) :"+ ((end_time-start_time)/1000));
		Scanner input = new Scanner(System.in);
		String inputval = input.next();
	}
}

and compare it to test.cpp:

#include <iostream>
#include <vector>
#include <ctime>
#include <random>
using namespace std;

static default_random_engine dre1(time(0));
static uniform_real_distribution <double> urd1(0, 1);

static double new_value(double value){
	if (value<0.5) value=0;
	return value*value;
}

int main(void){
		time_t tbegin,tend;
		double texec=0;
		tbegin=time(NULL);
		vector<double> ze_list;
		for (int i=0;i<1e8;i++){
			double temp=new_value(urd1(dre1));
			ze_list.push_back(temp);
		}
		tend=time(NULL);
		texec=difftime(tend,tbegin);
		cout << "\nTime (s) " << texec << " s\n";
		int val=0;
		cin >> val;
	return 0;
}

I just tested it on my Mac :

  • the Java version took 90s and required 3.77 Go
  • the C++ programm took 2s and required only 770 Mo

Maybe there is a possibility to increase Java performances but I cannot see how.

Solution 9 - Java

Java passes objects to methods as references and those references are passed by value, but C++ passes objects by value.

You should change C++ code to make it same as Java (Pass pointers in C++ intstead of passing objects):

bool verifierNonPrise(Point* emplacement) // The correct way for passing arguments.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionrealUser404View Question on Stackoverflow
Solution 1 - JavaBrian BiView Answer on Stackoverflow
Solution 2 - JavaMartin YorkView Answer on Stackoverflow
Solution 3 - JavaNetVipeCView Answer on Stackoverflow
Solution 4 - JavaDurandalView Answer on Stackoverflow
Solution 5 - Javarep_movsdView Answer on Stackoverflow
Solution 6 - JavasnstrandView Answer on Stackoverflow
Solution 7 - JavaJakubView Answer on Stackoverflow
Solution 8 - Java3dmazeView Answer on Stackoverflow
Solution 9 - JavaAmir SaniyanView Answer on Stackoverflow