Performance issue: Java vs C++
JavaC++PerformanceAlgorithmJava Problem Overview
I have always heard that C++ was way more efficient than Java (and that is why most games are developed in C++).
I wrote a small algorithm to solve the "Eight queens puzzle" in both Java and C++, using the exact same algorithm, and then started to raise the number or squares. When reaching checkboards of 2020 or even 2222, it appears Java is much more effective (3 seconds vs 66 seconds for C++).
I have no idea why, but I am pretty beginning with C++, so it is possible I made some huge performance mistakes, so I will gladly accept any information that would help me understand what is happening.
Below is the code I use in Java:
import java.awt.Point;
import java.util.ArrayList;
import java.util.List;
public class HuitDames {
/**
* La liste des coordnnées des dames.
*/
private static List<Point> positions = new ArrayList<>();
/**
* Largeur de la grille.
*/
private static final int LARGEUR_GRILLE = 22;
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
int i = 1;
placerDame(i);
for (Point point : positions) {
System.out.println("(" + point.x + "; " + point.y + ")");
}
}
/**
* Place une dame et return true si la position est bonne.
* @param i le numéro de la dame.
* @return si la position est bonne.
*/
private static boolean placerDame(int i) {
boolean bonnePosition = false;
for (int j = 1; j <= LARGEUR_GRILLE && bonnePosition == false; j++) {
Point emplacement = new Point(i, j);
positions.add(emplacement);
if (verifierPrise(emplacement) && (i == LARGEUR_GRILLE || placerDame(i + 1))) {
bonnePosition = true;
}
else {
positions.remove(i - 1);
}
}
return bonnePosition;
}
/**
* Vérifie que la nouvelle position n'est pas en prise avec une position déjà présente.
* @param position la position de la nouvelle dame.
* @return Si la position convient par rapport aux positions des autres dames.
*/
private static boolean verifierPrise(Point position) {
boolean nonPrise = true;
for (Point point : positions) {
if (!point.equals(position)) {
// Cas où sur la même colonne.
if (position.y == point.y) {
nonPrise = false;
}
// Cas où sur même diagonale.
if (Math.abs(position.y - point.y) == Math.abs(position.x - point.x)) {
nonPrise = false;
}
}
}
return nonPrise;
}
}
And below is the code in C++:
#include <iostream>
#include <list>
#include <math.h>
#include <stdlib.h>
using namespace std;
// Class to represent points.
class Point {
private:
double xval, yval;
public:
// Constructor uses default arguments to allow calling with zero, one,
// or two values.
Point(double x = 0.0, double y = 0.0) {
xval = x;
yval = y;
}
// Extractors.
double x() { return xval; }
double y() { return yval; }
};
#define LARGEUR_GRILLE 22
list<Point> positions;
bool verifierNonPrise(Point emplacement) {
bool nonPrise = true;
for (list<Point>::iterator it = positions.begin(); it!= positions.end(); it++) {
if (it->x() != emplacement.x()) {
if (it->y() == emplacement.y()) {
nonPrise = false;
}
if (abs(it->y() - emplacement.y()) == abs(it->x() - emplacement.x())) {
nonPrise = false;
}
}
}
return nonPrise;
}
bool placerDame(int i) {
bool bonnePosition = false;
for (int j = 1; j <= LARGEUR_GRILLE && !bonnePosition; j++) {
Point emplacement(i,j);
positions.push_back(emplacement);
if (verifierNonPrise(emplacement) && (i == LARGEUR_GRILLE || placerDame(i + 1))) {
bonnePosition = true;
}
else {
positions.pop_back();
}
}
return bonnePosition;
}
int main()
{
int i = 1;
placerDame(i);
for (list<Point>::iterator it = positions.begin(); it!= positions.end(); it++) {
cout << "(" << it->x() << "; " << it->y() << ")" << endl;
}
return 0;
}
Java Solutions
Solution 1 - Java
std::list
in C++ is a linked list, whereas java.util.ArrayList
is an array. Try replacing std::list
by std::vector
. Also, be sure to compile with optimization turned on.
Solution 2 - Java
##Updates: ###Changes to C++
- As written:
Compilation Failure - Replace math.h => cmath
27610 milliseconds - Add -O3 flag
4416 milliseconds - Replace std::list with std::vector
2294 milliseconds - Replace Point with std::pair
2384 milliseconds - Made verifierNonPrise const correct
2351 milliseconds - Replaced loop in verifierNonPrise with
std::find_if
929 milliseconds - Replacing double with int (to make it equiv to Java)
549 milliseconds
###Changes to Java
- As written
3459 milliseconds - Changes
verifierNonPrise
early return
368 milliseconds
##Java Vs C++
> javac HuitDames.java
> time java HuitDames
real 0m0.368s
user 0m0.436s
sys 0m0.042s
> g++ -O3 -std=c++11 HuitDames.cpp
> time ./a.out
real 0m0.541s
user 0m0.539s
sys 0m0.002s
##Final Code:
#include <iostream>
#include <vector>
#include <cmath>
#include <stdlib.h>
#include <chrono>
#include <algorithm>
using namespace std;
typedef std::pair<int, int> Point;
#define LARGEUR_GRILLE 22
vector<Point> positions;
bool verifierNonPrise(Point const& emplacement) {
return std::find_if(positions.begin(), positions.end(), [&emplacement](Point const& val){
if (val.first != emplacement.first) {
if ((val.second == emplacement.second) || (abs(val.second - emplacement.second) == abs(val.first - emplacement.first))) {
return true;
}
}
return false;
}) == positions.end();
}
bool placerDame(int i) {
bool bonnePosition = false;
for (int j = 1; j <= LARGEUR_GRILLE && !bonnePosition; j++) {
Point emplacement(i,j);
positions.push_back(emplacement);
if (verifierNonPrise(emplacement) && (i == LARGEUR_GRILLE || placerDame(i + 1))) {
bonnePosition = true;
}
else {
positions.pop_back();
}
}
return bonnePosition;
}
int main()
{
using std::chrono::system_clock;
system_clock::time_point begin_time = system_clock::now();
int i = 1;
placerDame(i);
for (vector<Point>::iterator it = positions.begin(); it!= positions.end(); it++) {
cout << "(" << it->first << "; " << it->second << ")" << endl;
}
system_clock::time_point end_time = system_clock::now();
long long elapsed_milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - begin_time).count();
cout << "Duration (milliseconds): "
<< elapsed_milliseconds
<< std::endl;
}
Solution 3 - Java
Test this version, updated using C++11 features. Tested in GCC 4.9.0 with -std=c++11
. Tested on Celeron 1.6 GHz, 512 MB RAM.
Times in my PC:
Original: Duration (milliseconds): 12658
First Version: Duration (milliseconds): 3616
Optimized Version: Duration (milliseconds): 1745
Changes are:
- Using
vector
instead oflist
Benchmark, and Words from Stroustrup. - Using const whatever we can, the compiler is able to optimize much more if it known that the value don't change.
- Using std::pair instead of Point.
- Using new for-loop with constant iterators.
Source:
#include <iostream>
#include <vector>
#include <chrono>
#include <iomanip>
using namespace std;
typedef std::pair<int, int> Point;
#define LARGEUR_GRILLE 22
vector<Point> positions;
bool verifierNonPrise(const Point& emplacement) {
bool nonPrise = true;
for (const auto& p : positions) {
if (p.first != emplacement.first) {
if (p.second == emplacement.second) {
nonPrise = false;
}
if (abs(p.second - emplacement.second) ==
abs(p.first - emplacement.first)) {
nonPrise = false;
}
}
}
return nonPrise;
}
bool placerDame(int i) {
bool bonnePosition = false;
for (int j = 1; j <= LARGEUR_GRILLE && !bonnePosition; j++) {
Point emplacement(i, j);
positions.emplace_back(emplacement);
if (verifierNonPrise(emplacement) &&
(i == LARGEUR_GRILLE || placerDame(i + 1))) {
bonnePosition = true;
} else {
positions.pop_back();
}
}
return bonnePosition;
}
int main(int argc, char* argv[]) {
std::chrono::system_clock::time_point begin_time =
std::chrono::system_clock::now();
positions.reserve(LARGEUR_GRILLE);
placerDame(1);
for (const auto& p : positions) {
cout << "(" << p.first << "; " << p.second << ")" << endl;
}
std::chrono::system_clock::time_point end_time =
std::chrono::system_clock::now();
long long elapsed_milliseconds =
std::chrono::duration_cast<std::chrono::milliseconds>(
end_time - begin_time).count();
std::cout << "Duration (milliseconds): " << elapsed_milliseconds
<< std::endl;
return 0;
}
Some more deep changes.
Changes include:
- Returning as early as possible. As soon as the queen can not be placed.
- Returning to a simpler Point class.
- Using find_if algorithm for searching queen placement.
Source (some recommendation updated):
#include <algorithm>
#include <iostream>
#include <vector>
#include <chrono>
#include <iomanip>
using namespace std;
struct Point {
int x, y;
};
#define LARGEUR_GRILLE 22
vector<Point> positions;
bool verifierNonPrise(const Point& emplacement) {
return find_if(positions.cbegin(), positions.cend(), [&emplacement](const Point& p) {
return (p.x != emplacement.x &&
(p.y == emplacement.y ||
abs(p.y - emplacement.y) == abs(p.x - emplacement.x)));
}) == positions.cend();
}
bool placerDame(int i) {
for (int j = 1; j <= LARGEUR_GRILLE; j++) {
Point emplacement{i, j};
positions.push_back(emplacement);
if (verifierNonPrise(emplacement) &&
(i == LARGEUR_GRILLE || placerDame(i + 1))) {
return true;
} else {
positions.pop_back();
}
}
return false;
}
int main(int argc, char* argv[]) {
std::chrono::system_clock::time_point begin_time =
std::chrono::system_clock::now();
positions.reserve(LARGEUR_GRILLE);
placerDame(1);
for (const auto& p : positions) {
cout << "(" << p.x << "; " << p.y << ")" << endl;
}
std::chrono::system_clock::time_point end_time =
std::chrono::system_clock::now();
long long elapsed_milliseconds =
std::chrono::duration_cast<std::chrono::milliseconds>(
end_time - begin_time).count();
std::cout << "Duration (milliseconds): " << elapsed_milliseconds
<< std::endl;
return 0;
}
Solution 4 - Java
Comparing a managed, dynamically compiled language like Java to a statically compiled language like C++ is very difficult.
You will always be comparing apples to oranges in a sense, because they are conceptually very different. It starts with the use of the standard libraries (ArrayList vs std::list/vector) that will have potentially wildly different performance characteristics, even your code looks similar in both languages.
Then there is the inherent problem with microbenchmarks in Java (short test in Java are always slower because the JIT will observe program flow before it decides what and how it is to be compiled). Same goes for compiler options for C++, even the structure of the source code (independently compiled and linked classes versus single file source) can make a significant difference (because it changes the amount of "insight" the C++ compiler has into the other classes).
Next is the general difference in memory management, garbage collection vs manual memory management (smart pointers etc. are still considered manual memory management).
Not to mention the general language differences like you need to explicitly declare a method virtual in C++, while in Java every member method is virtual by default (working out if it's really virtual at runtime is left to the VM).
With all those differences there will always be cases where one langauge will have a massive advantage over the other. A simple test with very limited scope (like your test here) says very little about each language as a whole.
Another point people often tend to ignore is: How productive can you be with a language - speed isn't everything (look a how sucessful script langages are in some domains, despite being hardly competive when looking only at excution speed). Lack of performance can be crippling, but so can be low productivity.
Solution 5 - Java
I may be beating a dead horse here, but simply doing a line by line translation of the Java to C++, not even using const reference parameters or any such thing, you can see the C++ is almost twice as fast as Java. All the "syntactic optimization" emplacing etc. has little effect if any...
rep ~/Documents $ g++ -O3 Queen.cpp
rep ~/Documents $ javac Queen.java
rep ~/Documents $ time java Queen
(1; 1)
(2; 3)
(3; 5)
(4; 2)
(5; 4)
(6; 10)
(7; 14)
(8; 17)
(9; 20)
(10; 13)
(11; 19)
(12; 22)
(13; 18)
(14; 8)
(15; 21)
(16; 12)
(17; 9)
(18; 6)
(19; 16)
(20; 7)
(21; 11)
(22; 15)
real 0m4.806s
user 0m4.857s
sys 0m0.067s
rep ~/Documents $ time ./a.out
(1; 1)
(2; 3)
(3; 5)
(4; 2)
(5; 4)
(6; 10)
(7; 14)
(8; 17)
(9; 20)
(10; 13)
(11; 19)
(12; 22)
(13; 18)
(14; 8)
(15; 21)
(16; 12)
(17; 9)
(18; 6)
(19; 16)
(20; 7)
(21; 11)
(22; 15)
real 0m2.131s
user 0m2.113s
sys 0m0.000s
rep ~/Documents $
Queen.java (translated to english)
import java.awt.Point;
import java.util.ArrayList;
import java.util.List;
public class Queen {
private static List<Point> positions = new ArrayList<>();
private static final int GRID_SIZE = 22;
public static void main(String[] args)
{
int i = 1;
placeQueen(i);
for (Point point : positions)
{
System.out.println("(" + point.x + "; " + point.y + ")");
}
}
private static boolean placeQueen(int i)
{
boolean bIsGoodPos = false;
for (int j = 1; j <= GRID_SIZE && bIsGoodPos == false; j++)
{
Point emplacement = new Point(i, j);
positions.add(emplacement);
if (verifyPos(emplacement) && (i == GRID_SIZE || placeQueen(i + 1)))
{
bIsGoodPos = true;
}
else
{
positions.remove(i - 1);
}
}
return bIsGoodPos;
}
private static boolean verifyPos(Point position)
{
boolean bIsSafe = true;
for (Point point : positions)
{
if (!point.equals(position))
{
if (position.y == point.y)
{
bIsSafe = false;
}
if (Math.abs(position.y - point.y) == Math.abs(position.x - point.x))
{
bIsSafe = false;
}
}
}
return bIsSafe;
}
}
Queen.cpp
#include <cmath>
#include <vector>
#include <iostream>
using namespace std;
struct Point
{
int x, y;
Point(int ii, int jj):x(ii), y(jj){}
};
vector<Point> positions;
int GRID_SIZE = 22;
bool verifyPos(Point position)
{
bool bIsSafe = true;
for(int i = 0; i < positions.size(); ++i)
{
Point point = positions[i];
if(point.x != position.x || point.y != position.y)
{
if(position.y == point.y)
{
bIsSafe = false;
}
if(abs(position.y - point.y) == abs(position.x - point.x))
{
bIsSafe = false;
}
}
}
return bIsSafe;
}
bool placeQueen(int i)
{
bool bIsGoodPos = false;
for(int j = 1; j <= GRID_SIZE && bIsGoodPos == false; j++)
{
Point p(i, j);
positions.push_back(p);
if(verifyPos(p) && (i == GRID_SIZE || placeQueen(i + 1)))
{
bIsGoodPos = true;
}
else
{
positions.pop_back();
}
}
return bIsGoodPos;
}
int main(void)
{
int i = 1;
placeQueen(i);
for(int i = 0; i < positions.size(); ++i)
{
Point p = positions[i];
cout << "(" << p.x << "; " << p.y << ")" << endl;
}
return 0;
}
Solution 6 - Java
C++ can do it in 21 ms (on a old core i7-860) if you use bit maps. For the timing run I commented out the showSoln() call since a graphic display of the chess board takes twice as long as finding the solution.
#include <iostream>
#include <iomanip>
#include <fstream>
#include <omp.h> //omp_get_wtime() is my favorite time function
using namespace std;
static const unsigned n(22); //size of board
static_assert(n<32,"use long unsigned for bit masks if n > 32");
static const unsigned mask((1U<<n)-1); //n wide bitmask with every bit set
void showSoln(unsigned* selCol, unsigned numSoln) { //show a solution
cout << "\nsolution " << numSoln << '\n';
for (unsigned row=0; row<n; ++row) {
for (unsigned col=0; col<n; ++col)
cout << (col==selCol[row]? " Q": " .");
cout << '\n';
}
}
void main() {
//for each row bitmasks that show what columns are attacked, 1 bit means attacked
unsigned ulAttack[n]; //cols attacked from upper left, shift right for next row
unsigned upAttack[n]; //cols attacked from straight up, same for next row
unsigned urAttack[n]; //cols attacked from upper right, shift left for next row
unsigned allAttack[n]; //OR of all attacks on given row
allAttack[0]= ulAttack[0]= upAttack[0]= urAttack[0]= 0; //no attacks on row 0
unsigned row= 0; //the row where now placing a queen
unsigned selCol[n]; //for each row the selected column
unsigned numSoln= 0; //count of soutions found
double wtime= omp_get_wtime();
for (;;) { //loop until find 1st (or all) solutions
if (allAttack[row]!=mask) { //if 'row' has a column not attacked
unsigned long bit;
_BitScanForward(&bit,~allAttack[row]); //find lowest column not attacked
//note - your compiler may have a different intrinsic for find lowest set bit
selCol[row]= bit; //remember selected column for this row
unsigned move= 1U<<bit; //convert selected column to bitmask
allAttack[row]|= move; //mark column attacked to prevent re-use
if (row==n-1) { //if move in last row have a soln
++numSoln;
showSoln(selCol,numSoln);
break; //remove this break if want all solutions
} else { //no solution yet, fill in rows below
unsigned nrow= row+1; //next row
//from attacks on this row plus 'move' decide attacks on row below
ulAttack[nrow]= (ulAttack[row] | move) >> 1;
upAttack[nrow]= (upAttack[row] | move);
urAttack[nrow]= ((urAttack[row] | move) << 1) & mask;
allAttack[nrow]= ulAttack[nrow] | upAttack[nrow] | urAttack[nrow];
row= nrow; //go to next row
}
} else { //else move on 'row' is impossible so backtrack
if (!row) //if backtrack from row 0 have found all solutions
break;
--row; //try next move in prior row
}
}
wtime= omp_get_wtime() - wtime;
cout << "numSoln= " << numSoln << '\n';
cout << "time= " << wtime*1000 << " msec\n";
}
Solution 7 - Java
Also, there is no reason to use float/doouble types for the coordinates.
You should gain performance if you do not force calling floating point abs library call in your C++
Java stores the Point coordinates as integer. The get functions return double, however this is probably easier to optimize away in Java, then in c++.
Solution 8 - Java
It seems that for code that requires not too much memory access and intensive computing the difference between Java an C++ is low. But in the opposite situation the difference looks amazing :
test.java
import java.lang.String;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.Scanner;
public class test{
private static Random gna=new Random();
public static double new_value(double value){
if (value<0.5) value=0;
return value*value;
}
public static void main(String[] args) {
long start_time = System.currentTimeMillis();
List<Double> ze_list=new ArrayList();
for (int i=0;i<1e8;i++){
double temp=new_value(gna.nextDouble());
ze_list.add(temp);
}
long end_time = System.currentTimeMillis();
System.out.println("Time (s) :"+ ((end_time-start_time)/1000));
Scanner input = new Scanner(System.in);
String inputval = input.next();
}
}
and compare it to test.cpp:
#include <iostream>
#include <vector>
#include <ctime>
#include <random>
using namespace std;
static default_random_engine dre1(time(0));
static uniform_real_distribution <double> urd1(0, 1);
static double new_value(double value){
if (value<0.5) value=0;
return value*value;
}
int main(void){
time_t tbegin,tend;
double texec=0;
tbegin=time(NULL);
vector<double> ze_list;
for (int i=0;i<1e8;i++){
double temp=new_value(urd1(dre1));
ze_list.push_back(temp);
}
tend=time(NULL);
texec=difftime(tend,tbegin);
cout << "\nTime (s) " << texec << " s\n";
int val=0;
cin >> val;
return 0;
}
I just tested it on my Mac :
- the Java version took 90s and required 3.77 Go
- the C++ programm took 2s and required only 770 Mo
Maybe there is a possibility to increase Java performances but I cannot see how.
Solution 9 - Java
Java passes objects to methods as references and those references are passed by value, but C++ passes objects by value.
You should change C++ code to make it same as Java (Pass pointers in C++ intstead of passing objects):
bool verifierNonPrise(Point* emplacement) // The correct way for passing arguments.