C job interview - casting and comparing

CCastingCharInt

C Problem Overview


I was confronted with a tricky (IMO) question. I needed to compare two MAC addresses, in the most efficient manner.

The only thought that crossed my mind in that moment was the trivial solution - a for loop, and comparing locations, and so I did, but the interviewer was aiming to casting.

The MAC definition:

typedef struct macA {
   char data[6];
} MAC;

And the function is (the one I was asked to implement):

int isEqual(MAC* addr1, MAC* addr2)
{
    int i;

    for(i = 0; i<6; i++)
    {
        if(addr1->data[i] != addr2->data[i]) 
            return 0;
    }
    return 1;
}

But as mentioned, he was aiming for casting.

Meaning, to somehow cast the MAC address given to an int, compare both of the addresses, and return.

But when casting, int int_addr1 = (int)addr1;, only four bytes will be casted, right? Should I check the remaining ones? Meaning locations 4 and 5?

Both char and int are integer types so casting is legal, but what happens in the described situation?

C Solutions


Solution 1 - C

If he is really dissatisfied with this approach (which is essentially a brain fart, since you aren't comparing megabytes or gigabytes of data, so one shan't really be worrying about "efficiency" and "speed" in this case), just tell him that you trust the quality and speed of the standard library:

int isEqual(MAC* addr1, MAC* addr2)
{
    return memcmp(&addr1->data, &addr2->data, sizeof(addr1->data)) == 0;
}

Solution 2 - C

If your interviewer demands that you produce undefined behavior, I would probably look for a job elsewhere.

The correct initial approach would be to store the MAC address in something like a uint64_t, at least in-memory. Then comparisons would be trivial, and implementable efficiently.

Solution 3 - C

Cowboy time:

typedef struct macA {
   char data[6];
} MAC;

typedef struct sometimes_works {
   long some;
   short more;
} cast_it

typedef union cowboy
{
    MAC real;
    cast_it hack;
} cowboy_cast;

int isEqual(MAC* addr1, MAC* addr2)
{
     assert(sizeof(MAC) == sizeof(cowboy_cast));  // Can't be bigger
     assert(sizeof(MAC) == sizeof(cast_it));      // Can't be smaller

     if ( ( ((cowboy_cast *)addr1)->hack.some == ((cowboy_cast *)addr2)->hack.some )
       && ( ((cowboy_cast *)addr1)->hack.more == ((cowboy_cast *)addr2)->hack.more ) )
             return (0 == 0);

     return (0 == 42);
}

Solution 4 - C

There is nothing wrong with an efficient implementation, for all you know this has been determined to be hot code that is called many many times. And in any case, its okay for interview questions to have odd constraints.

Logical AND is a priori a branching instruction due to short-circuit evaluation even if it doesn't compile this way, so lets avoid it, we don't need it. Nor do we need to convert our return value to a true bool (true or false, not 0 or anything that's not zero).

Here is a fast solution on 32-bit: XOR will capture the differences, OR will record difference in both parts, and NOT will negate the condition into EQUALS, not UNEQUAL. The LHS and RHS are independent computations, so a superscalar processor can do this in parallel.

int isEqual(MAC* addr1, MAC* addr2)
{
    return ~((*(int*)addr2 ^ *(int*)addr1) | (int)(((short*)addr2)[2] ^ ((short*)addr1)[2]));
}

EDIT
The purpose of the above code was to show that this could be done efficiently without branching. Comments have pointed out this C++ classifies this as undefined behavior. While true, VS handles this fine. Without changing the interviewer's struct definition and function signature, in order to avoid undefined behavior an extra copy must be made. So the non-undefined behavior way without branching but with an extra copy would be as follows:

int isEqual(MAC* addr1, MAC* addr2)
{
    struct IntShort
    {
        int   i;
        short s;
    };

    union MACU
    {
        MAC addr;
        IntShort is;
    };

    MACU u1;
    MACU u2;
    
    u1.addr = *addr1; // extra copy
    u2.addr = *addr2; // extra copy

    return ~((u1.is.i ^ u2.is.i) | (int)(u1.is.s ^ u2.is.s)); // still no branching
}

Solution 5 - C

This would work on most systems,and be faster than your solution.

int isEqual(MAC* addr1, MAC* addr2)
{
    return ((int32*)addr1)[0] == ((int32*)addr2)[0] &&  ((int16*)addr1)[2] == ((int16*)addr2)[2];
}

would inline nicely too, could be handy at the center of loop on a system where you can check the details are viable.

Solution 6 - C

Non-portable casting solution.

In a platform I use (PIC24 based), there is a type int48, so making a safe assumption char is 8 bits and the usual alignment requirements:

int isEqual(MAC* addr1, MAC* addr2) {
  return *((int48_t*) &addr1->data) == *((int48_t*) &addr2->data);
}

Of course, this is not usable on many platforms, but then so are a number of solutions that are not portable either, depending on assumed int size, no padding, etc.

The highest portable solution (and reasonably fast given a good compiler) is the memcmp() offered by @H2CO3.

Going to a higher design level and using a wide enough integer type like uint64_t instead of struct macA, as suggested by Kerrek SB, is very appealing.

Solution 7 - C

To do type punning correctly you have to use an union. Otherwise you will break the rules strict aliasing which certain compilers follow, and the result will be undefined.

int EqualMac( MAC* a , MAC* b )
{
	union
	{
        MAC m ;
	    uint16_t w[3] ;

	} ua , ub ;

	ua.m = *a ;
	ub.m = *b ;

	if( ua.w[0] != ub.w[0] )  
		return 0 ;

	if( ua.w[1] != ub.w[1] )
		return 0 ;

	if( ua.w[2] != ub.w[2] )
		return 0 ;

return 1 ;
}

According to C99 it is safe to read from an union member that is not the last used to store a value in it.

>If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Solution 8 - C

You have a MAC structure (which contains an array of 6 bytes),

typedef struct {
    char data[6];
} MAC;

Which agrees with this article about typedef for fixed length byte array.

The naive approach would be to assume the MAC address is word aligned (which is probably what the interviewer wanted), albeit not guaranteed.

typedef unsigned long u32;
typedef   signed long s32;
typedef unsigned short u16;
typedef   signed short s16;

int
MACcmp(MAC* mac1, MAC* mac2)
{
    if(!mac1 || !mac2) return(-1); //check for NULL args
    u32 m1 = *(u32*)mac1->data;
    U32 m2 = *(u32*)mac2->data;
    if( m1 != m2 ) return (s32)m1 - (s32)m2;
    u16 m3 = *(u16*)(mac1->data+4);
    u16 m2 = *(u16*)(mac2->data+4);
    return (s16)m3 - (s16)m4;
}

Slightly safer would be to interpret the char[6] as a short[3] (MAC more likely to be aligned on even byte boundaries than odd),

typedef unsigned short u16;
typedef   signed short s16;

int
MACcmp(MAC* mac1, MAC* mac2)
{
    if(!mac1 || !mac2) return(-1); //check for NULL args
    u16* p1 = (u16*)mac1->data;
    u16* p2 = (u16*)mac2->data;
    for( n=0; n<3; ++n ) {
        if( *p1 != *p2 ) return (s16)*p1 - (s16)*p2;
    }
    return(0);
}

Assume nothing, and copy to word aligned storage, but the only reason for typecasting here is to satisfy the interviewer,

typedef unsigned short u16;
typedef   signed short s16;

int
MACcmp(MAC* mac1, MAC* mac2)
{
    if(!mac1 || !mac2) return(-1); //check for NULL args
    u16 m1[3]; u16 p2[3];
    memcpy(m1,mac1->data,6);
    memcpy(m2,mac2->data,6);
    for( n=0; n<3; ++n ) {
        if( m1[n] != m2[n] ) return (s16)m1[n] - (s16)m2[n];
    }
    return(0);
}

Save yourself lots of work,

int
MACcmp(MAC* mac1, MAC* mac2)
{
    if(!mac1 || !mac2) return(-1);
    return memcmp(mac1->data,mac2->data,6);
}

Solution 9 - C

Function memcmp will eventually do the loop itself. So by using it, you would basically just make things less efficient (due to the additional function-call).

Here is an optional solution:

typedef struct
{
	int x;
	short y;
}
MacAddr;

int isEqual(MAC* addr1, MAC* addr2)
{
	return *(MacAddr*)addr1 == *(MacAddr*)addr2;
}

The compiler will most likely convert this code into two comparisons, since the MacAddr structure contains two fields.

Cavity: unless your CPU supports unaligned load/store operations, addr1 and addr2 must be aligned to 4 bytes (i.e., they must be located in addresses that are divisible by 4). Otherwise, a memory access violation will most likely occur when the function is executed.

You may divide the structure into 3 fields of 2 bytes each, or 6 fields of 1 byte each (reducing the alignment restriction to 2 or 1 respectively). But bare in mind that a single comparison in your source code is not necessarily a single comparison in the executable image (i.e., during runtime).

BTW, unaligned load/store operations by themselves may add runtime latency, if they require more "nops" in the CPU pipeline. This is really a matter of CPU architecture, which I doubt they meant to "dig into" that far in your job interview. However, in order to assert that the compiled code does not contain such operations (if indeed they are "expensive"), you could ensure that the variables are always aligned to 8 bytes AND add a #pragma (compiler directive) telling the compiler "not to worry about this".

Solution 10 - C

May be he had in mind a definition of MAC that used unsigned char and was thinking to:

int isEqual(MAC* addr1, MAC* addr2) { return strncmp((*addr1).data,(*addr2).data,6)==0; }

which implies a cast from (unsigned char *) to (char *). Anyway bad question.

Solution 11 - C

By the way, for those truly looking for a performant answer, the following is branchless, and while it does more fetches (one per char), they should all be from the same cache line, so not very expensive.

int isEqual(MAC* addr1, MAC* addr2)
{
    return
      (addr1->data[0] == addr2->data[0])
    & (addr1->data[1] == addr2->data[1])
    & (addr1->data[2] == addr2->data[2])
    & (addr1->data[3] == addr2->data[3])
    & (addr1->data[4] == addr2->data[4])
    & (addr1->data[5] == addr2->data[5])
    ;
}

See it live (and branchless) here

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionItzik984View Question on Stackoverflow
Solution 1 - Cuser529758View Answer on Stackoverflow
Solution 2 - CKerrek SBView Answer on Stackoverflow
Solution 3 - CGlenn TeitelbaumView Answer on Stackoverflow
Solution 4 - CAprioriView Answer on Stackoverflow
Solution 5 - CRichardPlunkettView Answer on Stackoverflow
Solution 6 - Cchux - Reinstate MonicaView Answer on Stackoverflow
Solution 7 - CthisView Answer on Stackoverflow
Solution 8 - CChuckCottrillView Answer on Stackoverflow
Solution 9 - Cbarak manosView Answer on Stackoverflow
Solution 10 - Cuser1708042View Answer on Stackoverflow
Solution 11 - CGlenn TeitelbaumView Answer on Stackoverflow