Storing sex (gender) in database

SqlDatabase Design

Sql Problem Overview


I want to store a user's gender in a database with as little (size/performance) cost as possible.

So far, 3 scenarios come to mind

  1. Int - aligned with Enum in code (1 = Male, 2 = Female, 3 = ...)
  2. char(1) - Store m, f or another single character identifier
  3. Bit (boolean) - is there an appropriate field name for this option?

The reason I ask is because of this answer which mentions that chars are smaller than booleans.

I should clarify that I'm using MS SQL 2008, which DOES in fact have the bit datatype.

Sql Solutions


Solution 1 - Sql

There is already an ISO standard for this; no need to invent your own scheme:

http://en.wikipedia.org/wiki/ISO_5218

Per the standard, the column should be called "Sex" and the 'closest' data type would be tinyint with a CHECK constraint or lookup table as appropriate.

Solution 2 - Sql

I'd call the column "gender".

Data Type   Bytes Taken          Number/Range of Values
------------------------------------------------
TinyINT     1                    255 (zero to 255)
INT         4            -       2,147,483,648 to 2,147,483,647
BIT         1 (2 if 9+ columns)  2 (0 and 1)
CHAR(1)     1                    26 if case insensitive, 52 otherwise

The BIT data type can be ruled out because it only supports two possible genders which is inadequate. While INT supports more than two options, it takes 4 bytes -- performance will be better with a smaller/more narrow data type.

CHAR(1) has the edge over TinyINT - both take the same number of bytes, but CHAR provides a more narrow number of values. Using CHAR(1) would make using "m", "f",etc natural keys, vs the use of numeric data which are referred to as surrogate/artificial keys. CHAR(1) is also supported on any database, should there be a need to port.

Conclusion

I would use Option 2: CHAR(1).

Addendum

An index on the gender column likely would not help because there's no value in an index on a low cardinality column. Meaning, there's not enough variety in the values for the index to provide any value.

Solution 3 - Sql

In medicine there are four genders: male, female, indeterminate, and unknown. You mightn't need all four but you certainly need 1, 2, and 4. It's not appropriate to have a default value for this datatype. Even less to treat it as a Boolean with 'is' and 'isn't' states.

Solution 4 - Sql

An Int (or TinyInt) aligned to an Enum field would be my methodology.

First, if you have a single bit field in a database, the row will still use a full byte, so as far as space savings, it only pays off if you have multiple bit fields.

Second, strings/chars have a "magic value" feel to them, regardless of how obvious they may seem at design time. Not to mention, it lets people store just about any value they would not necessarily map to anything obvious.

Third, a numeric value is much easier (and better practice) to create a lookup table for, in order to enforce referential integrity, and can correlate 1-to-1 with an enum, so there is parity in storing the value in memory within the application or in the database.

Solution 5 - Sql

Option 3 is your best bet, but not all DB engines have a "bit" type. If you don't have a bit, then TinyINT would be your best bet.

Solution 6 - Sql

I use char 'f', 'm' and 'u' because I surmise the gender from name, voice and conversation, and sometimes don't know the gender. The final determination is their opinion.

It really depends how well you know the person and whether your criteria is physical form or personal identity. A psychologist might need additional options - cross to female, cross to male, trans to female, trans to male, hermaphrodite and undecided. With 9 options, not clearly defined by a single character, I might go with Hugo's advice of tiny integer.

Solution 7 - Sql

CREATE TABLE Admission (
    Rno INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(25) NOT NULL,
    Gender ENUM('M','F'),
    Boolean_Valu boolean,
    Dob Date,
    Fees numeric(7,2) NOT NULL
);




insert into Admission (Name,Gender,Boolean_Valu,Dob,Fees)values('Raj','M',true,'1990-07-12',50000);
insert into Admission (Name,Gender,Boolean_Valu,Dob,Fees)values('Rani','F',false,'1994-05-10',15000);
select * from admission;

enter link description here

Solution 8 - Sql

I would go with Option 3 but multiple NON NULLABLE bit columns instead of one. IsMale (1=Yes / 0=No) IsFemale (1=Yes / 0=No)

if requried: IsUnknownGender (1=Yes / 0=No) and so on...

This makes for easy reading of the definitions, easy extensibility, easy programmability, no possibility of using values outside the domain and no requirement of a second lookup table+FK or CHECK constraints to lock down the values.

EDIT: Correction, you do need at least one constraint to ensure the set flags are valid.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMarkoView Question on Stackoverflow
Solution 1 - SqlPondlifeView Answer on Stackoverflow
Solution 2 - SqlOMG PoniesView Answer on Stackoverflow
Solution 3 - Sqluser207421View Answer on Stackoverflow
Solution 4 - SqlHugoView Answer on Stackoverflow
Solution 5 - Sqlajacian81View Answer on Stackoverflow
Solution 6 - SqlzaracView Answer on Stackoverflow
Solution 7 - SqlMohammad AsifView Answer on Stackoverflow
Solution 8 - SqlHansLindgrenView Answer on Stackoverflow