Storing sex (gender) in database
SqlDatabase DesignSql Problem Overview
I want to store a user's gender in a database with as little (size/performance) cost as possible.
So far, 3 scenarios come to mind
- Int - aligned with Enum in code (1 = Male, 2 = Female, 3 = ...)
- char(1) - Store m, f or another single character identifier
- Bit (boolean) - is there an appropriate field name for this option?
The reason I ask is because of this answer which mentions that chars are smaller than booleans.
I should clarify that I'm using MS SQL 2008, which DOES in fact have the bit datatype.
Sql Solutions
Solution 1 - Sql
There is already an ISO standard for this; no need to invent your own scheme:
http://en.wikipedia.org/wiki/ISO_5218
Per the standard, the column should be called "Sex" and the 'closest' data type would be tinyint with a CHECK constraint or lookup table as appropriate.
Solution 2 - Sql
I'd call the column "gender".
Data Type Bytes Taken Number/Range of Values
------------------------------------------------
TinyINT 1 255 (zero to 255)
INT 4 - 2,147,483,648 to 2,147,483,647
BIT 1 (2 if 9+ columns) 2 (0 and 1)
CHAR(1) 1 26 if case insensitive, 52 otherwise
The BIT data type can be ruled out because it only supports two possible genders which is inadequate. While INT supports more than two options, it takes 4 bytes -- performance will be better with a smaller/more narrow data type.
CHAR(1)
has the edge over TinyINT - both take the same number of bytes, but CHAR provides a more narrow number of values. Using CHAR(1)
would make using "m", "f",etc natural keys, vs the use of numeric data which are referred to as surrogate/artificial keys. CHAR(1)
is also supported on any database, should there be a need to port.
Conclusion
I would use Option 2: CHAR(1).
Addendum
An index on the gender column likely would not help because there's no value in an index on a low cardinality column. Meaning, there's not enough variety in the values for the index to provide any value.
Solution 3 - Sql
In medicine there are four genders: male, female, indeterminate, and unknown. You mightn't need all four but you certainly need 1, 2, and 4. It's not appropriate to have a default value for this datatype. Even less to treat it as a Boolean with 'is' and 'isn't' states.
Solution 4 - Sql
An Int
(or TinyInt
) aligned to an Enum
field would be my methodology.
First, if you have a single bit
field in a database, the row will still use a full byte, so as far as space savings, it only pays off if you have multiple bit
fields.
Second, strings/chars have a "magic value" feel to them, regardless of how obvious they may seem at design time. Not to mention, it lets people store just about any value they would not necessarily map to anything obvious.
Third, a numeric value is much easier (and better practice) to create a lookup table for, in order to enforce referential integrity, and can correlate 1-to-1 with an enum, so there is parity in storing the value in memory within the application or in the database.
Solution 5 - Sql
Option 3 is your best bet, but not all DB engines have a "bit" type. If you don't have a bit, then TinyINT would be your best bet.
Solution 6 - Sql
I use char 'f', 'm' and 'u' because I surmise the gender from name, voice and conversation, and sometimes don't know the gender. The final determination is their opinion.
It really depends how well you know the person and whether your criteria is physical form or personal identity. A psychologist might need additional options - cross to female, cross to male, trans to female, trans to male, hermaphrodite and undecided. With 9 options, not clearly defined by a single character, I might go with Hugo's advice of tiny integer.
Solution 7 - Sql
CREATE TABLE Admission (
Rno INT PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR(25) NOT NULL,
Gender ENUM('M','F'),
Boolean_Valu boolean,
Dob Date,
Fees numeric(7,2) NOT NULL
);
insert into Admission (Name,Gender,Boolean_Valu,Dob,Fees)values('Raj','M',true,'1990-07-12',50000);
insert into Admission (Name,Gender,Boolean_Valu,Dob,Fees)values('Rani','F',false,'1994-05-10',15000);
select * from admission;
Solution 8 - Sql
I would go with Option 3 but multiple NON NULLABLE bit columns instead of one. IsMale (1=Yes / 0=No) IsFemale (1=Yes / 0=No)
if requried: IsUnknownGender (1=Yes / 0=No) and so on...
This makes for easy reading of the definitions, easy extensibility, easy programmability, no possibility of using values outside the domain and no requirement of a second lookup table+FK or CHECK constraints to lock down the values.
EDIT: Correction, you do need at least one constraint to ensure the set flags are valid.