What is the practical difference between data.frame and data.table in R

Rdata.table

R Problem Overview


Apparently in my last question I demonstrated confusion between data.frame and data.table. Admittedly, I didn't realize there was a distinction.

So I read the help for each but in practical, everyday terms, what is the difference, what are the implications and what are each used for that would help guide me to their appropriate usage?

R Solutions


Solution 1 - R

While this is a broad question, if someone is new to R this can be confusing and the distinction can get lost.

All data.tables are also data.frames. Loosely speaking, you can think of data.tables as data.frames with extra features.

data.frame is part of base R.

data.table is a package that extends data.frames. Two of its most notable features are speed and cleaner syntax.

However, that syntax sugar is different from the standard R syntax for data.frame while being hard for the untrained eye to distinguish at a glance. Therefore, if you read a code snippet and there is no other context to indicate you are working with data.tables and try to apply the code to a data.frame it may fail or produce unexpected results. (a clear giveaway that you are working with d.t's, besides the library/require call is the presence of the assignment operator := which is unique to d.t)

With all that being said, I think it is hard to actually appreciate the beauty of data.table without experiencing the shortcomings of data.frame. (for example, see the first 3 bullet points of @eddi's answer). In other words, I would very much suggest learning how to work with and manipulate data.frames first then move on to data.tables.

Solution 2 - R

A few differences in my day to day life that come to mind (in no particular order):

  • not having to specify the data.table name over and over (leading to clumsy syntax and silly mistakes) in expressions (on the flip side I sometimes miss the TAB-completion of names)
  • much faster and very intuitive by operations
  • no more frantically hitting Ctrl-C after typing df, forgetting how large df was (also leading to almost never using head)
  • faster and better file reading with fread
  • the package also provides a number of other utility functions, like %between% or rbindlist that make life better
  • faster everything else, since a lot of data.frame operations copy the entire thing needlessly

Solution 3 - R

They are similar. Data frames are lists of vectors of equal length while data tables (data.table) is an inheritance of data frames. Therefore data tables are data frames but data frames are not necessarily data tables. The data tables package and function were written to enhance the speed of indexing, ordered joins, assignment, grouping and listing columns (etc.).

See http://datatable.r-forge.r-project.org/datatable-intro.pdf for more information.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionremarkableearthView Question on Stackoverflow
Solution 1 - RRicardo SaportaView Answer on Stackoverflow
Solution 2 - ReddiView Answer on Stackoverflow
Solution 3 - REllis ValentinerView Answer on Stackoverflow