Understanding __getitem__ method
PythonPython Problem Overview
I have gone through most of the documentation of __getitem__
in the Python docs, but I am still unable to grasp the meaning of it.
So all I can understand is that __getitem__
is used to implement calls like self[key]
. But what is the use of it?
Lets say I have a python class defined in this way:
class Person:
def __init__(self,name,age):
self.name = name
self.age = age
def __getitem__(self,key):
print ("Inside `__getitem__` method!")
return getattr(self,key)
p = Person("Subhayan",32)
print (p["age"])
This returns the results as expected. But why use __getitem__
in the first place? I have also heard that Python calls __getitem__
internally. But why does it do it?
Can someone please explain this in more detail?
Python Solutions
Solution 1 - Python
Cong Ma does a good job of explaining what __getitem__
is used for - but I want to give you an example which might be useful.
Imagine a class which models a building. Within the data for the building it includes a number of attributes, including descriptions of the companies that occupy each floor :
Without using __getitem__
we would have a class like this :
class Building(object):
def __init__(self, floors):
self._floors = [None]*floors
def occupy(self, floor_number, data):
self._floors[floor_number] = data
def get_floor_data(self, floor_number):
return self._floors[floor_number]
building1 = Building(4) # Construct a building with 4 floors
building1.occupy(0, 'Reception')
building1.occupy(1, 'ABC Corp')
building1.occupy(2, 'DEF Inc')
print( building1.get_floor_data(2) )
We could however use __getitem__
(and its counterpart __setitem__
) to make the usage of the Building class 'nicer'.
class Building(object):
def __init__(self, floors):
self._floors = [None]*floors
def __setitem__(self, floor_number, data):
self._floors[floor_number] = data
def __getitem__(self, floor_number):
return self._floors[floor_number]
building1 = Building(4) # Construct a building with 4 floors
building1[0] = 'Reception'
building1[1] = 'ABC Corp'
building1[2] = 'DEF Inc'
print( building1[2] )
Whether you use __setitem__
like this really depends on how you plan to abstract your data - in this case we have decided to treat a building as a container of floors (and you could also implement an iterator for the Building, and maybe even the ability to slice - i.e. get more than one floor's data at a time - it depends on what you need.
Solution 2 - Python
The []
syntax for getting item by key or index is just syntax sugar.
When you evaluate a[i]
Python calls a.__getitem__(i)
(or type(a).__getitem__(a, i)
, but this distinction is about inheritance models and is not important here). Even if the class of a
may not explicitly define this method, it is usually inherited from an ancestor class.
All the (Python 2.7) special method names and their semantics are listed here: https://docs.python.org/2.7/reference/datamodel.html#special-method-names
Solution 3 - Python
The magic method __getitem__
is basically used for accessing list items, dictionary entries, array elements etc. It is very useful for a quick lookup of instance attributes.
Here I am showing this with an example class Person that can be instantiated by 'name', 'age', and 'dob' (date of birth). The __getitem__
method is written in a way that one can access the indexed instance attributes, such as first or last name, day, month or year of the dob, etc.
import copy
# Constants that can be used to index date of birth's Date-Month-Year
D = 0; M = 1; Y = -1
class Person(object):
def __init__(self, name, age, dob):
self.name = name
self.age = age
self.dob = dob
def __getitem__(self, indx):
print ("Calling __getitem__")
p = copy.copy(self)
p.name = p.name.split(" ")[indx]
p.dob = p.dob[indx] # or, p.dob = p.dob.__getitem__(indx)
return p
Suppose one user input is as follows:
p = Person(name = 'Jonab Gutu', age = 20, dob=(1, 1, 1999))
With the help of __getitem__
method, the user can access the indexed attributes. e.g.,
print p[0].name # print first (or last) name
print p[Y].dob # print (Date or Month or ) Year of the 'date of birth'
Solution 4 - Python
As you suggested, like other “dunder methods” in Python, __getitem__
enables the development of container objects with the features of native container types. For example, implementing __len__
in a class will allow one to pass instances of that class to the built in len
. In conjunction with __setitem__
and __delitem__
, the method __getitem__
allows one to perform create-replace-update-delete operations on a container:
x[0] = "bork" # calls __setitem__
y = x[0] # calls __getitem__
del x[0] # calls __delitem__
With that, a specific example that answers your question would be overriding __getitem__
to implement "lazy" dict
subclasses. The aim is to avoid instantiating a dictionary at once that either already has an inordinately large number of key-value pairs in existing containers, or has an expensive/complicated hashing process between existing containers of key-value pairs, such as if the dictionary values are resources that are distributed over the internet.
Suppose you have two lists, keys
and values
, whereby {k:v for k,v in zip(keys, values)}
is the dictionary that you need, which must be made lazy for speed or efficiency purposes:
class LazyDict(dict):
def __init__(self, keys, values):
self.lazy_keys = keys
self.lazy_values = values
super().__init__()
def __getitem__(self, key):
if key not in self:
try:
i = self.lazy_keys.index(key)
self.__setitem__(self.lazy_keys.pop(i), self.lazy_values.pop(i))
except ValueError, IndexError:
raise KeyError("%s not in map" % str(key))
return super().__getitem__(key)
This is a contrived example that makes assumptions about duplicate keys in the input. In the context of subclassing a dictionary to be lazy, always make sure to include logic for dealing with duplicate keys.
Usage:
>>> a = [1,2,3,4]
>>> b = [1,2,2,3]
>>> c = LazyDict(a,b)
>>> c[1]
1
>>> c[4]
3
>>> c[2]
2
>>> c[3]
2
>>> d = LazyDict(a,b)
>>> d.items()
dict_items([])
Solution 5 - Python
For readability and consistency. That question is part of why operator overloading exists, since __getitem__
is one of the functions that implement that.
If you get an unknown class, written by an unknown author, and you want to add its 3rd element to its 5th element, you can very well assume that obj[3] + obj[5]
will work.
What would that line look like in a language that does not support operator overloading?? Probably something like obj.get(3).add(obj.get(5))
?? Or maybe obj.index(3).plus(obj.index(5))
??
The problem with the second approach is that (1) it's much less readable and (2) you can't guess, you have to look up the documentation.
Solution 6 - Python
A common library that uses this technique is the 'email' module. It uses the __getitem__
method in the email.message.Message
class, which in turn is inherited by MIME-related classes.
Then in the and all you need to get a valid MIME-type message with sane defaults is add your headers. There's a lot more going on under the hood but the usage is simple.
message = MIMEText(message_text)
message['to'] = to
message['from'] = sender
message['subject'] = subject
Solution 7 - Python
As a side note, the __getitem__
method also allows you to turn your object into an iterable.
Example: if used with iter()
, it can generate as many int
squared values as you want:
class MyIterable:
def __getitem__(self, index):
return index ** 2
obj = MyIterable()
obj_iter = iter(obj)
for i in range(1000):
print(next(obj_iter))
Solution 8 - Python
Django core has several interesting and nifty usages for magic methods, including __getitem__
. These were my recent finds:
-
Django HTTP Request
-
When you submit GET/POST data in Django, it will be stored in Django's
request
object asrequest.GET
/request.POST
dict. This dict is of type QueryDict which inherits from MultiValueDict. -
When you submit data, say
user_id=42
, QueryDict will be stored/represented as:<QueryDict: {'user_id': ['42']}>
So, the passed data becomes
'user_id': ['42']
instead of the intuitive
'user_id': '42'
MultiValueDict
's docstring explains though why it needs to auto-convert this to list format: > This class exists to solve the irritating problem raised by cgi.parse_qs, which returns a list for every key.. -
Given that the
QueryDict
values are transformed into lists, they will need to be accessed then like this (same idea withrequest.GET
):-
request.POST['user_id'][0]
-
request.POST['user_id'][-1]
-
request.POST.get('user_id')[0]
-
request.POST.get('user_id)[-1]
But, these are horrible ways to access the data. So. Django overridden the
__getitem__
and__get__
inMultiValueDict
. This is the simplified version:def __getitem__(self, key): """ Accesses the list value automatically using the `-1` list index. """ list_ = super().__getitem__(key) return list_[-1] def get(self, key, default=None): """ Just calls the `__getitem__` above. """ return self[key]
With these, you could now have a more intuitive accessors:
request.POST['user_id']
request.POST.get('user_id')
-
-
-
Django Forms
-
In Django, you could declare forms like this (includes
ModelForm
):class ArticleForm(...): title = ...
-
These forms inherit from BaseForm, and have these overridden magic methods (simplified version):
def __iter__(self): for name in self.fields: yield self[name] def __getitem__(self, name): return self.fields[name]
resulting to these convenient patterns:
# Instead of `for field in form.fields`. # This is a common pattern in Django templates. for field in form ... # Instead of `title = form.fields['title']` title = form['title']
-
In summary, magic methods (or their overrides) increase code readability and developer experience/convenience.