Search in lists of lists by given index

PythonList

Python Problem Overview


I have a list of two-item lists and need to search for things in it.

If the list is:

list = [['a','b'], ['a','c'], ['b','d']]

I can search for a pair easily by doing

['a','b'] in list

Now, is there a way to see if I have a pair in which a string is present in just the second position? I can do this:

for i in range (0, len(list)):
    if list[i][1]==search:
       found=1

But is there a (better) way without the for loop? I don't need to know i or keep the loop going after it's found.

Python Solutions


Solution 1 - Python

Here's the Pythonic way to do it:

data = [['a','b'], ['a','c'], ['b','d']]
search = 'c'
any(e[1] == search for e in data)

Or... well, I'm not going to claim this is the "one true Pythonic way" to do it because at some point it becomes a little subjective what is Pythonic and what isn't, or which method is more Pythonic than another. But using any() is definitely more typical Python style than a for loop as in e.g. RichieHindle's answer,

Of course there is a hidden loop in the implementation of any, although it breaks out of the loop as soon as it finds a match.


Since I was bored I made a timing script to compare performance of the different suggestions, modifying some of them as necessary to make the API the same. Now, we should bear in mind that fastest is not always best, and being fast is definitely not the same thing as being Pythonic. That being said, the results are... strange. Apparently for loops are very fast, which is not what I expected, so I'd take these with a grain of salt without understanding why they've come out the way they do.

Anyway, when I used the list defined in the question with three sublists of two elements each, from fastest to slowest I get these results:

  1. RichieHindle's answer with the for loop, clocking in at 0.22 μs
  2. Terence Honles' first suggestion which creates a list, at 0.36 μs
  3. Pierre-Luc Bedard's answer (last code block), at 0.43 μs
  4. Essentially tied between Markus's answer and the for loop from the original question, at 0.48 μs
  5. Coady's answer using operator.itemgetter(), at 0.53 μs
  6. Close enough to count as a tie between Alex Martelli's answer with ifilter() and Anon's answer, at 0.67 μs (Alex's is consistently about half a microsecond faster)
  7. Another close-enough tie between jojo's answer, mine, Brandon E Taylor's (which is identical to mine), and Terence Honles' second suggestion using any(), all coming in at 0.81-0.82 μs
  8. And then user27221's answer using nested list comprehensions, at 0.95 μs

Obviously the actual timings are not meaningful on anyone else's hardware, but the differences between them should give some idea of how close the different methods are.

When I use a longer list, things change a bit. I started with the list in the question, with three sublists, and appended another 197 sublists, for a total of 200 sublists each of length two. Using this longer list, here are the results:

  1. RichieHindle's answer, at the same 0.22 μs as with the shorter list
  2. Coady's answer using operator.itemgetter(), again at 0.53 μs
  3. Terence Honles' first suggestion which creates a list, at 0.36 μs
  4. Another virtual tie between Alex Martelli's answer with ifilter() and Anon's answer, at 0.67 μs
  5. Again a close-enough tie between my answer, Brandon E Taylor's identical method, and Terence Honles' second suggestion using any(), all coming in at 0.81-0.82 μs

Those are the ones that keep their original timing when the list is extended. The rest, which don't, are

  1. The for loop from the original question, at 1.24 μs
  2. Terence Honles' first suggestion which creates a list, at 7.49 μs
  3. Pierre-Luc Bedard's answer (last code block), at 8.12 μs
  4. Markus's answer, at 10.27 μs
  5. jojo's answer, at 19.87 μs
  6. And finally user27221's answer using nested list comprehensions, at 60.59 μs

Solution 2 - Python

You're always going to have a loop - someone might come along with a clever one-liner that hides the loop within a call to map() or similar, but it's always going to be there.

My preference would always be to have clean and simple code, unless performance is a major factor.

Here's perhaps a more Pythonic version of your code:

data = [['a','b'], ['a','c'], ['b','d']]
search = 'c'
for sublist in data:
    if sublist[1] == search:
        print "Found it!", sublist
        break
# Prints: Found it! ['a', 'c']

It breaks out of the loop as soon as it finds a match.

(You have a typo, by the way, in ['b''d'].)

Solution 3 - Python

>>> the_list =[ ['a','b'], ['a','c'], ['b''d'] ]
>>> any('c' == x[1] for x in the_list)
True

Solution 4 - Python

the above all look good

but do you want to keep the result?

if so...

you can use the following

result = [element for element in data if element[1] == search]

then a simple

len(result)

lets you know if anything was found (and now you can do stuff with the results)

of course this does not handle elements which are length less than one (which you should be checking unless you know they always are greater than length 1, and in that case should you be using a tuple? (tuples are immutable))

if you know all items are a set length you can also do:

any(second == search for _, second in data)

or for len(data[0]) == 4:

any(second == search for _, second, _, _ in data)

...and I would recommend using

for element in data:
   ...

instead of

for i in range(len(data)):
   ...

(for future uses, unless you want to save or use 'i', and just so you know the '0' is not required, you only need use the full syntax if you are starting at a non zero value)

Solution 5 - Python

>>> my_list =[ ['a', 'b'], ['a', 'c'], ['b', 'd'] ]
>>> 'd' in (x[1] for x in my_list)
True

Editing to add:

Both David's answer using any and mine using in will end when they find a match since we're using generator expressions. Here is a test using an infinite generator to show that:

def mygen():
    ''' Infinite generator '''
    while True:
        yield 'xxx'  # Just to include a non-match in the generator
        yield 'd'

print 'd' in (x for x in mygen())     # True
print any('d' == x for x in mygen())  # True
# print 'q' in (x for x in mygen())     # Never ends if uncommented
# print any('q' == x for x in mygen())  # Never ends if uncommented

I just like simply using in instead of both == and any.

Solution 6 - Python

What about:

list =[ ['a','b'], ['a','c'], ['b','d'] ]
search = 'b'

filter(lambda x:x[1]==search,list)

This will return each list in the list of lists with the second element being equal to search.

Solution 7 - Python

Markus has one way to avoid using the word for -- here's another, which should have much better performance for long the_lists...:

import itertools
found = any(itertools.ifilter(lambda x:x[1]=='b', the_list)

Solution 8 - Python

Nothing wrong with using a gen exp, but if the goal is to inline the loop...

>>> import itertools, operator
>>> 'b' in itertools.imap(operator.itemgetter(1), the_list)
True

Should be the fastest as well.

Solution 9 - Python

k old post but no one use list expression to answer :P

list =[ ['a','b'], ['a','c'], ['b','d'] ]
Search = 'c'

# return if it find in either item 0 or item 1
print [x for x,y in list if x == Search or y == Search]

# return if it find in item 1
print [x for x,y in list if y == Search]

Solution 10 - Python

>>> the_list =[ ['a','b'], ['a','c'], ['b','d'] ]
>>> "b" in zip(*the_list)[1]
True

zip() takes a bunch of lists and groups elements together by index, effectively transposing the list-of-lists matrix. The asterisk takes the contents of the_list and sends it to zip as arguments, so you're effectively passing the three lists separately, which is what zip wants. All that remains is to check if "b" (or whatever) is in the list made up of elements with the index you're interested in.

Solution 11 - Python

I was searching for a deep find for dictionaries and didn't find one. Based on this article I was able to create the following. Thanks and Enjoy!!

def deapFind( theList, key, value ):
	result = False
    for x in theList:
	    if( value == x[key] ):
		    return True
    return result

theList = [{ "n": "aaa", "d": "bbb" }, { "n": "ccc", "d": "ddd" }]
print 'Result: ' + str (deapFind( theList, 'n', 'aaa'))

I'm using == instead of the in operator since in returns true for partial matches. IOW: searching aa on the n key returns true. I don't think that would be desired.

HTH

Solution 12 - Python

I think using nested list comprehensions is the most elegant way to solve this, because the intermediate result is the position where the element is. An implementation would be:

list =[ ['a','b'], ['a','c'], ['b','d'] ]
search = 'c'
any([ (list.index(x),x.index(y)) for x in list for y in x if y == search ] )

Solution 13 - Python

Given below is a simple way to find exactly where in the list the item is.

for i in range (0,len(a)):
sublist=a[i]
for i in range(0,len(sublist)):
    if search==sublist[i]:
        print "found in sublist "+ "a"+str(i)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiongreyeView Question on Stackoverflow
Solution 1 - PythonDavid ZView Answer on Stackoverflow
Solution 2 - PythonRichieHindleView Answer on Stackoverflow
Solution 3 - PythonBrandon E TaylorView Answer on Stackoverflow
Solution 4 - PythonTerence HonlesView Answer on Stackoverflow
Solution 5 - PythonAnonView Answer on Stackoverflow
Solution 6 - PythonjojoView Answer on Stackoverflow
Solution 7 - PythonAlex MartelliView Answer on Stackoverflow
Solution 8 - PythonA. CoadyView Answer on Stackoverflow
Solution 9 - PythonPierre-Luc BedardView Answer on Stackoverflow
Solution 10 - PythonMarkusView Answer on Stackoverflow
Solution 11 - PythonKeithView Answer on Stackoverflow
Solution 12 - Pythonuser27221View Answer on Stackoverflow
Solution 13 - PythonDiliup GabadamudaligeView Answer on Stackoverflow