Filtering Data

Query filtering

The .get method when operating on a Pandas DataFrame provides keyword-style filtering and an optional lazy evaluation mode. Filters are applied remotely inside the database and thus perform much faster than if filtered in the returned dataframe.

om.datasets.get('foodf', x__gt=5)
=>
    x
 6  6
 7  7
 8  8
 9  9

The filter syntax is <column>__<operator>=<value>, where the operator is one of the following:

  • eq compare equal (this is also the default, when using the short form, i.e. <column>=<value>

  • gt greator than

  • gte greater or equal

  • lt less than

  • lte less or equal

  • between between two values, specify value as a 2-tuple

  • contains contains a value, specify value as a sequence

  • startswith starts with a string

  • endswith ends with a string

  • isnull is a null value, specify value as a boolean

In general get returns a Pandas DataFrame. See the Pandas documentation for ways to work with DataFrames.

However, unlike Pandas omega|ml provides methods to work with data that is larger than memory. This is covered in the next section.