Filtering Data¶
Query filtering¶
The .get method when operating on a Pandas DataFrame provides
keyword-style filtering and an optional lazy evaluation mode. Filters are
applied remotely inside the database and thus perform much faster than if
filtered in the returned dataframe.
om.datasets.get('foodf', x__gt=5)
=>
x
6 6
7 7
8 8
9 9
The filter syntax is <column>__<operator>=<value>, where the operator
is one of the following:
eqcompare equal (this is also the default, when using the short form, i.e.<column>=<value>gtgreator thangtegreater or equalltless thanlteless or equalbetweenbetween two values, specifyvalueas a 2-tuplecontainscontains a value, specifyvalueas a sequencestartswithstarts with a stringendswithends with a stringisnullis a null value, specifyvalueas a boolean
In general get returns a Pandas DataFrame. See the Pandas
documentation for ways to work with DataFrames.
However, unlike Pandas omega|ml provides methods to work with data that is larger than memory. This is covered in the next section.