Suppose you have a dataframe df which as many columns including 'Age' which is an integer. Now let's say you want to drop all the rows with 'Age' as negative number.
df_age_negative = df[ df['Age'] < 0 ] # Step 1df = df.drop(df_age_negative.index, axis=0) # Step 2
Hope this is much simpler and helps you.
You can also pass to DataFrame.drop the label itself (instead of Series of index labels):
In[17]: dfOut[17]: a b c d eone 0.456558 -2.536432 0.216279 -1.305855 -0.121635two -1.015127 -0.445133 1.867681 2.179392 0.518801In[18]: df.drop('one')Out[18]: a b c d etwo -1.015127 -0.445133 1.867681 2.179392 0.518801
Which is equivalent to:
In[19]: df.drop(df.index[[0]])Out[19]: a b c d etwo -1.015127 -0.445133 1.867681 2.179392 0.518801
If I want to drop a row which has let's say index x
, I would do the following:
df = df[df.index != x]
If I would want to drop multiple indices (say these indices are in the list unwanted_indices
), I would do:
desired_indices = [i for i in len(df.index) if i not in unwanted_indices]desired_df = df.iloc[desired_indices]
Here is a bit specific example, I would like to show. Say you have many duplicate entries in some of your rows. If you have string entries you could easily use string methods to find all indexes to drop.
ind_drop = df[df['column_of_strings'].apply(lambda x: x.startswith('Keyword'))].index
And now to drop those rows using their indexes
new_df = df.drop(ind_drop)
Use only the Index arg to drop row:-
df.drop(index = 2, inplace = True)
For multiple rows:-
df.drop(index=[1,3], inplace = True)
In a comment to @theodros-zelleke's answer, @j-jones asked about what to do if the index is not unique. I had to deal with such a situation. What I did was to rename the duplicates in the index before I called drop()
, a la:
dropped_indexes = <determine-indexes-to-drop>df.index = rename_duplicates(df.index)df.drop(df.index[dropped_indexes], inplace=True)
where rename_duplicates()
is a function I defined that went through the elements of index and renamed the duplicates. I used the same renaming pattern as pd.read_csv()
uses on columns, i.e., "%s.%d" % (name, count)
, where name
is the name of the row and count
is how many times it has occurred previously.
Determining the index from the boolean as described above e.g.
df[df['column'].isin(values)].index
can be more memory intensive than determining the index using this method
pd.Index(np.where(df['column'].isin(values))[0])
applied like so
df.drop(pd.Index(np.where(df['column'].isin(values))[0]), inplace = True)
This method is useful when dealing with large dataframes and limited memory.
To drop rows with indices 1, 2, 4 you can use:
df[~df.index.isin([1, 2, 4])]
The tilde operator ~
negates the result of the method isin
. Another option is to drop indices:
df.loc[df.index.drop([1, 2, 4])]
Look at the following dataframe df
df
column1 column2 column30 1 11 211 2 12 222 3 13 233 4 14 244 5 15 255 6 16 266 7 17 277 8 18 288 9 19 299 10 20 30
Lets drop all the rows which has an odd number in column1
Create a list of all the elements in column1 and keep only those elements that are even numbers (the elements that you dont want to drop)
keep_elements = [x for x in df.column1 if x%2==0]
All the rows with the values [2, 4, 6, 8, 10]
in its column1 will be retained or not dropped.
df.set_index('column1',inplace = True)df.drop(df.index.difference(keep_elements),axis=0,inplace=True)df.reset_index(inplace=True)
We make the column1 as index and drop all the rows that are not required. Then we reset the index back.df
column1 column2 column30 2 12 221 4 14 242 6 16 263 8 18 284 10 20 30
As Dennis Golomazov's answer suggests, using drop
to drop rows. You can select to keep rows instead. Let's say you have a list of row indices to drop called indices_to_drop
. You can convert it to a mask as follows:
mask = np.ones(len(df), bool)mask[indices_to_drop] = False
You can use this index directly:
df_new = df.iloc[mask]
The nice thing about this method is that mask
can come from any source: it can be a condition involving many columns, or something else.
The really nice thing is, you really don't need the index of the original DataFrame at all, so it doesn't matter if the index is unique or not.
The disadvantage is of course that you can't do the drop in-place with this method.
Consider an example dataframe
df = index column10 001 102 203 30
we want to drop 2nd and 3rd index rows.
Approach 1:
df = df.drop(df.index[2,3])or df.drop(df.index[2,3],inplace=True)print(df)df = index column10 003 30#This approach removes the rows as we wanted but the index remains unordered
Approach 2
df.drop(df.index[2,3],inplace=True,ignore_index=True)print(df)df = index column10 001 30#This approach removes the rows as we wanted and resets the index.
This worked for me
# Create a list containing the index numbers you want to removeindex_list = list(range(42766, 42798))df.drop(df.index[index_list], inplace =True)df.shape
This should drop all indexes within that created range