I am trying to get the max value from a panda dataframe as a whole. I am not interested in what row or column it came from. I am just interested in a single max value within the DataFrame.

Here is my DataFrame:

df = pd.DataFrame({'group1': ['a','a','a','b','b','b','c','c','d','d','d','d','d'],'group2': ['c','c','d','d','d','e','f','f','e','d','d','d','e'],'value1': [1.1,2,3,4,5,6,7,8,9,1,2,3,4],'value2': [7.1,8,9,10,11,12,43,12,34,5,6,2,3]})

This is what it looks like:

 group1 group2 value1 value20 a c 1.1 7.11 a c 2.0 8.02 a d 3.0 9.03 b d 4.0 10.04 b d 5.0 11.05 b e 6.0 12.06 c f 7.0 43.07 c f 8.0 12.08 d e 9.0 34.09 d d 1.0 5.010 d d 2.0 6.011 d d 3.0 2.012 d e 4.0 3.0

Expected output:

43.0

I was under the assumption that df.max() would do this job but it returns a max value for each column but I am not interested in that. I need the max from an entire dataframe.

7

Best Answer


The max of all the values in the DataFrame can be obtained using df.to_numpy().max(), or for pandas < 0.24.0 we use df.values.max():

In [10]: df.to_numpy().max()Out[10]: 'f'

The max is f rather than 43.0 since, in CPython2,

In [11]: 'f' > 43.0Out[11]: True

In CPython2, Objects of different types ... areordered by their type names. So any str compares as greater than any int since 'str' > 'int'.

In Python3, comparison of strings and ints raises a TypeError.


To find the max value in the numeric columns only, use

df.select_dtypes(include=[np.number]).max()

Hi the simplest answer is the following.Answer:

df.max(numeric_only=True).max()

Explanation:
series = df.max() give you a Series containing the maximum values for each column.
Therefore series.max()gives you the maximum for the whole dataframe.

numeric_only is required when strings are involved; as @unutbu's answer points out, the result for the OP's question would otherwise be f in python 2 and TypeError in python 3.

An alternative way:

df.melt().value.max()

Essentially melt() transforms the DataFrame into one long column.

using numpy max

np.max(df.values) 

or

 np.nanmax(df.values)

or in pandas

df.values.max()

For the max, check the previous answer...For the max of the values use e.g.:

val_cols = [c for c in df.columns if c.startswith('val')]print df[val_cols].max()

Max can be found in these two steps:

maxForRow = allData.max(axis=0) #max for each rowglobalMax = maxForRow.max(); #max across all rows

This answer is probably best for the general case, but if you need speed, and your values are limited to known columns, specifying your columns first will use less cpu cycles. For example:

df[['value1', 'value2']].max(numeric_only=True).max()

You can drop the numeric_only if the specified columns are known to contain only numbers.