I wrote a program in Flask to get input from users to enter the lengths and widths to predict the fish type but as soon as I enter it shows an error known as

UserWarning: X does not have valid feature names, but LogisticRegression was fitted with feature names
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LogisticRegressiondf=pd.read_csv('Fish.csv')df.head()X = df.drop('Species', axis=1)y = df['Species']cols = X.columnsindex = X.indexfrom sklearn.model_selection import train_test_splitX_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)from sklearn.ensemble import RandomForestClassifierrandom=RandomForestClassifier()random.fit(X_train,y_train)y_pred=random.predict(X_test)from sklearn.metrics import accuracy_scorescore=accuracy_score(y_test,y_pred)# Create a Pickle file import picklepickle_out = open("model.pkl","wb")pickle.dump(logistic_model, pickle_out)pickle_out.close()logistic_model.predict([[242.0,23.2,25.4,30.0,11.5200,4.0200]])
import numpy as npimport pickleimport pandas as pdfrom flask import Flask, request, jsonify, render_templateapp=Flask(__name__)pickle_in = open("model.pkl","rb")random = pickle.load(pickle_in)@app.route('/')def home():return render_template('index.html')@app.route('/predict',methods=["POST"])def predict():"""For rendering results on HTML GUI"""int_features = [x for x in request.form.values()]final_features = [np.array(int_features)]prediction = random.predict(final_features)return render_template('index.html', prediction_text = 'The fish belongs to species {}'.format(str(prediction)))if __name__=='__main__':app.run()

Data Sethttps://www.kaggle.com/datasets/aungpyaeap/fish-market

2

Best Answer


I also faced same warning:UserWarning: X does not have valid feature names, but LogisticRegression was fitted with feature names.

This warning actually saying while fitting data to our model during model.fit(), that dataframe X_train has got attribute names but while you are trying to predict using dataframe or numpy array converted into row vector, you're not providing features/attribute names to that tuples to which you want to do prediction.

For understanding clearly what i meant to say, just see sample image below:Click here to see example image

Hope this might help beginners while doing prediction on unseen data by model

Your X and y is a pandas dataframe. Before fitting it to Random forest classifier make it a numpy array like,

X = X.valuesy = y.values

After this do the train test split,

from sklearn.model_selection import train_test_splitX_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)

Now Fit the model (the code is same as yours below),

from sklearn.ensemble import RandomForestClassifierrandom = RandomForestClassifier()random.fit(X_train,y_train)y_pred=random.predict(X_test)

In the flask app, you are giving input in numpy array but during the training you have pandas dataframe that's why that warning was raised. Now, it should work properly!