Everything is related to the compromise between exploitation and exploration.
Gradient Descend uses all the data to update the weights which implies a better update. In neural networks Batch Gradient Descend is used because the original is not applicable to practice. Instead Stochastic Gradient Descend only uses a single example and that adds noise. With BGD and GD you exploit more data.
With SGD you can avoid minimum locations because by using a single example you benefit the exploration and you can come up with other solutions that with BGD you could not, that implies noise. SGD you explore more.
BGD, takes a dataset and breaks it into N chunks where each chunk has B samples (B is the batch_size). That forces you to go through all the data in the dataset.