I have some sentences like this one.

c = "In Acid-base reaction (page[4]), why does it create water and not H+?" 

I want to remove all special characters except for '?&+-/

I know that if I want to remove all special characters, I can simply use

gsub("[[:punct:]]", "", c)"In Acidbase reaction page4 why does it create water and not H"

However, some special characters such as + - ? are also removed, which I intend to keep.

I tried to create a string of special characters that I can use in some code like this

gsub("[special_string]", "", c)

The best I can do is to come up with this

cat("!\"#$%()*,.:;<=>@[\\]^_`{|}~.")

However, the following code just won't work

gsub("[cat("!\"#$%()*,.:;<=>@[\\]^_`{|}~.")]", "", c)

What should I do to remove special characters, except for a few that I want to keep?

Thanks

3

Best Answer


gsub("[^[:alnum:][:blank:]+?&/\\-]", "", c)# [1] "In Acid-base reaction page4 why does it create water and not H+?"

In order to get your method to work, you need to put the literal "]" immediately after the leading "["

 gsub("[][!#$%()*,.:;<=>@^_`|~.{}]", "", c)[1] "In Acid-base reaction page4 why does it create water and not H+?"

You can them put the inner "[" anywhere. If you needed to exclude minus, it would then need to be last. See the ?regex page after all of those special pre-defined character classes are listed.

I think you're after a regex solution. I'll give you a messy solution and a package add on solution (shameless self promotion).

There's likely a better regex:

x <- "In Acid-base reaction (page[4]), why does it create water and not H+?" keeps <- c("+", "-", "?")## Regex solutiongsub(paste0(".*?($|'|", paste(paste0("\\", keeps), collapse = "|"), "|[^[:punct:]]).*?"), "\\1", x)#qdap: addon package solutionlibrary(qdap)strip(x, keeps, lower = FALSE)## [1] "In Acid-base reaction page why does it create water and not H+?"