Problem
I made a function to censor a certain word from a piece of text. Is there anything I should fix or do more efficiently? What are some other approaches I could have taken to detecting words and censoring them?(This took me a lot of time to brainstorm)
Also, I would appreciate it if you told me how you brainstorm how to write a program.
def detect_word_start(text,word):
count = 0
text_count = 0
for x in text:
text_count+=1
if x == word[count]:
count+=1
if count == len(word):
return text_count - len(word) + 1
else:
count = 0
def censor(text,word):
word_start = detect_word_start(text,word)
censored_text = ''
count = 0
for x in text:
count += 1
if count >= word_start and count < word_start+len(word):
censored_text += '*'
else:
censored_text += x
print(censored_text)
Solution
Bugs
The censor()
function prints its progress as a triangle of text, as censored_text
lengthens one character at a time. Did you accidentally indent the print()
statement one level too far? In any case, it would be better practice to return
its output, so that the caller can choose to print()
it instead (or do something else with it).
“Censoring” implies that all appearances of the word
should be replaced. However, your censor()
function only replaces the first occurrence, since count
is never reset to 0.
Your detect_word_start(text, word)
function is basically text.index(word)
, with some differences:
-
If the
word
is not found, it returnsNone
, whereastext.index(word)
would raise aValueError
exception. Since thecensor()
function doesn’t handle the possibility that theword
never appears in thetext
, your code could crash. (I would expect thecensor()
function to return thetext
unmodified, if there is nothing to redact.) -
It returns a one-based index, which is very unconventional in Python, where indexes start from 0. (Your
censor()
function also counts characters starting from 1, so at least you are consistent.) -
It misses some possible results because it makes one pass through the
text
, and will not backtrack. For example,detect_word_start('coconut', 'con')
anddetect_word_start('nonoperational', 'nope')
both returnNone
.That might be acceptable behaviour, if you are only searching for complete space-delimited words. However, from the way the code was written, it appears that this behaviour was unintended.
More expressive and efficient code
Counting loops are usually better written using enumerate()
.
Python supports chained comparisons:
Comparisons can be chained arbitrarily, e.g.,
x < y <= z
is equivalent tox < y and y <= z
…
Here is the same code, preserving your 1-based indexing convention and other censorship bugs:
def censor(text, word):
word_start = detect_word_start(text, word)
censored_text = ''
for i, char in enumerate(text, 1):
if word_start <= i < word_start + len(word):
censored_text += '*'
else:
censored_text += x
return censored_text
We can save a few lines using a conditional expression. (I know, it looks worse, but there’s a reason that you’ll see…)
def censor(text, word):
word_start = detect_word_start(text, word)
censored_text = ''
for i, char in enumerate(text):
censored_text += '*' if (word_start <= i < word_start + len(word)) else x
return censored_text
There is an efficiency problem with your code. Python strings are immutable. Therefore, building strings using +=
concatenation is not optimal:
Concatenating immutable sequences always results in a new object. This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length.
A way to avoid that performance problem would be to construct the result using ''.join(…)
along with a generator expression:
def censor(text, word):
word_start = detect_word_start(text, word)
return ''.join(
'*' if word_start <= i < word_start + len(word) else x
for i, char in enumerate(text)
)
Simple solution
There is a very simple solution, using str.replace()
and the *
operation on a string:
def censor(text, word):
return text.replace(word, '*' * len(word))
Let’s start by removing print from the loop and add a return value
def censor(text,word):
#[...]
else:
censored_text += x
return censored_text
now we have a testable function. we add some tests
import unittest
class TestSome(unittest.TestCase):
def test_not_found(self):
self.assertEqual(censor("", "bar"), "")
self.assertEqual(censor("f", "bar"), "f")
self.assertEqual(censor("foo", "bar"), "foo")
self.assertEqual(censor("fooo", "bar"), "fooo")
def test_found(self):
self.assertEqual(censor("bar", "bar"), "***")
self.assertEqual(censor("bar!", "bar"), "***!")
self.assertEqual(censor("cow bar", "bar"), "cow ***")
def test_parts(self):
self.assertEqual(censor("foobar", "bar"), "foobar")
self.assertEqual(censor("bare", "bar"), "bare")
def test_capital(self):
self.assertEqual(censor("Bar", "bar"), "***")
def test_multiple(self):
self.assertEqual(censor("foo bar bar foo", "bar"), "foo *** *** foo")
if __name__ == '__main__':
unittest.main()
Running the tests show that the functions are not meeting my expectations.
detect_word_start()
does not return a value if nothing found- you miss words at the beginning of a sentence becaue of capitalization
- you censor different words containing the character sequence
- you miss multiple appearences