Problem
class text_generator
{
public int getWordIndex(List<string> source, Random value)
{
return value.Next(0, source.Count - 1);
}
public bool checkListLength(List<string> source)
{
return source.Count == 0;
}
public string getText(List<string> source, List<string> backup_source, Random value)
{
if (checkListLength(source))
{
source.AddRange(backup_source);
}
;
int index = getWordIndex(source, value);
string result = source[index];
source.RemoveAt(index);
return result;
}
}
Then I open a main list and an empty list.
text_generator textix = new text_generator();
List<string> hi = new List<string> { "Hi", "Howdy", "Hey", "etc" };
List<string> work_hi = new List<string>();
And… Generate. They will always be different until all elements will be used.
Random rand = new Random();
Console.WriteLine(textix.getText(work_hi, hi, rand));
My question is: While this code works fine, it seems a bit long. Is it possible to make the same with just one method? Is it possible NOT to open one more list? How can I do it?
Solution
There is lots that can be improved here. I won’t repeat the great points that others have already made. Some thoughts:
-
Why are all these methods public? Do you intend that CheckListLength is going to be called by code external to this class? Only make things public that are part of the public surface of the class.
-
Why are any of these methods instance methods? There is no instance data. They could all be static.
-
The contract of the primary method is bizarre. Summing up: you require that the user pass in two lists. If the first is empty, it becomes a copy of the second. The second list is then never touched again. Random items are chosen from the first list, and removed. Think about how many ways there are for that to go horribly wrong. What if the user passes in two completely different lists? What if they pass in two references to the same list? This is a disaster waiting to happen.
-
Consider the cost. Suppose the method is used as it is supposed to be: we pass in an empty list and a list with N items. We wish to get out the N items in a random order. Now, I will tell you this crucial fact: the cost of removing an item from a list at random is on average proportional to the number of items in the list. That’s because lists are packed. If you have a million items in a list and you remove item number 123, then item 124 has to be moved into slot 123, item 125 has to be moved into slot 124, and so on up to item 1000000 being moved into slot 999999. You are performing that operation N times, and the cost each time is proportional to N. If you work out the math you’ll find that on average, if you use this method to fully randomize a list, then you are performing about N2/2 operations. That’s nothing if N is 4, but it is a trillion if N is a million; you will wait a long time to get your list randomized.
-
Your goal of producing a random sequence without modifying the original list is a really good one. There are two far easier ways to do that, and I encourage you to try both of them. They are:
First: make a method with this signature:
public static List<T> ProduceShuffle<T>(this IList<T> original)
You can have the method take a Random
if you like as well.
the action of the method is:
- Make a copy of the original list.
- Loop from the end of the list to the beginning of the list
- Each time through the loop, choose a random number less than or equal to the current index. This is key; you must not choose a random number covering the entire size of the list, otherwise the shuffle is biased.
- Swap the element at the current index with the element at the randomly chosen index.
This is the Knuth Shuffle, also called the Fischer-Yates shuffle, and it is very efficient. Can you implement the code from my description? Can you produce an argument that this shuffle is both efficient and unbiased?
The second way to do it is much shorter but slightly less efficient.
public static List<T> ProduceShuffle<T>(this IEnumerable<T> original, Random random)
{
return original.OrderBy(x => random.NextDouble()).ToList();
}
That is: generate a random fraction associated with each element. Produce a sorted sequence — without modifying the original sequence — that is sorted by that random element. Then turn that thing into a list and return it.
I note that some people on StackOverflow will tell you to sort by a new guid rather than by a random number because guids are random. This is a bad idea. Guids are unique; that guids happen to gain their uniqueness through randomness is an implementation detail subject to change. Use guids as a source of uniqueness, not as a source of randomness.
Yes. It’s possible to write this code cleanly and succinctly – and I can imagine implementing it as a single function (in an appropriately named class) or a small class with significantly more functionality.
However I don’t believe in giving an answer. What I have done is provided notes about how one could improve this code, and at the end a different way of thinking about the problem. It’s now your job to learn from what has been written, apply it to your work, and come up with an improved solution.
Good luck!
Any
public bool checkListLength(List<string> source)
{
return source.Count == 0;
}
This method is somewhat pointless – it’s longer than the logic and not clearly named. Instead, consider using ‘source.Count == 0’ or an extension method such as ‘source.Any()’.
Params
public string getText(List<string> source, List<string> backup_source...
This is where the ‘params’ keyword can come in useful:
public string getText(Random rand, params List<string>[] sources)
You can use it as such:
...getText(rand, firstList);
...getText(rand, firstList, secondList, thirdList);
Private Field
You keep passing a Random
around. This creates more work for everyone, and doesn’t server a clearly defined purpose. The generator should be responsible for maintaining the Random
.
class text_generator
{
Random rand = new Random();
....
}
Alternatively, one could create a static Random class that handles it for the entire project.
public static class RandomManager
{
Random rand = new Random();
public static int Next(int min, int max)
{
return rand.Next(min,max);
}
....
}
Which would be used like so:
RandomManager.Next(0,10);
CamelCase
We tend to use CamelCase in C#. UpperCamelCase for Property, Method, Class and NameSpaces, and lowerCameCase for fields, parameter and variable names.
text_generator
is a class, so TextGenerator
.
getText
is a method name so GetText
.
index
is a variable, so index
is correct.
source
is a parameter, so source
is correct.
Naming
As mentioned checkListLength
is a long name that doesn’t describe what it does – maybe IsEmpty
would have been more appropriate. You’ve called this text_generator
… and yet it doesn’t generate any text. Maybe RandomDialog
might have been a better name. With this class name RandomDialog.getText(...)
might be better named RandomDialog.FromLists(..)
.
Correctness
The code as is will throw an exception if the provided lists are null or empty… you either need to handle these statements or through a more informational guard statement yourself. You should also document this through comments.
/// <summary>
/// This function....
/// </summary>
/// <param name="sources">Sources is ... and must have a valid item to retrieve or an ArgumentException will be thrown!</param>
/// <returns>What this method returns.</returns>
public string GetText(params List<string>[] sources)
{
if (sources == null || sources.Count == 0)
throw new ArgumentException("Sources cannot be null or empty");
var listWithItems = sources.FirstOrDefault(s => s != null && s.Any());
if (listWithItems == null)
throw new ArgumentException("No items could be found in sources!");
....
}
Intent
A very important thing in programming is being able to describe your intent – which is what you want to do. From your comment:
I Generate random elements from a list without repetition until all of
them are used. Then it starts anew
This is not what your code does. It does not ‘generate’ random items, it retrieves items in a random order and removes them from the incoming list. It does not start anew… it crashes. You either need to restate your intent, change how your code implements this intent, or both.
Side Effects
As @RubberDuck so elegantly puts it:
A method named GetXxx shouldn’t have side effects. It’s surprising to
call a “Get” method and have it modify the objects you send into it.
Look into something called the Principle of Least Astonishment – it’s a great way of thinking!
The principle of least astonishment (POLA) applies to user interface
and software design, from the ergonomics standpoint. It is
alternatively referred to as the law or rule of least astonishment, or
of least surprise. “If a necessary feature has a high astonishment
factor, it may be necessary to redesign the feature.”
Possible Solution
Given what I think you’re looking for, consider using a Queue<T>
. A queue is a data structure that returns to you, in order, items in a once-off manner. Since you want to prove one or more lists, and have the results ordered in a specific manner I would create a method that does this. That is it takes in collection of List<T>
s using the params keyword, and returns a queue populated according to your logic.
If you need to add some more logic – e.g. repopulate this queue once it’s been finished – I would then wrap it in a class to perform this functionality. For bonus points (and for more advanced users) look into interfaces/concepts such as IEnumerable<T>/IEnumerator<T>
to see if you can turn this into a infinite, steamed, deferred, side-effect-free, query-able enumerable 🙂
First, let’s give your methods some vertical breathing space for readability and standardize you indentation (copy paste error when you posted?)
class text_generator
{
public int getWordIndex(List<string> source, Random value)
{
return value.Next(0, source.Count - 1);
}
public bool checkListLength(List<string> source)
{
return source.Count == 0;
}
public string getText(List<string> source, List<string> backup_source, Random value)
{
if (checkListLength(source))
{
source.AddRange(backup_source);
}
int index = getWordIndex(source, value);
string result = source[index];
source.RemoveAt(index);
return result;
}
}
Much nicer to read, right? Anyway. I don’t like using “value” as a parameter name because value
is a keyword.
public int getWordIndex(List<string> source, Random value)
Since value
is the name the language gives to property setter args, I find this a bit confusing to my brain. It keeps thinking your method is a property when it’s not. Why not call it what it is? I’d go with rand
. It’s a pretty standard name and makes the internals of the method make lots more sense.
public int getWordIndex(List<string> source, Random rand)
{
return rand.Next(0, source.Count - 1);
}
As was mentioned by another poster, you can entirely do away with your checkListLength
method by replacing it with Any()
. What wasn’t mentioned was the performance difference. Count
has to iterate the entire List to return a count. Any
bails out as soon as it finds an element. On a list with a large number of items. This can be significant. It might not matter here, but get in the habit of using the right tool for the job now, and it’ll save you a headache when you need to do this on a list with tons and tons of elements.
if (source.Any())
{
source.AddRange(backup_source);
}
Heslacher pointed out to me in chat that I was wrong about this. What I said is only true if you’re working with a plain IEnumerable
. Since you’re working with List
, there’s no performance penalty, but I still recommend using Any()
as using Count
could have a performance penalty if you use it on a plain IEnumerable
. Also, Any()
is a better abstraction than Count == 0
. It does make the intent clearer.
Side Effect Alert!
A method named GetXxx
shouldn’t have side effects. It’s surprising to call a “Get” method and have it modify the objects you send into it.
int index = getWordIndex(source, value);
string result = source[index];
source.RemoveAt(index);
If you want to remove items from the source list, I’d recommend moving it up a level of abstraction. Principle of least surprise and all.
I’m sure you legitimately need to remove it from the list, but you’re modifying the list that’s being sent in from the client code. This is the kind of thing that creates nasty nasty hard to find bugs. Because who would ever think that library’s GetFooBar
method would remove items from the list you give it?! It’s bad design and needs to be rethought.
Perhaps instead, this class could be initialized with a copy of a list to work with instead?
Naming conventions:
Please read the Capitalization Conventions by Microsoft. Class names will Always be capitalized, same goes for method names.
tekst_generator
becomsTextGenerator
getText
becomesFetText
- and so on…
The var
keyword:
From the C# Programming Guide:
The var keyword can also be useful when the specific type of the variable is tedious to type on the keyboard, or is obvious, or does not add to the readability of the code.
So lines like:
int index = getWordIndex(source, value);
would become:
var index = GetWordIndex(source, value);
You should let your class handle more than you are doing now and inside your class divide all tasks properly. I’ve rewritten it so you will understand:
public class TextGenerator
{
public TextGenerator(List<string> source)
{
Source = source;
_random = new Random();
}
private readonly Random _random;
public List<string> Source { get; private set; }
public int MaxRandom
{
get { return Source.Count; }
}
private int GetNextIndex()
{
return _random.Next(0, MaxRandom);
}
public string GetText(List<string> backupSource)
{
if (MaxRandom == 0)
{
Source.AddRange(backupSource);
}
var index = GetNextIndex();
var result = Source[index];
Source.RemoveAt(index);
return result;
}
}
Key points here:
- Constructor with a parameter where you can instantly set the source list
- The
Random
resides inside the class, no more worries for the user of the class MaxRandom
property that returns the maximum for the next random valueGetNextIndex
method to return a next valid index- The
GetText
method is generally the same, only adapted to the changes
Usage of the class:
var generator = new TextGenerator(new List<string> { "Hi", "Howdy", "Hey", "etc" });
Console.WriteLine(generator.GetText(new List<string> { "Some", "backup", "values" }));
You’re violating this naming convention:
- DO NOT use underscores, hyphens, or any other nonalphanumeric characters.:
text_generator
,work_hi
,backup_source
And also this capitalizing convention:
- Do use Pascal casing for all public member, type, and namespace names consisting of multiple words.:
text_generator
,getWordIndex
Moreover, textix
, hi
and work_hi
are meaningless names that don’t inform me what they are and what they contain. Give them proper names, e.g. instead of hi
: greetings
or salutations
.