Loading tab-separated tweet data into an array

Posted on


I’m working on this school project and was wondering if there was any way of storing this information better, faster, more professionally. I’m also restricted to only using an array; don’t ask me we are not allowed to use ArrayLists yet.

My Code:

public void loadTweets(String fileName){

    try {
        File file = new File(fileName);
        Scanner s = new Scanner(file);

        tweets = new String[numberOfTweets];

        s = new Scanner(file);
        int counter = 0;
            String[] elements = s.nextLine().split("t");
            tweets[counter] = elements[2];
    } catch (IOException e) {

File Example:

Each field is separated by a tab and it goes, user > date posted > tweet.

USER_989b85bb 2010-03-04T15:34:46 @USER_6921e61d can I be...
USER_989b85bb 2010-03-04T15:34:47 superstar 
USER_a75657c2 2010-03-03T00:02:54 @USER_13e8a102 They reached a
USER_a75657c2 2010-03-07T21:45:48 So SunChips made a bag...
USER_ee551c6c 2010-03-07T15:40:27 drthema: Do something today that
USER_6c78461b 2010-03-03T05:13:34 @USER_a3d59856 yes, i watched...
USER_92b2293c 2010-03-04T14:00:11 RT @USER_5aac9e88: Let no 1 push u
USER_75c62ed9 2010-03-07T03:35:38 @USER_cb237f7f Congrats on...


The prohibition on ArrayList is unfortunate. One natural solution would be to use Files.readAllLines(), but that returns a List<String>, which is probably off-limits to you. Likewise, Files.lines() produces a Stream<String>, which would be even better and thus probably even more forbidden to you.

Your workaround is to open the file twice, which is definitely undesirable. (File I/O is considered “expensive”.) If I had to make a recommendation based on arrays, I would suggest

  1. Files.readAllBytes() to slurp the entire file into a byte array.
  2. Make a String from the byte array.
  3. Use String.split() to form an array of lines.
  4. For each line, retain only the third field.

My reasoning is that you eventually have to read the entire file anyway, so you might as well read it all at once, and only once. Once you have a string, you can take advantage of String.split().

I would also like to note that catching IOException to print a stack trace is counterproductive. If you don’t have a good way to handle an exception, just let it propagate by declaring public void loadTweets(…) throws IOException. That way, you’re letting the caller know that something went wrong — which is exactly what exceptions are meant for.

A good improvement would be to store each tweet as a Tweet object instead of using Strings only.

It could look like:

public class Tweet {
  private final String user;
  private final LocalDateTime date;
  private final String message;
  public Tweet(String user, LocalDateTime date, String message) {
    this.user = user;
    this.date = date;
    this.message = message;

So reading a tweet would be:

String[] elements = s.nextLine().split("t");
String user = elements[0];
LocalDateTime date = LocalDateTime.parse(elements[1]);
String message = elements[2];
Tweet t = new Tweet(user, date, message);

You also probably want to check that the tweet is properly formed:

if (elements.length != 3) {
  throw new RuntimeException("Expected 3 fields but only received " + elements.length);

Finally, I would suggest making this a pure function instead of mutating external state:

public Tweet[] loadTweets(String fileName) {
  Tweet[] tweets = ...;

  return tweets;

This will make the method more reusable and make your code less complicated to follow.

Building upon @200_success‘s answer, I suppose you can use a Scanner on the newly-created single String from the file’s content, delimit it by either the line separator or the tab character, and then extract every third field…

 * @return a String of A-X letters delimited by t, and a newline every three letters.
private static String createTestString() {
    StringJoiner joiner = new StringJoiner(System.lineSeparator());
    for (int i = 65; i < 88; i += 3) {
        joiner.add(IntStream.range(i, i + 3)
                            .mapToObj(c -> Character.valueOf((char) c).toString())
    return joiner.toString();

public static void main(String[] args) {
    String[] tweets = new String[0];
    try (Scanner scanner = new Scanner(createTestString())) {
        scanner.useDelimiter(System.lineSeparator() + "|t");
        int counter = 0;
        while (scanner.hasNext()) {
            String next = scanner.next();
            if (++counter % 3 == 0) {
                tweets = Arrays.copyOfRange(tweets, 0, tweets.length + 1);
                tweets[(counter / 3) - 1] = next;

// output
[C, F, I, L, O, R, U, X]

Like what @200_success said, it’s unfortunate that a List can’t be used, which means we have to resort to manually re-creating our array by making it one element larger every time.

As illustrated above as well, you should use try-with-resources on Scanner instances so that the underlying I/O can be safely and efficiently handled; not so applicable for my String example, but more pertinent to your actual file input.

Leave a Reply

Your email address will not be published. Required fields are marked *