This blog post is dedicated to making sense of the dates of all of these concerts. The date formats are inconsistent and use strange abbreviations. As well, there are many concerts whose shows happen over long weekends or otherwise, meaning there are multiples dates in a single ‘event’. For the purposes of this project at this point, I’m just trying to find the ‘StartDate’ of each of these shows. This will give me a good idea of when in the year that these shows are happening.
Loading
A quick print of two 5-row sections of the data provide a view into all of the different structures of dates. We see #2 in the first set took place over the days following Christmas in 2014. Meanwhile, shows in June and July are expressed in a completely different format.
Loading
Due to some weirdness in GitHub I am not able to print the output of the above syntax, so I’ve provided a snapshot of it below. At first glance the code seems to have mostly worked, but upon further inspection that is not actually the case. I wasn’t able to correctly transform any of the multi-date events.

I use an iterative process to pick apart the pieces of the dates I’m concerned about in order to fix this. First I isolate the Month, Year, and first Day listed. I then re-combine them into a StartDate variable.
Loading

This method works for those with the “Mon. Day, Year” format and those with multiple dates, however the dates with the other ‘Day-Month-Year” format are not correctly translated, so I use an ifelse statement to fill in the gaps from one method with the other.

With that our outputs are now complete! And these 10 examples look to have been translated correctly. In order to have a much better idea if everything went as expected I create a table of 100 randomly selected observations – using the original ‘Dates’ value and the new ‘StartDate’ value – to spot-check my transformations.
Loading
Loading
With that, I think the Events data is pretty much done. I rename the ‘newdata’ to ‘Events’ and – in order to show snapshots of the data itself – I delete the ‘Prices’ column and replace it with a ‘Num_Prices’ variable to indicate how many price categories there were. I don’t know enough about GitHub code to display it otherwise. I have a feeling it had something to do with the comma-separated nature of the list of prices. Either way, I don’t think it is a huge loss. In the next post I’ll begin to do some exploratory analysis.
Loading