I downloaded the OTC tickers with that new URL, renamed that file to "OTC.csv" and then imported that into Excel. From there I pulled out the tickers, added them to the end of my "tickers.xls" file (which is a long list of the tickers all combined). Then I copied those all into a text file and saved it out and reran the "stipTickers.pl" script against it (that script is the one which looks for characters that aren't letters and removes that ticker).
Two things to note from doing that:
1) When I pasted them into a text document, I am on a Mac, so I just double-clicked the old tickers file and pasted in the new data over that. When I double-clicked, it defaulted to TextEdit. When I saved that out, it must have worked some Mac "magic" on it, and it put a Mac newline character at the end ("^M"). When I ran the script, it saw that as one very long line with an illegal character and it deleted it all. I checked via the command line this was the case and then in order to fix this I pasted the data back in via BBEdit, which is much better about acting properly and not forcing Mac things around (there are many ways around this, that was the way I chose to go).
2) I ran the script again after that and we can now note that here are the before and after figures:
Before: 11780
After: 11000
So now we have an even 11K stock tickers that we are going to look at and try to get data for, and update their data at the end of the day (late at night actually).
So the next part coming in the series is going to be how to automate it all in a way that we can start getting data in, but we won't hammer the Yahoo servers. Readjusting our storage requirements to somewhere between 600 and 700MB.
In order to automate, we can essentially use the scripts which we already have, we just have to put in full paths so that when they are run by a cronjob, it will know where to look. Some might argue that had we put in the full paths from the start, we wouldn't need to go back and change this. That is completely and totally correct - but I know from my own experience on my server that this way is easier with this setup in terms of explaining it to someone else.
So stay tuned and hopefully learn something along the way.
Posted by ESS at March 3, 2005 06:10 PM
| TrackBack