August 2015

Aug 30, 2015

Analyzing Twitter data with R (part 3: Cleaning & organizing the Data)


After we have explained in the previous parts, how to set up the access to Twitter's API and how to import tweets with a simple R command, in this third part we will try to organize and clean the data we have imported in good reusable format.

So let's pick up from where we left off, after finishing the authentication process, now we want to import certain tweets. the command searchTwitter  from the package twitteR will do the job for us.
 we need to create a new object that will contain the data we are about to import and give it the result of our searchTwitter function.

In our example, we will be querying Twitter for the tweets containing the hashtag #Tunisia, we will import 1000 English tweets. and we will have a preview of the imported data with the head()  function:




Aug 17, 2015

String manipulation functions in R ( part 2)


First part of this series of posts.


This is the second part of a series of posts that treat the subject of strings and text manipulation in R. Every time some functions are picked with basic examples and shared here. I hope that at the end of this series, I'll have a good archive of lessons from beginners level to expert level.

The points we will see in this part 2 are:

  1. character encoding 
  2. length of a string 
  3. lowercase and uppercase conversion 
  4. basic string comparison 
  5. concatenating strings
  6. extracting sub-strings 

character encoding 



Aug 8, 2015

Mathematical Symbols in LATEX


I have been going through my hard drive and organizing some of the files I have, PDF's, eBooks and articles and I found a very interesting documents about LATEX math symbols.

Please deactivate ad-blockers to be able to see the pdf document !

UPDATE :

I have published this article a few years and it has attracted the most attention of all my blogposts, so I'd like to thank all of the visitors that shared this post.

Please take some time to check out other articles I have shared about Data Science, Machine learning and Big Data:

Data Mining basic and advanced concepts - Part 2 : Describing Data




Here are some of the most interesting parts of it:


Aug 5, 2015

M&M&M : Mean ~ Median ~ Mode : Uncovered



Is it just me , or people  are still confusing the 3 M's?

 


I've made a general observation about the common mistakes some folks make when talking about statistical concepts., that they can not differentiate between the mean, mode and median of a particular data variable.

For statisticians and domain experts, this might seem like a "stupid" topic to talk about, since statistics are actually a very deep field, and those 3 points are just small dots in an ocean of concepts and knowledge .

But it is very important for those who work with numbers, to really get the idea behind some of the basic concepts they see or use everyday, because as confusing as it might seem for some people, an argument where you use the "mean" as a variable to justify a decision you made will be totally wrong if what you were really referring to in your reasoning was the "Mode"