Jun 19, 2023

Open Banking 101: from a costumer's perspective


 

Open Banking 101: from a costumer’s perspective


Following the first part of this article series about Open Banking, this second part is built a little bit different.

We have seen in the previous part a straightforward definition of the concept of Open Banking, the regulatory background and it’s expansion and development throughout the world.

I’ve wanted for the second part to focus on the advantages of Open Banking from a customer’s perspective. 

It’s true that most of the public content online about Open Banking generally discusses this innovation from a bank or a fintech point of view, and usually it doesn’t present any arguments or explanation for the average person who will come across an account aggregation request without necessarily knowing what is that.

So why would an “average joe” accept to share personal banking information with a third-party company?

Below are 4 major advantages to switch to an Open Banking costumer experience. Enjoy the illustrations !




So what’s Open Banking for the simple everyday costumer ?




Why would costumers accept account aggregation, or the “Open Banking” option?











Apr 15, 2023

Open Banking 101 : Leveraging Opportunities & Disruptions


The banking and financial services industry has undergone a significant transformation due to the rapid advancement in technology and digitalization. 

This paradigm shift has paved the way for Open Banking, which has revolutionized the way banks and financial institutions operate. 

The introduction of Open Banking has provided an opportunity for banks to offer enhanced services, leverage customer data in new ways, and drive innovation in financial products and services. 

Open Banking has brought about a new era of collaboration between traditional banks and fintech startups, and data scientists are playing an essential role in this new landscape. 


Image by pch.vector on Freepik

However, with the opportunities come risks. The sensitive nature of financial data requires banks to ensure the security and privacy of customer information. Therefore, regulatory bodies have implemented guidelines such as the PSD2 (DSP2) to safeguard against potential threats and provide an operating context for banks and fintechs to mitigate all possible risks.


In this article, I will explore the opportunities and disruptions presented by Open Banking. I will discuss how banks can unlock customer insights, mitigate risks, and analyze customer behavior to gain a competitive advantage. 

I will also explore how banks can maximize revenue through cross-selling and up-selling strategies and the essential role of data scientists in driving the  Open Banking revolution.

Stick around because by the end of this post, you will have a better understanding of Open Banking and the potential it holds for driving innovation and growth in the banking industry and in the world digital economy.


"Unlocking Customer Insights in the Age of Open Banking"


Open Banking is a framework that allows third-party providers to access financial data, including payment accounts, through the use of open APIs (Application Programming Interfaces). Open Banking is designed to promote competition, innovation and enhance customer experience in the banking sector.

Image by sentavio on Freepik

The European Union (EU) was the first to introduce open banking regulations, with the Payment Services Directive 2 (PSD2) coming into effect in January 2018. PSD2 requires banks to make customer account data available to Third-Party-Providers through APIs, subject to the customer's consent. 

This has led to a wave of innovation in the financial services industry, with new companies developing products and services that use open banking data to improve customer experience.


In the United States, there is no federal open banking regulation, but there are a number of state-level regulations that are similar to PSD2. For example, the California Consumer Privacy Act (CCPA) gives consumers the right to access their financial data and share it with third-party providers.


In Canada, the Competition Bureau has released a report on open banking, which recommends that the government develop a national framework for open banking. The report also recommends that the government work with the financial industry to develop standards for open banking APIs.


In Latin America, there is no regional open banking regulation, but there are a number of countries that have introduced their own regulations. For example, Brazil's open banking framework, which is known as Open Banking Brasil, came into effect in February 2020. Open Banking Brasil requires banks to make customer account data available to TPPs through APIs, subject to the customer's consent.


In Australia, the government has released a draft open banking policy, which is currently open for consultation. The policy proposes that banks be required to make customer account data available to TPPs through APIs, subject to the customer's consent.

Image by storyset on Freepik


Open Banking has opened up a wealth of customer data that was previously siloed within banks and financial institutions. This data presents an opportunity for banks to gain valuable insights into their customers' behavior, preferences, and financial needs.


Through Open Banking APIs, banks can access customer data from multiple sources, including payment accounts, loans, and credit cards. By aggregating this data and using advanced analytics and machine learning algorithms, banks can develop a holistic view of their customers' financial behavior and preferences.


This deeper understanding of customers can be leveraged to enhance customer experience and drive innovation. 

For example, banks can use customer data to offer personalized financial products and services, such as customized investment portfolios and savings plans, that better meet individual needs.


Moreover, customer data can also be used to develop predictive models that anticipate customer future needs and behavior. For instance, by analyzing spending patterns, banks can predict when customers are likely to need a loan or a credit card upgrade and proactively offer these services to customers before they even ask for them.


Overall, Open Banking presents banks with a unique opportunity to leverage customer data to gain valuable insights that can be used to enhance customer experience, drive innovation, and gain a competitive advantage. However, it is important for banks to ensure that they are collecting and using customer data in an ethical and transparent manner to maintain customer trust and satisfaction.


In the next parts of the article, you'll learn more about mitigating risks in the open banking landscape.



Oct 10, 2021

My Style'Z web app [Style transfer in action]


 Sometimes you start playing with an idea, time passes and you end up with a project that you need to finish.

I am sure this is not the case with most people, however this describes perfectly the process with which I've made the StyleZ app.

I have always been fascinated by the applications where neural networks are used, outside of the traditional context. One of the ideas that were proposed to utilize such technology is style transfer.


The Scream by Edvard Munch



In this post, I will explain the steps I followed to implement and deploy a Neural Network Style Transfer App. The ideas is not to make a technical post, but to lay out the process to take a very small project from an idea to production



Neural Style Transfer


Neural style transfer is an optimization technique used to take two images, the first a content image such as the image of a person, a building or any other image or photo, and the second a style reference image (such as an artwork by a famous painter for example), and blend them together so the output image looks like the content image, but “styled or painted” in the same style of the style reference image.

This is implemented by optimizing the output image to match the content statistics of the content image and the style statistics of the style reference image. 

These statistics are extracted from the images using a convolutional neural network.

In summary, the algorithm takes the content image and the style image that we want to match. it transforms the base input image by minimizing the content and style distances (losses) with back-propagation, creating an image that matches the content of the content image and the style of the style image.


I will not go into further detail, however here is a link to the article where the authors explain all the math details and the steps that goes into achieving this task of style transfer


The different styles applied to our Eminem Image



Building the front-end

I don't see myself as a super-star professional graphic designer per se, however along the years, I had to learn to do graphics work for my NGO, build the website, make brochures and posters for events... etc 

So I think I got that experience to thank for my way with graphical design.

For this app, I wasn't a complex job once you figure out the macro-idea, So I laid out the prerequisites for the web application. I wanted to have a page that allows the user to :

  • upload a photo
  • choose a style from a predefined styles list
  • submit and see the result


Those were the basic needs to make the overall concept works, however I wanted to add more swagg in the app. So the user should also be able to:

  • go back after seeing the result to choose an other style
  • have some useful information about the artwork
  • have links to the technical resources behind the style transfer


I put the sketches together in a blanc sheet and then moved to choosing the tools.




The trio : HTML, CSS and Javascript

I wanted the web app to have a soul. Being an admirer of harmony, I seemed like I would make sense for a style transfer app to have a certain "Style" for itself. So the main page was split into 3 parts:

Top section:

This is where the user will land at first. So it needed to be attractive and very easy to understand and use.




There are 3 component in this section:

  • The first one is a slider to showcase the possible styles a user could apply to a photo.
  • Just below that, the same concept but with the names of the styles with all of the available styles laid out in a row.
  • And the final sub-section is the upload form. In this part of the top section, I tried to keep it simple, with a button to choose the photo from a local directory, a drop down menu for style selection, and finally a submit button.


Center section:



This is really an extra part. I was just set and done with keeping it simple with the top part of the page, which is enough to do the work.
However, I really wanted to share an appreciation for the art that inspired the Style Transfer algorithm.

So I have selected some of the famous paintings and artwork and linked to their Wikipedia pages  or other pages that could provide sufficient information on the painting, the artist and the history behind it.


Bottom section:

This was also an optional part. However, if we are going to talk about the art, we need to also talk about the algorithm that made it happen.

So I linked to the Style Transfer original paper , and I linked to Google's Tensorflow tutorial about style transfer so the user (the curious ones) can see step by step the construction process.





Front-end of the results page:


This was also necessary to think about. The main thing here was based on 2 prerequisites:

- Keep the coherent style the same

- Provide a way for the user to try an other style

There was also a their point but I had to let it go. It was the possibility of making an automatic download button for the result image.

I had to et it go because it was a bit tricky to work on the Javascript, some browsers had limitations for allowing users to download something and I didn't have the time to dig deeper in that.



So the final results page has an upper section with both the original and the styled images, and a button to allow users to test an other style.

Responsive design!!

An important thing I need to mention here is the fact that nowadays, all websites are opened on mobile platforms. For certain cases, mobile penetration is way higher for particular websites.

People are always on the go and I can't assume that a future user should be on their laptop to use my app, so I had to make sure my front-end is compliant with this criteria.




Building the back-end


Since the whole project was about doing something fun and not too complex, I made the choice of keeping the complexity of the work limited to building the style transfer model.

That being said, the simplest backend choice that I could make was to build everything around Flask, and that is what I exactly dit.


For those of you unfamiliar with Flask,  Flask is a micro web framework written in Python. It is classified as a micro-framework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions



Testing everything

There were few test case scenarios that I needed to keep in mind where the user, intentionally or not does one of the following things:

  • upload a non-photo file (pdf , or zip for example)
  • upload a very small image, a grey or black an white or an image with only one color
  • adversarial attacks wanting to reveal the code, model or the backend
  • other security risks


For the first two, testing wasn't very difficult to implement, and taking into account both scenarios, I was able to correct and enhance the overall performance with warning messages displayed to the user in case they missed and uploaded the wrong file type.


For the adversarial attacks and other security risks, since it is a free access app, the only thing I wanted to avoid is to have the app broken or taken off line, other than that most of the codes are available online and I don't really have to worry about people trying to "hack" it, so I did what I knew was possible to make sure the app and users are safe.

That's why the app does not store any original image or result image, as soon as the user refreshes the page, everything is gone.


It was also important for me to implement this feature of not conserving anything, because otherwise the hosting fees will cost me a fortune. So security for users is guaranteed and my pockets wont hurt as much.



Deploying 


This wasn't a difficult thing to decide, since I had 2 conditions to respect:

  • Finding a free hosting tier
  • Easy to deploy without the hassle of too much configs


Amazon and Heroku were my to test options. The only concern I had an I needed to check was the hosting space. I have worked with both solutions in a professional / production setting, so most of the thinking about sizing, redundancy and availability was at a very different scale.

This app is supposed to be shared to my network and used probably intensively during the first week of me sharing the link and the blogpost publicly, and the usage is supposed to go down from there to a more stable level.

I shouldn't expect a 5000 hits/second ( unless something happens and a neural networks app goes viral for some reason)

Also, to make sure I wont get hit by a sudden crash due to high usage, I intended not to post any link on Hacker-news or any other public website.


So If I did the math right, based on my viewers ad interactions numbers on my blog, website landing page and my 2 social network accounts, Twitter and Linkedin, I calculated that in the event I choose to post on a "virality- prone" timing, on all of them, at the same time, I shouldn't be worried about going above the free tier limits of both platforms.


So considering all the above and other reasons, I decided to go with Heroku and did some testing from various locations around the world to check for latency and availability and all in all, everything was good to go.


To sum everything:


It was a nice fun project to do, outside of the do-stuff-for-work mindset. I took my time with it and had fun figuring out some of the parts I didn't have time before to work on deeply, especially javascript and CSS .


The front-end design was a pleasant part because it took me back to the days when I used to do graphic design for my NGO, I actually spent more time working on the front-end than what I spent doing my website landing page.

I wont say I didn't learn much or I learnt too much, It was fun and I was able to do something with an EMINEM photo on it.. So I'm happy, Slim Shady is happy and we all Good!

Next Steps?

Not sure what to do next, either add more styles inspired by other art works, add other features (ideas are yet to come) or probably think about an other thing to do ( while writing this blogpost, I already had an idea for the next weekend project, and I like it!)


I've hidden something in the app, I hope you're one of the smart and curious ones who will maybe find it



Jun 5, 2021

L'île au million d'yeux : Analyse des interactions sociales sur Twitter





#KohLanta : Finale de la saison 2021


Koh-Lanta est aujourd'hui un programme incontournable du paysage audiovisuel français. Cette émission inspirée du “Survivor” polarise les réseaux sociaux et suscite à la fois l’excitation, la joie et parfois la **colère** de ses milliers de fans.


Comme dans la vraie vie, où la nature humaine avec ses *qualités* et ses *défauts* se dévoile face aux épreuves, les **communautés** et les alliances se forment. 

Et souvent  la “trahison” et la “déception” se manifestent  (tiens, c’est comme si je te décrivais un épisode lambda de #Kohlanta 🧐) 


Dans cet article,  je joins l'utile à l'agréable et je partage quelques insights interessants, pour montrer que, comme dans la vraie vie (et exactement pareil dans #Kohlanta), les communautés sont souvent la formation sociale la plus **dominante** dans les interactions humaines et que le fait de partager la même vision/personnalité ou les mêmes opinions, joue un rôle important dans la formation de ces communautés et dans l’influence sur la formation des opinions. 




Visualisation des données collectées, identification visuelle des membres **importants** du réseau et identification des groupements d'utilisateurs


Contexte: Dans cet article, j'analyse les tweets publiques, les réactions, les mentions et les retweets autours du hashtag #Kohlanta. 

La récupération des données a été effectuée en temps réel durant la diffusion de l’émission et 2 heures avant et après cette diffusion pour garantir une exhaustivité maximale des données.

Les types/nature des relations entre les utilisateurs, et les tweets ont été analysés afin de comprendre ( et démontrer si possible) que ce qui se passe dans cette émission (et également sur les réseaux sociaux), n’est qu’une expression/manifestation de la nature humaine et sociale dans tous ses états .


(Interactions en temps réel entre les membres des communautés. @DenisBrogniart au centre des communautés dominantes sur le réseau )


[ PI: Juste avant la publication de ces analyses, toutes les données brutes ont été supprimées :) ]


🔥 Les top 5 utilisateurs sur Twitte🔥


Les tops 5 sont les utilisateurs ayant le nombre le plus élevés de followers == un reach maximal pour la diffusion d’un tweet. 

C’est les utilisateurs qui, en partageant une information, ont le **pouvoir** d’atteindre un grand nombre d’utilisateurs et, en fonction de la pertinence ou de l'intérêt que génère cette information, sont les premiers noeuds du graphe de diffusion de cette information (c'est ici que le concept de vitalité intervient)

Les utilisateurs dans notre top 5 ont le profil parfait pour ça: c'est 100% des utilisateurs/représentants  des médias:

1- @tf1 : 5 628 169 followers :




2- @le_parisien: 2 791 114 followers:




3- @20minutes : 2 448 378 followers:




4- @mediavenir : 1 155 465 followers :




5- @bchameroy : 1 047 579 followers:




Combinés, ces 5 utilisateurs seuls ont un reach total d’environ 13 millions d’utilisateurs. 

Le top 10 parmi les utilisateurs analysés ont un reach de 17 millions. 

Donc ils jouent bien leur rôle principale de relayer l'information et de lancer l'étincelle sur Twitter. 



🤼 Parlons des communautés 🤼


Sur les réseaux sociaux, il existe 4 sortes de groupements d’utilisateurs:

1. ceux qui suivent un utilisateurs ou un hashtag

2. ceux qui partagent sans suivre 

3. ceux qui créent eux mêmes le contenu 

4. ceux qui font 1 et/ou 2 et/ou 3


La formation d’une communauté est souvent basée sur des intérêts communs entre ses membres, soit ils sont amis ou se connaissent dans la vraie vie, soit ils partagent un dénominateur commun (ici, ils sont tous fans de l'émission #Kohlanta)


📊 Chiffres & Observations 📊


Sans trop rentrer dans les détails mathématiques, le point de départ de cette analyse est basé sur la théorie des graphes, et plus précisément sur les méthodes de détection de communautés, de cliques, de clans/clubs, de leaders et d'influenceurs

 (grossus-modus : il s'agit de mesurer les interactions et les relations entre tous les membres d'un réseaux, de trouver un points central, et réitérer pour arriver finalement à avoir des groupements d'utilisateurs ayant une certaine proximité)


J’ai pu ainsi détecter :

- 112 cliques d’utilisateurs

- 9 communautés à fort influence

- 3 leaders / influenceurs 


Ci-dessous une illustration des différents composantes de la "social network" du hashtag #Kohlanta sur Twitter: 



les cliques

Il s’agit d’un groupe d’individus qui interagissent et partagent des intérêts similaires. 

C’est les utilisateurs isolés, ayant un faible nombre de followers ( entre 0 et 60) donc un cercle réduit de contacts sur Twitter ce qui engendre un reach et des interactions inexistantes ou limités.

Les 112 cliques sont constitués de groupement d’utilisateurs qui s’interagissent en cercles fermés entre-eux, souvent ayant un nombre de membres limité entre 2 et 4 utilisateurs qui retweet ou like les tweets d’un des membres du clique.

  

Les communautés:

Elles sont 7, mais ont des liens communs très forts, c'est des utilisateurs “pivots” qui font le pont entre les différentes communautés. 

Tu vas comprendre toute suite le pourquoi du comment.



J’ai pu identifier le profil type de 3 de ces communautés, je les présente ici par ordre décroissant de leur taille totale en terme membres/ utilisateurs. 

Ci-dessous quelques exemples de la polarisation des opinions au sein des différentes communautés :













#1 : Les Fans neutres : 

Inexistence d’une polarisation particulière d’opinions. Il s’agit de fans qui suivent tout simplement l’émission et qui ne s’expriment pas sur le contenu et les événements . C’est la communauté qui a utilisé le moins d’insultes envers les participants de l’émission. L'analyse des tweets révèle  que l'utilisation (non intentionnelle) d'un certain champs lexical a une influence sur l'appartenance d'un utilisateur à l'une des 3 communautés.



#2: Les joyeux: 

C’est la communauté qui comptait le grand nombre de personnes contentes de résultats de la finale. Ceci ne se traduit pas forcément par une satisfaction par rapport au participant gagnant, mais selon leurs tweets, ces personnes n’ont pas exprimé un mécontentement particulier.  Ils ont cependant partager leurs avis particulièrement forts envers certain participants perdants.


#3: Les mécontents  : 


C’est un groupe d’utilisateurs qui partage une sensation de mécontentement généralisé envers l’émission, envers quelques participants ou tous, et surtout envers le résultat final.

C’est la communauté qui partagent aussi un point commun particulier: ils s’expriment d’une manière très prononcée et vont au-delà des commentaires simples, et emploient particulièrement les insultes ( parfois très personnelles et violentes) envers certains participants .


Une particularité de ces 3 groupes, est le fait que le lien de “follower” n’est pas le seul déterminant de l’appartenance préalable d’un utilisateur à un de ces 3 groupes. Certes, le fait de faire partie du réseau d’un leader ( on verra ça juste après), va confirmer une partie de l’hypothèse sur le partage des mêmes opinions sur certains sujets. 

Il faut préciser que, l'appartenance à ces communautés n'était pas un choix réfléchi des utilisateurs. C'est à dire, il n'y avait pas une sorte de condition nécessaire et obligatoire pour choisir ses mots ou ses sentiments dans le but de faire partie d'un groupe d'utilisateurs.

 C'est en fait la détection des communautés qui arrive pour confirmer des hypothèses à posteriori.

Ces utilisateurs n'ont pas donc forcement conscience qu'ils font partie d'une communautés partageant les mêmes avis et opinions sur la finale de #Kohlanta 2021 😉  (Même si en prenant 5 minutes pour y réfléchir, tout sera logiquement logique... all makes perfect sense at the end!)

  

👨‍✈️ Finalement les “Leaders” 👨‍✈️


J’ai pu identifier 3 leaders qui polarisent et orientent le “débat” sur Twitter.

Il ne s’agit pas des personnes les plus suivies (ah bon?).

C’est des utilisateurs qui ont les caractéristiques (ou avantages) suivants:

- être le premier à partager une capture ou une image de l’émission: quand #Kohlanta commence à apparaître dans les tendances Twitter,  c’est eux qui sont vus en premier et donc bénéficient du plus grand nombre de likes et de partages. Souvent les utilisateurs prennent l'option la plus facile de retweeter.

- Avoir un sens d’humour développé ? : Je ne pouvais pas expliquer ça d’un point de vu sociologique/psychologique, mais c’est le phénomène d'une personne qui trouve très rapidement un commentaire marrant sur un participant ou une situation dans l’émission et associe à ça une image ou un “meme”,  ce qui facilite leur accès à un grand nombre d’utilisateurs “passifs” qui ne font que partager, par manque de “créativité” peut - être. 

Comme par exemple:

 ou :

 ou encore :



- Etre le présentateur de l’émission : même si Denis Brogniart n’a pas fait de tweets durant l’émission (d’ailleurs inactif depuis le 28 mai), il centralise à lui seul l’ensemble des interactions des 9 communautés. 

Des mentions de son nom et de son compte Twitter ont monopolisé une grande partie des discussions durant la diffusion de l’émission:



(NB: Cet exercice n'avait pas pou objectif de classer et labelliser les gens dans ces 3 groupes, je n'ai pas donc révélé les identités des membres des différentes communautés ou les leaders. Sauf la mention de Denis Brogniart bien évidement)


à ma surprise: 

Les participants dans l’édition 2021 n’ont pas eu un effet sur les interactions sociales durant la finale de la saison 2021. Je m'explique:

Certains ont été mentionnés dans les tweets, mais leur présence sur Twitter n'a pas impacté ou orienté le **débat** entre les utilisateurs. Ils ont cependant, fait l'objet d'un nombre de commentaires "polarisants".  

Par contre,  plusieurs participants des éditions précédentes ont dominés une partie des interactions ( Go Sam! Go Claude! Go Teheiura!), donc c'est un effet de nostalgie si j'ose dire?

En gros hier soir,  c'était  les téléspectateurs qui ont animé et orienté le **débat** sur Twitter.




Bon voila! 

L'invest' total dans cet exercice était : 

5h pour récupérer les datas (j’ai surveillé le démarrage pendant 1h), 1h pour le nettoyage, 1h pour la construction et la validation des communautés et 1h30 pour avoir l’inspiration du texte ci-dessus.

4h30 de fun pour moi… 

Le sujet de SNA me fascinait depuis des années.. J’ai pu faire des travaux de recherche sur des problématiques similaires et j’ai réussi à mettre en prod quelques modèles qui tournent aujourd’hui (AKA générer du CA grace aux analyse des SN)

Ça me fascine car c’est la manifestation parfaite du comportement humain, et ça me permets de plonger sur des thématiques économiques, sociologiques et psychologiques afin de pouvoir donner un sens aux chiffres. 

Donc un projet SNA c'est souvent beaucoup de lecture et beaucoup de réflexions, mais surtout un exercice amusant !

🎤 💧

Mic drop


à lire dans le même thème, mon article de 2015 sur : l'analyse des interaction Twitter suite aux attaques terroristes en Tunisie 



Apr 25, 2021

On the hype around #DataScience


 Here is my latest twitter storm about the death of Data Science.. 

Enjoy!

Thomas Cole, The Course of Empire, Destruction, 1836, The Metropolitan Museum of Art, New York, USA.

 

 

 




For the rest of the Twitter Storm... I invite you to check it out.. on Twitter !


 

Dec 12, 2020

Why and how I deleted 4000+ connections from Linkedin


The Procession of the Trojan Horse in Troy by Domenico Tiepolo (1773)
The Procession of the Trojan Horse in Troy by Domenico Tiepolo (1773)

 

First things first: could someone remind me of the main goal (or goals) of using Linkedin as a professional social network? 

It undeniably seems to me that, while I had more than a few thousand connections and followers, for all the times I have opened Linkedin to check what was going on (or to reply to that desperate student looking for advice or an internship), I could not fully process what was posted and the quantity of things that are of absolutely no interest to me personally.


In this post, I will explain the reasons that lead me to delete a big chunk of my Linkedin Network, and how I defined and identified those that have to be deleted


As for Facebook, Instagram or even Twitter, every place has its own use. 

Keeping in touch with close friends and family on Facebook, taking interest in following beloved artists and brands on Instagram, and enjoying the occasional discussions with likeminded people on Twitter is according to me, the appropriate way of utilizing each and every social network website.

However, for Linkedin it has alway been a relationship of love and not-so-much-love. You may wonder why ... 

I think that the way I view and use Linkedin are far different from what other people are using it for.

Like any product, we may have different ways of using it: 

a car can drive you from point A to point B, and it can also have a couple of subwoofers and loud speakers to play music as if you were in a night club. 

I don't mind having cars in our cities per say (a part from them being a source of pollution, accidents and produce a lot of waste),  they solve a fundamental problem of transporting people and goods, however in my personal opinion,  playing loud music around urban areas is not what a car was mainly built for.

Therefore, living in an area where most people try to compete for the loudest bass coming out of their trucks at 2 am in the morning is not something I would welcome with a happy attitude, and moving to a calmer area with more civilized people would be on top of my to-do list.

The same analogy could be applied to Linkedin. 

We all have our uses of this website, and despite other people's different understanding of why they are there, I do not intend to move out of it, especially since in real life we don't have the luxury of a "delete connection" and an "unfollow button ( yeah I hear you wishing only if that were true ).

So what is Linkedin anyways ? Let's break it down by asking more specific questions. 

Is it :

  • a job posting site?
  • a professional network?
  • a place to share your kid's drawings?
  • a place to ask for donations?
  • a place to promote products?
  • a place to find love?
  • a place to scam / be scammed?
  • a place to post political views?
  • a place to talk about religion?
  • a place to gather information / spy of people ?

Digging deeper requires much needed enlightenment regarding this matter. We all have observed a certain change in this website. I believe it all started with someone posting a meme or a funny joke, people liking it, others adding them to their network, and BAM! 

The change happened. I've seen it all... I mean all types of garbage...

As a principal, I'm all for seeing a cute kid's drawing of his father's ugly face with spaghetti sauce on a dirty napkin. However, I am a true believer that Linkedin is not the proper place to do that. 

Linkedin lost its primary goal according to me.

This is a personal opinion type of post, because again, I'm all for your freedom to post, say, and share anything you feel like sharing. And I would love for you to extend me the same courtesy and accept that I have the freedom and the right to not wanting to see irrelevant posts there.

I will not be analyzing or talking about the change of Linkedin feed algos and how they affected what we see and what type of posts get the most visibility. 

I'm sure the folks over there are doing their best to maximize shareholder profits (and by the by, maximizing user engagement).

I'm here to talk a bit about the kinds of engagement that I did not like, and what I did to do the "grand-ménage" of my network. 

While we're at it, If someone from Linkedin is reading this, I'm curious to know if you guys have thought about the engagement quality on the site. 

Your abuse reporting system is an absolute joke in terms of UX and you really need to figure out a way to offer help, the same way you are shoving the premium upgrade button everywhere, and the same way you made it a 3 step process to take our money. 

I would love for you to do the same UX simplifications to hear out your users and not send them in an infinite dance around the FAQ page (yes I tried to report someone posting really offensive stuff, and spent 35 minutes clicking from page to page, only to come back the the main FAQ page afterwards) this is done by design and a very bad one. 


Let's know the targets:

a disclaimer: this is not an attack on these people and their freedom to share whatever they want. This is a personal desire from me to clean up a personal space, a space where I do not tolerate low value posts filling up my sight.


In order to identify potential candidates for deletion, I needed to firstly define the different target groups with clear and simple characterizing feats. Since the process of deleting a few thousand connections all at once was made almost impossible by Linkedin ( UX team need to look at the possibility of a "select all to delete" feature).

Let us start with the recent trendy garbage:


1- People tapping twice, and any one sharing a "tap twice to see..."



This segment is the most annoying one recently and I don't have to sell you on this, no justification is needed for anyone participating in this wave of spamming. It was a trick to get more exposure by applying the same method used first on Instagram ( Insta has a double-tap-to-like feature that was not available until recently on Linkedin).

I understand if a social media influencer uses those silly methods to collect likes and get more exposure. Doing this on Linkedin is a sign of stupidity. STOP THIS!


2- No personal photo whatsoever



This is self explanatory, If I have never met you, and I already have you on Linkedin, the minimum thing to have is to put your full name and photo.

If you are afraid of showing the world how you look like, it is hard to accept being connected with actual faceless ghosts. Unless I personally know you, and somehow you decided to delete your profile photo, "you be gooooone" too buddy.


3- Photos of cats, dogs, cars, the beach, natures, food.. instead of a human being

Well, If you have no respect for yourself, how do you expect other to have any for you. Imagine having a photo on your ID, with something else besides your personal photo.

Since I believe that an online profile as something we have control over, why not put your best professional photo, or at least a photo that reflects a nice smile, without showing the world your bathing suite. 

Are you a fan of cats or dogs? We all are! who doesn't love those cute human companions, but as I have mentioned above, there are much appropriate place for that. Linkedin is definitely not one of them.


4- No name, or only initials


It is like, you are sitting at Starbucks, someone introduces themselves to you, asking for your business card, and when they hand your theirs , it is blank, or it has just a couple of initials. No pal, not interested.


5- Natural spammers 

This segment both tricky and easy to get rid of. I will explain in the "How" part how I got rid of most of them. However, my specific rule may not generalize well with others. 

It is composed of those that do the following things repeatedly  :

- Post more than 5 times /day

- Like more than 5 posts / day

- Commenting more than 5 posts / day

Combined with:

- Shouldn't show up in my feed more than 5 times / day for any of the above reasons

- Never sent me a message or an inmail

- Have sent me a message or an inmail, which was totally irrelevant 

- Those scraping profiles for contact info, AKA:  have sent me an unsolicited email at least once

- Those shoving their political / religious views down our throats, one mistake & "you be goooone"  


Doing the filtering and selection for this segment took a bit of time too, it required massive amount of historic data and manual labeling of some of the content in advance.. It was worth every second !

 

6- Romeos... Juliettes... the other type

This segment annoys most people. The definition is pretty strait-forward: Those who consider Linkedin as a dating website. All those that post inappropriate content, and those that have been publicly named and shamed by others.

I admit that the work I did to remove this segment of people from my network was done purely manually. I did not have enough data points to build something that wouldn't miss.

And since I was luckily not the target for marriage proposals on Linkedin, I had to rely on this basic rule, and I choose not to disclose it here.


7- I am recruiting for UAE, DUBAI, CANADA, MARS...


It is a sad feeling when, while browsing through the posts, you see a well respected connection falling victim to the scam of those trying to be the wiseass. I do not believe that copy/pasting this type of posts proves any point what so ever. As much as I don't appreciate scammers, I have less appreciation for lesson-givers. So this group is also an identified target for deletion.


Let's summarize and move forward :


The potential target list can get larger with special cases. I decided to keep it as short as possible for the time being in order to test the actual benefits of this first iteration. 

All in all, 7 target groups have been identified for immediate removal. Some of them have direct identifying characteristics, and other are only identified by behavioral traits . 




This simplification will help in the execution phase. Since profile identifying features are easy to spot once well-defined, and do not require any activity history to be processed. While the behavioral features are harder to come by, and require complex definitions, and an actual analysis of posts, comments and user activity to be able to flag those suckers and delete them.
 
I must note that I was extremely careful while implementing this project for 2 main reasons:

  • Mistakes happen, and I needed to make sure that my code does not delete based on false interpretations

  • Requiring user activity data was extremely tricky, It needed some of patience and a lot of scraping. Doing so meant that I could be detected by LinkedIn for having unusual activity. So my work needed to be as humanly-like as possible, or else I could risk being flagged by the website.
The breakdown of the work went as this:

Profile features identification


The easiest one was identifying the first segment based on predefined profile features. As explained above, there are mainly 3 features I am interested in:
  • The name
  • The profile picture
  • Profile description
Since Linkedin allows for a full download of network information, this process was done manually. However, it did not yield some of the data I needed to analyze and apply the identification rules. Yes you've guessed it right, I couldn't download profile photos from the archive dump provided automatically.
This is why an additional script was added to go through all the profiles one by one, after retreiving the corresponding URLs.









Behavioral features identification


Behavioral features identification was the longest part to work on for obvious reasons.
The challenging part was getting the data needed, AKA the signals required to flag targeted profiles.
Without going into more details, it took around 90 days of monitoring to be able to recover a decent amount of data that have allowed me to identify the largest group .
Below is a simplified diagram explaining how I split the targets, and what went into doing the identification.

I insist on "simplified" because I do not intent to reveal the actual work and the details of the various steps for obvious reasons.


This simplified analysis lead to the implementation of 2 modules:

- Activity module

- Text analysis module

For the first module, and during the monitoring period, since one of the criteria for identifying potential targets was a limit on how often they post, like or comment, per day, I had to capture this information and for every connection going beyond that limit, a flag would be set to the corresponding profile.

The text analysis module was at first a sub part of the activity module. Since I was getting post contents, I figured I can download it all. However it was a very complex task.

Honestly, the whole project was as you would have imagined it, full of trials and errors. Figuring out the optimal way to capture relevant information without consuming much time or resources took a while to optimize.


After the whole thing was set in motion, I was able to gather and flag a large number of information. It was unexpectedly mind blowing!


The Final fun part: Results ! 


As the title reveals it all, I was able to successfully and happily delete more than 4k useless connections on Linkedin. 

Most of them were profiles of people I never met, people outside of my work network, working in industries and in positions that were not relevant to me. So the actual lost was not huge in terms of being exposed to the world based on the 6-degree theory. 

Was it worth the effort? YES! a thousand times YES!

I actually feel the difference in the quality of posts I m experiencing in my feed. I got rid of many spammers, and the overall garbage-to-gold ratio has declined tremendously.

Other than the actual approximation of the total number of deleted connections, I will keep the rest of the details for myself. No personal identifying information or codes will be posted anywhere.

This project was a very self-indulging exercise, where I applied some of the stuff I usually do at work.

Regex, image processing and Deep Learning were heavily used in the behavioral-targeting part of the project.  

I had to analyze a large volume of texts, activity flags, posts and content varying from photos, links and videos. Comments were out of the scope of this project, however comment activity was taken into the equation for obvious reasons.


I personally went through each and every deletion-candidate for manual validation, since the data volumes and the mix of different techniques made it a bit tricky to pre-label posts and profiles, some manual labour went into achieving the final goal.  


My final advise for anyone complaining about LinkedIn: you should clean up and delete those who annoy you. simple..


This work took around 4 months to complete, about 90 days went into gathering the necessary data and 4 weeks of trial and error over multiple weekends. 

Doing this write-up took around a couple of days, and I went back and forth regarding the use of certain terms, especially the "not so politically correct" ones. So What you got here is a mix of both worlds and the fruit of some post-meditation writing.


Below are some other sources of fellow humans complaining about how sucky Linkedin became.. I feel you guys!


Quora : Why does LinkedIn suck?

Quora : What are the things that suck on LinkedIn?

Quora again : How bad is Linkedin?


legal disclaimer: For apparent reasons, I would like to say that this post, and all the stuff behind it are a work of fiction. I love sci-fi stuff and this write-up is a part of an imaginary scenario that never happened. It was all a dream, probably. No one is responsible for this. And if it bothers you in anyway, feel free to take a chill-pill. This never happened, Okay?

Nov 18, 2020

Text Visualization : Come and get inspired !!


 Doing data viz on some types of tasks can be difficult, especially when ordinary line, bar or pie charts will not do enough job explaining the ideas we want to convey.


Text in particular, is a bit delicate to represent with the traditional techniques. Depending on the core idea we would like to represent, sometimes it turns out be be a much harder task.

The Theodore Psalter, AD 1066: Add MS 19352, f. 100r




I remember few situations when I attended a presentation, only to leave later with more confusion regarding the charts used.

I have talked many times on my Twitter about the importance of data viz, and how we really need to make sure that graphs are as simple and as clear as possible, containing easy to understand information, without confusion to your audience .


Yes I really do understand that this requires a certain patience and expertise, however why not get inspired and learn from people having done the task and try to understand the story behind the data, the goal and the objectives of such graphs? 

I stumbled upon a great resource that I wanted to share to the world, so here it is:



It is called the " Text Visualization Browser, A Visual Survey of Text Visualization Techniques"

I really enjoyed the contents, especially the papers associated with each data viz. 

The good thing is, they have selected a few good papers, with really interesting topics, so it wont be an hard job to read the paper, understand the subject matter, and not jump directly to see the charts.


Read !

Learn!

Get inspired!

Apply!

Repeat !



Jun 29, 2020

On the importance of using Docker 🐳


I have always talked about the importance of adopting agile methodology in analytics projects.

With the fast pace of data acquisition and the immense volume of data points collected every hour in certain domains,  being  able to iterate and deploy very fast is a vital necessity in today's cutting-edge era.

 fish swallows an Egyptian soldier in a mosaic scene depicting the splitting of the Red Sea from the Exodus story




I wanted to share some thoughts on a very important set of technologies that are widely used in certain organizations in the automation, the deployment and  the running of robust and stable data products.


I am most certainly sure that you have heard of Docker! That big blue smiling whale  🐳 

Below, I will boil down in simple terms what is it and why I believe it is an important tool in the field of data analytics.

What is Docker ?


Simply put, Docker is an open-source application that utilizes the container paradigm at its core functioning. It allows developers to eliminate hours of work every year, lost in doing repetitive work of setting up environments, installing OS and other applications and therefor allowing them to be more efficient and providing them with precious time that should be spent in more added-value tasks.

It allows the packaging of an application and its dependencies in a virtualized container, permitting its deployment in other environment without the hassle of thinking about compatibility and runtime issues.


Docker gives the possibility of making sure the application could be launched in an isolated environments, therefore providing the huge possibility of deploying it on premise, in a public or private cloud...etc

Docker is not the equivalent of a virtual machine because it does not require the user to worry about setting up the OS and its different dependencies. Its technology relies of utilizing the Linux kernel and the different host system resources (CPU, memory, storage...etc) to provide an isolated environment relative to the application.



Benefits of using Docker


Several benefits could be gained by utilizing Docker in an enterprise context. It goes without mention that the success of this technology comes at no surprise considering the gains it allows us to have >


Flexibility & Portability :


One of the major pain points for IT professionals while working with different sets of applications is the extreme complexity they face to guarantee that applications are smoothly deployed and ran on their production environment.

This complexity is doubled when there are constraints within the company if they have different sets of environments for development, testing and production. It is not always possible to have an iso-system for all these environments and sometimes upgrading the python version in a server might have disastrous consequences on other applications using the older version.

Due to its technology, Docker permits organizations to worry less about this issue and provides a certain flexibility and portability to its users. The core system and its dependencies are not effected by one application, each Docker container has its owns requirements installed within the container.

This makes the task of spinning up a Docker container and deploying it anywhere , a piece of cake! You don't need to think about whether or not the application dev team has thought of making sure the exact system requirements are respected, and offers this flexibility for the dev team to only focus on how to build a running software without the hassle of prerequisites installation and version compatibility .



Efficiency & Scalability :


You can not fully grasp the fact that with all its features, Docker provides a certain amount of efficiency. Gaining time that could have been spent in prerequisites verification and installation is a major efficiency feature. No more time lost due to inadequate configuration, error of compatibility and issues we all could face.

Docker's isolation feature allows developers to pick and choose their technological stack and use the most efficient version of any software to include in their application,  which means that is you believe that a certain programing language is more user friendly than other, or if you are more efficient in working with R better than python, Docker provides you with the possibility of using the best tools for you.

And because launching an application with Docker is a matter of a couple of clicks ( which can be automated) spinning up a container in few seconds provides a major scalability feature that could be utilized when demand is very high and there is a need to scale up or down a certain application.


Security:


Lastly and most importantly, the security feature provided by Docker is a less-known feature. By design, Docker ensures that every application running inside a container is isolated from other applications and containers, even if they are deployed on the same system/server.

Docker ensures that every container is completely segregated and isolated and from an architecture point of view, the system administrator have complete control over the flow of data and the interactions between different system.



Final thoughts:


I believe that data scientist and software engineers are better off focusing on enhancing their code, building efficient and user friendly application. Time lost in setting up infrastructure and dealing with compatibility issues is for me a lost investment .

Using Docker is in today's extremely competitive world is a must. It because some sort of a filter criteria for me to evaluate how efficient and competitive a certain company is, based on their techno stack and whether or not they utilize this type con technologies.

I should mention that the container based solutions are not limited to Docker. There are various other tools that can substitute it.

It goes without saying that some people have brought up the disadvantages of Docker in certain contexts. I am fully aware that in different situations, there should be an adapted solution/ tool.

Docker might have few drawback, however I believe that for a well structured organization, taking advantage of this tool, while keeping in mind its constrains (and doing something about them) is the answer here.


Feel free to share your thoughts about your experience with Docker in the comment section! If you need any advice about how to use Docker in Data Science applications, hit me up on Twitter!