Posts

Showing posts from 2015

The Web of Content : Why Sitecore Matters

There's this thing about software that goes big - it always has a philosophy at its very core. I believe in the simple equation that defines software - Problem + Philosophy = Technology Let's look at what the problem is first. As discussed in my last post, we're trying to model content that is bound to change in somewhat predictable ways very often in its development phase and after that is done we need to focus more on being able to add new content, upgrade the current technology stack and while doing that, maintain compatibility and flexibility. These are the prime issues but yet another issue that comes out of them is the common solutions to this problem of software inertia. Once written, it can only be burned down. Software is no different from temples, mosques and churches because its very difficult to convince the end-user to adopt a radical change and hence, for the most part, useful software must retain flexibility. Now that we're done with the background...

Why Content Management ?

Do you know how many websites are there right now on the internet ? I don't see a reason to look that up because its a number that's gonna change with a rate that is increasing every second. You might quote an approximate number today and tomorrow you're gonna be far behind the actual number. The question is not how many websites are made but why are so many website made and what is it really that they contain ? Well, like everything else on computers - something that is not instructions given to the computer is of use to the viewer - us humans; and that's what we call content or data. Now you might think, really - you want an uncountable number of websites ? What do these websites contain ? Well, they contain content. A website that sells shoes contains info about shoes, their pictures, prices etc. and also some information about the seller. What happened with web programming is that it became really broad because it had to tackle these millions of websites that are...

Data Types in R

For understanding data types in any language, it often helps to find out the most elementary data type because often other data types are built on elementary ones. In our case, vectors are the most elementary so let’s define a vector first. Vectors   - They are used to store ordinal data (data where order matters) of the  same base type  ie. you can store 1 and 3.4 in a vector because they have the same base type - numeric. There are other types like logical (TRUE, FALSE), and character ( “a” ,  “ b").  If that ’ s very confusing think about coordinates in a plane or space. The order of numbers used to  specify   coordinates matters( ie. (2,1) is different from (1,2) ) and hence we would use a vector to store things like that. Arrays(multi-dimensional)  - Arrays are the most common data types in other languages but not in R. R stores arrays as  vectors  with  two  other parameters - number of dimensions and names f...

Data Mining and R/Rattle : First Experiment

Data mining is the activity of harnessing useful insights from vast amount of data based on a certain model. Instead of one particular model or way analysis, data mining is context-based and may potentially employ multiple models and can work on a variety of sources of data including text, audio, video, images etc. Data mining is an ongoing process and there are scenarios where the development of the data model stagnates are relatively few. This is because the mining insights act as feedback for a better model and hence, generally the process is much like agile software development. The role of a data miner in an organization begins with an understanding of the domain and data itself. This process is crucial to defining and refining the model that is the heart of the mining process. A model is constructed and critiqued by domain experts and data experts. This completes one cycle of data mining and the cycle continues in the same way every time using the insights generated from th...

PornHub.com : Terrible Data

Some years ago Pornhub.com was asked to gather and analyze data for BuzzFeed to answer a seemingly simple yet statistically involved question. The question was “Who watches more porn, the Reds (republicans) or the Blues (democrats) ?” Aside from the fact that this question was probably not the most efficient use of their time, BuzzFeed relied completely on the prowess of Pornhub’s statisticians and declared that it was evident from the analysis that the Democrats took the lead with a 13% difference. It so turned out that the data mining process flawed. This led to a lot of seemingly plausible insights which were later proved to be inaccurate. Here are a few mistakes they did : The first step was to obtain data about state-wise distribution of republicans versus democrats. Now you would think this can emerge from the data about the percentage of voters per state. But initially they assumed that just because there was a democratic victory (or a republican victo...

Knewton : Gravitating towards data-driven education

There are multiple companies that have sprung up in the past decade which have used big data to create novel products that might completely change they way we think about education itself. Knewton (probably a famous name spelled wrong) is a company in the data-driven education space that derives insights from extended interaction of students with their course material and adapts the material to the learner’s pace, strengths and  weaknesses. The platform provides features like tracking day-to-day activity of a student to determine their “most efficient” study time and what materials suit them and then dynamically generating varied learning plans tailored to a student’s specific needs. What is more exciting about this company is that it was started in 2008 and has now grown to involve textbook publishers, universities and software companies. Knewton’s CEO Jose Ferreira, provides an approximate classification of data gathering categories that the platform uses. ...

Data Privacy : How Little We Know

Data privacy has always been a concern and now with a growing data hungry corporate where everyone wants user-specific data, the privacy of individuals is in grave danger. Here I describe in two sections cases covered by popular media. It wouldn’t be far fetched to assume that there are plenty more that haven’t struck the media as news-worthy. The Facebook Moods Experiment I guess everyone came across this article once, or must have heard about this in the news because guess who it’s about ? Facebook !         Yes, our good old social network that has now become second nature to surfing over the internet dented the beliefs of its users (well, certainly some of us) by conducting an experiment that concerned manipulating the emotions of 700,000 Facebook users. The experiment was done to study the effects that the news feed had on people’s emotions and how they spread through networks. The study was published in the National Academy of Sciences, USA....

The P vs NP question : Relativization and its importance

1.0: Relativization In this section we discuss the importance of relativization as a technique and how it has provided for new directions in complexity theory. We will also see why this otherwise powerful technique is insufficient to resolve the P-NP dilemma but provides counter-arguments against other techniques and hence has continued to be useful in complexity theory. 1.1: Introduction Relativization was first introduced by Baker, Gill and Solovay [2.1] and has continued to be an important technique because it provides arguments against proofs that try to solve the P versus NP question. The general idea is as explained below: Suppose we can show for some statement S that there exists an oracle A such that S fails relative to A in some oracle model. Now any proof that S holds must not relativize in that model or otherwise that statement would also hold relative to A. If we can also find an oracle relative to which S holds then no relativizable technique can decide the tr...

Data Journalism : Do we trust the numbers ?

Data journalism can be rephrased as data-aided journalism. The ability to extract information from various sources is at the heart of the big data process called Data Mining. The current trend in journalism is to rely more on the data mining techniques to gather data that allows for meaningful exploitation of facts and insights that would be otherwise difficult to analyze manually. A lot of news stations have started investing in big data over the past few years but this has led to another problem.          How can we trust the analysis performed by the new channels ? Not only does the technique of big data create a myth that data analysis serves as conclusive evidence, but also disengages the audience from conducting its own inquiry. I’m not implying here the curation of viewers minds, rather the false trust generated by statistical analysis in general. Statistical analysis which is at the heart of big data is only used to indicate there presence of trends ...

Making Machines : Introduction

Machines are now ubiquitous but once they were just thoughts in a crazy man's mind. Well actually, many men but for our understanding there were three people who set the ball rolling. 1. Alan Turing 2. Alonzo Church 3. Kurt Godel All these men were, in their own ways, trying to figure out a mathematical model for a machine. For all those readers who lost me here should think of a mathematical model as the way of describing anything around you with pen and paper. It's almost like taking notes but mathematics is the art of taking notes about your notes and doing all that till you make sense of what is possible by twisting what you know in ways you couldn't have directly imagined. Math is just a language and it's not necessary for all of us to think in that language. My blog here is another effort among many to explain what "computation" means and how is it different and similar to the word "computer" we use every day now. Think about what yo...