Posts

Why computers aren't amazingly smart

Artificial intelligence is the field of computer science with the goal of making computer more useful than they already are by making them receptive to their environment, giving them a novel way to store, process and expand their knowledge base and letting them make informed decisions based on the accumulated knowledge. As you can already see from the above definition that the field has a variety of sub-fields or places where one can contribute. Let's begin by talking about what motivates artificial intelligence. How can the task of making decisions in machines ( which is mostly inspired by how humans think and act ) be understood as a process ? The best way to draw a parallel is to keep in mind that what the machine is essentially trying to do is act like humans albeit without a perfect resemblance. The state of art thinking machines are not even as good as a teenager. In fact, most would not be able to do simple tasks like differentiating a girl from a boy as fast and as accur...

The Web of Content : Why Sitecore Matters

There's this thing about software that goes big - it always has a philosophy at its very core. I believe in the simple equation that defines software - Problem + Philosophy = Technology Let's look at what the problem is first. As discussed in my last post, we're trying to model content that is bound to change in somewhat predictable ways very often in its development phase and after that is done we need to focus more on being able to add new content, upgrade the current technology stack and while doing that, maintain compatibility and flexibility. These are the prime issues but yet another issue that comes out of them is the common solutions to this problem of software inertia. Once written, it can only be burned down. Software is no different from temples, mosques and churches because its very difficult to convince the end-user to adopt a radical change and hence, for the most part, useful software must retain flexibility. Now that we're done with the background...

Why Content Management ?

Do you know how many websites are there right now on the internet ? I don't see a reason to look that up because its a number that's gonna change with a rate that is increasing every second. You might quote an approximate number today and tomorrow you're gonna be far behind the actual number. The question is not how many websites are made but why are so many website made and what is it really that they contain ? Well, like everything else on computers - something that is not instructions given to the computer is of use to the viewer - us humans; and that's what we call content or data. Now you might think, really - you want an uncountable number of websites ? What do these websites contain ? Well, they contain content. A website that sells shoes contains info about shoes, their pictures, prices etc. and also some information about the seller. What happened with web programming is that it became really broad because it had to tackle these millions of websites that are...

Data Types in R

For understanding data types in any language, it often helps to find out the most elementary data type because often other data types are built on elementary ones. In our case, vectors are the most elementary so let’s define a vector first. Vectors   - They are used to store ordinal data (data where order matters) of the  same base type  ie. you can store 1 and 3.4 in a vector because they have the same base type - numeric. There are other types like logical (TRUE, FALSE), and character ( “a” ,  “ b").  If that ’ s very confusing think about coordinates in a plane or space. The order of numbers used to  specify   coordinates matters( ie. (2,1) is different from (1,2) ) and hence we would use a vector to store things like that. Arrays(multi-dimensional)  - Arrays are the most common data types in other languages but not in R. R stores arrays as  vectors  with  two  other parameters - number of dimensions and names f...

Data Mining and R/Rattle : First Experiment

Data mining is the activity of harnessing useful insights from vast amount of data based on a certain model. Instead of one particular model or way analysis, data mining is context-based and may potentially employ multiple models and can work on a variety of sources of data including text, audio, video, images etc. Data mining is an ongoing process and there are scenarios where the development of the data model stagnates are relatively few. This is because the mining insights act as feedback for a better model and hence, generally the process is much like agile software development. The role of a data miner in an organization begins with an understanding of the domain and data itself. This process is crucial to defining and refining the model that is the heart of the mining process. A model is constructed and critiqued by domain experts and data experts. This completes one cycle of data mining and the cycle continues in the same way every time using the insights generated from th...

PornHub.com : Terrible Data

Some years ago Pornhub.com was asked to gather and analyze data for BuzzFeed to answer a seemingly simple yet statistically involved question. The question was “Who watches more porn, the Reds (republicans) or the Blues (democrats) ?” Aside from the fact that this question was probably not the most efficient use of their time, BuzzFeed relied completely on the prowess of Pornhub’s statisticians and declared that it was evident from the analysis that the Democrats took the lead with a 13% difference. It so turned out that the data mining process flawed. This led to a lot of seemingly plausible insights which were later proved to be inaccurate. Here are a few mistakes they did : The first step was to obtain data about state-wise distribution of republicans versus democrats. Now you would think this can emerge from the data about the percentage of voters per state. But initially they assumed that just because there was a democratic victory (or a republican victo...

Knewton : Gravitating towards data-driven education

There are multiple companies that have sprung up in the past decade which have used big data to create novel products that might completely change they way we think about education itself. Knewton (probably a famous name spelled wrong) is a company in the data-driven education space that derives insights from extended interaction of students with their course material and adapts the material to the learner’s pace, strengths and  weaknesses. The platform provides features like tracking day-to-day activity of a student to determine their “most efficient” study time and what materials suit them and then dynamically generating varied learning plans tailored to a student’s specific needs. What is more exciting about this company is that it was started in 2008 and has now grown to involve textbook publishers, universities and software companies. Knewton’s CEO Jose Ferreira, provides an approximate classification of data gathering categories that the platform uses. ...