“The undifferentiated digitalization of our lives has brought an exponential growth of datification; we are all, without exception, “producers of data and relationships” who enter more and more directly into the processes of capitalist accumulation.
To have an order of magnitude of the quantities and of the acceleration of which we speak, where precisely the datification constitutes the central pivot, Martin Hilbert of the Annenberg School for Communication and Journalism has calculated that, while in 2000 the information recorded in the world was for the 25 % supported by digital format and 75% contained on analog devices, in 2013 the digitized information is estimated around 1,200 exabytes and is 98% of the total, leaving only 2% to the analogue.
So if up to a few years ago the data could be analyzed by an expert or a research group, using classic machines and algorithms to be able to extract the information useful for profiling, today big data make these models unsatisfactory and obsolete, this because we are today in the condition, never had before, of possessing a quantity of data greater than that which the means allow us to manage. The need to use Artificial Intelligence mechanisms such as Machine Learning comes from, because while in classical algorithm the machine has the task of performing certain calculations in a predetermined order, in Machine Learning the algorithm finds the lifeblood in the data, defining them the raw material; the incredible power is this process can be applied to virtually any context.
To do this we need a single thing, a sufficiently large dataset, with information contextualised to our goals, which trains our algorithms.
PRAGMA ETIMOS, leader in data refining (Data Economy), has for some time in its field of integrated Research and Development the study of Sentiment Analysis processes, a branch of Natural Language Processing, with the aim of determining a “positivity” level or emotional “negativity” from a destructured text. To create such a machine learning algorithm it was necessary to recover a large amount of “labeled” texts, which the network makes available to us with the many open social platforms, publicly available and consisting of a text attached to an evaluation, which it already constitutes a sentimental cataloging, to carry out a traning as accurately as possible. Subsequently a calibration of the recognition processes was necessary by means of precise tokenizations (separation of the elements) with a lemmatura (management of inflected forms) and a search for the element naturalized in the DB of the headings cataloged in a positive and negative way, where precisely the algorithm will have the task of calculating the real polarity by comparing the various scores. “