/, Case study, Uncategorized, Use case/Alternative data: clean dataset and why it’s impotent

Alternative data: clean dataset and why it’s impotent

Today, more than 80% of the data is unstructured. Data is being produced as we speak – from every conversation we make in social media to every content generated from news sources. In order to produce any meaningful actionable insight from data, it is important to know how to work with it in its unstructured form.

Data and data scientist work in the financial industry

Investors are looking at everything from satellite images of store parking lots, to trending topics on social media, to key events, to product release dates and many more as leading indicators of market movement. For these investors, the approach to doing “great research” has morphed from just reading all available analyst reports to taking a methodical approach to combining potential sources of insight to reach the objective, statistically based conclusions and make smarter decisions. It’s imperative for investment managers to constantly be on the lookout for data streams that can lead to smarter investments.

Structured and unstructured, traditional and alternative data are the bread and butter of data scientists who are professionals with the capabilities to gather large amounts of data to analyze and synthesize the information into actionable signals. It is highly complicated work that produces a mindful output, and with an average yearly salary of $100-150K, every min of their work is count. The reality is that most data scientists spend most of their time cleaning data. According to several studies, Data scientists spend on average 60% of their time on cleaning and organizing data.

David Blackwell, head of client analytics, at UBS Wealth Management, estimated in a study that as much as 70% of analysts’ time can be spent on managing raw data, cleaning it and preparing it for analysis. “This means that only a fraction of analyst work-hours is left to extract insights and guide strategic decisions,” he added. In other words, data scientist who works with different traditional and alternative datasets to find the alpha gener¬ation potential, spend a massive portion of their time on challenges like data connectivity, data cleaning, varying quality.

What is involved in data cleaning?

Data cleaning, also called data cleansing, is the process of ensuring that your data is correct, consistent and useable by identifying any errors or corruptions in the data, correcting or deleting them, or manually processing them as needed to prevent the error from happening again.
Incorrect or inconsistent data can create a number of quality issues that lead to the drawing of false conclusions. Therefore, data cleaning can be an important element in some data analysis situations.

Data cleaning comes in all shapes and sizes and there is no one template to handle all situations, but key factors must be in place:
Accuracy – The degree to which the data is close to the true values.
Completeness – The degree to which all required data is known.
Consistency – The degree to which the data is consistent, within the same data set or across multiple data sets.
Uniformity – The degree to which the data is specified using the same unit of measure.

What can be done to reduce data scientists’ time on cleaning datasets?

1. Outsource the cleaning stage to contractors
2. To use NLP in the cleaning process to save time.
3. When buying external datasets, make sure that the data is structured, clean, and linked to tickers.

To conclude, while hedgefunds and asset managers recognize the alpha generation potential of traditional and alternative datasets, they face many challenges like data connectivity, data cleaning and varying quality which reduces their new dataset’s beck testing capacity.


 

Using big data and NLP technologies to capture alpha by collecting, structuring, and revealing events from news articles, press releases, and financial social media.

(Views and recommendations given in this section are for research purposes only. Please consult your financial adviser before taking any position in the stock/s or currencies mentioned.) Neither First to invest. nor any of its officers, employees, representatives, agents or independent contractors are, in such capacities, licensed financial advisors, registered investment advisers or registered broker-dealers. First to invest does not provide investment or financial advice or make investment recommendations. Nothing contained in this communication constitutes a solicitation, recommendation, promotion, endorsement or offer by First to invest of any particular security, transaction or investment.)

TAKE THE NEXT STEP

Please enter your email, so we can follow up with you.
We will process your personal data with the purpose of managing our service offering to you. You can exercise your rights of access, rectification, erasure, restriction of processing, data portability and objection by emailing us at [email protected] For more information, you can check out our Privacy Policy. By submitting this form, you agree to our terms and conditions
2019-10-13T08:40:05-04:00 October 13th, 2019|case studies, Case study, Uncategorized, Use case|Comments Off on Alternative data: clean dataset and why it’s impotent