Data Professionals is the collection of workforces primarily specialising in the data science field, ie that provides meaningful information based on large amounts of complex data or big data. It combines different fields of work in statistics and computation to interpret data for decision-making purposes.
Prepare documentation to outline data sources, models and algorithms used and developed;
02
Extract data from data sources;
03
Propose new uses for existing data sources and structures;
04
Integrate multiple data sets to build large and complex data sets;
05
Build software to scrub, combine and manage data;
06
Apply data mining techniques to investigate leads, identify patterns and regularities in data;
07
Implement automated processes to produce scale models;
08
Develop complex code, scripts and data pipelines to process structured and unstructured data;
09
Test data system configurations to increase efficiency;
10
Enable searching, data visualisation, and advanced analytics functionality;
11
Monitor data system performance;
12
Facilitate data cleansing and enrichment; and
13
Perform data validation and quality control checks
What are the skills required?
Skills, knowledege and abilities that are essential to carry out a task with determined results. Skills can often be divided into domain-general and domain-specific skills.
Hard Skill
Machine Learning
Machine learning is the subfield of computer science that, according to Arthur Samuel, gives "computers the ability to learn without being explicitly programmed." Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term "machine learning" in 1959 while at IBM. Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions, through building a model from sample inputs.
Hard Skill
Python
Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991.
Hard Skill
SQL
SQL ( ESS-kew-EL or SEE-kwl, Structured Query Language) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS).
Hard Skill
Analytics
Ability to deconstruct information into smaller categories in order to draw conclusions
Soft Skill
Communication
The ability to communicate effectively with superiors, colleagues, and staff is essential, no matter what industry you work in.
Hard Skill
Data Analysis
Data analysis, also known as analysis of data or data analytics, is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.
Hard Skill
Data Visualization
Data visualization or data visualisation is viewed by many disciplines as a modern equivalent of visual communication.
Soft Skill
English
Able to speak English
Hard Skill
Mandarin
Able to speak Mandarin
Hard Skill
Pandas
In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis.
Soft Skill
Research
Experience performing creative and systematic work to understand a product, market, or customer, either before building a new solution, or to troubleshoot an existing issue
Hard Skill
Analysis
The ability to collect and analyze information, problem-solve, and make decisions
Hard Skill
Data Integration
Data integration involves combining data residing in different sources and providing users with a unified view of them.
Hard Skill
Data Transformation
Working experience of Data Transformation, which is the process of converting data from one format or structure into another format or structure.
Hard Skill
Data Validation
In computer science, data validation is the process of ensuring that a program operates on clean, correct and useful data.
Hard Skill
Data wrangling
Working experience of Data wrangling, which is the process of transforming and mapping data from one raw data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.
Soft Skill
Planning
Planning (also called forethought) is the process of thinking about and organizing the activities required to achieve a desired goal.
Hard Skill
Predictive Analytics
Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.
Soft Skill
Bahasa Malaysia
Able to speak Bahasa Malaysia
Hard Skill
BigQuery
Working experience of BigQuery, which is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage. It is an Infrastructure as a Service (IaaS) that may be used complementarily with MapReduce.
BigQuery provides an external access to the Dremel technology, a scalable, interactive ad hoc query system for analysis of read-only nested data.
Hard Skill
Business Analysis
Experience identifying business needs and and determining solutions to business problems.
Hard Skill
Data Archiving
Working experience of Data Archiving, which is the process of moving data that is no longer actively used to a separate storage device for long-term retention. Archive data consists of older data that is still important to the organization and may be needed for future reference, as well as data that must be retained for regulatory compliance. Data archives are indexed and have search capabilities so files and parts of files can be easily located and retrieved.
Hard Skill
Data Cleaning
Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.
Hard Skill
Data Entry
Working experience of Data Entry. Data entry is the professional term for entering information into a computer or data-recording system using an electronic or mechanical device.
Hard Skill
Data Management
Data management comprises all the disciplines related to managing data as a valuable resource.
Hard Skill
Data Manipulation
Data manipulationis the process of changing data in an effort to make it easier to read or be more organized.
Hard Skill
Data Mining
Data mining is the computing process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Hard Skill
Data Modeling
Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques.
Hard Skill
Data Modelling
Data modeling is the process of creating a data model for the data to be stored in a Database.
Hard Skill
Data Warehousing
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.
Hard Skill
Database Management
Database management refers to the actions a business takes to manipulate and control data to meet necessary conditions throughout the entire data lifecycle.
Soft Skill
Decision Making
In psychology, decision-making is regarded as the cognitive process resulting in the selection of a belief or a course of action among several alternative possibilities.
Hard Skill
Forecasting
Forecasting is the process of making predictions of the future based on past and present data and most commonly by analysis of trends.
Hard Skill
Regression Analysis
Working experience of Regression Analysis, which is a set of statistical processes for estimating the relationships among variables.
Hard Skill
Statistical Modelling
a simplified, mathematically-formalized way to approximate reality (i.e. what generates your data) and optionally to make predictions from this approximation. The statistical model is the mathematical equation that is used.
Hard Skill
.NET
.NET Framework (pronounced dot net) is a software framework developed by Microsoft that runs primarily on Microsoft Windows.
Hard Skill
Apache Oozie
Data base management system software
Hard Skill
Apache Sqoop
Data base management system software
Hard Skill
Attribution Analysis
Working experience of Attribution Analysis, which is a performance-evaluation tool used to analyze the ability of portfolio and fund managers. Attribution analysis uncovers the impact of the manager's investment decisions with regard to overall investment policy, asset allocation, security selection and activity.
Hard Skill
Big Data Analytics
Working experience of Big Data Analytics, which refers to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from big data - data sets that are so voluminous and complex that traditional data-processing application software are inadequate to deal with them.
Hard Skill
Business Intelligence Reporting
Business Intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis of business information.
Hard Skill
Cassandra
Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
Hard Skill
Change data capture
Working experience of Change data capture. In databases, change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data.
Hard Skill
Customer Data Integration
Integration of information from multiple applications, created and maintained by organisations, into one consistent and transparent data structure
Hard Skill
Data Acquisition
Data acquisition is the process of sampling signals that measure real world physical conditions and converting the resulting samples into digital numeric values that can be manipulated by a computer.
Hard Skill
Data Architecture
In information technology, data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations.
Hard Skill
Data Conversion
Data conversion is the conversion of computer data from one format to another.
Hard Skill
Data Engineering
Working experience of Data Engineering, which includes gathering and collectiong of data, storing it, batch processing or real-time processing of data, and setting up access via e.g. an API.
Field of Study Required
Field of study consists of a broad area of academic and skills qualifications that come under a similar branch of subject knowledge. In addition, courses offered under each field of study require similar academic entry requirements.
Mathematics
Mathematics is the study of abstract deductive systems. It includes algebra, arithmetic, geometry, real and complex analysis and pure and applied mathematics.