Artificial Intelligence
Articles
eBooks
Interview Questions
Videos
Keras
Articles
Create Model using both Sequential and Functional API in Keras
Deep dive into Keras - Convolution Neural Network (CNN)
Deep Dive into Modules provided by Keras Library
Detail understanding of Keras Layers
Detailed understanding of Keras applications
Detailed understanding of the Keras Model compilation process
How Keras help in Deep Learning and Architecture of Keras Library
How to write a simple MPL based Artificial Neural Network to perform regression prediction.
Keras Backend Implementations and overview of Deep Learning
Overview of Deep Learning Library Keras and How to install Keras Library on your machine
Write a simple Long Short-Term Memory (LSTM) based RNN to do sequence analysis
eBooks
Interview Questions
Videos
Create Model using both Sequential and Functional API in Keras
Deep dive into Keras - Convolution Neural Network (CNN)
Deep Dive into Modules provided by Keras Library
Detail understanding of Keras Layers
Detailed understanding of Keras applications
Detailed understanding of the Keras Model compilation process
How Keras help in Deep Learning and Architecture of Keras Library
How to write a simple MPL based Artificial Neural Network to perform regression prediction.
Keras Backend Implementations and overview of Deep Learning
Overview of Deep Learning Library Keras and How to install Keras Library on your machine
Write a simple Long Short-Term Memory (LSTM) based RNN to do sequence analysis
Tensor Flow
Articles
Concept of Agents and Environments in AI
Hidden Layer Perceptron in TensorFlow
Multi-layer Perceptron in TensorFlow
Concept of Fuzzy Logic Systems
Deep Dive into TensorFlow Playground
Difference between TensorFlow and Keras
Difference between TensorFlow and PyTorch
Difference between TensorFlow and Theano
How to Install TensorFlow Through pip in Windows
Idea of Intelligence and components of Intelligence
Implementation of Neural Network in TensorFlow
Linear Regression in TensorFlow
Machine Learning and Deep Learning
How to Install TensorFlow through Anaconda
Introduction of Convolutional Neural Network in TensorFlow
Long short-term memory (LSTM) RNN in TensorFlow
What are Artificial Neural Networks?
Working of Convolutional Neural Network
Advantages and Disadvantages of TensorFlow
Architecture of TensorFlow explained
AI - Popular Search Algorithms
Artificial Intelligence - Research Areas
Artificial Neural Network in TensorFlow
CIFAR-10 and CIFAR-100 Dataset in TensorFlow
Classification of Neural Network in TensorFlow
TensorFlow Single and Multiple GPU
TensorFlow Security and TensorFlow Vs Caffe
Style Transferring in TensorFlow
Single Layer Perceptron in TensorFlow
Robotics in Artificial Intelligence
Recurrent Neural Network (RNN) in TensorFlow
eBooks
Interview Questions
Videos
Concept of Agents and Environments in AI
Hidden Layer Perceptron in TensorFlow
Multi-layer Perceptron in TensorFlow
Concept of Fuzzy Logic Systems
Deep Dive into TensorFlow Playground
Difference between TensorFlow and Keras
Difference between TensorFlow and PyTorch
Difference between TensorFlow and Theano
How to Install TensorFlow Through pip in Windows
Idea of Intelligence and components of Intelligence
Implementation of Neural Network in TensorFlow
Linear Regression in TensorFlow
Machine Learning and Deep Learning
How to Install TensorFlow through Anaconda
Introduction of Convolutional Neural Network in TensorFlow
Long short-term memory (LSTM) RNN in TensorFlow
What are Artificial Neural Networks?
Working of Convolutional Neural Network
Advantages and Disadvantages of TensorFlow
Architecture of TensorFlow explained
AI - Popular Search Algorithms
Artificial Intelligence - Research Areas
Artificial Neural Network in TensorFlow
CIFAR-10 and CIFAR-100 Dataset in TensorFlow
Classification of Neural Network in TensorFlow
TensorFlow Single and Multiple GPU
TensorFlow Security and TensorFlow Vs Caffe
Style Transferring in TensorFlow
Single Layer Perceptron in TensorFlow
Robotics in Artificial Intelligence
Recurrent Neural Network (RNN) in TensorFlow
Data Science Introduction and How to set up python
How to use Hibernate Query Language
Handling Arrays and Strings in PHP
Cookies and Sessions Handling in PHP
Cookies and Sessions Handling in PHP
Concept of Agents and Environments in AI
Hidden Layer Perceptron in TensorFlow
Multi-layer Perceptron in TensorFlow
Concept of Fuzzy Logic Systems
Deep Dive into TensorFlow Playground
Difference between TensorFlow and Keras
Difference between TensorFlow and PyTorch
Difference between TensorFlow and Theano
How to Install TensorFlow Through pip in Windows
Idea of Intelligence and components of Intelligence
Implementation of Neural Network in TensorFlow
Linear Regression in TensorFlow
Machine Learning and Deep Learning
How to Install TensorFlow through Anaconda
Introduction of Convolutional Neural Network in TensorFlow
Long short-term memory (LSTM) RNN in TensorFlow
What are Artificial Neural Networks?
Working of Convolutional Neural Network
Advantages and Disadvantages of TensorFlow
Architecture of TensorFlow explained
AI - Popular Search Algorithms
Artificial Intelligence - Research Areas
Artificial Neural Network in TensorFlow
CIFAR-10 and CIFAR-100 Dataset in TensorFlow
Classification of Neural Network in TensorFlow
Create Model using both Sequential and Functional API in Keras
Deep dive into Keras - Convolution Neural Network (CNN)
Deep Dive into Modules provided by Keras Library
Detail understanding of Keras Layers
Detailed understanding of Keras applications
Detailed understanding of the Keras Model compilation process
How Keras help in Deep Learning and Architecture of Keras Library
How to write a simple MPL based Artificial Neural Network to perform regression prediction.
Keras Backend Implementations and overview of Deep Learning
Overview of Deep Learning Library Keras and How to install Keras Library on your machine
Write a simple Long Short-Term Memory (LSTM) based RNN to do sequence analysis
Model Evaluation and Model Prediction in Keras
How to create our own Customized Layer in Keras Library
TensorFlow Single and Multiple GPU
TensorFlow Security and TensorFlow Vs Caffe
Style Transferring in TensorFlow
Single Layer Perceptron in TensorFlow
Robotics in Artificial Intelligence
Recurrent Neural Network (RNN) in TensorFlow
Overview of Artificial Intelligence and its Application
Natural Language Processing in AI
Test Article Main differences between Selenium RC and Selenium WebDriver - Dont Delete
Top Artificial Intelligence Interview Questions and Answers
Top Oracle DBA Interview Questions and Answers
Basics of Splunk and Installation of Splunk Environment
Microsoft Azure Solutions Architect Certification Exam Questions (AZ-300 & AZ-301)
Best Approach for Storing data to AWS DynamoDB and S3 – AWS Implementation
Maintain High Availability in AWS with anticipated Additional Load
BI and Visualization
Articles
eBooks
Videos
Cognos Analytics
Articles
Perform Report Operations in IBM Cognos
Introduction to IBM Cognos and its Components and Services
How to open, create, save, run and print report in Cognos
How to open, create and save Analysis in Analysis Studio in Cognos
How to create report in Report Studio
How to create List and CrossTab Report in Cognos
How to create a package using Cognos
Filters and Custom Calculations in Cognos
Data Warehouse Schemas, ETL and Reporting Tools
Cognos Studios and other capabilities
eBooks
Videos
Perform Report Operations in IBM Cognos
Introduction to IBM Cognos and its Components and Services
How to open, create, save, run and print report in Cognos
How to open, create and save Analysis in Analysis Studio in Cognos
How to create report in Report Studio
How to create List and CrossTab Report in Cognos
How to create a package using Cognos
Filters and Custom Calculations in Cognos
Data Warehouse Schemas, ETL and Reporting Tools
Cognos Studios and other capabilities
Cognos - Relationships in Metadata Model
Top Tableau Desktop Interview Questions and Answers
Top Tableau Server Interview Questions and Answers
Top Power BI Interview Questions and Answers
Top Cognos TM1 Interview Questions and Answers
Cognos TM1
eBooks
Interview Questions
Videos
Microsoft Excel
Articles
How to Merge & Wrap Cells, Borders and Shades and Apply Formatting in Excel
BackStage View and Explore Window in Excel
Creating Formulas, Copying Formulas in Excel
Data Sorting and Using Ranges in Excel
Data Tables and Pivot Tables in Excel
Excel Fill Handle and Excel If Function
Freeze Panes and Conditional Format in Excel
Header and Footer, Page Break and Set Background in Excel
How to add Graphics and Perform Cross-Referencing in Excel
How to Create and Copy Worksheet in Excel
How to Create Worksheets in Excel
How to enter values and move around in Excel
How to Insert Comments and Add Text Box in Excel
How to Open, Close, Delete and Hide Worksheet in Excel
How to Perform Copy & Paste, Find & Replace in Excel
Using Styles, Themes and Templates in Excel
Using Functions and Built-in Functions in Excel
Translate Worksheet and workbook Security in Excel
Simple Charts and Pivot Charts in Excel
Sheet Options, Adjust Margins and Page Orientation in Excel
Printing Worksheets and Email workbooks in Excel
Perform Spell Check, Zoom In-Out and Use Special Symbols in Excel
How to use COUNT, COUNTIF, and COUNTIFS Function and Advanced If in Excel
How to Undo Changes, Setting Cell and Fonts, Text Decoration in Excel
How to Select, Insert, Delete and Move Data in Excel
How to Rotate Cells, Setting Colors and Text Alignment in Excel
eBooks
Interview Questions
Videos
How to Merge & Wrap Cells, Borders and Shades and Apply Formatting in Excel
BackStage View and Explore Window in Excel
Creating Formulas, Copying Formulas in Excel
Data Sorting and Using Ranges in Excel
Data Tables and Pivot Tables in Excel
Excel Fill Handle and Excel If Function
Freeze Panes and Conditional Format in Excel
Header and Footer, Page Break and Set Background in Excel
How to add Graphics and Perform Cross-Referencing in Excel
How to Create and Copy Worksheet in Excel
How to Create Worksheets in Excel
How to enter values and move around in Excel
How to Insert Comments and Add Text Box in Excel
How to Open, Close, Delete and Hide Worksheet in Excel
How to Perform Copy & Paste, Find & Replace in Excel
Using Styles, Themes and Templates in Excel
Using Functions and Built-in Functions in Excel
Translate Worksheet and workbook Security in Excel
Simple Charts and Pivot Charts in Excel
Sheet Options, Adjust Margins and Page Orientation in Excel
Printing Worksheets and Email workbooks in Excel
Perform Spell Check, Zoom In-Out and Use Special Symbols in Excel
How to use COUNT, COUNTIF, and COUNTIFS Function and Advanced If in Excel
How to Undo Changes, Setting Cell and Fonts, Text Decoration in Excel
How to Select, Insert, Delete and Move Data in Excel
How to Rotate Cells, Setting Colors and Text Alignment in Excel
OBIEEE
Articles
Concept of Testing Repository in OBIEE
Understanding Schemas in OBIEE
Overview of Oracle Business Intelligence Edition (OBIEE)
Multiple Logical Table Sources, Calculation Measures and Dimension Hierarchies
Level-Based Measures and Aggregates in OBIEE
Deep Dive into Repositories in OBIEE
eBooks
Interview Questions
Videos
Concept of Testing Repository in OBIEE
Understanding Schemas in OBIEE
Overview of Oracle Business Intelligence Edition (OBIEE)
Multiple Logical Table Sources, Calculation Measures and Dimension Hierarchies
Level-Based Measures and Aggregates in OBIEE
Deep Dive into Repositories in OBIEE
Pentaho
Articles
User interfaces available in Pentaho and their navigation
Overview of Pentaho and How to install Pentaho on your system
How to use the Pentaho Reporting Designer
How to use Grouping in Pentaho
How to use Functions in Reports in Pentaho
How to create Chart Report in Pentaho
eBooks
Interview Questions
Videos
User interfaces available in Pentaho and their navigation
Overview of Pentaho and How to install Pentaho on your system
How to use the Pentaho Reporting Designer
How to use Grouping in Pentaho
How to use Functions in Reports in Pentaho
How to create Chart Report in Pentaho
Power BI
Articles
Visualization Options in Power BI
Power BI Data Sources and How to connect with them
Power BI - Supported Data Sources
Power BI - Comparison with Other BI Tools
Overview of Power BI Embedded, Power BI Gateway and Power BI Report Server
Overview of Business Intelligence (BI) and Power BI
How to use various DAX functions in Power BI
How to Share Power BI Dashboard
How to Integrate Excel in Power BI
How to Download and Install Power BI Desktop
eBooks
Interview Questions
Videos
Visualization Options in Power BI
Power BI Data Sources and How to connect with them
Power BI - Supported Data Sources
Power BI - Comparison with Other BI Tools
Overview of Power BI Embedded, Power BI Gateway and Power BI Report Server
Overview of Business Intelligence (BI) and Power BI
How to use various DAX functions in Power BI
How to Share Power BI Dashboard
How to Integrate Excel in Power BI
How to Download and Install Power BI Desktop
Qlik View
Articles
List Box and Multi Box in QlikView
Navigation Options in QlikView
Overview of Data files (QVD) in QlikView
Processing Web Files in QlikView
Resident Load, Preceding Load and Incremental Load in QlikView
How to create Cross Tables in QlikView
How to create Pie Chart in QlikView
Straight Tables and Pivot Tables in QlikView
Database Connection in QlikView
Dimensions and Measures in QlikView
Usage of Keep Command in QlikView
Using Peek and RangeSum Function in QlikView
Handling Delimited Files in QlikView
Handling Excel Files in QlikView
How to create Bar Chart in QlikView
Using Match and Rank Function in QlikView
Overview of QlikView and How to install QlikView on your machine
Inline Data and Scripting in QlikView
Data Transformation in QlikView
Creating Dashboard in QlikView
Concept of Star Schema and Synthetic Key in QlikView
Concatenation and Master Calendar in QlikView
Column Manipulation in QlikView
eBooks
Interview Questions
Videos
List Box and Multi Box in QlikView
Navigation Options in QlikView
Overview of Data files (QVD) in QlikView
Processing Web Files in QlikView
Resident Load, Preceding Load and Incremental Load in QlikView
How to create Cross Tables in QlikView
How to create Pie Chart in QlikView
Straight Tables and Pivot Tables in QlikView
Database Connection in QlikView
Dimensions and Measures in QlikView
Usage of Keep Command in QlikView
Using Peek and RangeSum Function in QlikView
Handling Delimited Files in QlikView
Handling Excel Files in QlikView
How to create Bar Chart in QlikView
Using Match and Rank Function in QlikView
Overview of QlikView and How to install QlikView on your machine
Inline Data and Scripting in QlikView
Data Transformation in QlikView
Creating Dashboard in QlikView
Concept of Star Schema and Synthetic Key in QlikView
Concatenation and Master Calendar in QlikView
Column Manipulation in QlikView
Circular Reference in QlikView
QLikSense
Articles
Navigating in Qlik Sense Selections
Qlik Sense Conditional Functions
Qlik Sense Counter and Exponential and Logarithmic Functions
Qlik Sense Developer: Roles and Responsibilities
Overview of Gauge Chart in Qlik Sense
Qlik Sense Advantages and Limitations
Qlik Sense Architecture Components
Qlik Sense Capabilities for people, Groups and Organizations
What is Qlik Sense Pivot Table?
Ways of Qlik Sense Collaboration
Qlik Sense Formatting Functions
Qlik Sense distribution and Trigonometric and HyperBolic Functions
Qlik Sense Mapping and Logical Functions
Qlik Sense Financial Functions
Types of Qlik Sense Aggregation Functions
Tableau vs Qlik Sense vs Power BI
Significance of Text and Image in Qlik Sense
Set Analysis and Set Expressions in Qlik Sense
QlikView Vs Qlik Sense: Overview
Using a Scatter Plot in Qlik Sense
Types of Operators in Qlik Sense
Qlik Sense System Requirements
Treemap Visualization in Qlik Sense
Qlik Sense Interpretation Functions
Modulo Functions in Qlik Sense
Key Performance Indicators (KPI) in Qlik Sense
Introduction to Qlik Sense Mashup
How to Manage Content and Resources in Qlik Management Console
How to Interact With Qlik Sense Visualizations?
How to Interact with Qlik Sense interface
How to create Qlik Sense Application
General Numeric Functions in Qlik Sense
Concept of Social Engineering Attacks and Cross-Site Scripting
Components of Qlik Sense Desktop
eBooks
Interview Questions
Videos
Navigating in Qlik Sense Selections
Qlik Sense Conditional Functions
Qlik Sense Counter and Exponential and Logarithmic Functions
Qlik Sense Developer: Roles and Responsibilities
Overview of Gauge Chart in Qlik Sense
Qlik Sense Advantages and Limitations
Qlik Sense Architecture Components
Qlik Sense Capabilities for people, Groups and Organizations
What is Qlik Sense Pivot Table?
Ways of Qlik Sense Collaboration
Qlik Sense Formatting Functions
Qlik Sense distribution and Trigonometric and HyperBolic Functions
Qlik Sense Mapping and Logical Functions
Qlik Sense Financial Functions
Types of Qlik Sense Aggregation Functions
Tableau vs Qlik Sense vs Power BI
Significance of Text and Image in Qlik Sense
Set Analysis and Set Expressions in Qlik Sense
QlikView Vs Qlik Sense: Overview
Using a Scatter Plot in Qlik Sense
Types of Operators in Qlik Sense
Qlik Sense System Requirements
Treemap Visualization in Qlik Sense
Qlik Sense Interpretation Functions
Modulo Functions in Qlik Sense
Key Performance Indicators (KPI) in Qlik Sense
Introduction to Qlik Sense Mashup
How to Manage Content and Resources in Qlik Management Console
How to Interact With Qlik Sense Visualizations?
How to Interact with Qlik Sense interface
How to create Qlik Sense Application
General Numeric Functions in Qlik Sense
Concept of Social Engineering Attacks and Cross-Site Scripting
Components of Qlik Sense Desktop
SSAS
eBooks
Interview Questions
Videos
SSIS
eBooks
Interview Questions
Videos
SSRS
eBooks
Interview Questions
Videos
Tableau Desktop
Articles
How to create Pareto Chart in Tableau
How to create Gantt Chart in Tableau
How to create Dual Axis Chart, Box Plot and Heat Map in Tableau
How to create Crosstab and Motion Chart in Tableau
How to create Bump and Bubble Chart in Tableau
How to create Bar, Line and Pie Chart in Tableau
How to Build Hierarchy and Groups in Tableau
Filter Operations and Extract Filters in Tableau
Different Tools of Tableau and Tableau Architecture
Understanding Tableau Navigation and Data Terminology
Understanding Tableau Desktop Workspace
Data Window, Data Types, Data Aggregation and File Types in Tableau
Top 10 Data Visualization Tools
Tableau Quick and Context Filters
Perform Table Calculations in Tableau
Perform Data Sorting in Tableau
Perform Calculation and Operators and Functions in Tableau
Overview of Tableau and Data Visualization
How to perform Numeric, String and Date Calculations in Tableau
How to Join Data in Tableau using multiple sources
Condition Filters, Data Source and Top Filters in Tableau
Comparison of Tableau and Power BI
How to install Tableau on your system
How to create Waterfall, Bullet and Area Chart in Tableau
eBooks
Interview Questions
Videos
How to create Pareto Chart in Tableau
How to create Gantt Chart in Tableau
How to create Dual Axis Chart, Box Plot and Heat Map in Tableau
How to create Crosstab and Motion Chart in Tableau
How to create Bump and Bubble Chart in Tableau
How to create Bar, Line and Pie Chart in Tableau
How to Build Hierarchy and Groups in Tableau
Filter Operations and Extract Filters in Tableau
Different Tools of Tableau and Tableau Architecture
Understanding Tableau Navigation and Data Terminology
Understanding Tableau Desktop Workspace
Data Window, Data Types, Data Aggregation and File Types in Tableau
Top 10 Data Visualization Tools
Tableau Quick and Context Filters
Perform Table Calculations in Tableau
Perform Data Sorting in Tableau
Perform Calculation and Operators and Functions in Tableau
Overview of Tableau and Data Visualization
How to perform Numeric, String and Date Calculations in Tableau
How to Join Data in Tableau using multiple sources
Condition Filters, Data Source and Top Filters in Tableau
Comparison of Tableau and Power BI
How to install Tableau on your system
How to create Waterfall, Bullet and Area Chart in Tableau
Tableau Server
TIBCO BW
eBooks
Videos
How to Merge & Wrap Cells, Borders and Shades and Apply Formatting in Excel
How to Clone Repository in Git
Navigating in Qlik Sense Selections
Qlik Sense Conditional Functions
Qlik Sense Counter and Exponential and Logarithmic Functions
Qlik Sense Developer: Roles and Responsibilities
Overview of Gauge Chart in Qlik Sense
Qlik Sense Advantages and Limitations
Qlik Sense Architecture Components
Qlik Sense Capabilities for people, Groups and Organizations
What is Qlik Sense Pivot Table?
Ways of Qlik Sense Collaboration
Qlik Sense Formatting Functions
Qlik Sense distribution and Trigonometric and HyperBolic Functions
Qlik Sense Mapping and Logical Functions
Qlik Sense Financial Functions
Types of Qlik Sense Aggregation Functions
Tableau vs Qlik Sense vs Power BI
Significance of Text and Image in Qlik Sense
Set Analysis and Set Expressions in Qlik Sense
QlikView Vs Qlik Sense: Overview
Using a Scatter Plot in Qlik Sense
Types of Operators in Qlik Sense
Qlik Sense System Requirements
List Box and Multi Box in QlikView
Navigation Options in QlikView
Overview of Data files (QVD) in QlikView
Processing Web Files in QlikView
Resident Load, Preceding Load and Incremental Load in QlikView
How to create Cross Tables in QlikView
How to create Pie Chart in QlikView
Straight Tables and Pivot Tables in QlikView
Database Connection in QlikView
Dimensions and Measures in QlikView
Usage of Keep Command in QlikView
Using Peek and RangeSum Function in QlikView
Handling Delimited Files in QlikView
Handling Excel Files in QlikView
How to create Bar Chart in QlikView
How to create Pareto Chart in Tableau
BackStage View and Explore Window in Excel
Creating Formulas, Copying Formulas in Excel
Treemap Visualization in Qlik Sense
Overview of SSRS and its Architecture
Overview of SSIS and why SSIS is required
Introduction to TIBCO Business Works (TIBCO BW)
Overview of Tableau Server and How to install it
Overview of SSAS and its Architecture
Using Match and Rank Function in QlikView
Data Sorting and Using Ranges in Excel
Data Tables and Pivot Tables in Excel
Excel Fill Handle and Excel If Function
Freeze Panes and Conditional Format in Excel
Overview of QlikView and How to install QlikView on your machine
Header and Footer, Page Break and Set Background in Excel
Inline Data and Scripting in QlikView
How to add Graphics and Perform Cross-Referencing in Excel
How to Create and Copy Worksheet in Excel
How to Create Worksheets in Excel
How to enter values and move around in Excel
How to Insert Comments and Add Text Box in Excel
How to Open, Close, Delete and Hide Worksheet in Excel
How to Perform Copy & Paste, Find & Replace in Excel
Qlik Sense Interpretation Functions
Setting Up Distributed Servers in Tableau Server
Concept of Testing Repository in OBIEE
Understanding Schemas in OBIEE
Modulo Functions in Qlik Sense
Key Performance Indicators (KPI) in Qlik Sense
Introduction to Qlik Sense Mashup
Data Transformation in QlikView
Creating Dashboard in QlikView
Concept of Star Schema and Synthetic Key in QlikView
How to Manage Content and Resources in Qlik Management Console
Concatenation and Master Calendar in QlikView
Column Manipulation in QlikView
How to Interact With Qlik Sense Visualizations?
How to Interact with Qlik Sense interface
Circular Reference in QlikView
Aggregate Functions in QlikView
Perform Report Operations in IBM Cognos
Overview of Oracle Business Intelligence Edition (OBIEE)
Introduction to IBM Cognos and its Components and Services
How to open, create, save, run and print report in Cognos
Multiple Logical Table Sources, Calculation Measures and Dimension Hierarchies
How to open, create and save Analysis in Analysis Studio in Cognos
Level-Based Measures and Aggregates in OBIEE
How to create report in Report Studio
Deep Dive into Repositories in OBIEE
Concept of Data Warehouse and Dimension Modelling
How to create List and CrossTab Report in Cognos
How to create a package using Cognos
How to create Qlik Sense Application
General Numeric Functions in Qlik Sense
Concept of Social Engineering Attacks and Cross-Site Scripting
Business and Presentation Layer of OBIEE explained
Filters and Custom Calculations in Cognos
Components of Qlik Sense Desktop
BI Tools for giant Data Visualization
Data Warehouse Schemas, ETL and Reporting Tools
Aggregation Functions in Qlik Sense
How to create Gantt Chart in Tableau
How to create Dual Axis Chart, Box Plot and Heat Map in Tableau
How to create Crosstab and Motion Chart in Tableau
How to create Bump and Bubble Chart in Tableau
How to create Bar, Line and Pie Chart in Tableau
Introduction to Cognos TM1 Perspective
How to Setup TM1 Application Server
How to Build Hierarchy and Groups in Tableau
How to Configure Security in TM1
Concept of Dimensions in Cognos TM1
Filter Operations and Extract Filters in Tableau
Cognos TM1 Installation and Configuration
Different Tools of Tableau and Tableau Architecture
Understanding Tableau Navigation and Data Terminology
Understanding Tableau Desktop Workspace
Data Window, Data Types, Data Aggregation and File Types in Tableau
Top 10 Data Visualization Tools
Tableau Quick and Context Filters
Perform Table Calculations in Tableau
Perform Data Sorting in Tableau
Perform Calculation and Operators and Functions in Tableau
Overview of Tableau and Data Visualization
How to perform Numeric, String and Date Calculations in Tableau
How to Join Data in Tableau using multiple sources
Condition Filters, Data Source and Top Filters in Tableau
Comparison of Tableau and Power BI
How to install Tableau on your system
How to create Waterfall, Bullet and Area Chart in Tableau
How to create Tree Maps and Heat Maps in Tableau
How to create Scatter Plot and Histogram Chart in Tableau
User interfaces available in Pentaho and their navigation
Overview of Pentaho and How to install Pentaho on your system
How to use the Pentaho Reporting Designer
How to use Grouping in Pentaho
How to use Functions in Reports in Pentaho
How to create Chart Report in Pentaho
How to add Page Footer Fields in Pentaho
Formatting Report Elements in Pentaho Reporting Designer
Using Styles, Themes and Templates in Excel
Using Functions and Built-in Functions in Excel
Translate Worksheet and workbook Security in Excel
Simple Charts and Pivot Charts in Excel
Sheet Options, Adjust Margins and Page Orientation in Excel
Printing Worksheets and Email workbooks in Excel
Perform Spell Check, Zoom In-Out and Use Special Symbols in Excel
How to use COUNT, COUNTIF, and COUNTIFS Function and Advanced If in Excel
How to Undo Changes, Setting Cell and Fonts, Text Decoration in Excel
How to Select, Insert, Delete and Move Data in Excel
How to Rotate Cells, Setting Colors and Text Alignment in Excel
How to Perform Data Validation and Data Filtering in Excel
Cognos Studios and other capabilities
Cognos - Relationships in Metadata Model
Visualization Options in Power BI
Power BI Data Sources and How to connect with them
Power BI - Supported Data Sources
Power BI - Comparison with Other BI Tools
Overview of Power BI Embedded, Power BI Gateway and Power BI Report Server
Overview of Business Intelligence (BI) and Power BI
How to use various DAX functions in Power BI
How to Share Power BI Dashboard
How to Integrate Excel in Power BI
How to Download and Install Power BI Desktop
How to create Power BI Dashboard and Reports
Top Qlik Sense Interview Questions and Answers
Top Microsoft BI Interview Questions and Answers
Top TIBCO Spotfire Interview Questions and Answers
Top OBIEE Interview Questions and Answers
Top Tableau Desktop Interview Questions and Answers
Top Tableau Server Interview Questions and Answers
Top Qlik View Interview Questions and Answers
Top TIBCO Business Works Interview Questions and Answers
Top Oracle Hyperion Interview Questions and Answers
Top Power BI Interview Questions and Answers
Top Pentaho Interview Questions and Answers
Top Cognos TM1 Interview Questions and Answers
Top IBM DataStage Interview Questions and Answers
Top IBM Cognos Analytics Interview Questions and Answers
Best Approach for Storing data to AWS DynamoDB and S3 – AWS Implementation
Maintain High Availability in AWS with anticipated Additional Load
Big Data
eBooks
Videos
Aapche Cassandra
Articles
Deep dive into Cassandra Query Language Collections and user defined data types.
Deep dive into Cassandra Shell Commands
How to Create and Alter Tables in Apache Cassandra
How to Create and Drop Indexes in Apache Cassandra
How to create, alter and drop Keyspaces in Cassandra
How to Drop and Truncate Tables in Apache Cassandra
How to set up Both cqlsh and Java environments to work with Cassandra
How to Perform CRUD ( Create , Read , Update and Delete ) Operations in Table in Apache Cassandra
Introduction to Apache Cassandra, History and Architecture
Overview of How Cassandra Stores its data
Overview of important class in Cassandra and introduction of Cassandra query shell language
eBooks
Interview Questions
Videos
Deep dive into Cassandra Query Language Collections and user defined data types.
Deep dive into Cassandra Shell Commands
How to Create and Alter Tables in Apache Cassandra
How to Create and Drop Indexes in Apache Cassandra
How to create, alter and drop Keyspaces in Cassandra
How to Drop and Truncate Tables in Apache Cassandra
How to set up Both cqlsh and Java environments to work with Cassandra
How to Perform CRUD ( Create , Read , Update and Delete ) Operations in Table in Apache Cassandra
Introduction to Apache Cassandra, History and Architecture
Overview of How Cassandra Stores its data
Overview of important class in Cassandra and introduction of Cassandra query shell language
Apache NiFi
Articles
How to Monitor System statistics using Apache NiFi
Concept of Logging in Apache NiFi
Basic Concepts of Apache NiFi and its Installation
Deep Dive into Apache Nifi – Flow Files, Queues, Process Groups and Labels
Deep dive into Apache NiFi-Processors
Detailed understanding of Apache NiFi -Templates
How to Administer Apache NiFi and Create Flows in Apache NiFi
Understanding Apache NiFi API’s with request and response example
Understanding Apache Nifi Processors Categorization and its relationship
Introduction to Apache NiFi, its History, Features and Architecture
eBooks
Interview Questions
Videos
How to Monitor System statistics using Apache NiFi
Concept of Logging in Apache NiFi
Basic Concepts of Apache NiFi and its Installation
Deep Dive into Apache Nifi – Flow Files, Queues, Process Groups and Labels
Deep dive into Apache NiFi-Processors
Detailed understanding of Apache NiFi -Templates
How to Administer Apache NiFi and Create Flows in Apache NiFi
Understanding Apache NiFi API’s with request and response example
Understanding Apache Nifi Processors Categorization and its relationship
Introduction to Apache NiFi, its History, Features and Architecture
Apache Oozie
eBooks
Interview Questions
Videos
Apache Pig
Articles
Explanation of Apache Pig Group and Cogroup Operators
Detailed Study of Architecture of Apache Pig
Deep Dive into Pig Latin Diagnostic Operators
Deep Dive into Apache Pig Functions: Load & Store, Bag & Tuple, String, Date-time, Math
Apache Pig Basics, Features and Comparison with MapReduce, Hive & SQL and History of Apache Pig
Explanation of Shell and Utility Commands provided by Apache Grunt Shell
How to Install Apache Pig and Configure Pig
How to Load data to Apache Pig from Hadoop File System
How to run Apache Pig Scripts in Batch Mode
How to Store data in Apache Pig using Store Operator
How to use Cross Operator and Union Operator in Pig Latin
How to use Split and Filter Operator in Apache Pig Latin
How to use the Join Operators in Pig Latin
How to use Distinct, For Each, Order By, Limit Operators and Eval Functions in Apache Pig
eBooks
Interview Questions
Videos
Explanation of Apache Pig Group and Cogroup Operators
Detailed Study of Architecture of Apache Pig
Deep Dive into Pig Latin Diagnostic Operators
Deep Dive into Apache Pig Functions: Load & Store, Bag & Tuple, String, Date-time, Math
Apache Pig Basics, Features and Comparison with MapReduce, Hive & SQL and History of Apache Pig
Explanation of Shell and Utility Commands provided by Apache Grunt Shell
How to Install Apache Pig and Configure Pig
How to Load data to Apache Pig from Hadoop File System
How to run Apache Pig Scripts in Batch Mode
How to Store data in Apache Pig using Store Operator
How to use Cross Operator and Union Operator in Pig Latin
How to use Split and Filter Operator in Apache Pig Latin
How to use the Join Operators in Pig Latin
How to use Distinct, For Each, Order By, Limit Operators and Eval Functions in Apache Pig
Apache Spark
Articles
Overview of Scala programming language and How to install Scala on your system
How to Install Apache Spark on your system
How to perform pattern matching in Scala and use of Regex expressions
How to use Functions in Scala programming Language
How to use Collections in Scala
How to use Arrays in Scala Programming Language
How to perform Exception Handling in Scala Language
How to Deploy Spark Application on Cluster
Extractor Object in Scala and how to perform pattern matching using extractors
Details of Data Types and Basic Literals in Scala
Detailed understanding of Operators in Scala Language
Deep dive into File Handling in Scala
Deep dive into Advanced programming in Spark
Basics of Scala Programming Language
Concept of String Manipulation in Scala
Conditional statements and Loop control structures in Scala
Concept of Resilient Distributed Datasets (RDD) in Apache Spark
How to use Classes and Objects in Scala programming
Overview of Apache Spark Framework
Spark Core and implementation of RDD transformations and actions in RDD programming
eBooks
Interview Questions
Videos
Overview of Scala programming language and How to install Scala on your system
How to Install Apache Spark on your system
How to perform pattern matching in Scala and use of Regex expressions
How to use Functions in Scala programming Language
How to use Collections in Scala
How to use Arrays in Scala Programming Language
How to perform Exception Handling in Scala Language
How to Deploy Spark Application on Cluster
Extractor Object in Scala and how to perform pattern matching using extractors
Details of Data Types and Basic Literals in Scala
Detailed understanding of Operators in Scala Language
Deep dive into File Handling in Scala
Deep dive into Advanced programming in Spark
Basics of Scala Programming Language
Concept of String Manipulation in Scala
Conditional statements and Loop control structures in Scala
Concept of Resilient Distributed Datasets (RDD) in Apache Spark
How to use Classes and Objects in Scala programming
Overview of Apache Spark Framework
Spark Core and implementation of RDD transformations and actions in RDD programming
Detailed understanding of Scala Access Modifiers
Apache Sqoop
Articles
How to use the Apache Sqoop Eval and Codegen tool
How to list out the databases and tables of a particular database using Sqoop
How to import data and tables from MySQL to Hadoop HDFS
How to export data back from Hadoop HDFS to RDBMS and Create and maintain the Sqoop jobs
Introduction, Installation and Configuration of Apache Sqoop
eBooks
Interview Questions
Videos
How to use the Apache Sqoop Eval and Codegen tool
How to list out the databases and tables of a particular database using Sqoop
How to import data and tables from MySQL to Hadoop HDFS
How to export data back from Hadoop HDFS to RDBMS and Create and maintain the Sqoop jobs
Introduction, Installation and Configuration of Apache Sqoop
Apache Storm
Articles
Application of Apache Storm Framework in Yahoo Finance
Concept of Cluster Architecture in Apache Storm
Introduction to Apache Storm and Core Concepts of Apache Storm
How to Install Apache Storm framework on your machine
How to implement Mobile Call log Analyzer using Apache Storm
How Apache Storm is used in Twitter
eBooks
Interview Questions
Videos
Application of Apache Storm Framework in Yahoo Finance
Concept of Cluster Architecture in Apache Storm
Introduction to Apache Storm and Core Concepts of Apache Storm
How to Install Apache Storm framework on your machine
How to implement Mobile Call log Analyzer using Apache Storm
How Apache Storm is used in Twitter
Detailed understanding of Workflow of Apache Storm
Hadoop and MapReduce
Articles
Concept of Combiners in Hadoop MapReduce
Concept of MapReduce in BigData
Detailed understanding of Hadoop Architecture and Hadoop Distributed File System (HDFS)
Concept of Partitioner in MapReduce and its implementation using example
Deep dive into Hadoop administration
Deep dive into the MapReduce API
Detailed understanding of Hadoop Distributed File System (HDFS)
Phases of MapReduce Data flow and detailed understanding of Mapreduce API
Overview of YARN and its components and benefits of YARN
Overview of Big Data and Hadoop, Big Data technologies
Implementation of Word Count program using Hadoop MapReduce
Operation of MapReduce in Hadoop framework using Java
Implementation of Character Count program using Hadoop MapReduce
How to set up Hadoop Multi-Node Cluster on a distributed environment
How to perform operations in Hadoop and commands used in Hadoop
How to install Hadoop on your system
eBooks
Videos
Concept of Combiners in Hadoop MapReduce
Concept of MapReduce in BigData
Detailed understanding of Hadoop Architecture and Hadoop Distributed File System (HDFS)
Concept of Partitioner in MapReduce and its implementation using example
Deep dive into Hadoop administration
Deep dive into the MapReduce API
Detailed understanding of Hadoop Distributed File System (HDFS)
Phases of MapReduce Data flow and detailed understanding of Mapreduce API
Overview of YARN and its components and benefits of YARN
Overview of Big Data and Hadoop, Big Data technologies
Implementation of Word Count program using Hadoop MapReduce
Operation of MapReduce in Hadoop framework using Java
Implementation of Character Count program using Hadoop MapReduce
How to set up Hadoop Multi-Node Cluster on a distributed environment
How to perform operations in Hadoop and commands used in Hadoop
How to install Hadoop on your system
How to install Hadoop Framework on your system
How the MapReduce Algorithm works using example
HBase
Articles
Deep dive into HBase architecture
Deep dive into Java Client API for HBase and its associated classes
How to create and List Table in HBase shell
How to create data in an HBase table
How to delete data in Table in HBase
How to enable and disable a Table using HBase shell
How to install HBase and configure on your system
How to make changes to an existing Table and describe it in HBase
How to read data from Table in HBase
How to start HBase interactive shell and how HBase general commands works
How to Stop HBase using Java API
How to update data in Table using HBase Shell
How to verify the existence of a Table and How to Drop a Table in HBase
Overview of HBase, its Advantages, Features and history
Deep dive into HBase Scan, Count and Truncate command and how to achieve security in HBase
Deep dive into HBase Scan, Count and Truncate command and how to achieve security in HBase
eBooks
Interview Questions
Videos
Deep dive into HBase architecture
Deep dive into Java Client API for HBase and its associated classes
How to create and List Table in HBase shell
How to create data in an HBase table
How to delete data in Table in HBase
How to enable and disable a Table using HBase shell
How to install HBase and configure on your system
How to make changes to an existing Table and describe it in HBase
How to read data from Table in HBase
How to start HBase interactive shell and how HBase general commands works
How to Stop HBase using Java API
How to update data in Table using HBase Shell
How to verify the existence of a Table and How to Drop a Table in HBase
Overview of HBase, its Advantages, Features and history
Top Apache HBase Interview Questions and Answers
Deep dive into HBase Scan, Count and Truncate command and how to achieve security in HBase
Deep dive into HBase Scan, Count and Truncate command and how to achieve security in HBase
Hive and Impala
Articles
Concept of Partitioning of table in Hive
Detailed understanding of built-in functions available in Hive
Different Data Types in Hive which are involved in creation of table.
How to Alter the attributes of a table and delete a Table in Hive
How to create a table in Hive and how to insert data into it
How to create and drop a database in Hive
How to create and manage Views and Create and Drop an index in Hive
How to install Hive on your system
How to perform Join operations in Hive Query Language (HQL)
How to use the select statement in Hive Query Language
Introduction to Impala, its features, advantages and disadvantages
How to start Impala Shell and the various options of the shell
How to select a database using Command and select database using Hue Browser in Impala
How to perform changes on a given table and how to delete table in Impala
How to fetch the data from one or more tables in a database and fetch description in Impala
How to download, install and set up Impala in your system
How to create a table in the required database in Impala
How to Create, Alter and Drop a View in Impala
Explanation of Union Clause, With Clause and Distinct Operator in Impala
Explanation of Limit Clause and Offset Clause in Impala
Data Types in Impala Query Language
Detailed understanding of Architecture of Impala
Explanation of Order by Clause, Group by Clause and Having Clause in Impala
How to add new records into an existing table in a database using INSERT in Impala
How to create a database in Impala
eBooks
Videos
Concept of Partitioning of table in Hive
Detailed understanding of built-in functions available in Hive
Different Data Types in Hive which are involved in creation of table.
How to Alter the attributes of a table and delete a Table in Hive
How to create a table in Hive and how to insert data into it
How to create and drop a database in Hive
How to create and manage Views and Create and Drop an index in Hive
How to install Hive on your system
How to perform Join operations in Hive Query Language (HQL)
How to use the select statement in Hive Query Language
Introduction to Impala, its features, advantages and disadvantages
How to start Impala Shell and the various options of the shell
How to select a database using Command and select database using Hue Browser in Impala
How to perform changes on a given table and how to delete table in Impala
How to fetch the data from one or more tables in a database and fetch description in Impala
How to download, install and set up Impala in your system
How to create a table in the required database in Impala
How to Create, Alter and Drop a View in Impala
Explanation of Union Clause, With Clause and Distinct Operator in Impala
Explanation of Limit Clause and Offset Clause in Impala
Data Types in Impala Query Language
Detailed understanding of Architecture of Impala
Explanation of Order by Clause, Group by Clause and Having Clause in Impala
How to add new records into an existing table in a database using INSERT in Impala
How to create a database in Impala
How to drop a database in Impala
Top Apache Impala Interview Questions and Answers
Top Apache Hive Interview Questions and Answers
MongoDB
Articles
Advanced Indexing in MongoDB and Limitation of Indexing in MongoDB
Concept of Capped Collections and Auto-Increment Sequence in MongoDB
Concept of Map Reduce in MongoDB
Concept of Relationships and Database References in MongoDB
Concept of Sharding process and How to create a backup in MongoDB
Data Modelling in MongoDB and How to create and Drop database in MongoDB
Deep dive into Covered Queries in MongoDB and Analyzing queries
Deep dive into Replication process in MongoDB
How to Create and Drop a collection using MongoDB
How to Insert, Update, Delete and Query Document in MongoDB Collection
How to Install MongoDB on your system
How to limit records using MongoDB ad use projection in MongoDB
How to Set up MongoDB JDBC driver
How to sort records in MongoDB and concept of Indexing and Aggregation in MongoDB
How to use Regex Expressions and Text Search in MongoDB
MongoDB Administration using RockMongo and concept of GridFS in MongoDB
Overview of MongoDB, its history and purpose of building MongoDB
Understand NoSQL Databases and MongoDB advantages over Relational DBMS
eBooks
Interview Questions
Videos
Advanced Indexing in MongoDB and Limitation of Indexing in MongoDB
Concept of Capped Collections and Auto-Increment Sequence in MongoDB
Concept of Map Reduce in MongoDB
Concept of Relationships and Database References in MongoDB
Concept of Sharding process and How to create a backup in MongoDB
Data Modelling in MongoDB and How to create and Drop database in MongoDB
Deep dive into Covered Queries in MongoDB and Analyzing queries
Deep dive into Replication process in MongoDB
How to Create and Drop a collection using MongoDB
How to Insert, Update, Delete and Query Document in MongoDB Collection
How to Install MongoDB on your system
How to limit records using MongoDB ad use projection in MongoDB
How to Set up MongoDB JDBC driver
How to sort records in MongoDB and concept of Indexing and Aggregation in MongoDB
How to use Regex Expressions and Text Search in MongoDB
MongoDB Administration using RockMongo and concept of GridFS in MongoDB
Overview of MongoDB, its history and purpose of building MongoDB
Understand NoSQL Databases and MongoDB advantages over Relational DBMS
Splunk
Articles
Deep dive into Splunk Search processing Language (SPL)
How to perform Basic Search in Splunk
How to perform searching using fields in Splunk
How to perform Time Range search in Splunk
How to share and export the search result in Splunk
A Deep Dive into Splunk Web Interface
eBooks
Videos
Deep dive into Splunk Search processing Language (SPL)
How to perform Basic Search in Splunk
How to perform searching using fields in Splunk
How to perform Time Range search in Splunk
How to share and export the search result in Splunk
Top Splunk SIEM Interview Questions and Answers
Top Splunk Interview Questions and Answers
A Deep Dive into Splunk Web Interface
Application of Apache Storm Framework in Yahoo Finance
Deep dive into Cassandra Query Language Collections and user defined data types.
Deep dive into Cassandra Shell Commands
How to Create and Alter Tables in Apache Cassandra
How to Create and Drop Indexes in Apache Cassandra
How to create, alter and drop Keyspaces in Cassandra
How to Drop and Truncate Tables in Apache Cassandra
Concept of Combiners in Hadoop MapReduce
Concept of MapReduce in BigData
Detailed understanding of Hadoop Architecture and Hadoop Distributed File System (HDFS)
Concept of Partitioner in MapReduce and its implementation using example
Deep dive into Hadoop administration
Deep dive into the MapReduce API
Detailed understanding of Hadoop Distributed File System (HDFS)
How to set up Both cqlsh and Java environments to work with Cassandra
How to Perform CRUD ( Create , Read , Update and Delete ) Operations in Table in Apache Cassandra
Explanation of Apache Pig Group and Cogroup Operators
Detailed Study of Architecture of Apache Pig
Deep Dive into Pig Latin Diagnostic Operators
Deep Dive into Apache Pig Functions: Load & Store, Bag & Tuple, String, Date-time, Math
Apache Pig Basics, Features and Comparison with MapReduce, Hive & SQL and History of Apache Pig
Phases of MapReduce Data flow and detailed understanding of Mapreduce API
Overview of YARN and its components and benefits of YARN
Overview of Big Data and Hadoop, Big Data technologies
Implementation of Word Count program using Hadoop MapReduce
Operation of MapReduce in Hadoop framework using Java
Implementation of Character Count program using Hadoop MapReduce
How to set up Hadoop Multi-Node Cluster on a distributed environment
How to perform operations in Hadoop and commands used in Hadoop
How to install Hadoop on your system
How to install Hadoop Framework on your system
How the MapReduce Algorithm works using example
Introduction to Apache Cassandra, History and Architecture
Overview of How Cassandra Stores its data
Overview of important class in Cassandra and introduction of Cassandra query shell language
Concept of Partitioning of table in Hive
Detailed understanding of built-in functions available in Hive
Different Data Types in Hive which are involved in creation of table.
How to Alter the attributes of a table and delete a Table in Hive
How to create a table in Hive and how to insert data into it
How to create and drop a database in Hive
How to create and manage Views and Create and Drop an index in Hive
How to install Hive on your system
How to perform Join operations in Hive Query Language (HQL)
How to use the select statement in Hive Query Language
Deep dive into Splunk Search processing Language (SPL)
Explanation of Shell and Utility Commands provided by Apache Grunt Shell
How to perform Basic Search in Splunk
How to Install Apache Pig and Configure Pig
How to perform searching using fields in Splunk
How to Load data to Apache Pig from Hadoop File System
How to perform Time Range search in Splunk
How to run Apache Pig Scripts in Batch Mode
How to Store data in Apache Pig using Store Operator
How to share and export the search result in Splunk
How to use Cross Operator and Union Operator in Pig Latin
How to use Split and Filter Operator in Apache Pig Latin
How to use the Join Operators in Pig Latin
How to use Distinct, For Each, Order By, Limit Operators and Eval Functions in Apache Pig
User Defined Functions in Apache Pig Latin
Advanced Indexing in MongoDB and Limitation of Indexing in MongoDB
Concept of Capped Collections and Auto-Increment Sequence in MongoDB
Concept of Map Reduce in MongoDB
Concept of Relationships and Database References in MongoDB
Concept of Sharding process and How to create a backup in MongoDB
Data Modelling in MongoDB and How to create and Drop database in MongoDB
Deep dive into Covered Queries in MongoDB and Analyzing queries
Deep dive into Replication process in MongoDB
How to Create and Drop a collection using MongoDB
How to Insert, Update, Delete and Query Document in MongoDB Collection
How to Install MongoDB on your system
How to limit records using MongoDB ad use projection in MongoDB
Deep dive into HBase architecture
Deep dive into Java Client API for HBase and its associated classes
How to create and List Table in HBase shell
How to create data in an HBase table
How to delete data in Table in HBase
How to enable and disable a Table using HBase shell
How to Set up MongoDB JDBC driver
How to install HBase and configure on your system
How to make changes to an existing Table and describe it in HBase
How to read data from Table in HBase
How to sort records in MongoDB and concept of Indexing and Aggregation in MongoDB
How to start HBase interactive shell and how HBase general commands works
How to Stop HBase using Java API
How to update data in Table using HBase Shell
How to verify the existence of a Table and How to Drop a Table in HBase
Overview of HBase, its Advantages, Features and history
How to use Regex Expressions and Text Search in MongoDB
Overview of Scala programming language and How to install Scala on your system
MongoDB Administration using RockMongo and concept of GridFS in MongoDB
Overview of MongoDB, its history and purpose of building MongoDB
Understand NoSQL Databases and MongoDB advantages over Relational DBMS
How to Monitor System statistics using Apache NiFi
Concept of Logging in Apache NiFi
How to use the Apache Sqoop Eval and Codegen tool
How to list out the databases and tables of a particular database using Sqoop
How to import data and tables from MySQL to Hadoop HDFS
How to export data back from Hadoop HDFS to RDBMS and Create and maintain the Sqoop jobs
How to Install Apache Spark on your system
How to perform pattern matching in Scala and use of Regex expressions
How to use Functions in Scala programming Language
How to use Collections in Scala
How to use Arrays in Scala Programming Language
How to perform Exception Handling in Scala Language
How to Deploy Spark Application on Cluster
Extractor Object in Scala and how to perform pattern matching using extractors
Details of Data Types and Basic Literals in Scala
Detailed understanding of Operators in Scala Language
Deep dive into File Handling in Scala
Deep dive into Advanced programming in Spark
Basics of Scala Programming Language
Concept of String Manipulation in Scala
Conditional statements and Loop control structures in Scala
Introduction to Impala, its features, advantages and disadvantages
How to start Impala Shell and the various options of the shell
How to select a database using Command and select database using Hue Browser in Impala
How to perform changes on a given table and how to delete table in Impala
How to fetch the data from one or more tables in a database and fetch description in Impala
How to download, install and set up Impala in your system
How to create a table in the required database in Impala
How to Create, Alter and Drop a View in Impala
Explanation of Union Clause, With Clause and Distinct Operator in Impala
Explanation of Limit Clause and Offset Clause in Impala
Data Types in Impala Query Language
Concept of Resilient Distributed Datasets (RDD) in Apache Spark
How to use Classes and Objects in Scala programming
Overview of Apache Spark Framework
Spark Core and implementation of RDD transformations and actions in RDD programming
Introduction, Installation and Configuration of Apache Sqoop
Basic Concepts of Apache NiFi and its Installation
Deep Dive into Apache Nifi – Flow Files, Queues, Process Groups and Labels
Deep dive into Apache NiFi-Processors
Detailed understanding of Apache NiFi -Templates
How to Administer Apache NiFi and Create Flows in Apache NiFi
Understanding Apache NiFi API’s with request and response example
Understanding Apache Nifi Processors Categorization and its relationship
Detailed understanding of Architecture of Impala
Explanation of Order by Clause, Group by Clause and Having Clause in Impala
How to add new records into an existing table in a database using INSERT in Impala
How to create a database in Impala
How to drop a database in Impala
Detailed understanding of Scala Access Modifiers
How to use Variables in Scala with the help of example
Introduction to Apache NiFi, its History, Features and Architecture
Concept of Cluster Architecture in Apache Storm
Introduction to Apache Storm and Core Concepts of Apache Storm
How to Install Apache Storm framework on your machine
How to implement Mobile Call log Analyzer using Apache Storm
How Apache Storm is used in Twitter
Detailed understanding of Workflow of Apache Storm
Deep Dive into Trident – an extension of Apache Storm
Top Splunk SIEM Interview Questions and Answers
Top Big Data Hadoop Interview Questions and Answers
Top MongoDB Interview Questions and Answers
Top Scala Interview Questions and Answers
Top Splunk Interview Questions and Answers
Top Hadoop Administration Interview Questions and Answers
Top Apache Sqoop Interview Questions and Answers
Top Apache NiFi Interview Questions and Answers
Top Apache Impala Interview Questions and Answers
Top Apache HBase Interview Questions and Answers
Top Apache Flume Interview Questions and Answers
Top Apache Spark Interview Questions and Answers
Top Apache Pig Interview Questions and Answers
Top Apache Cassandra Interview Questions and Answers
Top Apache Hive Interview Questions and Answers
Top Apache Oozie Interview Questions and Answers
Top Apache Storm Interview Questions and Answers
Deep dive into HBase Scan, Count and Truncate command and how to achieve security in HBase
Deep Dive into Apache NiFi User Interface
Deep dive into built-in operators of Hive
Concept of Atomic Operations in MongoDB
A Deep Dive into Splunk Web Interface
Deep Dive into Apache Oozie Workflow
How to Configure Oozie Workflow using Property File
Concept of Coordinators applications using Apache Oozie
Basics of Apache Oozie and Oozie Editors
Deep Dive into Oozie Bundle System and CLI & Extensions
Process of Data Ingestion in Splunk Environment
Deep Dive into Apache NiFi User Interface
Deep dive into HBase Scan, Count and Truncate command and how to achieve security in HBase
Basics of Splunk and Installation of Splunk Environment
Top Apache Oozie Interview Questions and Answers You must Prepare Gaurav
Blockchain
Articles
Introduction to Ethereum and Smart Contracts
Ethereum - Interacting with Deployed Contract
Ethereum – Attaching Wallet to Ganache Blockchain
Ethereum - Creating Contract Users
Concept of Blockchain Double Spending and Bitcoin Cash
Bitcoin Forks and SegWit and BlockChain Merkel Tree
Comparison between Blockchain and Database
Basic Components of Bitcoin and Blockchain Proof of Work
Ethereum - Solidity for Contract Writing
Ethereum - Ganache for Blockchain
eBooks
Interview Questions
Videos
BlockChain and Ethereum
Articles
eBooks
Interview Questions
Videos
Introduction to Ethereum and Smart Contracts
Ethereum - Interacting with Deployed Contract
Ethereum – Attaching Wallet to Ganache Blockchain
Ethereum - Creating Contract Users
Concept of Blockchain Double Spending and Bitcoin Cash
Bitcoin Forks and SegWit and BlockChain Merkel Tree
Comparison between Blockchain and Database
Basic Components of Bitcoin and Blockchain Proof of Work
Ethereum - Solidity for Contract Writing
Ethereum - Ganache for Blockchain
Overview and History of Blockchain
Overview of Bitcoin and Key Concepts of Bitcoin
Top BlockChain Interview Questions and Answers
Top Ethereum Interview Questions and Answers
Best Approach for Storing data to AWS DynamoDB and S3 – AWS Implementation
Maintain High Availability in AWS with anticipated Additional Load
Cloud Computing
Articles
eBooks
Interview Questions
Videos
AWS
Articles
How to Use Amazon Machine Learning
How to use Amazon KCL and set up Amazon EMR
How to Set Up Amazon RDS (Relational Database Service)
How to Configure AWS Direct Connect
How to Configure Amazon Simple Storage Service (S3)
How to Configure Amazon Route 53
How AWS CloudFront Delivers the Content
Amazon Elastic Block Storage (EBS) and Storage Gateway
How to use Simple Workflow Service (SWF) and Amazon WorkMail
Understanding of AWS Management Console
How to Set Up AWS Data Pipeline
eBooks
Videos
How to Use Amazon Machine Learning
How to use Amazon KCL and set up Amazon EMR
How to Set Up Amazon RDS (Relational Database Service)
How to Configure AWS Direct Connect
How to Configure Amazon Simple Storage Service (S3)
How to Configure Amazon Route 53
How AWS CloudFront Delivers the Content
Amazon Elastic Block Storage (EBS) and Storage Gateway
How to use Simple Workflow Service (SWF) and Amazon WorkMail
Understanding of AWS Management Console
How to Set Up AWS Data Pipeline
How to Create Amazon Workspaces
Top Azure Developer Interview Questions and Answers
Top Amazon Web Services (AWS) Interview Questions and Answers
Azure
Articles
How to configure Azure Cloud Service
How to configure Azure Load Balancer
How to Configure Azure Storage Security
How to create Azure Mobile App
Overview of Microsoft Azure and Cloud Computing
Creating App Service Plan in Azure Portal
Azure Virtual Machines and Compute Service
Azure Virtual Machine Scale Set and Auto Scaling
Azure Table, Queue and Disk Storage
Azure Storage Monitoring and Resource Tool
Azure Storage Building Blocks and Storage Account
Azure Storage account and Blob service configuration
Azure SQL Managed Instance and SQL Stretch Database
Azure SQL Database and its Configuration
Azure Network Service and Azure Virtual Network
Azure Media Service and Database Service
Azure Backup and Virtual Machine Security
Azure Availability Zones and Sets and VNet Connectivity
Azure App Service Monitoring and Azure CDN
eBooks
Interview Questions
Videos
How to configure Azure Cloud Service
How to configure Azure Load Balancer
How to Configure Azure Storage Security
How to create Azure Mobile App
Overview of Microsoft Azure and Cloud Computing
Creating App Service Plan in Azure Portal
Azure Virtual Machines and Compute Service
Azure Virtual Machine Scale Set and Auto Scaling
Azure Table, Queue and Disk Storage
Azure Storage Monitoring and Resource Tool
Azure Storage Building Blocks and Storage Account
Azure Storage account and Blob service configuration
Azure SQL Managed Instance and SQL Stretch Database
Azure SQL Database and its Configuration
Azure Network Service and Azure Virtual Network
Azure Media Service and Database Service
Azure Backup and Virtual Machine Security
Azure Availability Zones and Sets and VNet Connectivity
Azure App Service Monitoring and Azure CDN
Azure App Service Backup and Security
How to Use Amazon Machine Learning
How to use Amazon KCL and set up Amazon EMR
How to Set Up Amazon RDS (Relational Database Service)
How to Configure AWS Direct Connect
How to Configure Amazon Simple Storage Service (S3)
How to Configure Amazon Route 53
How AWS CloudFront Delivers the Content
Amazon Elastic Block Storage (EBS) and Storage Gateway
How to use Simple Workflow Service (SWF) and Amazon WorkMail
Understanding of AWS Management Console
How to configure Azure Cloud Service
How to configure Azure Load Balancer
How to Configure Azure Storage Security
How to create Azure Mobile App
Overview of Microsoft Azure and Cloud Computing
Creating App Service Plan in Azure Portal
Azure Virtual Machines and Compute Service
Azure Virtual Machine Scale Set and Auto Scaling
Azure Table, Queue and Disk Storage
Azure Storage Monitoring and Resource Tool
Azure Storage Building Blocks and Storage Account
Azure Storage account and Blob service configuration
How to Set Up AWS Data Pipeline
How to Create Amazon Workspaces
Azure SQL Managed Instance and SQL Stretch Database
Azure SQL Database and its Configuration
Azure Network Service and Azure Virtual Network
Azure Media Service and Database Service
Azure Backup and Virtual Machine Security
Azure Availability Zones and Sets and VNet Connectivity
Azure App Service Monitoring and Azure CDN
Azure App Service Backup and Security
Azure API Apps and API Management
Top Azure Developer Interview Questions and Answers
Top Azure Architect Interview Questions and Answers
Top Amazon Web Services (AWS) Interview Questions and Answers
Best Approach for Storing data to AWS DynamoDB and S3 – AWS Implementation
Migration of 3-tier e-commerce web application using Amazon web Services (AWS)
Cyber Security
Articles
eBooks
Interview Questions
Videos
Ethical Hacking
Articles
Concept of Enumeration in Ethical Hacking
Concept of Exploitation in Ethical Hacking
Concept of Social Engineering Attacks and Cross-Site Scripting
Concept of SQL Injection Attack
Concept of TCP/IP Hijacking and Trojan Attacks
DDOS Attacks in Ethical Hacking
Ethical Hacking - Fingerprinting
Ethical Hacking - Footprinting
Processes in Ethical Hacking and Reconnaissance
eBooks
Interview Questions
Videos
Concept of Enumeration in Ethical Hacking
Concept of Exploitation in Ethical Hacking
Concept of Social Engineering Attacks and Cross-Site Scripting
Concept of SQL Injection Attack
Concept of TCP/IP Hijacking and Trojan Attacks
DDOS Attacks in Ethical Hacking
Ethical Hacking - Fingerprinting
Ethical Hacking - Footprinting
Processes in Ethical Hacking and Reconnaissance
Concept of Enumeration in Ethical Hacking
Concept of Exploitation in Ethical Hacking
Concept of Social Engineering Attacks and Cross-Site Scripting
Concept of SQL Injection Attack
Concept of TCP/IP Hijacking and Trojan Attacks
DDOS Attacks in Ethical Hacking
Ethical Hacking - Fingerprinting
Ethical Hacking - Footprinting
Processes in Ethical Hacking and Reconnaissance
Data Science
Articles
Regression Analysis in Machine learning
Regression vs Classification in Machine Learning
Simple Linear Regression in Machine Learning
Naïve Bayes Classifier Algorithm
Support Vector Machine Algorithm
Logistic Regression in Machine Learning
Linear Regression in Machine Learning
K-Nearest Neighbor (KNN) Algorithm for Machine Learning
eBooks
Interview Questions
Videos
Machine Learning
Python with Data Science
Articles
Processing JSON Data in Python and Matplotlib
Processing Unstructured Data and rectilinear regression and Chi-Square Test in Python
P-Value and Correlation in Python
Python - Data Science Introduction
Relational Databases in Python
Perform Data Cleansing in Python
Performing Data Wrangling in Python
Introduction to Pandas, NumPy and SciPy Libraries
How to Read HTML Pages in Python
How to interact with MongoDB in Python
Box Plots and Scatter Plots and Heat Maps in Python
Bubble Charts and 3D Charts in Python
Data Aggregation and binomial distribution in Python
How to create Geographical Maps and Graphs in Python
Measuring Central Tendency and Variance in Python
eBooks
Interview Questions
Videos
Processing JSON Data in Python and Matplotlib
Processing Unstructured Data and rectilinear regression and Chi-Square Test in Python
P-Value and Correlation in Python
Python - Data Science Introduction
Relational Databases in Python
Perform Data Cleansing in Python
Performing Data Wrangling in Python
Introduction to Pandas, NumPy and SciPy Libraries
How to Read HTML Pages in Python
How to interact with MongoDB in Python
Box Plots and Scatter Plots and Heat Maps in Python
Bubble Charts and 3D Charts in Python
Data Aggregation and binomial distribution in Python
How to create Geographical Maps and Graphs in Python
Measuring Central Tendency and Variance in Python
Normal, Binomial and Poisson distribution in Python
R Language
Articles
Arrays and Factors in R Language
Binomial Distribution and Poisson Regression in R
Analysis of Covariance in R Language
Decision making and Loops in R Language
Handling Excel and Binary Files in R
Handling XML Files in R Language
How to create Line Graphs in R
How to create Scatterplots in R
How to create Histograms and Box Plots in R
Random Forest and Survival Analysis in R
Operators and Variables in R Language
Normal Distribution in R Language
Multiple and Logistic Regression in R
eBooks
Interview Questions
Videos
Arrays and Factors in R Language
Binomial Distribution and Poisson Regression in R
Analysis of Covariance in R Language
Decision making and Loops in R Language
Handling Excel and Binary Files in R
Handling XML Files in R Language
How to create Line Graphs in R
How to create Scatterplots in R
How to create Histograms and Box Plots in R
Random Forest and Survival Analysis in R
Operators and Variables in R Language
Normal Distribution in R Language
Multiple and Logistic Regression in R
SAS
Articles
One Way Anova and Hypothesis Testing
Overview of SAS and its Features
SAS - Basic Syntax and Program Structure
How to Perform Standard Deviation in SAS
How to perform Correlation Analysis in SAS
How to perform Bland Altman Analysis
SAS Applications and Loops and Decision Making
SAS Intelligence Platform Architecture
Strings Manipulation and Arrays in SAS
How to Format Data Sets in SAS
How to create Scatter Plots in SAS
How to create Pie Charts in SAS
How to create Histogram and Simulations in SAS
How to create Box Plots in SAS
How to Create Bar Charts in SAS
How to Concatenate Data Sets in SAS
How to calculate Arithmetic Mean and Handling Data and Time
Frequency Distributions and Cross Tabulations in SAS
eBooks
Interview Questions
Videos
One Way Anova and Hypothesis Testing
Overview of SAS and its Features
SAS - Basic Syntax and Program Structure
How to Perform Standard Deviation in SAS
How to perform Correlation Analysis in SAS
How to perform Bland Altman Analysis
SAS Applications and Loops and Decision Making
SAS Intelligence Platform Architecture
Strings Manipulation and Arrays in SAS
How to Format Data Sets in SAS
How to create Scatter Plots in SAS
How to create Pie Charts in SAS
How to create Histogram and Simulations in SAS
How to create Box Plots in SAS
How to Create Bar Charts in SAS
How to Concatenate Data Sets in SAS
How to calculate Arithmetic Mean and Handling Data and Time
Frequency Distributions and Cross Tabulations in SAS
Fishers Exact Tests and Repeated Measure Analysis in SAS
Regression Analysis in Machine learning
Regression vs Classification in Machine Learning
Simple Linear Regression in Machine Learning
Naïve Bayes Classifier Algorithm
Support Vector Machine Algorithm
Logistic Regression in Machine Learning
Linear Regression in Machine Learning
K-Nearest Neighbor (KNN) Algorithm for Machine Learning
Difference between Supervised and Unsupervised Learning
Classification Algorithm in Machine Learning
How to get datasets for Machine Learning
Processing JSON Data in Python and Matplotlib
Processing Unstructured Data and rectilinear regression and Chi-Square Test in Python
P-Value and Correlation in Python
Python - Data Science Introduction
Relational Databases in Python
One Way Anova and Hypothesis Testing
Overview of SAS and its Features
SAS - Basic Syntax and Program Structure
How to Perform Standard Deviation in SAS
How to perform Correlation Analysis in SAS
How to perform Bland Altman Analysis
SAS Applications and Loops and Decision Making
SAS Intelligence Platform Architecture
Strings Manipulation and Arrays in SAS
Perform Data Cleansing in Python
Performing Data Wrangling in Python
Introduction to Pandas, NumPy and SciPy Libraries
How to Read HTML Pages in Python
How to interact with MongoDB in Python
Box Plots and Scatter Plots and Heat Maps in Python
Arrays and Factors in R Language
Binomial Distribution and Poisson Regression in R
Bubble Charts and 3D Charts in Python
Analysis of Covariance in R Language
Difference between Artificial intelligence and Machine learning
Data Preprocessing in Machine learning
Data Aggregation and binomial distribution in Python
How to create Geographical Maps and Graphs in Python
Measuring Central Tendency and Variance in Python
Introduction to Machine Learning
Normal, Binomial and Poisson distribution in Python
Installing Anaconda and Python
Applications of Machine learning
Handle Date and Time in Python
Decision making and Loops in R Language
Handling Excel and Binary Files in R
Handling XML Files in R Language
How to create Line Graphs in R
How to create Scatterplots in R
How to create Histograms and Box Plots in R
How to Format Data Sets in SAS
How to create Scatter Plots in SAS
How to create Pie Charts in SAS
How to create Histogram and Simulations in SAS
How to create Box Plots in SAS
How to Create Bar Charts in SAS
How to Concatenate Data Sets in SAS
How to calculate Arithmetic Mean and Handling Data and Time
Frequency Distributions and Cross Tabulations in SAS
Fishers Exact Tests and Repeated Measure Analysis in SAS
Advantages and Disadvantages of SAS Programming Language
Random Forest and Survival Analysis in R
Operators and Variables in R Language
Normal Distribution in R Language
Multiple and Logistic Regression in R
Linear Regression in R Language
Top Data Science Interview Questions and Answers
Top Machine Learning Interview Questions and Answers
Top SAS Interview Questions and Answers
Top Python Interview Questions and Answers
Data Warehousing and ETL
Articles
eBooks
Interview Questions
Videos
ETL Testing
eBooks
Interview Questions
Videos
Informatica
Articles
Aggregator Transformation in Informatica
Concept of Informatica (Big Data Management) BDM
Informatica Master Data Management (MDM) Process
Lookup and Normalizer Transformation in Informatica
Performance Tuning and Partitioning in Informatica
Rank Transformation in Informatica
Router and Joiner Transformation in Informatica
Source Qualifier Transformation in Informatica
Transaction Control Transformation in Informatica
Sequence Generator Transformation in Informatica
eBooks
Interview Questions
Videos
Aggregator Transformation in Informatica
Concept of Informatica (Big Data Management) BDM
Informatica Master Data Management (MDM) Process
Lookup and Normalizer Transformation in Informatica
Performance Tuning and Partitioning in Informatica
Rank Transformation in Informatica
Router and Joiner Transformation in Informatica
Source Qualifier Transformation in Informatica
Transaction Control Transformation in Informatica
Sequence Generator Transformation in Informatica
Concept of Informatica IDQ (Informatica Data Quality)
Concept of ETL Pipeline and Files
Overview of ELT Testing and its Architecture
Aggregator Transformation in Informatica
Concept of Informatica (Big Data Management) BDM
Informatica Master Data Management (MDM) Process
Lookup and Normalizer Transformation in Informatica
Performance Tuning and Partitioning in Informatica
Rank Transformation in Informatica
Router and Joiner Transformation in Informatica
Source Qualifier Transformation in Informatica
Transaction Control Transformation in Informatica
Sequence Generator Transformation in Informatica
Concept of Informatica IDQ (Informatica Data Quality)
Installation of Informatica PowerCenter
Comparison between ETL and ELT
Detailed understanding of ETL (Extraction, Transformation and Loading) Testing
Databases
Articles
eBooks
Interview Questions
Videos
MS-SQL Server
Articles
Backup and restore a database in SQL Server
Concept of Primary Key in SQL Server
CRUD Operations of Data in MS SQL Server
How to Enable, Disable and Drop a Foreign Key
Popular Functions in MS SQL Server
SQL Server BETWEEN Condition (Operator)
SQL Server Comparison Operator
Create and Delete Table in MS SQL Server
SQL Server DISTINCT and GROUP BY Clause
eBooks
Interview Questions
Videos
Backup and restore a database in SQL Server
Concept of Primary Key in SQL Server
CRUD Operations of Data in MS SQL Server
How to Enable, Disable and Drop a Foreign Key
Popular Functions in MS SQL Server
SQL Server BETWEEN Condition (Operator)
SQL Server Comparison Operator
Create and Delete Table in MS SQL Server
SQL Server DISTINCT and GROUP BY Clause
Oracle DBA
Overview of Oracle Tablespace Group
Overview of Oracle Database and its Architecture
Oracle ALTER USER and DROP USER
Introduction to Oracle Data Pump Import and Export tool
Introduction to Oracle CREATE USER statement
How to use the Oracle STARTUP command to start out an Oracle Database instance
How to shut down the Oracle Database
How to Manage Tablespaces in Oracle
How To List Users within the Oracle Database
How to Grant SELECT Object Privilege on One or More Tables to a User and Unlock a User in Oracle
How to Grant All Privileges to a User in Oracle
How to Grant All Privileges to a User in Oracle
How to Create User Profiles in Oracle
How to Create Oracle Database Links
How to Alter and Drop Roles in Oracle
Oracle PL-SQL
eBooks
Interview Questions
Videos
Date and Time Handling in PL-SQL
Constants and Literals and Operators in PL-SQL
Conditions and Loops in PL-SQL
Backup and restore a database in SQL Server
Concept of Primary Key in SQL Server
CRUD Operations of Data in MS SQL Server
How to Enable, Disable and Drop a Foreign Key
Popular Functions in MS SQL Server
SQL Server BETWEEN Condition (Operator)
SQL Server Comparison Operator
Create and Delete Table in MS SQL Server
SQL Server DISTINCT and GROUP BY Clause
SQL Server NOT Condition (Operator)
Overview of Oracle Tablespace Group
Overview of Oracle Database and its Architecture
Oracle ALTER USER and DROP USER
Introduction to Oracle Data Pump Import and Export tool
Introduction to Oracle CREATE USER statement
How to use the Oracle STARTUP command to start out an Oracle Database instance
How to shut down the Oracle Database
How to Manage Tablespaces in Oracle
How To List Users within the Oracle Database
How to Grant SELECT Object Privilege on One or More Tables to a User and Unlock a User in Oracle
How to Grant All Privileges to a User in Oracle
How to Grant All Privileges to a User in Oracle
How to Create User Profiles in Oracle
How to Create Oracle Database Links
How to Alter and Drop Roles in Oracle
How to Alter and Drop Oracle Database Link
Introduction to PL-SQL and Environment setup
Top Oracle PL-SQL Interview Questions and Answers
Top MS-SQL Server Interview Questions and Answers
Best Approach for Storing data to AWS DynamoDB and S3 – AWS Implementation
Maintain High Availability in AWS with anticipated Additional Load
DevOps
eBooks
Interview Questions
Videos
Ansible
Articles
Overview of YAML and Ad-hoc commands in Ansible
A Detailed comparison of Ansible and Puppet
A Detailed comparison of Ansible Vs Chef
Detailed understanding of concept of Playbooks in Ansible
Deep dive into Pip module in Ansible
How to perform troubleshooting in Ansible
How to use variables in playbooks in Ansible and concept of exception handling
Overview of Ansible, its History and How to set-up Ansible on your machine
eBooks
Interview Questions
Videos
Overview of YAML and Ad-hoc commands in Ansible
A Detailed comparison of Ansible and Puppet
A Detailed comparison of Ansible Vs Chef
Detailed understanding of concept of Playbooks in Ansible
Deep dive into Pip module in Ansible
How to perform troubleshooting in Ansible
How to use variables in playbooks in Ansible and concept of exception handling
Overview of Ansible, its History and How to set-up Ansible on your machine
Chef
Articles
Chef-Client as Daemon and Chef-Shell
Concept of Libraries , Definition and setting environment variable
Concept of Lightweight Resource Provider and Blueprints in Chef
Concept of Templates and Dynamically Configuring Recipes
Dealing with Files and Software packages and Community Cookbooks
Execute Cookbook on Node and run Chef-Client
Detailed understanding of Resources in Chef
How to Set up Chef on your system
How to set up Test Kitchen Workflow
How to write Cross-Platform Cookbooks
Overview of Chef and its Architecture
Plain Ruby with Chef DSL and Ruby Gems with Recipes
Testing Cookbook with Test Kitchen
Roles in Chef and perform environment specific configuration
eBooks
Interview Questions
Videos
Chef-Client as Daemon and Chef-Shell
Concept of Libraries , Definition and setting environment variable
Concept of Lightweight Resource Provider and Blueprints in Chef
Concept of Templates and Dynamically Configuring Recipes
Dealing with Files and Software packages and Community Cookbooks
Execute Cookbook on Node and run Chef-Client
Detailed understanding of Resources in Chef
How to Set up Chef on your system
How to set up Test Kitchen Workflow
How to write Cross-Platform Cookbooks
Overview of Chef and its Architecture
Plain Ruby with Chef DSL and Ruby Gems with Recipes
Testing Cookbook with Test Kitchen
Roles in Chef and perform environment specific configuration
Docker
Articles
Concept of Docker Cloud Service
Deep dive into Docker Architecture
Concept of public repositories in Docker
Concept of Container Linking and Storage in Docker
Building a web Server Docker File
Working with Docker Toolbox and how to use the Jenkins Docker image from Docker Hub
Overview of Docker and its features
Managing ports and private registries in Docker
Instruction commands in Docker
How to work with Containers in Docker
How to Set-up MongoDB in Docker
How to set up Node.js in Docker
How to set up Kubernetes in Docker
How to set up ASP.net in Docker
How to perform Continuous integration using Jenkins in Docker
How to install Docker on Windows
eBooks
Interview Questions
Videos
Concept of Docker Cloud Service
Deep dive into Docker Architecture
Concept of public repositories in Docker
Concept of Container Linking and Storage in Docker
Building a web Server Docker File
Working with Docker Toolbox and how to use the Jenkins Docker image from Docker Hub
Overview of Docker and its features
Managing ports and private registries in Docker
Instruction commands in Docker
How to work with Containers in Docker
How to Set-up MongoDB in Docker
How to set up Node.js in Docker
How to set up Kubernetes in Docker
How to set up ASP.net in Docker
How to perform Continuous integration using Jenkins in Docker
How to install Docker on Windows
How to install docker on Linux
Git and GitHub
eBooks
Interview Questions
Videos
Concept of Git Index and Git Head
Comparison of Git with SVN and Mercurial
Deep dive into Git Branching Model
Git Repository and How to Fork Repository
Git Terminology and General Tools
How to Clone Repository in Git
Working with Remote Repository
Version Control System and its Types
Overview of GitHub and Comparison of Git and GitHub
Overview of Git and its features
Merging Branches and Resolve conflicts in Git
How to use Git via the command line
How to switch branches without committing the current branch in Git
How to perform Rebasing in Git
How to Install Git on Linux (Ubuntu) and Mac
Jenkins
Articles
Deep dive into Metrics and Trends for builds
Server maintenance and Plugins Management in Jenkins
Perform Continuous Deployment using Jenkins
Overview of Jenkins, its History and Architecture
How to take Back-up in Jenkins using Backup plugin
How to set up Git and Maven Plugin in Jenkins
How to set up Distributed build and Automated deployment in Jenkins
How to set up Build jobs in Jenkins
How to run Remote tests using Jenkins
How to perform Notification, Reporting and Code Analysis
How to perform Junit Testing in Jenkins
How to perform Automation Testing in Jenkins
How to install Jenkins on your system
Comparison of Jenkins with Ansible and Hudson Frameworks
Comparison of Jenkins with Bamboo and TeamCity
eBooks
Interview Questions
Videos
Deep dive into Metrics and Trends for builds
Server maintenance and Plugins Management in Jenkins
Perform Continuous Deployment using Jenkins
Overview of Jenkins, its History and Architecture
How to take Back-up in Jenkins using Backup plugin
How to set up Git and Maven Plugin in Jenkins
How to set up Distributed build and Automated deployment in Jenkins
How to set up Build jobs in Jenkins
How to run Remote tests using Jenkins
How to perform Notification, Reporting and Code Analysis
How to perform Junit Testing in Jenkins
How to perform Automation Testing in Jenkins
How to install Jenkins on your system
Comparison of Jenkins with Ansible and Hudson Frameworks
Comparison of Jenkins with Bamboo and TeamCity
Comparison of Jenkins with GoCD and Maven Tools
Kubernetes
Articles
eBooks
Interview Questions
Videos
How to setup Kubernetes on your machine
How to Set up Kubernetes Dashboard
How to manage Deployments and Concept of Kubernetes Volume
How to achieve Autoscaling in Kubernetes cluster
Deep dive into Kubectl command line utility
Create an Application for Kubernetes deployment
Concept of Secrets, Network Policy and Kubernetes API
Concept of Replication Controller and Replica Sets
Concept of Node, Service and Pod in Kubernetes
Concept of Images and creating a Job in Kubernetes
Namespace, Labels and Selectors in Kubernetes
Overview of Kubernetes and its Architecture and components
Maven
Articles
Introduction to Maven and How to Set up Maven Environment
How to manage Maven Project in NetBeans and IntelliJ IDEA
How to manage a web-based project using Maven
How to import Maven Project in Eclipse IDE
How to create documentation of Application in Maven
How to automate the Deployment process in Maven
Deep dive into Build Automation
Creating Java Project in Maven
Concept of Project Object Model (POM) in Maven
Concept of Maven Repositories and Plugins in Maven
eBooks
Interview Questions
Videos
Introduction to Maven and How to Set up Maven Environment
How to manage Maven Project in NetBeans and IntelliJ IDEA
How to manage a web-based project using Maven
How to import Maven Project in Eclipse IDE
How to create documentation of Application in Maven
How to automate the Deployment process in Maven
Deep dive into Build Automation
Creating Java Project in Maven
Concept of Project Object Model (POM) in Maven
Concept of Maven Repositories and Plugins in Maven
Nagios
Articles
Look into Nagios Features, applications, Hosts and services and Commands
Overview of Nagios, its architecture and Nagios products
Ports and protocols and Add-ons and plugins in Nagios
Detailed understanding of Checks and States in Nagios
How to run Nagios plugins on other machines remotely using NRPE
eBooks
Interview Questions
Videos
Look into Nagios Features, applications, Hosts and services and Commands
Overview of Nagios, its architecture and Nagios products
Ports and protocols and Add-ons and plugins in Nagios
Detailed understanding of Checks and States in Nagios
How to run Nagios plugins on other machines remotely using NRPE
Puppet
Articles
Implementation of Live working demo project in Puppet
How to Set-up and configure Puppet Master
How to install and configure r10k tool and validate puppet setup
How to install and configure puppet on your machine
How to define Functions and Custom functions in Puppet
Concept of Templating in Puppet
Concept of Type and Provider in Puppet
How to create custom environment in Puppet
Detailed understanding of architecture of puppet and its components and application of puppet
Detail understanding of environment conf file in puppet
Deep Dive into Resources in Puppet
Concept of Resource Abstraction Layer (RAL) in Puppet
Concept of File Server in Puppet
Concept of Facter and Facts in Puppet
Understanding Puppet Manifest files and How to write a manifest file in Puppet
Overview of Puppet and its components and concept of configuration management
How to Set-up Puppet agent and How to sign and check for SSL Ceritficate
eBooks
Interview Questions
Videos
How to use RESTful APIs in Puppet
Implementation of Live working demo project in Puppet
How to Set-up and configure Puppet Master
How to install and configure r10k tool and validate puppet setup
How to install and configure puppet on your machine
How to define Functions and Custom functions in Puppet
Concept of Templating in Puppet
Concept of Type and Provider in Puppet
How to create custom environment in Puppet
Detailed understanding of architecture of puppet and its components and application of puppet
Detail understanding of environment conf file in puppet
Deep Dive into Resources in Puppet
Concept of Resource Abstraction Layer (RAL) in Puppet
Concept of File Server in Puppet
Concept of Facter and Facts in Puppet
Understanding Puppet Manifest files and How to write a manifest file in Puppet
Overview of Puppet and its components and concept of configuration management
How to Set-up Puppet agent and How to sign and check for SSL Ceritficate
Concept of Git Index and Git Head
Comparison of Git with SVN and Mercurial
Deep dive into Git Branching Model
Git Repository and How to Fork Repository
Git Terminology and General Tools
How to Clone Repository in Git
Deep dive into Metrics and Trends for builds
Concept of Docker Cloud Service
Server maintenance and Plugins Management in Jenkins
How to use RESTful APIs in Puppet
Implementation of Live working demo project in Puppet
Perform Continuous Deployment using Jenkins
Overview of Jenkins, its History and Architecture
How to take Back-up in Jenkins using Backup plugin
How to set up Git and Maven Plugin in Jenkins
How to set up Distributed build and Automated deployment in Jenkins
How to set up Build jobs in Jenkins
How to run Remote tests using Jenkins
How to perform Notification, Reporting and Code Analysis
How to perform Junit Testing in Jenkins
How to perform Automation Testing in Jenkins
How to install Jenkins on your system
Deep dive into Docker Architecture
Concept of public repositories in Docker
Concept of Container Linking and Storage in Docker
Building a web Server Docker File
Comparison of Jenkins with Ansible and Hudson Frameworks
How to setup Kubernetes on your machine
How to Set up Kubernetes Dashboard
How to manage Deployments and Concept of Kubernetes Volume
How to achieve Autoscaling in Kubernetes cluster
Deep dive into Kubectl command line utility
Create an Application for Kubernetes deployment
Concept of Secrets, Network Policy and Kubernetes API
Concept of Replication Controller and Replica Sets
Concept of Node, Service and Pod in Kubernetes
Concept of Images and creating a Job in Kubernetes
How to Set-up and configure Puppet Master
How to install and configure r10k tool and validate puppet setup
How to install and configure puppet on your machine
How to define Functions and Custom functions in Puppet
Concept of Templating in Puppet
Concept of Type and Provider in Puppet
How to create custom environment in Puppet
Detailed understanding of architecture of puppet and its components and application of puppet
Detail understanding of environment conf file in puppet
Deep Dive into Resources in Puppet
Concept of Resource Abstraction Layer (RAL) in Puppet
Concept of File Server in Puppet
Concept of Facter and Facts in Puppet
Working with Docker Toolbox and how to use the Jenkins Docker image from Docker Hub
Overview of Docker and its features
Managing ports and private registries in Docker
Instruction commands in Docker
How to work with Containers in Docker
How to Set-up MongoDB in Docker
How to set up Node.js in Docker
How to set up Kubernetes in Docker
How to set up ASP.net in Docker
How to perform Continuous integration using Jenkins in Docker
How to install Docker on Windows
How to install docker on Linux
Namespace, Labels and Selectors in Kubernetes
Comparison of Jenkins with Bamboo and TeamCity
Comparison of Jenkins with GoCD and Maven Tools
Comparison of Jenkins with Travis CI and Circle CI
Working with Remote Repository
Version Control System and its Types
Overview of GitHub and Comparison of Git and GitHub
Overview of Git and its features
Merging Branches and Resolve conflicts in Git
How to use Git via the command line
How to switch branches without committing the current branch in Git
How to perform Rebasing in Git
How to Install Git on Linux (Ubuntu) and Mac
How to create a new Blank Repository and commit code in it
Overview of Kubernetes and its Architecture and components
Monitor processes in Kubernetes
Introduction to Maven and How to Set up Maven Environment
How to manage Maven Project in NetBeans and IntelliJ IDEA
How to manage a web-based project using Maven
How to import Maven Project in Eclipse IDE
How to create documentation of Application in Maven
How to automate the Deployment process in Maven
Deep dive into Build Automation
Creating Java Project in Maven
Concept of Project Object Model (POM) in Maven
Concept of Maven Repositories and Plugins in Maven
Concept of Dependency Management in Maven
Understanding Puppet Manifest files and How to write a manifest file in Puppet
Overview of Puppet and its components and concept of configuration management
Look into Nagios Features, applications, Hosts and services and Commands
Overview of Nagios, its architecture and Nagios products
Ports and protocols and Add-ons and plugins in Nagios
Detailed understanding of Checks and States in Nagios
How to run Nagios plugins on other machines remotely using NRPE
Chef-Client as Daemon and Chef-Shell
Concept of Libraries , Definition and setting environment variable
Concept of Lightweight Resource Provider and Blueprints in Chef
Concept of Templates and Dynamically Configuring Recipes
Dealing with Files and Software packages and Community Cookbooks
Execute Cookbook on Node and run Chef-Client
Detailed understanding of Resources in Chef
How to Set up Chef on your system
How to set up Test Kitchen Workflow
How to write Cross-Platform Cookbooks
Overview of Chef and its Architecture
Plain Ruby with Chef DSL and Ruby Gems with Recipes
Testing Cookbook with Test Kitchen
How to Set-up Puppet agent and How to sign and check for SSL Ceritficate
Roles in Chef and perform environment specific configuration
Overview of YAML and Ad-hoc commands in Ansible
A Detailed comparison of Ansible and Puppet
A Detailed comparison of Ansible Vs Chef
Detailed understanding of concept of Playbooks in Ansible
Deep dive into Pip module in Ansible
How to perform troubleshooting in Ansible
How to use variables in playbooks in Ansible and concept of exception handling
Overview of Ansible, its History and How to set-up Ansible on your machine
Concept of Advanced Execution with Ansible
Popular DevOps and DevOps Automation Tools
Comparison between DevOps and Agile methodologies
Concept of DevOps Pipeline and Who are DevOps Engineers
Overview of DevOps and its Architecture
DevOps Training Certification and Azure and AWS DevOps
How to set-up Nagios on Ubuntu
Top Docker Interview Questions and Answers
Top Ansible Interview Questions and Answers
Top Chef Interview Questions and Answers
Top Git and GitHub Interview Questions and Answers
Top DevOps Interview Questions and Answers
Top Puppet Interview Questions and Answers
Top Nagios Interview Questions and Answers
Top Kubernetes Interview Questions and Answers
Digital Marketing
Articles
Understanding Mobile marketing
Understanding Google Analytics
Online Marketing - Web Analytics
Why can we need an SEO Friendly Website?
Concept of Pay Per Click (PPC) and Conversion Rate Optimization (CRO) explained
Online Marketing - Impact, Pros & Cons
Online Marketing - Blogs, banners and forums
Introduction to Online Marketing
Digital Marketing using Twitter and LinkedIn
Digital Marketing using Social Media and YouTube
Digital Marketing using Facebook and Pinterest
Digital Marketing using Content marketing and Email Marketing
eBooks
Interview Questions
Videos
SEO and SMM
Articles
Social Media Marketing using Blogs
Social Media Marketing using Facebook
Social Media Marketing using Google Plus
Social Media Marketing using Linkedin
Social Media Marketing using Pinterest
Social Media Marketing using Twitter
Social Media Marketing using Video
Social Media Analysis and Monitoring Social Media Accounts
SMM - Image Optimization and Social Bookmarking
eBooks
Interview Questions
Videos
Social Media Marketing using Blogs
Social Media Marketing using Facebook
Social Media Marketing using Google Plus
Social Media Marketing using Linkedin
Social Media Marketing using Pinterest
Social Media Marketing using Twitter
Social Media Marketing using Video
Social Media Analysis and Monitoring Social Media Accounts
SMM - Image Optimization and Social Bookmarking
Social Media Marketing using Blogs
Social Media Marketing using Facebook
Social Media Marketing using Google Plus
Social Media Marketing using Linkedin
Social Media Marketing using Pinterest
Social Media Marketing using Twitter
Social Media Marketing using Video
Understanding Mobile marketing
Understanding Google Analytics
Online Marketing - Web Analytics
Why can we need an SEO Friendly Website?
Concept of Pay Per Click (PPC) and Conversion Rate Optimization (CRO) explained
Online Marketing - Impact, Pros & Cons
Online Marketing - Blogs, banners and forums
Introduction to Online Marketing
Digital Marketing using Twitter and LinkedIn
Digital Marketing using Social Media and YouTube
Digital Marketing using Facebook and Pinterest
Digital Marketing using Content marketing and Email Marketing
Overview of Digital Marketing and SEO
Social Media Analysis and Monitoring Social Media Accounts
SMM - Image Optimization and Social Bookmarking
SEO Strategy to Optimize Keywords and Metatags
Affiliate Marketing and Email Marketing
Frontend Development
Articles
eBooks
Interview Questions
Videos
Angular JS
Articles
Create Angular Application and Angular MVC Architecture
Custom Directives in Angular JS
Dependency Injection in Angular JS
Directives and Filters in Angular JS
Embedding Html Pages within HTML page
Expressions and Controllers in Angular JS
How to create Forms in Angular JS
How to create Single Page Application via multiple views
Internationalization in Angular JS
Services Architecture in Angular JS
Spring Angular CRUD Application
Spring Angular Login & Logout Application
Spring Angular Search Field Application
Tables and HTML DOM in Angular JS
Using Directives and Expressions in Angular JS
eBooks
Interview Questions
Videos
Create Angular Application and Angular MVC Architecture
Custom Directives in Angular JS
Dependency Injection in Angular JS
Directives and Filters in Angular JS
Embedding Html Pages within HTML page
Expressions and Controllers in Angular JS
How to create Forms in Angular JS
How to create Single Page Application via multiple views
Internationalization in Angular JS
Services Architecture in Angular JS
Spring Angular CRUD Application
Spring Angular Login & Logout Application
Spring Angular Search Field Application
Tables and HTML DOM in Angular JS
Using Directives and Expressions in Angular JS
React JS
Articles
Comparison Between AngularJS and ReactJS
How to implement flux pattern in React Applications
How to Animate elements using React
Error Handling using Error Boundaries
Environment Setup for React JS
Component Life Cycle Methods in React JS
Comparison between ReactJS and React Native
Overview of ReactJS and its Features
Overview of React Redux with an example
How to set up Router for an app
Using Refs and Keys in React JS
eBooks
Interview Questions
Videos
Comparison Between AngularJS and ReactJS
How to implement flux pattern in React Applications
How to Animate elements using React
Error Handling using Error Boundaries
Environment Setup for React JS
Component Life Cycle Methods in React JS
Comparison between ReactJS and React Native
Overview of ReactJS and its Features
Overview of React Redux with an example
How to set up Router for an app
Using Refs and Keys in React JS
Create Angular Application and Angular MVC Architecture
Custom Directives in Angular JS
Dependency Injection in Angular JS
Directives and Filters in Angular JS
Embedding Html Pages within HTML page
Expressions and Controllers in Angular JS
How to create Forms in Angular JS
How to create Single Page Application via multiple views
Internationalization in Angular JS
Services Architecture in Angular JS
Spring Angular CRUD Application
Spring Angular Login & Logout Application
Spring Angular Search Field Application
Tables and HTML DOM in Angular JS
Using Directives and Expressions in Angular JS
How to Setup AngularJS Environment
Comparison Between AngularJS and ReactJS
How to implement flux pattern in React Applications
How to Animate elements using React
Error Handling using Error Boundaries
Environment Setup for React JS
Component Life Cycle Methods in React JS
Comparison between ReactJS and React Native
Overview of ReactJS and its Features
Overview of React Redux with an example
How to set up Router for an app
Using Refs and Keys in React JS
Understanding ReactJS Components
Top React JS Interview Questions and Answers
IOT
Articles
IoT project of controlling home light using WiFi Node MCU, and Relay module
IoT project of Sonar system using Ultrasonic Sensor HC-SR04 and Arduino device
IoT project of Temperature and Pressure measurement using Pressure sensor BMP180 and Arduino device
IoT (Internet of Things) Project: Google Firebase controlling LED with NodeMCU
IoT link Communication Protocol
IoT Decision Framework and Architecture
IoT in Energy and Biometrics Domain
IoT in Security Camera and Smart Home
IoT in Smart Agriculture and Healthcare Domain
IoT Network Layer and Session Layer Protocols
IoT – Platform and Thing Worx in IoT
IoT Project Google Firebase controlling LED using Android App
IoT Project: Google Firebase using NodeMCU ESP8266
IoT project of controlling home light using WiFi Node MCU, and Relay module
Overview of Internet of Things (IoT)
CISCO Virtualized Packet Zone and Salesforce in IoT
Embedded Devices (System) in (IoT) and IoT Ecosystem
GE Predix Platform and Eclipse IoT
How is IoT transforming businesses and IoT in transportation
eBooks
Interview Questions
Videos
IoT project of controlling home light using WiFi Node MCU, and Relay module
IoT project of Sonar system using Ultrasonic Sensor HC-SR04 and Arduino device
IoT project of Temperature and Pressure measurement using Pressure sensor BMP180 and Arduino device
IoT (Internet of Things) Project: Google Firebase controlling LED with NodeMCU
IoT link Communication Protocol
IoT Decision Framework and Architecture
IoT in Energy and Biometrics Domain
IoT in Security Camera and Smart Home
IoT in Smart Agriculture and Healthcare Domain
IoT Network Layer and Session Layer Protocols
IoT – Platform and Thing Worx in IoT
IoT Project Google Firebase controlling LED using Android App
IoT Project: Google Firebase using NodeMCU ESP8266
IoT project of controlling home light using WiFi Node MCU, and Relay module
Overview of Internet of Things (IoT)
CISCO Virtualized Packet Zone and Salesforce in IoT
Embedded Devices (System) in (IoT) and IoT Ecosystem
GE Predix Platform and Eclipse IoT
How is IoT transforming businesses and IoT in transportation
Internet of Things – Contiki and Security Flaws
Internet of Things – Security and Identity Protection
Top Internet of Things (IoT) Interview Questions and Answers
Mobile Development
Articles
eBooks
Interview Questions
Videos
Operating Systems
Articles
eBooks
Interview Questions
Videos
Programming and Frameworks
Articles
Cookies in Laravel based web applications
Encryption and Hashing in Laravel
How to create Blade Templates Layout
How to Create Façade in Laravel
How to perform Redirections and connect to Database
Installation Process of Laravel
Introduction to Laravel and its History
Laravel vs CodeIgniter and Laravel Vs Symphony
Laravel vs Django and Laravel vs WordPress
Middleware Mechanism in Laravel
Process of Authentication and Authorization in Laravel
Responses in Laravel web applications
Understanding Release Process in Laravel
How to setup Check/Money Order payment method in Magento 2
Dynamic Content Handling in PHP
eBooks
Interview Questions
Videos
Hibernate and Spring
Articles
How to use Node Package Manager and REPL Terminal
Handling GET and POST Request in NodeJS
Using Sessions and POJO Classes in Hibernate
Transaction Management in Spring
Overview and Architecture of Spring Framework
ORM Overview and Overview of Hibernate
IoC Containers, AOP and JDBC Framework in Spring
Injecting Inner Beans and Collections in Spring
How to use Criteria Queries in Hibernate
How to perform Java Based Configuration in Spring
How to Install Hibernate and its Configuration
eBooks
Interview Questions
Videos
How to use Node Package Manager and REPL Terminal
Handling GET and POST Request in NodeJS
Using Sessions and POJO Classes in Hibernate
Transaction Management in Spring
Overview and Architecture of Spring Framework
ORM Overview and Overview of Hibernate
IoC Containers, AOP and JDBC Framework in Spring
Injecting Inner Beans and Collections in Spring
How to use Criteria Queries in Hibernate
How to perform Java Based Configuration in Spring
How to Install Hibernate and its Configuration
Java
Articles
Variables and Keywords in Java
Transaction Management and Batch Processing in JDBC
StringBuffer and StringBuilder Class in Java
String Vs StringBuffer Vs StringBuilder
Stream API Improvement in Java 9
Static Binding and Dynamic Binding and Final Keyword
Serialization and Reflection in Java
Properties class and Generics in Java
Method Parameter Reflection in Java
Java StringJoiner and ArrayList Vs Vector
Java Queue and Deque Interface
Java Parallel Array Sorting and Type Inference
Java Networking and Socket Programming
Java Nested Interface and Method Overloading and Overriding
Java Method References and Functional Interfaces
Java Garbage Collection and Java Runtime Class
Java forEach loop and Collectors
Java Comments and Naming Conventions
Java 9 Process API Improvement
Java 9 Module System and Control Panel
Java 9 Anonymous Inner Classes Improvement and SafeVarargs Annotation
Introduction to Java and History of Java
Inter-thread communication and Deadlock in Java
How to write the Hello World Java program
How to create Immutable class in Java
Features of Java and C++ Vs Java
ExceptionHandling with MethodOverriding in Java
Difference between JDK, JRE, and JVM
Deep Dive into Threads in Java
Deep Dive into LinkedList in Java
Deep dive into LinkedHashMap and TreeMap
Deep dive into HashSet , LinkedHashSet and TreeSet
Deep Dive into HashMap in Java
Deep Dive into ArrayList in Java
Conditional Statements in Java
Concept of Method Overloading and Method Overriding in Java
Concept of Inheritance and Aggregation in Java
Comparable and Comparator interface in Java
eBooks
Interview Questions
Videos
Variables and Keywords in Java
Transaction Management and Batch Processing in JDBC
StringBuffer and StringBuilder Class in Java
String Vs StringBuffer Vs StringBuilder
Stream API Improvement in Java 9
Static Binding and Dynamic Binding and Final Keyword
Serialization and Reflection in Java
Properties class and Generics in Java
Method Parameter Reflection in Java
Java StringJoiner and ArrayList Vs Vector
Java Queue and Deque Interface
Java Parallel Array Sorting and Type Inference
Java Networking and Socket Programming
Java Nested Interface and Method Overloading and Overriding
Java Method References and Functional Interfaces
Java Garbage Collection and Java Runtime Class
Java forEach loop and Collectors
Java Comments and Naming Conventions
Java 9 Process API Improvement
Java 9 Module System and Control Panel
Java 9 Anonymous Inner Classes Improvement and SafeVarargs Annotation
Introduction to Java and History of Java
Inter-thread communication and Deadlock in Java
How to write the Hello World Java program
How to create Immutable class in Java
Features of Java and C++ Vs Java
ExceptionHandling with MethodOverriding in Java
Difference between JDK, JRE, and JVM
Deep Dive into Threads in Java
Deep Dive into LinkedList in Java
Deep dive into LinkedHashMap and TreeMap
Deep dive into HashSet , LinkedHashSet and TreeSet
Deep Dive into HashMap in Java
Deep Dive into ArrayList in Java
Conditional Statements in Java
Concept of Method Overloading and Method Overriding in Java
Concept of Inheritance and Aggregation in Java
Comparable and Comparator interface in Java
JSP
eBooks
Interview Questions
Videos
Laravel
Articles
Understanding Release Process in Laravel
Responses in Laravel web applications
Process of Authentication and Authorization in Laravel
Middleware Mechanism in Laravel
Laravel vs Django and Laravel vs WordPress
Laravel vs CodeIgniter and Laravel Vs Symphony
Introduction to Laravel and its History
Installation Process of Laravel
How to perform Redirections and connect to Database
How to Create Façade in Laravel
How to create Blade Templates Layout
Encryption and Hashing in Laravel
Cookies in Laravel based web applications
Contracts and CSRF Protection in Laravel
Available Validation Rules of Laravel
eBooks
Interview Questions
Videos
Understanding Release Process in Laravel
Responses in Laravel web applications
Process of Authentication and Authorization in Laravel
Middleware Mechanism in Laravel
Laravel vs Django and Laravel vs WordPress
Laravel vs CodeIgniter and Laravel Vs Symphony
Introduction to Laravel and its History
Installation Process of Laravel
How to perform Redirections and connect to Database
How to Create Façade in Laravel
How to create Blade Templates Layout
Encryption and Hashing in Laravel
Cookies in Laravel based web applications
Contracts and CSRF Protection in Laravel
Available Validation Rules of Laravel
Magento
Articles
Architecture of Magento 2 and Product Overview
How to use the multi language feature of Magento
How to Setup System Theme, Page Title, Layout and New Pages in Magento
How to Setup Shipping Rates and Payment Plans in Magento
How to setup shipping methods in Magento 2
How to Setup Paypal Payment and Google checkout in Magento
How to Setup Newsletter in Magento
How to Setup Google Analytics Youtube Videos and Facebook Likes in Magento
How to setup Check/Money Order payment method in Magento 2
How to set up Zero Subtotal Checkout payment method in Magento 2
How to set up the tax rules, tax rates, and tax zones in Magento 2
How to set up Purchase Order (PO) payment method in Magento 2
How to set up Order Emails in Magento 2
How to set up multiple websites, stores, and store views in Magento 2
How to Set up Contact, Categories, Products and Inventory in Magento
How to set up Cash on Delivery (COD) payment method in Magento 2
How to set up Bank Transfer payment method in Magento 2
How to set up Authorize.net method in Magento 2
How to Manage Tax Classes in Magento
How to Install Magento on your system
How to install Magento 2 using Composer
How to install Magento 2 on windows
Ways for Site Optimization in Magento
Store Configuration in Magento 2
Search Engine Optimization in Magento 2
Products and their Types in Magento 2
Overview of Magento and its Features
Orders Life Cycle in Magento 2
Ways for Site Optimization in Magento
Basic Configuration in Magento 2
Create and Manage CMS (Content Management System) in Magento 2
How to add the product on Home page in Magento 2
How to configure and Manage the Inventory in Magento 2
How to create Attribute Sets in Magento 2
How to create Product Attributes in Magento 2
How to create Product Category in Magento 2
eBooks
Interview Questions
Videos
Architecture of Magento 2 and Product Overview
How to use the multi language feature of Magento
How to Setup System Theme, Page Title, Layout and New Pages in Magento
How to Setup Shipping Rates and Payment Plans in Magento
How to setup shipping methods in Magento 2
How to Setup Paypal Payment and Google checkout in Magento
How to Setup Newsletter in Magento
How to Setup Google Analytics Youtube Videos and Facebook Likes in Magento
How to setup Check/Money Order payment method in Magento 2
How to set up Zero Subtotal Checkout payment method in Magento 2
How to set up the tax rules, tax rates, and tax zones in Magento 2
How to set up Purchase Order (PO) payment method in Magento 2
How to set up Order Emails in Magento 2
How to set up multiple websites, stores, and store views in Magento 2
How to Set up Contact, Categories, Products and Inventory in Magento
How to set up Cash on Delivery (COD) payment method in Magento 2
How to set up Bank Transfer payment method in Magento 2
How to set up Authorize.net method in Magento 2
How to Manage Tax Classes in Magento
How to Install Magento on your system
How to install Magento 2 using Composer
How to install Magento 2 on windows
Ways for Site Optimization in Magento
Store Configuration in Magento 2
Search Engine Optimization in Magento 2
Products and their Types in Magento 2
Overview of Magento and its Features
Orders Life Cycle in Magento 2
Ways for Site Optimization in Magento
Basic Configuration in Magento 2
Create and Manage CMS (Content Management System) in Magento 2
How to add the product on Home page in Magento 2
How to configure and Manage the Inventory in Magento 2
How to create Attribute Sets in Magento 2
How to create Product Attributes in Magento 2
How to create Product Category in Magento 2
NodeJS
Articles
Scaffolding and Middleware in ExpressJS
Overview of expressJS, installation and Request-response model
NodeJS environment setup and Creating First Application
How to scale application in NodeJS and concept of packaging
Event Driven Programming in NodeJS
Cookies Management, Routing and Template Engine in ExpressJS
eBooks
Interview Questions
Videos
Scaffolding and Middleware in ExpressJS
Overview of expressJS, installation and Request-response model
NodeJS environment setup and Creating First Application
How to scale application in NodeJS and concept of packaging
Event Driven Programming in NodeJS
Cookies Management, Routing and Template Engine in ExpressJS
PHP
Articles
Variable Types and Constant Types in PHP
Operations in MySQL DB using PHP
Object Oriented Programming in PHP
Login with Facebook and Paypal Integration in PHP
Dynamic Content Handling in PHP
How to Install PHP on your system
How to access information from DB using PHP and AJAX
Error and Exception Handling in PHP
CRUD operations in MySQL DB using PHP
eBooks
Interview Questions
Videos
Variable Types and Constant Types in PHP
Operations in MySQL DB using PHP
Object Oriented Programming in PHP
Login with Facebook and Paypal Integration in PHP
Dynamic Content Handling in PHP
How to Install PHP on your system
How to access information from DB using PHP and AJAX
Error and Exception Handling in PHP
CRUD operations in MySQL DB using PHP
Python
Articles
Variable Types and Basic Operators in Python
Time Series, Geographical and Graph Data in Python
Sending Email using SMTP in Python
Processing CSV, JSON and XLS Data in Python
MySQL Database Access in Python
Multithreaded Programming in Python
Introduction to Python and Installing Python
How to draw different Charts in Python
Handling Relational and NoSQL Databases in Python
Handling Date and Time in Python
Extension Programming with C in Python
Data Wrangling and Data Aggregations in Python
Data Science Libraries in Python
eBooks
Interview Questions
Videos
Variable Types and Basic Operators in Python
Time Series, Geographical and Graph Data in Python
Sending Email using SMTP in Python
Processing CSV, JSON and XLS Data in Python
MySQL Database Access in Python
Multithreaded Programming in Python
Introduction to Python and Installing Python
How to draw different Charts in Python
Handling Relational and NoSQL Databases in Python
Handling Date and Time in Python
Extension Programming with C in Python
Data Wrangling and Data Aggregations in Python
Data Science Libraries in Python
Servlet
eBooks
Interview Questions
Videos
Spring Boot
Articles
How to write a Scheduler on the Spring applications and CORS Support
Service Components in Spring Boot
Tracing Micro Service Logs in Spring Boot
How to perform Bootstrapping on a Spring Boot application
How to use Spring Boot JDBC driver connection to connect the database
How to write a unit test case by using Mockito and Web Controller
Spring Boot - Code Structure and Build Systems
Spring Boot - Enabling Swagger2
Spring Boot - Google Cloud Platform
Spring Boot - Rest Controller Unit Test
Spring Boot - Securing Web Applications
Spring Boot - Tomcat Deployment
Spring Boot Architecture and Why Spring Boot is used
Spring Boot Security mechanisms and OAuth2 with JWT
Spring Vs Spring Boot Vs Spring MVC
Application Properties in Spring Boot
How to implement the SMS sending and making voice calls by using Spring Boot with Twilio
Building RESTful Web Services using Spring Boot
Consuming RESTful Web Services by using jQuery AJAX
Create a Web Application in Spring Boot using Thymeleaf
Creating Servlet Filter using Spring Boot
Exception Handling in Spring Boot
File Handling using Spring Boot
How to add the Google OAuth2 Sign-In by using Spring Boot application with Gradle build
How to build a Eureka Server using Spring Boot
How to build an interactive web application by using Spring Boot with Web sockets
How to Build Spring Boot Admin Server and Client
How to Create Applications that consume Restful Web Services
How to Create Spring Cloud Configuration Server
How to Configure Flyway Database in your Spring Boot application
How to create a Docker Image using Maven and Gradle
How to create a Spring Boot Application using Maven and Gradle
How to create Zuul Proxy Server application in Spring Boot
How to implement the Apache Kafka in Spring Boot application
eBooks
Interview Questions
Videos
How to write a Scheduler on the Spring applications and CORS Support
Service Components in Spring Boot
Tracing Micro Service Logs in Spring Boot
How to perform Bootstrapping on a Spring Boot application
How to use Spring Boot JDBC driver connection to connect the database
How to write a unit test case by using Mockito and Web Controller
Spring Boot - Code Structure and Build Systems
Spring Boot - Enabling Swagger2
Spring Boot - Google Cloud Platform
Spring Boot - Rest Controller Unit Test
Spring Boot - Securing Web Applications
Spring Boot - Tomcat Deployment
Spring Boot Architecture and Why Spring Boot is used
Spring Boot Security mechanisms and OAuth2 with JWT
Spring Vs Spring Boot Vs Spring MVC
Application Properties in Spring Boot
How to implement the SMS sending and making voice calls by using Spring Boot with Twilio
Building RESTful Web Services using Spring Boot
Consuming RESTful Web Services by using jQuery AJAX
Create a Web Application in Spring Boot using Thymeleaf
Creating Servlet Filter using Spring Boot
Exception Handling in Spring Boot
File Handling using Spring Boot
How to add the Google OAuth2 Sign-In by using Spring Boot application with Gradle build
How to build a Eureka Server using Spring Boot
How to build an interactive web application by using Spring Boot with Web sockets
How to Build Spring Boot Admin Server and Client
How to Create Applications that consume Restful Web Services
How to Create Spring Cloud Configuration Server
How to Configure Flyway Database in your Spring Boot application
How to create a Docker Image using Maven and Gradle
How to create a Spring Boot Application using Maven and Gradle
How to create Zuul Proxy Server application in Spring Boot
How to implement the Apache Kafka in Spring Boot application
Variable Types and Basic Operators in Python
Time Series, Geographical and Graph Data in Python
Sending Email using SMTP in Python
Processing CSV, JSON and XLS Data in Python
MySQL Database Access in Python
Multithreaded Programming in Python
Introduction to Python and Installing Python
How to draw different Charts in Python
Handling Relational and NoSQL Databases in Python
Handling Date and Time in Python
Extension Programming with C in Python
Data Wrangling and Data Aggregations in Python
Data Science Libraries in Python
Calendar and Date and Time in Python
Scaffolding and Middleware in ExpressJS
Overview of expressJS, installation and Request-response model
NodeJS environment setup and Creating First Application
How to use Node Package Manager and REPL Terminal
How to scale application in NodeJS and concept of packaging
Handling GET and POST Request in NodeJS
Event Driven Programming in NodeJS
Cookies Management, Routing and Template Engine in ExpressJS
Concept of Callbacks and Streams in NodeJS
Comparison of NodeJS with other programming languages
Using Sessions and POJO Classes in Hibernate
Transaction Management in Spring
Overview and Architecture of Spring Framework
ORM Overview and Overview of Hibernate
IoC Containers, AOP and JDBC Framework in Spring
Architecture of Magento 2 and Product Overview
Injecting Inner Beans and Collections in Spring
How to use Criteria Queries in Hibernate
How to perform Java Based Configuration in Spring
How to Install Hibernate and its Configuration
Environment Setup for Spring Framework
Variables and Keywords in Java
Transaction Management and Batch Processing in JDBC
StringBuffer and StringBuilder Class in Java
String Vs StringBuffer Vs StringBuilder
Stream API Improvement in Java 9
Static Binding and Dynamic Binding and Final Keyword
Serialization and Reflection in Java
Properties class and Generics in Java
Method Parameter Reflection in Java
Java StringJoiner and ArrayList Vs Vector
Java Queue and Deque Interface
Java Parallel Array Sorting and Type Inference
Java Networking and Socket Programming
Java Nested Interface and Method Overloading and Overriding
Java Method References and Functional Interfaces
Java Garbage Collection and Java Runtime Class
Java forEach loop and Collectors
Java Comments and Naming Conventions
Java 9 Process API Improvement
Java 9 Module System and Control Panel
Java 9 Anonymous Inner Classes Improvement and SafeVarargs Annotation
Introduction to Java and History of Java
Inter-thread communication and Deadlock in Java
How to write the Hello World Java program
How to create Immutable class in Java
Features of Java and C++ Vs Java
ExceptionHandling with MethodOverriding in Java
Difference between JDK, JRE, and JVM
Deep Dive into Threads in Java
Deep Dive into LinkedList in Java
Deep dive into LinkedHashMap and TreeMap
Deep dive into HashSet , LinkedHashSet and TreeSet
Deep Dive into HashMap in Java
Deep Dive into ArrayList in Java
Conditional Statements in Java
Concept of Method Overloading and Method Overriding in Java
Concept of Inheritance and Aggregation in Java
Comparable and Comparator interface in Java
Call by Value and Call by Reference in Java
Cookies in Laravel based web applications
Encryption and Hashing in Laravel
How to create Blade Templates Layout
How to Create Façade in Laravel
How to perform Redirections and connect to Database
Installation Process of Laravel
Introduction to Laravel and its History
Laravel vs CodeIgniter and Laravel Vs Symphony
Laravel vs Django and Laravel vs WordPress
Middleware Mechanism in Laravel
Process of Authentication and Authorization in Laravel
Responses in Laravel web applications
Understanding Release Process in Laravel
How to setup Check/Money Order payment method in Magento 2
Dynamic Content Handling in PHP
Object Oriented Programming in PHP
How to use the multi language feature of Magento
How to Setup System Theme, Page Title, Layout and New Pages in Magento
How to Setup Shipping Rates and Payment Plans in Magento
How to setup shipping methods in Magento 2
How to Setup Paypal Payment and Google checkout in Magento
How to Setup Newsletter in Magento
How to Setup Google Analytics Youtube Videos and Facebook Likes in Magento
How to setup Check/Money Order payment method in Magento 2
How to set up Zero Subtotal Checkout payment method in Magento 2
How to set up the tax rules, tax rates, and tax zones in Magento 2
How to set up Purchase Order (PO) payment method in Magento 2
How to set up Order Emails in Magento 2
How to set up multiple websites, stores, and store views in Magento 2
How to Set up Contact, Categories, Products and Inventory in Magento
How to set up Cash on Delivery (COD) payment method in Magento 2
How to set up Bank Transfer payment method in Magento 2
How to set up Authorize.net method in Magento 2
How to Manage Tax Classes in Magento
How to Install Magento on your system
How to install Magento 2 using Composer
How to install Magento 2 on windows
Ways for Site Optimization in Magento
Store Configuration in Magento 2
Search Engine Optimization in Magento 2
Products and their Types in Magento 2
Overview of Magento and its Features
Orders Life Cycle in Magento 2
Ways for Site Optimization in Magento
Basic Configuration in Magento 2
Create and Manage CMS (Content Management System) in Magento 2
How to add the product on Home page in Magento 2
How to configure and Manage the Inventory in Magento 2
How to create Attribute Sets in Magento 2
How to create Product Attributes in Magento 2
How to create Product Category in Magento 2
How to generate Order Report in Magento 2
How to create Product in Magento 2
Variable Types and Constant Types in PHP
Operations in MySQL DB using PHP
Object Oriented Programming in PHP
Login with Facebook and Paypal Integration in PHP
Dynamic Content Handling in PHP
How to Install PHP on your system
How to access information from DB using PHP and AJAX
Error and Exception Handling in PHP
CRUD operations in MySQL DB using PHP
Handling Arrays and Strings in PHP
Standard Tag Library (JSTL) in JSP
Page Redirecting and Hits Counter and Auto Refresh
Overview of Java Server Pages and its Architecture
How to Access Database with JSP
Servlets - Server HTTP Response
Servlets - Page Redirection and Auto Refresh
Internationalization in Servlets
Handling Date and Time using Servlets
Exception Handling in Servlets
Overview of Servlets and setup of Environment
How to write a Scheduler on the Spring applications and CORS Support
Service Components in Spring Boot
Tracing Micro Service Logs in Spring Boot
How to perform Bootstrapping on a Spring Boot application
How to use Spring Boot JDBC driver connection to connect the database
How to write a unit test case by using Mockito and Web Controller
Spring Boot - Code Structure and Build Systems
Spring Boot - Enabling Swagger2
Spring Boot - Google Cloud Platform
Spring Boot - Rest Controller Unit Test
Spring Boot - Securing Web Applications
Spring Boot - Tomcat Deployment
Spring Boot Architecture and Why Spring Boot is used
Spring Boot Security mechanisms and OAuth2 with JWT
Spring Vs Spring Boot Vs Spring MVC
Application Properties in Spring Boot
How to implement the SMS sending and making voice calls by using Spring Boot with Twilio
Building RESTful Web Services using Spring Boot
Consuming RESTful Web Services by using jQuery AJAX
Create a Web Application in Spring Boot using Thymeleaf
Creating Servlet Filter using Spring Boot
Exception Handling in Spring Boot
File Handling using Spring Boot
How to add the Google OAuth2 Sign-In by using Spring Boot application with Gradle build
How to build a Eureka Server using Spring Boot
How to build an interactive web application by using Spring Boot with Web sockets
How to Build Spring Boot Admin Server and Client
How to Create Applications that consume Restful Web Services
How to Create Spring Cloud Configuration Server
How to Configure Flyway Database in your Spring Boot application
How to create a Docker Image using Maven and Gradle
How to create a Spring Boot Application using Maven and Gradle
How to create Zuul Proxy Server application in Spring Boot
How to implement the Apache Kafka in Spring Boot application
How to implement the Internationalization in Spring Boot
How to implement the Hystrix in a Spring Boot application
Login with Facebook and Paypal Integration in PHP
Understanding Release Process in Laravel
Responses in Laravel web applications
Process of Authentication and Authorization in Laravel
Middleware Mechanism in Laravel
Laravel vs Django and Laravel vs WordPress
Laravel vs CodeIgniter and Laravel Vs Symphony
Introduction to Laravel and its History
Installation Process of Laravel
How to perform Redirections and connect to Database
How to Create Façade in Laravel
How to create Blade Templates Layout
Encryption and Hashing in Laravel
Cookies in Laravel based web applications
Contracts and CSRF Protection in Laravel
Available Validation Rules of Laravel
Artisan Console for interaction in Laravel
Application Structure of Laravel
Expression Language (EL) in JSP
Expression Language (EL) in JSP
Project Management and Methodologies
Articles
eBooks
Interview Questions
Videos
Robotic Process Automation
eBooks
Interview Questions
Videos
RPA-UiPath
Articles
eBooks
Interview Questions
Videos
Working of RPA and its Services
Understanding User Interface Components
UiPath Studio - Workflow Design
RPA Use Cases and Applications
RPA Life Cycle and Implementation
Recording using UiPath in Detail
Keyboard Shortcuts and Customization in UiPath Studio
Key Basics of UiPath and the related concepts
Installation of UiPath on your local system
How to work with Automation Projects in UiPath and their Debugging methods
How to deal and work with variables and arguments in UiPath
Data Scraping and Screen Scraping in UiPath
Comparison of RPA and AI, Test Automation and Traditional Automation
Architecture and Components of RPA
Advantages and drawbacks of RPA
Top Robotic Process Automation (RPA) with UiPath Interview Questions and Answers
Salesforce
Articles
Different Levels of Data Access in Salesforce
Variables & Formulas in Salesforce
Using Records, Fields and Tables in Salesforce
Using Forms and List Controllers in Salesforce
Creating Static Resources in Salesforce
Standard and Custom Objects in Salesforce platform
Overview of Salesforce and its architecture
Master Detail Relationship in Salesforce
Lookup Relationship in Salesforce
How to Import Data in Salesforce
How to Export Data from Salesforce
How to Define Sharing Rules in Salesforce
How to create Visual force Pages in Salesforce
How to create Reports and Dashboards in Salesforce
Get Started with Salesforce - Environment
How to Create a Role Hierarchy in Salesforce
eBooks
Interview Questions
Videos
Apex Programming
Articles
Classes and Methods in Apex programming language
Concept of Objects and Interfaces in Apex programming language
Database Methods and process of executing the Apex class in Salesforce
Deployment in Salesforce using Sandbox
Enterprise Application Development Example
How to Perform Debugging in Apex
How to perform the various Database Modification Functionalities in Salesforce
How to perform Unit Testing in Apex
Overview of Apex Programming and its environment
Search Functionality using SOSL and SOQL
Understand Batch Processing in Salesforce Apex
Understanding deciding, Loops and Collections in Apex
Understanding Governor Limits in Salesforce Apex
Understanding the info Types and variables in Apex programming language
Understanding the environment for Salesforce Apex development
Understanding the String Manipulation, Arrays and Constants in Apex programming language
eBooks
Interview Questions
Videos
Classes and Methods in Apex programming language
Concept of Objects and Interfaces in Apex programming language
Database Methods and process of executing the Apex class in Salesforce
Deployment in Salesforce using Sandbox
Enterprise Application Development Example
How to Perform Debugging in Apex
How to perform the various Database Modification Functionalities in Salesforce
How to perform Unit Testing in Apex
Overview of Apex Programming and its environment
Search Functionality using SOSL and SOQL
Understand Batch Processing in Salesforce Apex
Understanding deciding, Loops and Collections in Apex
Understanding Governor Limits in Salesforce Apex
Understanding the info Types and variables in Apex programming language
Understanding the environment for Salesforce Apex development
Understanding the String Manipulation, Arrays and Constants in Apex programming language
Different Levels of Data Access in Salesforce
Variables & Formulas in Salesforce
Using Records, Fields and Tables in Salesforce
Using Forms and List Controllers in Salesforce
Creating Static Resources in Salesforce
Standard and Custom Objects in Salesforce platform
Overview of Salesforce and its architecture
Master Detail Relationship in Salesforce
Lookup Relationship in Salesforce
How to Import Data in Salesforce
How to Export Data from Salesforce
How to Define Sharing Rules in Salesforce
How to create Visual force Pages in Salesforce
How to create Reports and Dashboards in Salesforce
Get Started with Salesforce - Environment
How to Create a Role Hierarchy in Salesforce
Classes and Methods in Apex programming language
Concept of Objects and Interfaces in Apex programming language
Database Methods and process of executing the Apex class in Salesforce
Deployment in Salesforce using Sandbox
Enterprise Application Development Example
How to Perform Debugging in Apex
How to perform the various Database Modification Functionalities in Salesforce
How to perform Unit Testing in Apex
Overview of Apex Programming and its environment
Search Functionality using SOSL and SOQL
Understand Batch Processing in Salesforce Apex
Understanding deciding, Loops and Collections in Apex
Understanding Governor Limits in Salesforce Apex
Understanding the info Types and variables in Apex programming language
Understanding the environment for Salesforce Apex development
Understanding the String Manipulation, Arrays and Constants in Apex programming language
Using Formula Fields in Salesforce
SAP
Articles
unv Universe in SAP Business Object
Using Formula Bar and Universe Operations in SAP Universe Designer
Using LOVs and Create, Edit and Save a Universe
How to Display Financial Tables in SAP Simple Finance
Concept of Period Lock Transaction in SAP Simple Finance
Concept of Asset Scrapping in SAP Simple Finance
Create Default Account Assignment in SAP Simple Finance
How to Create a Primary Cost in G-L Account
Asset Accounting in SAP Simple Finance
Concept of Integrated Business Planning and Integration of Simple Finance with other Modules
eBooks
Interview Questions
Videos
SAP Business Object
Articles
Using Filters in SAP BO Analysis
Sheets and Sharing Workspaces in SAP BO Analysis
Perform Conditional Formatting in SAP BO Analysis
Overview of SAP Business Object Analysis
How to create a Workspace in SAP Business Objects
How to Connect to SAP BW in SAP Business Objects
Export Options in SAP BO Analysis
Concept of Sub Analysis in SAP BO
eBooks
Interview Questions
Videos
Using Filters in SAP BO Analysis
Sheets and Sharing Workspaces in SAP BO Analysis
Perform Conditional Formatting in SAP BO Analysis
Overview of SAP Business Object Analysis
How to create a Workspace in SAP Business Objects
How to Connect to SAP BW in SAP Business Objects
Export Options in SAP BO Analysis
Concept of Sub Analysis in SAP BO
Calculations in SAP BO Analysis
SAP Hana
Articles
Alert Monitoring and Logging in SAP Hana
Authentications and Authorization Methods in SAP HANA
DXC Replication Method and CTL Method and MDX provider in SAP Hana
Excel Integration with SAP Hana and Bi 4.0 Connectivity to Hana Views
User Administration & Role Management and Security Overview in SAP Hana
Usage of SQL Script in SAP Hana
SQL Triggers, Synonym and Data Profiling in SAP Hana
SQL Overview and Data Types in SAP Hana
SQL Functions and Operators in SAP Hana
SQL Expressions, Stored Procedures and Sequences in SAP Hana
Packages and Attribute and Analytic View in SAP Hana
Modeling and Schemas in SAP HANA
Log Based and ETL Based Replication in SAP Hana
License Management and Auditing in SAP Hana
Information Modeler and System Monitor in SAP HANA
High Availability and Backup and Recovery in SAP Hana
Export and Import Options in Sap Hana
eBooks
Videos
Alert Monitoring and Logging in SAP Hana
Authentications and Authorization Methods in SAP HANA
DXC Replication Method and CTL Method and MDX provider in SAP Hana
Excel Integration with SAP Hana and Bi 4.0 Connectivity to Hana Views
User Administration & Role Management and Security Overview in SAP Hana
Usage of SQL Script in SAP Hana
SQL Triggers, Synonym and Data Profiling in SAP Hana
SQL Overview and Data Types in SAP Hana
SQL Functions and Operators in SAP Hana
SQL Expressions, Stored Procedures and Sequences in SAP Hana
Packages and Attribute and Analytic View in SAP Hana
Modeling and Schemas in SAP HANA
Log Based and ETL Based Replication in SAP Hana
License Management and Auditing in SAP Hana
Information Modeler and System Monitor in SAP HANA
High Availability and Backup and Recovery in SAP Hana
Export and Import Options in Sap Hana
Data Replication Overview in SAP Hana
Analytic Privileges and Information Composer in SAP Hana
SAP Hana Adminstration
Articles
SAP HANA Admin Studio and System Management
Overview of SAP HANA Administration
SAP HANA License Management and Multitenant DB Container Management
Smart Data Access and Integration with Hadoop
How to Start, Stop and Monitor a HANA System
HANA XS Application Service and Data Provisioning in SAP Hana
Data Compression and Solman Integration in SAP Hana
eBooks
Interview Questions
Videos
SAP HANA Admin Studio and System Management
Overview of SAP HANA Administration
SAP HANA License Management and Multitenant DB Container Management
Smart Data Access and Integration with Hadoop
How to Start, Stop and Monitor a HANA System
HANA XS Application Service and Data Provisioning in SAP Hana
Data Compression and Solman Integration in SAP Hana
SAP Hana Finance
Articles
Profitability Analysis and Management Accounting in SAP Simple Finance
Overview of SAP Hana and SAP Hana Finance
Migration and Manual Reposting of Costs in SAP Simple Finance
How to Display Financial Tables in SAP Simple Finance
Concept of Period Lock Transaction in SAP Simple Finance
Concept of Asset Scrapping in SAP Simple Finance
Create Default Account Assignment in SAP Simple Finance
How to Create a Primary Cost in G-L Account
Ledger Management in SAP Simple Finance
Reporting Options and G/L Accounting in SAP Simple Finance
Universal Journal and Document Number in SAP Simple Finance
SAP Simple Finance Architecture and Deployment Options
Asset Accounting in SAP Simple Finance
Concept of Integrated Business Planning and Integration of Simple Finance with other Modules
eBooks
Interview Questions
Videos
Profitability Analysis and Management Accounting in SAP Simple Finance
Overview of SAP Hana and SAP Hana Finance
Migration and Manual Reposting of Costs in SAP Simple Finance
How to Display Financial Tables in SAP Simple Finance
Concept of Period Lock Transaction in SAP Simple Finance
Concept of Asset Scrapping in SAP Simple Finance
Create Default Account Assignment in SAP Simple Finance
How to Create a Primary Cost in G-L Account
Ledger Management in SAP Simple Finance
Reporting Options and G/L Accounting in SAP Simple Finance
Universal Journal and Document Number in SAP Simple Finance
SAP Simple Finance Architecture and Deployment Options
Asset Accounting in SAP Simple Finance
Concept of Integrated Business Planning and Integration of Simple Finance with other Modules
SAP Hana Logistics
Articles
Supply Chain Planning and Integrated Business Planning in SAP Hana Logistics
Overview of SAP Hana Simple Logistics
MRP Procedures and Key Features in SAP Simple Logistics
MIGO Transactions in SAP Simple Logistics
Manufacturing Process in SAP Simple Logistics
Invoice Management and Operational Procurement in SAP Simple Logistics
How to Manage Business Partner in SAP Simple Logistics
How to Execute MRP Live planning
How to Create Business Partner in SAP HANA Logistics
Fiori UX and Deployment and Procurement Types in SAP Hana Logistics
Execute Discrete Production in SAP Hana Logistics
Contract Management and Perform Procurement Transfer Stock in SAP Hana Logistics
eBooks
Interview Questions
Videos
Supply Chain Planning and Integrated Business Planning in SAP Hana Logistics
Overview of SAP Hana Simple Logistics
MRP Procedures and Key Features in SAP Simple Logistics
MIGO Transactions in SAP Simple Logistics
Manufacturing Process in SAP Simple Logistics
Invoice Management and Operational Procurement in SAP Simple Logistics
How to Manage Business Partner in SAP Simple Logistics
How to Execute MRP Live planning
How to Create Business Partner in SAP HANA Logistics
Fiori UX and Deployment and Procurement Types in SAP Hana Logistics
Execute Discrete Production in SAP Hana Logistics
Contract Management and Perform Procurement Transfer Stock in SAP Hana Logistics
Concept of Simplification Item in SAP Simple Logistics
SAP UDT & IDT
Articles
Building Data Foundation in SAP IDT
Building Query in Query Panel, Publishing in SAP IDT
Business Layer Properties in SAP IDT
Dealing with Published Universes in SAP IDT
Deploying Universe in SAP Universe Designer
Format Editor Overview in SAP IDT
How to create universe in SAP IDT
How to use Table Browser and Derived Tables in SAP Universal Designer
Joins In Data Foundation in SAP IDT
Managing Connections in SAP IDT
Managing Resources in Repository, Qualifiers and Owners
OLAP Data Sources in SAP Universe Designer
Overview of SAP Universe Designer
unv Universe in SAP Business Object
Using Formula Bar and Universe Operations in SAP Universe Designer
Using LOVs and Create, Edit and Save a Universe
Concept of Calculated Measures and Aggregate Awareness
Business Layer View in SAP IDT
eBooks
Interview Questions
Videos
Building Data Foundation in SAP IDT
Building Query in Query Panel, Publishing in SAP IDT
Business Layer Properties in SAP IDT
Dealing with Published Universes in SAP IDT
Deploying Universe in SAP Universe Designer
Format Editor Overview in SAP IDT
How to create universe in SAP IDT
How to use Table Browser and Derived Tables in SAP Universal Designer
Joins In Data Foundation in SAP IDT
Managing Connections in SAP IDT
Managing Resources in Repository, Qualifiers and Owners
OLAP Data Sources in SAP Universe Designer
Overview of SAP Universe Designer
unv Universe in SAP Business Object
Using Formula Bar and Universe Operations in SAP Universe Designer
Using LOVs and Create, Edit and Save a Universe
Concept of Calculated Measures and Aggregate Awareness
Business Layer View in SAP IDT
Sap Webi
Articles
Working with Reports in SAP Webi
Sending Documents in SAP Web Intelligence
Query Filters and Filters Type in SAP Webi
Queries using Bex and Analysis View in SAP Webi
How to use Formulas and Variables in SAP Webi
How to use Breaks, Sorts and Ranking Data in SAP Webi
How to Create SAP Webi documents
How to achieve Conditional Formatting in SAP Webi
eBooks
Interview Questions
Videos
Working with Reports in SAP Webi
Sending Documents in SAP Web Intelligence
Query Filters and Filters Type in SAP Webi
Queries using Bex and Analysis View in SAP Webi
How to use Formulas and Variables in SAP Webi
How to use Breaks, Sorts and Ranking Data in SAP Webi
How to Create SAP Webi documents
How to achieve Conditional Formatting in SAP Webi
SAP HANA Admin Studio and System Management
Overview of SAP HANA Administration
SAP HANA License Management and Multitenant DB Container Management
Smart Data Access and Integration with Hadoop
Building Data Foundation in SAP IDT
Building Query in Query Panel, Publishing in SAP IDT
Business Layer Properties in SAP IDT
Dealing with Published Universes in SAP IDT
Deploying Universe in SAP Universe Designer
Format Editor Overview in SAP IDT
How to create universe in SAP IDT
How to use Table Browser and Derived Tables in SAP Universal Designer
Joins In Data Foundation in SAP IDT
Managing Connections in SAP IDT
Managing Resources in Repository, Qualifiers and Owners
OLAP Data Sources in SAP Universe Designer
Overview of SAP Universe Designer
unv Universe in SAP Business Object
Using Formula Bar and Universe Operations in SAP Universe Designer
Using LOVs and Create, Edit and Save a Universe
Concept of Calculated Measures and Aggregate Awareness
Business Layer View in SAP IDT
Profitability Analysis and Management Accounting in SAP Simple Finance
Overview of SAP Hana and SAP Hana Finance
Migration and Manual Reposting of Costs in SAP Simple Finance
How to Display Financial Tables in SAP Simple Finance
Concept of Period Lock Transaction in SAP Simple Finance
Concept of Asset Scrapping in SAP Simple Finance
Create Default Account Assignment in SAP Simple Finance
How to Create a Primary Cost in G-L Account
Ledger Management in SAP Simple Finance
Reporting Options and G/L Accounting in SAP Simple Finance
Universal Journal and Document Number in SAP Simple Finance
SAP Simple Finance Architecture and Deployment Options
Alert Monitoring and Logging in SAP Hana
Authentications and Authorization Methods in SAP HANA
DXC Replication Method and CTL Method and MDX provider in SAP Hana
Excel Integration with SAP Hana and Bi 4.0 Connectivity to Hana Views
Working with Reports in SAP Webi
Sending Documents in SAP Web Intelligence
Query Filters and Filters Type in SAP Webi
Queries using Bex and Analysis View in SAP Webi
How to use Formulas and Variables in SAP Webi
How to use Breaks, Sorts and Ranking Data in SAP Webi
How to Create SAP Webi documents
How to achieve Conditional Formatting in SAP Webi
Filtering Report Data in SAP Webi
Drill Options in Reports and Sharing Reports in SAP Webi
Supply Chain Planning and Integrated Business Planning in SAP Hana Logistics
Overview of SAP Hana Simple Logistics
MRP Procedures and Key Features in SAP Simple Logistics
MIGO Transactions in SAP Simple Logistics
Manufacturing Process in SAP Simple Logistics
Invoice Management and Operational Procurement in SAP Simple Logistics
How to Manage Business Partner in SAP Simple Logistics
How to Execute MRP Live planning
How to Create Business Partner in SAP HANA Logistics
Fiori UX and Deployment and Procurement Types in SAP Hana Logistics
Execute Discrete Production in SAP Hana Logistics
Contract Management and Perform Procurement Transfer Stock in SAP Hana Logistics
Concept of Simplification Item in SAP Simple Logistics
Analyze Sales Orders in SAP Simple Logistics
User Administration & Role Management and Security Overview in SAP Hana
Usage of SQL Script in SAP Hana
SQL Triggers, Synonym and Data Profiling in SAP Hana
SQL Overview and Data Types in SAP Hana
SQL Functions and Operators in SAP Hana
SQL Expressions, Stored Procedures and Sequences in SAP Hana
Packages and Attribute and Analytic View in SAP Hana
Modeling and Schemas in SAP HANA
Log Based and ETL Based Replication in SAP Hana
License Management and Auditing in SAP Hana
Information Modeler and System Monitor in SAP HANA
High Availability and Backup and Recovery in SAP Hana
Export and Import Options in Sap Hana
Data Replication Overview in SAP Hana
Analytic Privileges and Information Composer in SAP Hana
Using Filters in SAP BO Analysis
Sheets and Sharing Workspaces in SAP BO Analysis
Perform Conditional Formatting in SAP BO Analysis
Overview of SAP Business Object Analysis
How to create a Workspace in SAP Business Objects
Asset Accounting in SAP Simple Finance
Concept of Integrated Business Planning and Integration of Simple Finance with other Modules
How to Connect to SAP BW in SAP Business Objects
Export Options in SAP BO Analysis
Concept of Sub Analysis in SAP BO
Calculations in SAP BO Analysis
Aggregations and Hierarchies in SAP BO Analysis
SAP IDT - Overview and User Interface
Creating Parameters and Schemas in SAP Universe Designer
How to Start, Stop and Monitor a HANA System
HANA XS Application Service and Data Provisioning in SAP Hana
Data Compression and Solman Integration in SAP Hana
Authentication Methods supported by SAP HANA
Auditing Activities in SAP Hana
Top SAP S4 HANA Logistics Interview Questions and Answers
Top SAP S4 HANA Finance Interview Questions and Answers
Top SAP HANA Interview Questions and Answers
Software Testing
Articles
eBooks
Interview Questions
Videos
Selenium WebDriver
Articles
How to run your Selenium Test Scripts on IE Browser
How to run your Selenium Test Scripts on Firefox Browser
Comparison of Selenium vs QTP and Selenium Tool Suite
How to run your Selenium Test Scripts on Safari Browser
Overview of Selenium WebDriver
Overview of Selenium, its features and limitations
Scrolling an internet page in Selenium WebDriver
Selenium IDE- Locating Strategies by Identifier and By Id
Selenium IDE- Locating Strategies by Name, XPath , CSS and DOM
How to run your Selenium Test Scripts on Chrome Browser
How to Handle Alerts in Selenium WebDriver
Selenium WebDriver - Navigation and Web Element Commands
How to handle radio buttons and checkbox in selenium web driver
Selenium WebDriver - Browser Commands
Selenium WebDriver- Locating Strategies and Handling Drop-downs
Comparison between Selenium WebDriver and Selenium RC
Creating Test Cases Manually in Selenium IDE
How to create Login test suit in Selenium IDE
How to create your First Selenium Automation Test Script
Selenium IDE- Commands (Selenese)
Using Assertions in Selenium WebDriver
Overview of Selenium Integrated Development Environment (IDE)
eBooks
Interview Questions
Videos
How to run your Selenium Test Scripts on IE Browser
How to run your Selenium Test Scripts on Firefox Browser
Comparison of Selenium vs QTP and Selenium Tool Suite
How to run your Selenium Test Scripts on Safari Browser
Overview of Selenium WebDriver
Overview of Selenium, its features and limitations
Scrolling an internet page in Selenium WebDriver
Selenium IDE- Locating Strategies by Identifier and By Id
Selenium IDE- Locating Strategies by Name, XPath , CSS and DOM
How to run your Selenium Test Scripts on Chrome Browser
How to Handle Alerts in Selenium WebDriver
Selenium WebDriver - Navigation and Web Element Commands
How to handle radio buttons and checkbox in selenium web driver
Selenium WebDriver - Browser Commands
Selenium WebDriver- Locating Strategies and Handling Drop-downs
Comparison between Selenium WebDriver and Selenium RC
Creating Test Cases Manually in Selenium IDE
How to create Login test suit in Selenium IDE
How to create your First Selenium Automation Test Script
Selenium IDE- Commands (Selenese)
Using Assertions in Selenium WebDriver
Overview of Selenium Integrated Development Environment (IDE)
Selenium with Maven
Articles
Execute Selenium code through Maven and TestNG
How to Configure Selenium using NUnit in Visual Studio
How to Configure Selenium with Visual Studio in C#
How to handle or download dependency Jar using Maven
Write a Selenium test script using C#
Selenium Test Script using NUnit
How to write a Selenium test script using C#
eBooks
Interview Questions
Videos
Execute Selenium code through Maven and TestNG
How to Configure Selenium using NUnit in Visual Studio
How to Configure Selenium with Visual Studio in C#
How to handle or download dependency Jar using Maven
Write a Selenium test script using C#
Selenium Test Script using NUnit
How to write a Selenium test script using C#
Test NG
Articles
How to Run test cases in TestNG without java compiler
Overview of TestNG and its Features
Importance of XML file in TestNG Configuration
How to use TestNG Annotation Attributes
How to Run test cases with Regex in TestNG
How to install TestNG Framework and Configuration in Eclipse
How to enable and disable test cases in TestNG
eBooks
Interview Questions
Videos
How to Run test cases in TestNG without java compiler
Overview of TestNG and its Features
Importance of XML file in TestNG Configuration
How to use TestNG Annotation Attributes
How to Run test cases with Regex in TestNG
How to install TestNG Framework and Configuration in Eclipse
How to enable and disable test cases in TestNG
How to run your Selenium Test Scripts on IE Browser
How to run your Selenium Test Scripts on Firefox Browser
Comparison of Selenium vs QTP and Selenium Tool Suite
How to run your Selenium Test Scripts on Safari Browser
Overview of Selenium WebDriver
Overview of Selenium, its features and limitations
Scrolling an internet page in Selenium WebDriver
Selenium IDE- Locating Strategies by Identifier and By Id
Selenium IDE- Locating Strategies by Name, XPath , CSS and DOM
How to run your Selenium Test Scripts on Chrome Browser
How to Handle Alerts in Selenium WebDriver
Selenium WebDriver - Navigation and Web Element Commands
How to handle radio buttons and checkbox in selenium web driver
Selenium WebDriver - Browser Commands
Selenium WebDriver- Locating Strategies and Handling Drop-downs
Comparison between Selenium WebDriver and Selenium RC
Creating Test Cases Manually in Selenium IDE
How to create Login test suit in Selenium IDE
How to create your First Selenium Automation Test Script
Execute Selenium code through Maven and TestNG
How to Configure Selenium using NUnit in Visual Studio
How to Configure Selenium with Visual Studio in C#
How to handle or download dependency Jar using Maven
Write a Selenium test script using C#
Selenium Test Script using NUnit
How to write a Selenium test script using C#
Write and Execute the Selenium test script
Using Maven with Selenium TestNG
Selenium IDE- Commands (Selenese)
Using Assertions in Selenium WebDriver
Overview of Selenium Integrated Development Environment (IDE)
How to Run test cases in TestNG without java compiler
Overview of TestNG and its Features
Importance of XML file in TestNG Configuration
How to use TestNG Annotation Attributes
How to Run test cases with Regex in TestNG
How to install TestNG Framework and Configuration in Eclipse
How to enable and disable test cases in TestNG
How to create TestNG Listeners
Top Data Science Interview Questions and Answers
Last updated on Feb 18 2022Table of Contents
Top Data Science Interview Questions and Answers
What is logistic regression in Data Science?
Logistic Regression is also called as the logit model. It is a method to forecast the binary outcome from a linear combination of predictor variables.
Name three types of biases that can occur during sampling
In the sampling process, there are three types of biases, which are:
- Selection bias
- Under coverage bias
- Survivorship bias
Discuss Decision Tree algorithm
A decision tree is a popular supervised machine learning algorithm. It is mainly used for Regression and Classification. It allows breaks down a dataset into smaller subsets. The decision tree can able to handle both categorical and numerical data.
How do you build a random forest model?
A random forest is built up of a number of decision trees. If you split the data into different packages and make a decision tree in each of the different groups of data, the random forest brings all those trees together.
Steps to build a random forest model:
- Randomly select ‘k’ features from a total of ‘m’ features where k << m
- Among the ‘k’ features, calculate the node D using the best split point
- Split the node into daughter nodes using the best split
- Repeat steps two and three until leaf nodes are finalized
- Build forest by repeating steps one to four for ‘n’ times to create ‘n’ number of trees
How can you avoid the overfitting your model?
Overfitting refers to a model that is only set for a very small amount of data and ignores the bigger picture. There are three main methods to avoid overfitting:
- Keep the model simple—take fewer variables into account, thereby removing some of the noise in the training data
- Use cross-validation techniques, such as k folds cross-validation
- Use regularization techniques, such as LASSO, that penalize certain model parameters if they’re likely to cause overfitting
What are the differences between supervised and unsupervised learning?
How is logistic regression done?
Logistic regression measures the relationship between the dependent variable (our label of what we want to predict) and one or more independent variables (our features) by estimating probability using its underlying logistic function (sigmoid).
The image shown below depicts how logistic regression works:
Explain the steps in making a decision tree.
- Take the entire data set as input
- Calculate entropy of the target variable, as well as the predictor attributes
- Calculate your information gain of all attributes (we gain information on sorting different objects from each other)
- Choose the attribute with the highest information gain as the root node
- Repeat the same procedure on every branch until the decision node of each branch is finalized
For example, let’s say you want to build a decision tree to decide whether you should accept or decline a job offer. The decision tree for this case is as shown:
It is clear from the decision tree that an offer is accepted if:
- Salary is greater than $,
- The commute is less than an hour
- Incentives are offered
Differentiate between univariate, bivariate, and multivariate analysis.
Univariate
Univariate data contains only one variable. The purpose of the univariate analysis is to describe the data and find patterns that exist within it.
Example: height of students
The patterns can be studied by drawing conclusions using mean, median, mode, dispersion or range, minimum, maximum, etc.
Bivariate
Bivariate data involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to determine the relationship between the two variables.
Example: temperature and ice cream sales in the summer season
Here, the relationship is visible from the table that temperature and sales are directly proportional to each other. The hotter the temperature, the better the sales.
Multivariate
Multivariate data involves three or more variables, it is categorized under multivariate. It is similar to a bivariate but contains more than one dependent variable.
Example: data for house price prediction
he patterns can be studied by drawing conclusions using mean, median, and mode, dispersion or range, minimum, maximum, etc. You can start describing the data and using it to guess what the price of the house will be.
What are the feature selection methods used to select the right variables?
There are two main methods for feature selection, i.e, filter, and wrapper methods.
Filter Methods
This involves:
- Linear discrimination analysis
- ANOVA
- Chi-Square
The best analogy for selecting features is “bad data in, bad answer out.” When we’re limiting or selecting the features, it’s all about cleaning up the data coming in.
Wrapper Methods
This involves:
- Forward Selection: We test one feature at a time and keep adding them until we get a good fit
- Backward Selection: We test all the features and start removing them to see what works better
- Recursive Feature Elimination: Recursively looks through all the different features and how they pair together
Wrapper methods are very labor-intensive, and high-end computers are needed if a lot of data analysis is performed with the wrapper method.
You are given a data set consisting of variables with more than percent missing values. How will you deal with them?
The following are ways to handle missing data values:
If the data set is large, we can just simply remove the rows with missing data values. It is the quickest way; we use the rest of the data to predict the values.
For smaller data sets, we can substitute missing values with the mean or average of the rest of the data using the pandas’ data frame in python. There are different ways to do so, such as df.mean(), df.fillna(mean).
For the given points, how will you calculate the Euclidean distance in Python?
plot = [,]
plot = [,]
The Euclidean distance can be calculated as follows:
euclidean_distance = sqrt( (plot[]-plot[])** + (plot[]-plot[])** )
What are dimensionality reduction and its benefits?
Dimensionality reduction refers to the process of converting a data set with vast dimensions into data with fewer dimensions (fields) to convey similar information concisely.
This reduction helps in compressing data and reducing storage space. It also reduces computation time as fewer dimensions lead to less computing. It removes redundant features; for example, there’s no point in storing a value in two different units (meters and inches).
How will you calculate eigenvalues and eigenvectors of the following x matrix?
– | – | |
– | ||
The characteristic equation is as shown:
Expanding determinant:
(- – λ) [(-λ) (-λ)-x] + [(-) x (-λ) -x] + [(-) x -(-λ)] =
– λ + λ + λ – = ,
λ – λ – λ + =
Here we have an algebraic equation built from the eigenvectors.
By hit and trial:
– x – x + =
Hence, (λ – ) is a factor:
λ – λ – λ + = (λ – ) (λ – λ – )
Eigenvalues are ,-,:
(λ – ) (λ – λ – ) = (λ – ) (λ+) (λ-),
Calculate eigenvector for λ =
For X = ,
– – Y + Z =,
– – Y + Z =
Subtracting the two equations:
+ Y = ,
Subtracting back into second equation:
Y = -(/)
Z = -(/)
Similarly, we can calculate the eigenvectors for – and .
How should you maintain a deployed model?
The steps to maintain a deployed model are:
Monitor
Constant monitoring of all models is needed to determine their performance accuracy. When you change something, you want to figure out how your changes are going to affect things. This needs to be monitored to ensure it’s doing what it’s supposed to do.
Evaluate
Evaluation metrics of the current model are calculated to determine if a new algorithm is needed.
Compare
The new models are compared to each other to determine which model performs the best.
Rebuild
The best performing model is re-built on the current state of data.
What are recommender systems?
A recommender system predicts what a user would rate a specific product based on their preferences. It can be split into two different areas:
Collaborative Filtering
As an example, Last.fm recommends tracks that other users with similar interests play often. This is also commonly seen on Amazon after making a purchase; customers may notice the following message accompanied by product recommendations: “Users who bought this also bought…”
Content-based Filtering
As an example: Pandora uses the properties of a song to recommend music with similar properties. Here, we look at content, instead of looking at who else is listening to music.
MSE and MSE are two of the most common measures of accuracy for a linear regression model.
RMSE indicates the Root Mean Square Error.
MSE indicates the Mean Square Error.
How can you select k for k-means?
We use the elbow method to select k for k-means clustering. The idea of the elbow method is to run k-means clustering on the data set where ‘k’ is the number of clusters.
Within the sum of squares (WSS), it is defined as the sum of the squared distance between each member of the cluster and its centroid.
What is the significance of p-value?
p-value typically ≤ .
This indicates strong evidence against the null hypothesis; so you reject the null hypothesis.
p-value typically > .
This indicates weak evidence against the null hypothesis, so you accept the null hypothesis.
p-value at cutoff .
This is considered to be marginal, meaning it could go either way.
How can outlier values be treated?
You can drop outliers only if it is a garbage value.
Example: height of an adult = abc ft. This cannot be true, as the height cannot be a string value. In this case, outliers can be removed.
If the outliers have extreme values, they can be removed. For example, if all the data points are clustered between zero to , but one point lies at , then we can remove this point.
If you cannot drop outliers, you can try the following:
- Try a different model. Data detected as outliers by linear models can be fit by nonlinear models. Therefore, be sure you are choosing the correct model.
- Try normalizing the data. This way, the extreme data points are pulled to a similar range.
- You can use algorithms that are less affected by outliers; an example would be random forests.
How can a time-series data be declared as stationery?
It is stationary when the variance and mean of the series are constant with time.
Here is a visual example:
In the first graph, the variance is constant with time. Here, X is the time factor and Y is the variable. The value of Y goes through the same points all the time; in other words, it is stationary.
In the second graph, the waves get bigger, which means it is non-stationary and the variance is changing with time.
How can you calculate accuracy using a confusion matrix?
Consider this confusion matrix:
You can see the values for total data, actual values, and predicted values.
The formula for accuracy is:
Accuracy = (True Positive + True Negative) / Total Observations
= ( + ) /
= /
= .
As a result, we get an accuracy of percent.
Write the equation and calculate the precision and recall rate.
Consider the same confusion matrix used in the previous question.
Precision = (True positive) / (True Positive + False Positive)
= /
= .
Recall Rate = (True Positive) / (Total Positive + False Negative)
= /
= .
‘People who bought this also bought…’ recommendations seen on Amazon are a result of which algorithm?
The recommendation engine is accomplished with collaborative filtering. Collaborative filtering explains the behavior of other users and their purchase history in terms of ratings, selection, etc.
The engine makes predictions on what might interest a person based on the preferences of other users. In this algorithm, item features are unknown.
For example, a sales page shows that a certain number of people buy a new phone and also buy tempered glass at the same time. Next time, when a person buys a phone, he or she may see a recommendation to buy tempered glass as well.
Write a basic SQL query that lists all orders with customer information.
Usually, we have order tables and customer tables that contain the following columns:
Order Table
Orderid
customerId
OrderNumber
TotalAmount
Customer Table
Id
FirstName
LastName
City
Country
The SQL query is:
SELECT OrderNumber, TotalAmount, FirstName, LastName, City, Country
FROM Order
JOIN Customer
ON Order.CustomerId = Customer.Id
You are given a dataset on cancer detection. You have built a classification model and achieved an accuracy of percent. Why shouldn’t you be happy with your model performance? What can you do about it?
Cancer detection results in imbalanced data. In an imbalanced dataset, accuracy should not be based as a measure of performance. It is important to focus on the remaining four percent, which represents the patients who were wrongly diagnosed. Early diagnosis is crucial when it comes to cancer detection, and can greatly improve a patient’s prognosis.
Hence, to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine the class wise performance of the classifier.
Which of the following machine learning algorithms can be used for inputting missing values of both categorical and continuous variables?
- K-means clustering
- Linear regression
- K-NN (k-nearest neighbor)
- Decision trees
The K nearest neighbor algorithm can be used because it can compute the nearest neighbor and if it doesn’t have a value, it just computes the nearest neighbor based on all the other features.
When you’re dealing with K-means clustering or linear regression, you need to do that in your pre-processing, otherwise, they’ll crash. Decision trees also have the same problem, although there is some variance.
Below are the eight actual values of the target variable in the train file. What is the entropy of the target variable?
[, , , , , , , ]
Choose the correct answer.
- -(/ log(/) + / log(/))
- / log(/) + / log(/)
- / log(/) + / log(/)
- / log(/) – / log(/)
The target variable, in this case, is .
The formula for calculating the entropy is:
Putting p= and n=, we get
Entropy = A = -(/ log(/) + / log(/))
We want to predict the probability of death from heart disease based on three risk factors: age, gender, and blood cholesterol level. What is the most appropriate algorithm for this case?
Choose the correct option:
- Logistic Regression
- Linear Regression
- K-means clustering
- Apriori algorithm
The most appropriate algorithm for this case is A, logistic regression.
After studying the behavior of a population, you have identified four specific individual types that are valuable to your study. You would like to find all users who are most similar to each individual type. Which algorithm is most appropriate for this study?
Choose the correct option:
- K-means clustering
- Linear regression
- Association rules
- Decision trees
As we are looking for grouping people together specifically by four different similarities, it indicates the value of k. Therefore, K-means clustering (answer A) is the most appropriate algorithm for this study.
You have run the association rules algorithm on your dataset, and the two rules {banana, apple} => {grape} and {apple, orange} => {grape} have been found to be relevant. What else must be true?
Choose the right answer:
- {banana, apple, grape, orange} must be a frequent itemset
- {banana, apple} => {orange} must be a relevant rule
- {grape} => {banana, apple} must be a relevant rule
- {grape, apple} must be a frequent itemset
The answer is A: {grape, apple} must be a frequent itemset
Your organization has a website where visitors randomly receive one of two coupons. It is also possible that visitors to the website will not receive a coupon. You have been asked to determine if offering a coupon to website visitors has any impact on their purchase decisions. Which analysis method should you use?
- One-way ANOVA
- K-means clustering
- Association rules
- Student’s t-test
The answer is A: One-way ANOVA
What are the feature vectors?
A feature vector is an n-dimensional vector of numerical features that represent an object. In machine learning, feature vectors are used to represent numeric or symbolic characteristics (called features) of an object in a mathematical way that’s easy to analyze.
What are the steps in making a decision tree?
- Take the entire data set as input.
- Look for a split that maximizes the separation of the classes. A split is any test that divides the data into two sets.
- Apply the split to the input data (divide step).
- Re-apply steps one and two to the divided data.
- Stop when you meet any stopping criteria.
- This step is called pruning. Clean up the tree if you went too far doing splits.
What is root cause analysis?
Root cause analysis was initially developed to analyze industrial accidents but is now widely used in other areas. It is a problem-solving technique used for isolating the root causes of faults or problems. A factor is called a root cause if its deduction from the problem-fault-sequence averts the final undesirable event from recurring.
What is logistic regression?
Logistic regression is also known as the logit model. It is a technique used to forecast the binary outcome from a linear combination of predictor variables.
What are recommender systems?
Recommender systems are a subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product.
Explain cross-validation.
Cross-validation is a model validation technique for evaluating how the outcomes of a statistical analysis will generalize to an independent data set. It is mainly used in backgrounds where the objective is to forecast and one wants to estimate how accurately a model will accomplish in practice.
The goal of cross-validation is to term a data set to test the model in the training phase (i.e. validation data set) to limit problems like overfitting and gain insight into how the model will generalize to an independent data set.
What is collaborative filtering?
Most recommender systems use this filtering process to find patterns and information by collaborating perspectives, numerous data sources, and several agents.
Do gradient descent methods always converge to similar points?
They do not, because in some cases, they reach a local minimum or local optima point. You would not reach the global optima point. This is governed by the data and the starting conditions.
What is the goal of A/B Testing?
This is statistical hypothesis testing for randomized experiments with two variables, A and B. The objective of A/B testing is to detect any changes to a web page to maximize or increase the outcome of a strategy.
What are the drawbacks of the linear model?
- The assumption of linearity of the errors
- It can’t be used for count outcomes or binary outcomes
- There are overfitting problems that it can’t solve
What is the law of large numbers?
It is a theorem that describes the result of performing the same experiment very frequently. This theorem forms the basis of frequency-style thinking. It states that the sample mean, sample variance, and sample standard deviation converge to what they are trying to estimate.
What are the confounding variables?
These are extraneous variables in a statistical model that correlates directly or inversely with both the dependent and the independent variable. The estimate fails to account for the confounding factor.
What is star schema?
It is a traditional database schema with a central table. Satellite tables map IDs to physical names or descriptions and can be connected to the central fact table using the ID fields; these tables are known as lookup tables and are principally useful in real-time applications, as they save a lot of memory. Sometimes, star schemas involve several layers of summarization to recover information faster.
How regularly must an algorithm be updated?
You will want to update an algorithm when:
- You want the model to evolve as data streams through infrastructure
- The underlying data source is changing
- There is a case of non-stationarity
What are eigenvalue and eigenvector?
Eigenvalues are the directions along which a particular linear transformation acts by flipping, compressing, or stretching.
Eigenvectors are for understanding linear transformations. In data analysis, we usually calculate the eigenvectors for a correlation or covariance matrix.
Why is resampling done?
Resampling is done in any of these cases:
- Estimating the accuracy of sample statistics by using subsets of accessible data, or drawing randomly with replacement from a set of data points
- Substituting labels on data points when performing significance tests
- Validating models by using random subsets (bootstrapping, cross-validation)
What is selection bias?
Selection bias, in general, is a problematic situation in which error is introduced due to a non-random population sample.
What are the types of biases that can occur during sampling?
- Selection bias
- Undercoverage bias
- Survivorship bias
What is survivorship bias?
Survivorship bias is the logical error of focusing on aspects that support surviving a process and casually overlooking those that did not because of their lack of prominence. This can lead to wrong conclusions in numerous ways.
How do you work towards a random forest?
The underlying principle of this technique is that several weak learners combine to provide a strong learner. The steps involved are:
- Build several decision trees on bootstrapped training samples of data
- On each tree, each time a split is considered, a random sample of mm predictors is chosen as split candidates out of all pp predictors
- Rule of thumb: At each split m=p√m=p
- Predictions: At the majority rule
What is Selection Bias?
Selection bias is a kind of error that occurs when the researcher decides who is going to be studied. It is usually associated with research where the selection of participants isn’t random. It is sometimes referred to as the selection effect. It is the distortion of statistical analysis, resulting from the method of collecting samples. If the selection bias is not taken into account, then some conclusions of the study may not be accurate.
The types of selection bias include:
- Sampling bias: It is a systematic error due to a non-random sample of a population causing some members of the population to be less likely to be included than others resulting in a biased sample.
- Time interval: A trial may be terminated early at an extreme value (often for ethical reasons), but the extreme value is likely to be reached by the variable with the largest variance, even if all variables have a similar mean.
- Data: When specific subsets of data are chosen to support a conclusion or rejection of bad data on arbitrary grounds, instead of according to previously stated or generally agreed criteria.
- Attrition: Attrition bias is a kind of selection bias caused by attrition (loss of participants) discounting trial subjects/tests that did not run to completion.
What is bias-variance trade-off?
Bias: Bias is an error introduced in your model due to oversimplification of the machine learning algorithm. It can lead to underfitting. When you train your model at that time model makes simplified assumptions to make the target function easier to understand.
Low bias machine learning algorithms — Decision Trees, k-NN and SVM High bias machine learning algorithms — Linear Regression, Logistic Regression
Variance: Variance is error introduced in your model due to complex machine learning algorithm, your model learns noise also from the training data set and performs badly on test data set. It can lead to high sensitivity and overfitting.
Normally, as you increase the complexity of your model, you will see a reduction in error due to lower bias in the model. However, this only happens until a particular point. As you continue to make your model more complex, you end up over-fitting your model and hence your model will start suffering from high variance.
Bias-Variance trade-off: The goal of any supervised machine learning algorithm is to have low bias and low variance to achieve good prediction performance.
- The k-nearest neighbour algorithm has low bias and high variance, but the trade-off can be changed by increasing the value of k which increases the number of neighbours that contribute to the prediction and in turn increases the bias of the model.
- The support vector machine algorithm has low bias and high variance, but the trade-off can be changed by increasing the C parameter that influences the number of violations of the margin allowed in the training data which increases the bias but decreases the variance.
There is no escaping the relationship between bias and variance in machine learning. Increasing the bias will decrease the variance. Increasing the variance will decrease bias.
What is a confusion matrix?
The confusion matrix is a X table that contains outputs provided by the binary classifier. Various measures, such as error-rate, accuracy, specificity, sensitivity, precision and recall are derived from it. Confusion Matrix
A data set used for performance evaluation is called a test data set. It should contain the correct labels and predicted labels.
The predicted labels will exactly the same if the performance of a binary classifier is perfect.
The predicted labels usually match with part of the observed labels in real-world scenarios.
A binary classifier predicts all data instances of a test data set as either positive or negative. This produces four outcomes-
- True-positive(TP) — Correct positive prediction
- False-positive(FP) — Incorrect positive prediction
- True-negative(TN) — Correct negative prediction
- False-negative(FN) — Incorrect negative prediction
Basic measures derived from the confusion matrix
- Error Rate = (FP+FN)/(P+N)
- Accuracy = (TP+TN)/(P+N)
- Sensitivity(Recall or True positive rate) = TP/P
- Specificity(True negative rate) = TN/N
- Precision(Positive predicted value) = TP/(TP+FP)
- F-Score(Harmonic mean of precision and recall) = (+b)(PREC.REC)/(b²PREC+REC) where b is commonly ., , .
What is the difference between “long” and “wide” format data?
In the wide-format, a subject’s repeated responses will be in a single row, and each response is in a separate column. In the long-format, each row is a one-time point per subject. You can recognize data in wide format by the fact that columns generally represent groups.
What do you understand by the term Normal Distribution?
Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up.
However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell-shaped curve.
Figure: Normal distribution in a bell curve
The random variables are distributed in the form of a symmetrical, bell-shaped curve.
Properties of Normal Distribution are as follows;
- Unimodal -one mode
- Symmetrical -left and right halves are mirror images
- Bell-shaped -maximum height (mode) at the mean
- Mean, Mode, and Median are all located in the center
- Asymptotic
What is correlation and covariance in statistics?
Covariance and Correlation are two mathematical concepts; these two approaches are widely used in statistics. Both Correlation and Covariance establish the relationship and also measure the dependency between two random variables. Though the work is similar between these two in mathematical terms, they are different from each other.
Correlation: Correlation is considered or described as the best technique for measuring and also for estimating the quantitative relationship between two variables. Correlation measures how strongly two variables are related.
Covariance: In covariance two items vary together and it’s a measure that indicates the extent to which two random variables change in cycle. It is a statistical term; it explains the systematic relation between a pair of random variables, wherein changes in one variable reciprocal by a corresponding change in another variable.
What is the difference between Point Estimates and Confidence Interval?
Point Estimation gives us a particular value as an estimate of a population parameter. Method of Moments and Maximum Likelihood estimator methods are used to derive Point Estimators for population parameters.
A confidence interval gives us a range of values which is likely to contain the population parameter. The confidence interval is generally preferred, as it tells us how likely this interval is to contain the population parameter. This likeliness or probability is called Confidence Level or Confidence coefficient and represented by — alpha, where alpha is the level of significance.
What is the goal of A/B Testing?
It is a hypothesis testing for a randomized experiment with two variables A and B.
The goal of A/B Testing is to identify any changes to the web page to maximize or increase the outcome of interest. A/B testing is a fantastic method for figuring out the best online promotional and marketing strategies for your business. It can be used to test everything from website copy to sales emails to search ads
An example of this could be identifying the click-through rate for a banner ad.
What is p-value?
When you perform a hypothesis test in statistics, a p-value can help you determine the strength of your results. p-value is a number between and . Based on the value it will denote the strength of the results. The claim which is on trial is called the Null Hypothesis.
Low p-value (≤ .) indicates strength against the null hypothesis which means we can reject the null Hypothesis. High p-value (≥ .) indicates strength for the null hypothesis which means we can accept the null Hypothesis p-value of . indicates the Hypothesis could go either way. To put it in another way,
High P values: your data are likely with a true null. Low P values: your data are unlikely with a true null.
In any -minute interval, there is a % probability that you will see at least one shooting star. What is the probability that you see at least one shooting star in the period of an hour?
Probability of not seeing any shooting star in minutes is
= – P( Seeing one shooting star )
= – . = .
Probability of not seeing any shooting star in the period of one hour
= (.) ^ = .
Probability of seeing at least one shooting star in the one hour
= – P( Not seeing any star )
= – . = .
How can you generate a random number between – with only a die?
- Any die has six sides from -. There is no way to get seven equal outcomes from a single rolling of a die. If we roll the die twice and consider the event of two rolls, we now have different outcomes.
- To get our equal outcomes we have to reduce this to a number divisible by . We can thus consider only outcomes and exclude the other one.
- A simple scenario can be to exclude the combination (,), i.e., to roll the die again if appears twice.
- All the remaining combinations from (,) till (,) can be divided into parts of each. This way all the seven sets of outcomes are equally likely.
A certain couple tells you that they have two children, at least one of which is a girl. What is the probability that they have two girls?
In the case of two children, there are equally likely possibilities
BB, BG, GB and GG;
where B = Boy and G = Girl and the first letter denotes the first child.
From the question, we can exclude the first case of BB. Thus from the remaining possibilities of BG, GB & BB, we have to find the probability of the case with two girls.
Thus, P(Having two girls given one girl) = /
A jar has coins, of which are fair and is double headed. Pick a coin at random, and toss it times. Given that you see heads, what is the probability that the next toss of that coin is also a head?
There are two ways of choosing the coin. One is to pick a fair coin and the other is to pick the one with two heads.
Probability of selecting fair coin = / = .
Probability of selecting unfair coin = / = .
Selecting heads in a row = Selecting fair coin * Getting heads + Selecting an unfair coin
P (A) = . * (/)^ = . * (/) = .
P (B) = . * = .
P( A / A + B ) = . / (. + .) = .
P( B / A + B ) = . / . = .
Probability of selecting another head = P(A/A+B) * . + P(B/A+B) * = . * . + . = .
What do you understand by statistical power of sensitivity and how do you calculate it?
Sensitivity is commonly used to validate the accuracy of a classifier (Logistic, SVM, Random Forest etc.).
Sensitivity is nothing but “Predicted True events/ Total events”. True events here are the events which were true and model also predicted them as true.
Calculation of seasonality is pretty straightforward.
Seasonality = (True Positives) / (Positives in Actual Dependent Variable)
Why Is Re-sampling Done?
Resampling is done in any of these cases:
- Estimating the accuracy of sample statistics by using subsets of accessible data or drawing randomly with replacement from a set of data points
- Substituting labels on data points when performing significance tests
- Validating models by using random subsets (bootstrapping, cross-validation)
What are the differences between over-fitting and under-fitting?
In statistics and machine learning, one of the most common tasks is to fit a model to a set of training data, so as to be able to make reliable predictions on general untrained data.
In overfitting, a statistical model describes random error or noise instead of the underlying relationship. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfitted, has poor predictive performance, as it overreacts to minor fluctuations in the training data.
Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Underfitting would occur, for example, when fitting a linear model to non-linear data. Such a model too would have poor predictive performance.
How to combat Overfitting and Underfitting?
To combat overfitting and underfitting, you can resample the data to estimate the model accuracy (k-fold cross-validation) and by having a validation dataset to evaluate the model.
What is regularisation? Why is it useful?
Regularisation is the process of adding tuning parameter to a model to induce smoothness in order to prevent overfitting. This is most often done by adding a constant multiple to an existing weight vector. This constant is often the L(Lasso) or L(ridge). The model predictions should then minimize the loss function calculated on the regularized training set.
What Is the Law of Large Numbers?
It is a theorem that describes the result of performing the same experiment a large number of times. This theorem forms the basis of frequency-style thinking. It says that the sample means, the sample variance and the sample standard deviation converge to what they are trying to estimate.
What Are Confounding Variables?
In statistics, a confounder is a variable that influences both the dependent variable and independent variable.
For example, if you are researching whether a lack of exercise leads to weight gain,
lack of exercise = independent variable
weight gain = dependent variable.
A confounding variable here would be any other variable that affects both of these variables, such as the age of the subject.
What Are the Types of Biases That Can Occur During Sampling?
- Selection bias
- Under coverage bias
- Survivorship bias
What is Survivorship Bias?
It is the logical error of focusing aspects that support surviving some process and casually overlooking those that did not work because of their lack of prominence. This can lead to wrong conclusions in numerous different means.
What is selection Bias?
Selection bias occurs when the sample obtained is not representative of the population intended to be analysed.
Explain how a ROC curve works?
The ROC curve is a graphical representation of the contrast between true positive rates and false-positive rates at various thresholds. It is often used as a proxy for the trade-off between the sensitivity(true positive rate) and false-positive rate.
What is TF/IDF vectorization?
TF–IDF is short for term frequency-inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining.
The TF–IDF value increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.
Why we generally use Softmax non-linearity function as last operation in-network?
It is because it takes in a vector of real numbers and returns a probability distribution. Its definition is as follows. Let x be a vector of real numbers (positive, negative, whatever, there are no constraints).
Then the i’th component of Softmax(x) is —
It should be clear that the output is a probability distribution: each element is non-negative and the sum over all components is .
Python or R – Which one would you prefer for text analytics?
We will prefer Python because of the following reasons:
- Python would be the best option because it has Pandas library that provides easy to use data structures and high-performance data analysis tools.
- R is more suitable for machine learning than just text analysis.
- Python performs faster for all types of text analytics.
How does data cleaning plays a vital role in the analysis?
Data cleaning can help in analysis because:
- Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with.
- Data Cleaning helps to increase the accuracy of the model in machine learning.
- It is a cumbersome process because as the number of data sources increases, the time taken to clean the data increases exponentially due to the number of sources and the volume of data generated by these sources.
- It might take up to % of the time for just cleaning data making it a critical part of the analysis task.
Differentiate between univariate, bivariate and multivariate analysis.
Univariate analyses are descriptive statistical analysis techniques which can be differentiated based on the number of variables involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can the analysis can be referred to as univariate analysis.
The bivariate analysis attempts to understand the difference between two variables at a time as in a scatterplot. For example, analyzing the volume of sale and spending can be considered as an example of bivariate analysis.
Multivariate analysis deals with the study of more than two variables to understand the effect of variables on the responses.
Explain Star Schema.
It is a traditional database schema with a central table. Satellite tables map IDs to physical names or descriptions and can be connected to the central fact table using the ID fields; these tables are known as lookup tables and are principally useful in real-time applications, as they save a lot of memory. Sometimes star schemas involve several layers of summarization to recover information faster.
Can you cite some examples where a false positive is important than a false negative?
Let us first understand what false positives and false negatives are.
- False Positives are the cases where you wrongly classified a non-event as an event a.k.a Type I error.
- False Negatives are the cases where you wrongly classify events as non-events, a.k.a Type II error.
Example : In the medical field, assume you have to give chemotherapy to patients. Assume a patient comes to that hospital and he is tested positive for cancer, based on the lab prediction but he actually doesn’t have cancer. This is a case of false positive. Here it is of utmost danger to start chemotherapy on this patient when he actually does not have cancer. In the absence of cancerous cell, chemotherapy will do certain damage to his normal healthy cells and might lead to severe diseases, even cancer.
Example : Let’s say an e-commerce company decided to give $ Gift voucher to the customers whom they assume to purchase at least $, worth of items. They send free voucher mail directly to customers without any minimum purchase condition because they assume to make at least % profit on sold items above $,. Now the issue is if we send the $ gift vouchers to customers who have not actually purchased anything but are marked as having made $, worth of purchase.
Can you cite some examples where a false negative important than a false positive?
Example : Assume there is an airport ‘A’ which has received high-security threats and based on certain characteristics they identify whether a particular passenger can be a threat or not. Due to a shortage of staff, they decide to scan passengers being predicted as risk positives by their predictive model. What will happen if a true threat customer is being flagged as non-threat by airport model?
Example : What if Jury or judge decides to make a criminal go free?
Example : What if you rejected to marry a very good person based on your predictive model and you happen to meet him/her after a few years and realize that you had a false negative?
Can you cite some examples where both false positive and false negatives are equally important?
In the Banking industry giving loans is the primary source of making money but at the same time if your repayment rate is not good you will not make any profit, rather you will risk huge losses.
Banks don’t want to lose good customers and at the same point in time, they don’t want to acquire bad customers. In this scenario, both the false positives and false negatives become very important to measure.
What is Cluster Sampling?
Cluster sampling is a technique used when it becomes difficult to study the target population spread across a wide area and simple random sampling cannot be applied. Cluster Sample is a probability sample where each sampling unit is a collection or cluster of elements.
For eg., A researcher wants to survey the academic performance of high school students in Japan. He can divide the entire population of Japan into different clusters (cities). Then the researcher selects a number of clusters depending on his research through simple or systematic random sampling.
Let’s continue our Data Science Interview Questions blog with some more statistics questions.
What is Systematic Sampling?
Systematic sampling is a statistical technique where elements are selected from an ordered sampling frame. In systematic sampling, the list is progressed in a circular manner so once you reach the end of the list, it is progressed from the top again. The best example of systematic sampling is equal probability method.
What are Eigenvectors and Eigenvalues?
Eigenvectors are used for understanding linear transformations. In data analysis, we usually calculate the eigenvectors for a correlation or covariance matrix. Eigenvectors are the directions along which a particular linear transformation acts by flipping, compressing or stretching.
Eigenvalue can be referred to as the strength of the transformation in the direction of eigenvector or the factor by which the compression occurs.
Can you explain the difference between a Validation Set and a Test Set?
AValidation set can be considered as a part of the training set as it is used for parameter selection and to avoid overfitting of the model being built.
On the other hand, a Test Set is used for testing or evaluating the performance of a trained machine learning model.
In simple terms, the differences can be summarized as; training set is to fit the parameters i.e. weights and test set is to assess the performance of the model i.e. evaluating the predictive power and generalization.
Explain cross-validation.
Cross-validation is a model validation technique for evaluating how the outcomes of statistical analysis will generalize to an independent dataset. Mainly used in backgrounds where the objective is forecast and one wants to estimate how accurately a model will accomplish in practice.
The goal of cross-validation is to term a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting and get an insight on how the model will generalize to an independent data set.
What is Machine Learning?
Machine Learning explores the study and construction of algorithms that can learn from and make predictions on data. Closely related to computational statistics. Used to devise complex models and algorithms that lend themselves to a prediction which in commercial use is known as predictive analytics. Given below, is an image representing the various domains Machine Learning lends itself to.
What is Supervised Learning?
Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples.
Algorithms: Support Vector Machines, Regression, Naive Bayes, Decision Trees, K-nearest Neighbor Algorithm and Neural Networks
E.g. If you built a fruit classifier, the labels will be “this is an orange, this is an apple and this is a banana”, based on showing the classifier examples of apples, oranges and bananas.
What is Unsupervised learning?
Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses.
Algorithms: Clustering, Anomaly Detection, Neural Networks and Latent Variable Models
E.g. In the same example, a fruit clustering will categorize as “fruits with soft skin and lots of dimples”, “fruits with shiny hard skin” and “elongated yellow fruits”.
What are the various classification algorithms?
The diagram lists the most important classification algorithms.
What is ‘Naive’ in a Naive Bayes?
The Naive Bayes Algorithm is based on the Bayes Theorem. Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
The Algorithm is ‘naive’ because it makes assumptions that may or may not turn out to be correct.
Explain SVM algorithm in detail.
SVM stands for support vector machine, it is a supervised machine learning algorithm which can be used for both Regression and Classification. If you have n features in your training data set, SVM tries to plot it in n-dimensional space with the value of each feature being the value of a particular coordinate. SVM uses hyperplanes to separate out different classes based on the provided kernel function.
What are the different kernels in SVM?
There are four types of kernels in SVM.
- Linear Kernel
- Polynomial kernel
- Radial basis kernel
- Sigmoid kernel
Explain Decision Tree algorithm in detail.
A decision tree is a supervised machine learning algorithm mainly used for Regression and Classification. It breaks down a data set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision tree can handle both categorical and numerical data.
What are Entropy and Information gain in Decision tree algorithm?
The core algorithm for building a decision tree is called ID. ID uses Entropy and Information Gain to construct a decision tree.
Why do you need to perform resampling?
Resampling is done in below-given cases:
- Estimating the accuracy of sample statistics by drawing randomly with replacement from a set of the data point or using as subsets of accessible data
- Substituting labels on data points when performing necessary tests
- Validating models by using random subsets
List out the libraries in Python used for Data Analysis and Scientific Computations.
- SciPy
- Pandas
- Matplotlib
- NumPy
- SciKit
- Seaborn
What is Power Analysis?
The power analysis is an integral part of the experimental design. It helps you to determine the sample size requires to find out the effect of a given size from a cause with a specific level of assurance. It also allows you to deploy a particular probability in a sample size constraint.
Explain Collaborative filtering
Collaborative filtering used to search for correct patterns by collaborating viewpoints, multiple data sources, and various agents.
What is bias?
Bias is an error introduced in your model because of the oversimplification of a machine learning algorithm.” It can lead to underfitting.
Discuss ‘Naive’ in a Naive Bayes algorithm?
The Naive Bayes Algorithm model is based on the Bayes Theorem. It describes the probability of an event. It is based on prior knowledge of conditions which might be related to that specific event.
What is a Linear Regression?
Linear regression is a statistical programming method where the score of a variable ‘A’ is predicted from the score of a second variable ‘B’. B is referred to as the predictor variable and A as the criterion variable.
State the difference between the expected value and mean value
They are not many differences, but both of these terms are used in different contexts. Mean value is generally referred to when you are discussing a probability distribution whereas expected value is referred to in the context of a random variable.
What the aim of conducting A/B Testing?
AB testing used to conduct random experiments with two variables, A and B. The goal of this testing method is to find out changes to a web page to maximize or increase the outcome of a strategy.
What is Ensemble Learning?
The ensemble is a method of combining a diverse set of learners together to improvise on the stability and predictive power of the model. Two types of Ensemble learning methods are:
Bagging
Bagging method helps you to implement similar learners on small sample populations. It helps you to make nearer predictions.
Boosting
Boosting is an iterative method which allows you to adjust the weight of an observation depends upon the last classification. Boosting decreases the bias error and helps you to build strong predictive models.
Discuss Artificial Neural Networks
Artificial Neural networks (ANN) are a special set of algorithms that have revolutionized machine learning. It helps you to adapt according to changing input. So the network generates the best possible result without redesigning the output criteria.
What is Back Propagation?
Back-propagation is the essence of neural net training. It is the method of tuning the weights of a neural net depend upon the error rate obtained in the previous epoch. Proper tuning of the helps you to reduce error rates and to make the model reliable by increasing its generalization.
What is the K-means clustering method?
K-means clustering is an important unsupervised learning method. It is the technique of classifying data using a certain set of clusters which is called K clusters. It is deployed for grouping to find out the similarity in the data.
Explain the difference between Data Science and Data Analytics
Data Scientists need to slice data to extract valuable insights that a data analyst can apply to real-world business scenarios. The main difference between the two is that the data scientists have more technical knowledge then business analyst. Moreover, they don’t need an understanding of the business required for data visualization.
Explain the method to collect and analyze data to use social media to predict the weather condition.
You can collect social media data using Facebook, twitter, Instagram’s API’s. For example, for the tweeter, we can construct a feature from each tweet like tweeted date, retweets, list of follower, etc. Then you can use a multivariate time series model to predict the weather condition.
When do you need to update the algorithm in Data science?
You need to update an algorithm in the following situation:
- You want your data model to evolve as data streams using infrastructure
- The underlying data source is changing
If it is non-stationarity
Explain the benefits of using statistics by Data Scientists
Statistics help Data scientist to get a better idea of customer’s expectation. Using the statistic method Data Scientists can get knowledge regarding consumer interest, behavior, engagement, retention, etc. It also helps you to build powerful data models to validate certain inferences and predictions.
Name various types of Deep Learning Frameworks
- Pytorch
- Microsoft Cognitive Toolkit
- TensorFlow
- Caffe
- Chainer
- Keras
Explain Auto-Encoder
Autoencoders are learning networks. It helps you to transform inputs into outputs with fewer numbers of errors. This means that you will get output to be as close to input as possible.
Define Boltzmann Machine
Boltzmann machines is a simple learning algorithm. It helps you to discover those features that represent complex regularities in the training data. This algorithm allows you to optimize the weights and the quantity for the given problem.
Explain why Data Cleansing is essential and which method you use to maintain clean data
Dirty data often leads to the incorrect inside, which can damage the prospect of any organization. For example, if you want to run a targeted marketing campaign. However, our data incorrectly tell you that a specific product will be in-demand with your target audience; the campaign will fail.
What is skewed Distribution & uniform distribution?
Skewed distribution occurs when if data is distributed on any one side of the plot whereas uniform distribution is identified when the data is spread is equal in the range.
When underfitting occurs in a static model?
Underfitting occurs when a statistical model or machine learning algorithm not able to capture the underlying trend of the data.
What is reinforcement learning?
Reinforcement Learning is a learning mechanism about how to map situations to actions. The end result should help you to increase the binary reward signal. In this method, a learner is not told which action to take but instead must discover which action offers a maximum reward. As this method based on the reward/penalty mechanism.
Name commonly used algorithms.
Four most commonly used algorithm by Data scientist are:
- Linear regression
- Logistic regression
- Random Forest
- KNN
What is precision?
Precision is the most commonly used error metric is n classification mechanism. Its range is from to , where represents %
What is a univariate analysis?
An analysis which is applied to none attribute at a time is known as univariate analysis. Boxplot is widely used, univariate model.
How do you overcome challenges to your findings?
In order, to overcome challenges of my finding one need to encourage discussion, Demonstrate leadership and respecting different options.
Explain cluster sampling technique in Data science
A cluster sampling method is used when it is challenging to study the target population spread across, and simple random sampling can’t be applied.
State the difference between a Validation Set and a Test Set
A Validation set mostly considered as a part of the training set as it is used for parameter selection which helps you to avoid overfitting of the model being built.
While a Test Set is used for testing or evaluating the performance of a trained machine learning model.
Explain the term Binomial Probability Formula?
“The binomial distribution contains the probabilities of every possible success on N trials for independent events that have a probability of π of occurring.”
What is a recall?
A recall is a ratio of the true positive rate against the actual positive rate. It ranges from to .
Discuss normal distribution
Normal distribution equally distributed as such the mean, median and mode are equal.
While working on a data set, how can you select important variables? Explain
Following methods of variable selection you can use:
- Remove the correlated variables before selecting important variables
- Use linear regression and select variables which depend on that p values.
- Use Backward, Forward Selection, and Stepwise Selection
- Use Xgboost, Random Forest, and plot variable importance chart.
- Measure information gain for the given set of features and select top n features accordingly.
Is it possible to capture the correlation between continuous and categorical variable?
Yes, we can use analysis of covariance technique to capture the association between continuous and categorical variables.
Treating a categorical variable as a continuous variable would result in a better predictive model?
Yes, the categorical value should be considered as a continuous variable only when the variable is ordinal in nature. So it is a better predictive model.
What is exploding gradients ?
Gradient:
Gradient is the direction and magnitude calculated during training of a neural network that is used to update the network weights in the right direction and by the right amount.
“Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights during training.” At an extreme, the values of weights can become so large as to overflow and result in NaN values.
This has the effect of your model being unstable and unable to learn from your training data. Now let’s understand what is the gradient.
What is selection Bias ?
Selection bias occurs when sample obtained is not representative of the population intended to be analysed.
Explain SVM machine learning algorithm in detail.
SVM stands for support vector machine, it is a supervised machine learning algorithm which can be used for both Regression and Classification. If you have n features in your training data set, SVM tries to plot it in n-dimensional space with the value of each feature being the value of a particular coordinate. SVM uses hyper planes to separate out different classes based on the provided kernel function.
What are support vectors in SVM.
In the above diagram we see that the thinner lines mark the distance from the classifier to the closest data points called the support vectors (darkened data points). The distance between the two thin lines is called the margin.
What is Prior probability and likelihood?
Prior probability is the proportion of the dependent variable in the data set while the likelihood is the probability of classifying a given observant in the presence of some other variable.
Explain Recommender Systems?
It is a subclass of information filtering techniques. It helps you to predict the preferences or ratings which users likely to give to a product.
Name three disadvantages of using a linear model
Three disadvantages of the linear model are:
- The assumption of linearity of the errors.
- You can’t use this model for binary or count outcomes
- There are plenty of overfitting problems that it can’t solve
What are the different kernels functions in SVM ?
There are four types of kernels in SVM.
- Linear Kernel
- Polynomial kernel
- Radial basis kernel
- Sigmoid kernel
What is pruning in Decision Tree ?
When we remove sub-nodes of a decision node, this process is called pruning or opposite process of splitting.
What is Ensemble Learning ?
Ensemble is the art of combining diverse set of learners(Individual models) together to improvise on the stability and predictive power of the model. Ensemble learning has many types but two more popular ensemble learning techniques are mentioned below.
Bagging
Bagging tries to implement similar learners on small sample populations and then takes a mean of all the predictions. In generalised bagging, you can use different learners on different population. As you expect this helps us to reduce the variance error.
Boosting
Boosting is an iterative technique which adjust the weight of an observation based on the last classification. If an observation was classified incorrectly, it tries to increase the weight of this observation and vice versa. Boosting in general decreases the bias error and builds strong predictive models. However, they may over fit on the training data.
What cross-validation technique would you use on a time series data set.
Instead of using k-fold cross-validation, you should be aware to the fact that a time series is not randomly distributed data — It is inherently ordered by chronological order.
In case of time series data, you should use techniques like forward chaining — Where you will be model on past data then look at forward-facing data.
fold : training[], test[]
fold : training[ ], test[]
fold : training[ ], test[]
fold : training[ ], test[]
What is logistic regression? Or State an example when you have used logistic regression recently.
Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. or (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.
What is a Box Cox Transformation?
Dependent variable for a regression analysis might not satisfy one or more assumptions of an ordinary least squares regression. The residuals could either curve as the prediction increases or follow skewed distribution. In such scenarios, it is necessary to transform the response variable so that the data meets the required assumptions. A Box cox transformation is a statistical technique to transform non-normal dependent variables into a normal shape. If the given data is not normal then most of the statistical techniques assume normality. Applying a box cox transformation means that you can run a broader number of tests.
A Box Cox transformation is a way to transform non-normal dependent variables into a normal shape. Normality is an important assumption for many statistical techniques, if your data isn’t normal, applying a Box-Cox means that you are able to run a broader number of tests. The Box Cox transformation is named after statisticians George Box and Sir David Roxbee Cox who collaborated on a paper and developed the technique.
How will you define the number of clusters in a clustering algorithm?
Though the Clustering Algorithm is not specified, this question will mostly be asked in reference to K-Means clustering where “K” defines the number of clusters. For example, the following image shows three different groups.
Within Sum of squares is generally used to explain the homogeneity within a cluster. If you plot WSS for a range of number of clusters, you will get the plot shown below. The Graph is generally known as Elbow Curve.
Red circled point in above graph i.e. Number of Cluster = is the point after which you don’t see any decrement in WSS. This point is known as bending point and taken as K in K — Means.This is the widely used approach but few data scientists also use Hierarchical clustering first to create dendograms and identify the distinct groups from there.
What is deep learning?
Deep learning is sub field of machine learning inspired by structure and function of brain called artificial neural network. We have a lot numbers of algorithms under machine learning like Linear regression, SVM, Neural network etc and deep learning is just an extension of Neural networks. In neural nets we consider small number of hidden layers but when it comes to deep learning algorithms, we consider a huge number of hidden layers to better understand the input output relationship.
What are Recurrent Neural Networks (RNNs)?
Recurrent nets are type of artificial neural networks designed to recognise pattern from the sequence of data such as Time series, stock market and government agencies etc. To understand recurrent nets, first you have to understand the basics of feed forward nets. Both these networks RNN and feed forward named after the way they channel information through a series of mathematical orations performed at the nodes of the network. One feeds information through straight (never touching same node twice), while the other cycles it through loop, and the latter are called recurrent.
Recurrent networks on the other hand, take as their input not just the current input example they see, but also the what they have perceived previously in time. The BTSXPE at the bottom of the drawing represents the input example in the current moment, and CONTEXT UNIT represents the output of the previous moment. The decision a recurrent neural network reached at time t- affects the decision that it will reach one moment later at time t. So recurrent networks have two sources of input, the present and the recent past, which combine to determine how they respond to new data, much as we do in life.
The error they generate will return via back propagation and be used to adjust their weights until error can’t go any lower. Remember, the purpose of recurrent nets is to accurately classify sequential input. We rely on the back propagation of error and gradient descent to do so.
Back propagation in feed forward networks moves backward from the final error through the outputs, weights and inputs of each hidden layer, assigning those weights responsibility for a portion of the error by calculating their partial derivatives — ∂E/∂w, or the relationship between their rates of change. Those derivatives are then used by our learning rule, gradient descent, to adjust the weights up or down, whichever direction decreases error.
Recurrent networks rely on an extension of back propagation called back propagation through time, or BPTT. Time, in this case, is simply expressed by a well-defined, ordered series of calculations linking one-time step to the next, which is all back propagation needs to work.
What is the difference between machine learning and deep learning?
Machine learning:
Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Machine learning can be categorised in following three categories.
- Supervised machine learning,
- Unsupervised machine learning,
- Reinforcement learning
Deep learning:
Deep Learning is a sub field of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.
What is reinforcement learning ?
Reinforcement learning
Reinforcement Learning is learning what to do and how to map situations to actions. The end result is to maximise the numerical reward signal. The learner is not told which action to take, but instead must discover which action will yield the maximum reward. Reinforcement learning is inspired by the learning of human beings, it is based on the reward/panelity mechanism.
Explain what regularization is and why it is useful.
Regularisation is the process of adding tunning parameter to a model to induce smoothness in order to prevent overfitting. This is most often done by adding a constant multiple to an existing weight vector. This constant is often the L(Lasso) or L(ridge). The model predictions should then minimize the loss function calculated on the regularized training set.
What is TF/IDF vectorization?
tf–idf is short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.
What are Recommender Systems?
A subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.
What is the difference between Regression and classification ML techniques.
Both Regression and classification machine learning techniques come under Supervised machine learning algorithms. In Supervised machine learning algorithm, we have to train the model using labelled data set, while training we have to explicitly provide the correct labels and algorithm tries to learn the pattern from input to output. If our labels are discrete values then it will a classification problem, e.g A, B etc. but if our labels are continuous values then it will be a regression problem, e.g ., . etc.
If you are having GB RAM in your machine and you want to train your model on GB data set. How would you go about this problem. Have you ever faced this kind of problem in your machine learning/data science experience so far?
First of all, you have to ask which ML model you want to train.
For Neural networks: Batch size with Numpy array will work.
Steps:
- Load the whole data in NumPy array. NumPy array has property to create mapping of complete data set, it doesn’t load complete data set in memory.
- You can pass index to NumPy array to get required data.
- Use this data to pass to Neural network.
- Have small batch size.
For SVM: Partial fit will work
Steps:
- Divide one big data set in small size data sets.
- Use partial fit method of SVM, it requires subset of complete data set.
- Repeat step for other subsets.
What is p-value?
When you perform a hypothesis test in statistics, a p-value can help you determine the strength of your results. p-value is a number between and . Based on the value it will denote the strength of the results. The claim which is on trial is called Null Hypothesis.
Low p-value (≤ .) indicates strength against the null hypothesis which means we can reject the null Hypothesis. High p-value (≥ .) indicates strength for the null hypothesis which means we can accept the null Hypothesis p-value of . indicates the Hypothesis could go either way. To put it in another way,
High P values: your data are likely with a true null. Low P values: your data are unlikely with a true null.
What are different ranking algorithms?
Traditional ML algorithms solve a prediction problem (classification or regression) on a single instance at a time. E.g. if you are doing spam detection on email, you will look at all the features associated with that email and classify it as spam or not. The aim of traditional ML is to come up with a class (spam or no-spam) or a single numerical score for that instance.
Ranking algorithms like LTR solves a ranking problem on a list of items. The aim of LTR is to come up with optimal ordering of those items. As such, LTR doesn’t care much about the exact score that each item gets, but cares more about the relative ordering among all the items. RankNet, LambdaRank and LambdaMART are all LTR algorithms developed by Chris Burges and his colleagues at Microsoft Research.
- RankNet — The cost function for RankNet aims to minimize the number of inversions in ranking. RankNet optimizes the cost function using Stochastic Gradient Descent.
- LambdaRank — Burgess et. al. found that during RankNet training procedure, you don’t need the costs, only need the gradients (λ) of the cost with respect to the model score. You can think of these gradients as little arrows attached to each document in the ranked list, indicating the direction we’d like those documents to move. Further they found that scaling the gradients by the change in NDCG found by swapping each pair of documents gave good results. The core idea of LambdaRank is to use this new cost function for training a RankNet. On experimental datasets, this shows both speed and accuracy improvements over the original RankNet.
- LambdaMart — LambdaMART combines LambdaRank and MART (Multiple Additive Regression Trees). While MART uses gradient boosted decision trees for prediction tasks, LambdaMART uses gradient boosted decision trees using a cost function derived from LambdaRank for solving a ranking task. On experimental datasets, LambdaMART has shown better results than LambdaRank and the original RankNet.
Can you enumerate the various differences between Supervised and Unsupervised Learning?
Answer: Supervised learning is a type of machine learning where a function is inferred from labeled training data. The training data contains a set of training examples.
Unsupervised learning, on the other hand, is a type of machine learning where inferences are drawn from datasets containing input data without labeled responses. Following are the various other differences between the two types of machine learning:
- Algorithms Used – Supervised learning makes use of Decision Trees, K-nearest Neighbor algorithm, Neural Networks, Regression, and Support Vector Machines. Unsupervised learning uses Anomaly Detection, Clustering, Latent Variable Models, and Neural Networks.
- Enables – Supervised learning enables classification and regression, whereas unsupervised learning enables classification, dimension reduction, and density estimation
- Use – While supervised learning is used for prediction, unsupervised learning finds use in analysis
What do you understand by the Selection Bias? What are its various types?
Answer: Selection bias is typically associated with research that doesn’t have a random selection of participants. It is a type of error that occurs when a researcher decides who is going to be studied. On some occasions, selection bias is also referred to as the selection effect.
In other words, selection bias is a distortion of statistical analysis that results from the sample collecting method. When selection bias is not taken into account, some conclusions made by a research study might not be accurate. Following are the various types of selection bias:
- Sampling Bias – A systematic error resulting due to a non-random sample of a populace causing certain members of the same to be less likely included than others that results in a biased sample.
- Time Interval – A trial might be ended at an extreme value, usually due to ethical reasons, but the extreme value is most likely to be reached by the variable with the most variance, even though all variables have a similar mean.
- Data – Results when specific data subsets are selected for supporting a conclusion or rejection of bad data arbitrarily.
- Attrition – Caused due to attrition, i.e. loss of participants, discounting trial subjects or tests that didn’t run to completion.
Please explain the goal of A/B Testing.
Answer: A/B Testing is a statistical hypothesis testing meant for a randomized experiment with two variables, A and B. The goal of A/B Testing is to maximize the likelihood of an outcome of some interest by identifying any changes to a webpage.
A highly reliable method for finding out the best online marketing and promotional strategies for a business, A/B Testing can be employed for testing everything, ranging from sales emails to search ads and website copy.
How will you calculate the Sensitivity of machine learning models?
Answer: In machine learning, Sensitivity is used for validating the accuracy of a classifier, such as Logistic, Random Forest, and SVM. It is also known as REC (recall) or TPR (true positive rate).
Sensitivity can be defined as the ratio of predicted true events and total events i.e.:
Sensitivity = True Positives / Positives in Actual Dependent Variable
Here, true events are those events that were true as predicted by a machine learning model. The best sensitivity is . and the worst sensitivity is ..
Could you draw a comparison between overfitting and underfitting?
Answer: In order to make reliable predictions on general untrained data in machine learning and statistics, it is required to fit a (machine learning) model to a set of training data. Overfitting and underfitting are two of the most common modeling errors that occur while doing so.
Following are the various differences between overfitting and underfitting:
- Definition – A statistical model suffering from overfitting describes some random error or noise in place of the underlying relationship. When underfitting occurs, a statistical model or machine learning algorithm fails in capturing the underlying trend of the data.
- Occurrence – When a statistical model or machine learning algorithm is excessively complex, it can result in overfitting. Example of a complex model is one having too many parameters when compared to the total number of observations. Underfitting occurs when trying to fit a linear model to non-linear data.
- Poor Predictive Performance – Although both overfitting and underfitting yield poor predictive performance, the way in which each one of them does so is different. While the overfitted model overreacts to minor fluctuations in the training data, the underfit model under-reacts to even bigger fluctuations.
What do you mean by cluster sampling and systematic sampling?
Answer: When studying the target population spread throughout a wide area becomes difficult and applying simple random sampling becomes ineffective, the technique of cluster sampling is used. A cluster sample is a probability sample, in which each of the sampling units is a collection or cluster of elements.
Following the technique of systematic sampling, elements are chosen from an ordered sampling frame. The list is advanced in a circular fashion. This is done in such a way so that once the end of the list is reached, the same is progressed from the start, or top, again.
Can you compare the validation set with the test set?
Answer: A validation set is part of the training set used for parameter selection as well as for avoiding overfitting of the machine learning model being developed. On the contrary, a test set is meant for evaluating or testing the performance of a trained machine learning model.
What do you understand by linear regression and logistic regression?
Answer: Linear regression is a form of statistical technique in which the score of some variable Y is predicted on the basis of the score of a second variable X, referred to as the predictor variable. The Y variable is known as the criterion variable.
Also known as the logit model, logistic regression is a statistical technique for predicting the binary outcome from a linear combination of predictor variables.
Please explain Recommender Systems along with an application.
Answer: Recommender Systems is a subclass of information filtering systems, meant for predicting the preferences or ratings awarded by a user to some product.
An application of a recommender system is the product recommendations section in Amazon. This section contains items based on the user’s search history and past orders.
What are outlier values and how do you treat them?
Answer: Outlier values, or simply outliers, are data points in statistics that don’t belong to a certain population. An outlier value is an abnormal observation that is very much different from other values belonging to the set.
Identification of outlier values can be done by using univariate or some other graphical analysis method. Few outlier values can be assessed individually but assessing a large set of outlier values require the substitution of the same with either the th or the st percentile values.
There are two popular ways of treating outlier values:
- To change the value so that it can be brought within a range
- To simply remove the value
Note: – Not all extreme values are outlier values.
Please enumerate the various steps involved in an analytics project.
Answer: Following are the numerous steps involved in an analytics project:
- Understanding the business problem
- Exploring the data and familiarizing with the same
- Preparing the data for modeling by means of detecting outlier values, transforming variables, treating missing values, et cetera
- Running the model and analyzing the result for making appropriate changes or modifications to the model (an iterative step that repeats until the best possible outcome is gained)
- Validating the model using a new dataset
- Implementing the model and tracking the result for analyzing the performance of the same
Could you explain how to define the number of clusters in a clustering algorithm?
Answer: The primary objective of clustering is to group together similar identities in such a way that while entities within a group are similar to each other, the groups remain different from one another.
Generally, the Within Sum of Squares is used for explaining the homogeneity within a cluster. For defining the number of clusters in a clustering algorithm, WSS is plotted for a range pertaining to a number of clusters. The resultant graph is known as the Elbow Curve.
The Elbow Curve graph contains a point that represents the point post in which there aren’t any decrements in the WSS. This is known as the bending point and represents K in K–Means.
Although the aforementioned is the widely-used approach, another important approach is the Hierarchical clustering. In this approach, dendrograms are created first and then distinct groups are identified from there.
What do you understand by Deep Learning?
Answer: Deep Learning is a paradigm of machine learning that displays a great degree of analogy with the functioning of the human brain. It is a neural network method based on convolutional neural networks (CNN).
Deep learning has a wide array of uses, ranging from social network filtering to medical image analysis and speech recognition. Although Deep Learning has been present for a long time, it’s only recently that it has gained worldwide acclaim. This is mainly due to:
- An increase in the amount of data generation via various sources
- The growth in hardware resources required for running Deep Learning models
Caffe, Chainer, Keras, Microsoft Cognitive Toolkit, Pytorch, and TensorFlow are some of the most popular Deep Learning frameworks as of today.
Please explain Gradient Descent.
Answer: The degree of change in the output of a function relating to the changes made to the inputs is known as a gradient. It measures the change in all weights with respect to the change in error. A gradient can also be comprehended as the slope of a function.
Gradient Descent refers to escalating down to the bottom of a valley. Simply, consider this something as opposed to climbing up a hill. It is a minimization algorithm meant for minimizing a given activation function.
How does Backpropagation work? Also, it states its various variants.
Answer: Backpropagation refers to a training algorithm used for multilayer neural networks. Following the backpropagation algorithm, the error is moved from an end of the network to all weights inside the network. Doing so allows for efficient computation of the gradient.
Backpropagation works in the following way:
- Forward propagation of training data
- Output and target is used for computing derivatives
- Backpropagate for computing the derivative of the error with respect to the output activation
- Using previously calculated derivatives for output generation
- Updating the weights
Following are the various variants of Backpropagation:
- Batch Gradient Descent – The gradient is calculated for the complete dataset and update is performed on each iteration
- Mini-batch Gradient Descent – Mini-batch samples are used for calculating gradient and updating parameters (a variant of the Stochastic Gradient Descent approach)
- Stochastic Gradient Descent – Only a single training example is used to calculate gradient and updating parameters
What do you know about Autoencoders?
Answer: Autoencoders are simplistic learning networks used for transforming inputs into outputs with minimum possible error. It means that the outputs resulted are very close to the inputs.
A couple of layers are added between the input and the output with the size of each layer smaller than the size pertaining to the input layer. An autoencoder receives unlabeled input that is encoded for reconstructing the output.
Please explain the concept of a Boltzmann Machine.
Answer: A Boltzmann Machine features a simple learning algorithm that enables the same to discover fascinating features representing complex regularities present in the training data. It is basically used for optimizing the quantity and weight for some given problem.
The simple learning algorithm involved in a Boltzmann Machine is very slow in networks that have many layers of feature detectors.
What are the skills required as a Data Scientist that could help in using Python for data analysis purposes?
Answer: The skills required as a Data Scientist that could help in using Python for data analysis purposes are stated under:
- Expertize in Pandas Dataframes, Scikit-learn, and N-dimensional NumPy Arrays.
- Skills to apply element-wise vector and matrix operations on NumPy arrays.
- Able to understand built-in data types, including tuples, sets, dictionaries, and various others.
- It is equipped with Anaconda distribution and the Conda package manager.
- Capability in writing efficient list comprehensions, small, clean functions, and avoid traditional for loops.
- Knowledge of Python script and optimizing bottlenecks.
What is the full form of GAN? Explain GAN?
Answer: The full form of GAN is Generative Adversarial Network. Its task is to take inputs from the noise vector and send it forward to the Generator and then to Discriminator to identify and differentiate the unique and fake inputs.
What are the vital components of GAN?
Answer: There are two vital components of GAN. These include the following:
- Generator: The Generator act as a Forger, which creates fake copies.
- Discriminator: The Discriminator act as a recognizer for fake and unique (real) copies.
What is the Computational Graph?
Answer: A computational graph is a graphical presentation that is based on TensorFlow. It has a wide network of different kinds of nodes wherein each node represents a particular mathematical operation. The edges in these nodes are called tensors. This is the reason the computational graph is called a TensorFlow of inputs. The computational graph is characterized by data flows in the form of a graph; therefore, it is also called the DataFlow Graph.
What are tensors?
Answer: Tensors are the mathematical objects that represent the collection of higher dimensions of data inputs in the form of alphabets, numerals, and rank fed as inputs to the neural network.
Why are Tensorflow considered a high priority in learning Data Science?
Answer: Tensorflow is considered a high priority in learning Data Science because it provides support to using computer languages such as C++ and Python. This way, it makes various processes under data science to achieve faster compilation and completion within the stipulated time frame and faster than the conventional Keras and Torch libraries. Tensorflow supports the computing devices, including the CPU and GPU for faster inputs, editing, and analysis of the data.
What is Dropout in Data Science?
Answer: Dropout is a toll in Data Science, which is used for dropping out the hidden and visible units of a network on a random basis. They prevent the overfitting of the data by dropping as much as % of the nodes so that the required space can be arranged for iterations needed to converge the network.
What is Batch normalization in Data Science?
Answer: Batch Normalization in Data Science is a technique through which attempts could be made to improve the performance and stability of the neural network. This can be done by normalizing the inputs in each layer so that the mean output activation remains with the standard deviation at .
What is the difference between Batch and Stochastic Gradient Descent?
Answer: The difference between Batch and Stochastic Gradient Descent can be displayed as follows:
What are Auto-Encoders?
Answer: Auto-Encoders are learning networks that are meant to change inputs into output with the lowest chance of getting an error. They intend to keep the output closer to the input. The process of Autoencoders is needed to be done through the development of layers between the input and output. However, efforts are made to keep the size of these layers smaller for faster processing.
What are the various Machine Learning Libraries and their benefits?
Answer: The various machine learning libraries and their benefits are as follows.
- Numpy: It is used for scientific computation.
- Statsmodels: It is used for time-series analysis.
- Pandas: It is used for tubular data analysis.
- Scikit learns: It is used for data modeling and pre-processing.
- Tensorflow: It is used for the deep learning process.
- Regular Expressions: It is used for text processing.
- Pytorch: It is used for the deep learning process.
- NLTK: It is used for text processing.
What is an Activation function?
Answer: An Activation function helps in introducing the non-linearity in the neural network. This is done to help the learning process for complex functions. Without the activation function, the neural network will be unable to perform only the linear function and apply linear combinations. Activation function, therefore, offers complex functions and combinations by applying artificial neurons, which helps in delivering output based on the inputs.
What are vanishing gradients?
Answer: The vanishing gradients is a condition when the slope is too small during the training process of RNN. The result of vanishing gradients is poor performance outcomes, low accuracy, and long term training processes.
What are exploding gradients?
Answer: The exploding gradients are a condition when the errors grow at an exponential rate or high rate during the training of RNN. This error gradient accumulates and results in applying large updates to the neural network, causes an overflow, and results in NaN values.
What is the full form of LSTM? What is its function?
Answer: LSTM stands for Long Short Term Memory. It is a recurrent neural network that is capable of learning long term dependencies and recalling information for the longer period as part of its default behavior.
What are the different steps in LSTM?
Answer: The different steps in LSTM include the following.
- Step : The network helps in deciding the things that need to be remembered while others that need to be forgotten.
- Step : The selection is made for cell state values that can be updated.
- Step : The network decides as to what can be made as part of the current output.
What is Pooling on CNN?
Answer: Polling is a method that is used with the purpose to reduce the spatial dimensions of a CNN. It helps in performing downsampling operations for reducing dimensionality and creating pooled feature maps. Pooling in CNN helps in sliding the filter matrix over the input matrix.
What is RNN?
Answer: The RNN stands for Recurrent Neural Networks. They are an artificial neural network that is a sequence of data, including stock markets, sequence of data including stock markets, time series, and various others. The main idea behind the RNN application is to understand the basics of the feedforward nets.
What are the different layers on CNN?
Answer: There are four different layers on CNN. These include the following.
- Convolutional Layer: In this layer, several small picture windows are created to go over the data.
- ReLU Layer: This layer helps in bringing non-linearity to the network and converts the negative pixels to zero so that the output becomes a rectified feature map.
- Pooling Layer: This layer reduces the dimensionality of the feature map.
- Fully Connected Layer: This layer recognizes and classifies the objects in the image.
What is an Epoch in Data Science?
Answer: Epoch in Data Science represents one of the iterations over the entire dataset. It includes everything that is applied to the learning model.
What is a Batch in Data Science?
Answer: Batch is referred to as a different dataset that is divided into the form of different batches to help to pass the information into the system. It is developed in the situation when the developer cannot pass the entire dataset into the neural network at once.
What is the iteration in Data Science? Give an example?
Answer: Iteration in Data Science is applied by Epoch for analysis of data. The iteration is, therefore, classification of the data into different groups. For example, when there are , images, and the batch size is , then in such a case, the Epoch will run about iterations.
What is the cost function?
Answer: Cost functions are a tool to evaluate how good the model performance has been made. It takes into consideration the errors and losses that are made in the output layer during the backpropagation process. In such a case, the errors are moved backward in the neural network, and various other training functions are applied.
What are hyperparameters?
Answer: Hyperparameter is a kind of parameter whose value is set before the learning process so that the network training requirements can be identified and the structure of the network can be improved. This process includes recognizing the hidden units, learning rate, epochs, and various others associated.
Which skills are important to become a certified Data Scientist?
Answer: The important skills to become a certified Data Scientist include the following:
- Knowledge of built-in data types including lists, tuples, sets, and related.
- Expertize in N-dimensional NumPy Arrays.
- Ability to apply Pandas Dataframes.
- Strong holdover performance in element-wise vectors.
- Knowledge of matrix operations on NumPy arrays.
What is an Artificial Neural Network in Data Science?
Answer: Artificial Neural Network in Data Science is the specific set of algorithms that are inspired by the biological neural network meant to adapt the changes in the input so that the best output can be achieved. It helps in generating the best possible results without the need to redesign the output methods.
What is Deep Learning in Data Science?
Answer: Deep Learning in Data Science is a name given to machine learning, which requires a great level of analogy with the functioning of the human brain. This way, it is a paradigm of machine learning.
What is Ensemble learning?
Answer: Ensemble learning is a process of combining the diverse set of learners that is the individual models with each other. It helps in improving the stability and predictive power of the model.
What are the different kinds of Ensemble learning?
Answer: The different kinds of Ensemble learning includes the following.
- Bagging: It implements simple learners on one small population and takes mean for estimation purposes.
- Boosting: It adjusts the weight of the observation and thereby classifies the population in different sets before the outcome prediction is made.
What are some of the steps for data wrangling and data cleaning before applying machine learning algorithms?
There are many steps that can be taken when data wrangling and data cleaning. Some of the most common steps are listed below:
- Data profiling: Almost everyone starts off by getting an understanding of their dataset. More specifically, you can look at the shape of the dataset with. shape and a description of your numerical variables with .describe().
- Data visualizations: Sometimes, it’s useful to visualize your data with histograms, boxplots, and scatterplots to better understand the relationships between variables and also to identify potential outliers.
- Syntax error: This includes making sure there’s no white space, making sure letter casing is consistent, and checking for typos. You can check for typos by using .unique() or by using bar graphs.
- Standardization or normalization: Depending on the dataset your working with and the machine learning method you decide to use, it may be useful to standardize or normalize your data so that different scales of different variables don’t negatively impact the performance of your model.
- Handling null values: There are a number of ways to handle null values including deleting rows with null values altogether, replacing null values with the mean/median/mode, replacing null values with a new category (eg. unknown), predicting the values, or using machine learning models that can deal with null values.
- Other things include: removing irrelevant data, removing duplicates, and type conversion.
How to deal with unbalanced binary classification?
There are a number of ways to handle unbalanced binary classification (assuming that you want to identify the minority class):
- First, you want to reconsider the metrics that you’d use to evaluate your model. The accuracy of your model might not be the best metric to look at because and I’ll use an example to explain why. Let’s say bank withdrawals were not fraudulent and withdrawal was. If your model simply classified every instance as “not fraudulent”, it would have an accuracy of %! Therefore, you may want to consider using metrics like precision and recall.
- Another method to improve unbalanced binary classification is by increasing the cost of misclassifying the minority class. By increasing the penalty of such, the model should classify the minority class more accurately.
- Lastly, you can improve the balance of classes by oversampling the minority class or by undersampling the majority class. You can read more about it here.
What is the difference between a box plot and a histogram?
Boxplot vs Histogram
While boxplots and histograms are visualizations used to show the distribution of the data, they communicate information differently.
Histograms are bar charts that show the frequency of a numerical variable’s values and are used to approximate the probability distribution of the given variable. It allows you to quickly understand the shape of the distribution, the variation, and potential outliers.
Boxplots communicate different aspects of the distribution of data. While you can’t see the shape of the distribution through a box plot, you can gather other information like the quartiles, the range, and outliers. Boxplots are especially useful when you want to compare multiple charts at the same time because they take up less space than histograms.
Describe different regularization methods, such as L and L regularization?
Both L and L regularization are methods used to reduce the overfitting of training data. Least Squares minimizes the sum of the squared residuals, which can result in low bias but high variance.
L Regularization, also called ridge regression, minimizes the sum of the squared residuals plus lambda times the slope squared. This additional term is called the Ridge Regression Penalty. This increases the bias of the model, making the fit worse on the training data, but also decreases the variance.
If you take the ridge regression penalty and replace it with the absolute value of the slope, then you get Lasso regression or L regularization.
L is less robust but has a stable solution and always one solution. L is more robust but has an unstable solution and can possibly have multiple solutions.
StatQuest has an amazing video on Lasso and Ridge regression here.
Neural Network Fundamentals
A neural network is a multi-layered model inspired by the human brain. Like the neurons in our brain, the circles above represent a node. The blue circles represent the input layer, the black circles represent the hidden layers, and the green circles represent the output layer. Each node in the hidden layers represents a function that the inputs go through, ultimately leading to an output in the green circles. The formal term for these functions is called the sigmoid activation function.
If you want a step by step example of creating a neural network, check out Victor Zhou’s article here.
If you’re a visual/audio learner, BlueBrown has an amazing series on neural networks and deep learning on YouTube here.
How to define/select metrics?
There isn’t a one-size-fits-all metric. The metric(s) chosen to evaluate a machine learning model depends on various factors:
- Is it a regression or classification task?
- What is the business objective? Eg. precision vs recall
- What is the distribution of the target variable?
There are a number of metrics that can be used, including adjusted r-squared, MAE, MSE, accuracy, recall, precision, f score, and the list goes on.
Explain what precision and recall are
Recall attempts to answer “What proportion of actual positives was identified correctly?”
Precision attempts to answer “What proportion of positive identifications was actually correct?”
Explain what a false positive and a false negative are. Why is it important these from each other? Provide examples when false positives are more important than false negatives, false negatives are more important than false positives and when these two types of errors are equally important
A false positive is an incorrect identification of the presence of a condition when it’s absent.
A false negative is an incorrect identification of the absence of a condition when it’s actually present.
An example of when false negatives are more important than false positives is when screening for cancer. It’s much worse to say that someone doesn’t have cancer when they do, instead of saying that someone does and later realizing that they don’t.
This is a subjective argument, but false positives can be worse than false negatives from a psychological point of view. For example, a false positive for winning the lottery could be a worse outcome than a false negative because people normally don’t expect to win the lottery anyways.
Assume you need to generate a predictive model using multiple regression. Explain how you intend to validate this model
There are two main ways that you can do this:
- A) Adjusted R-squared.
R Squared is a measurement that tells you to what extent the proportion of variance in the dependent variable is explained by the variance in the independent variables. In simpler terms, while the coefficients estimate trends, R-squared represents the scatter around the line of best fit.
However, every additional independent variable added to a model always increases the R-squared value — therefore, a model with several independent variables may seem to be a better fit even if it isn’t. This is where adjusted R² comes in. The adjusted R² compensates for each additional independent variable and only increases if each given variable improves the model above what is possible by probability. This is important since we are creating a multiple regression model.
- B) Cross-Validation
A method common to most people is cross-validation, splitting the data into two sets: training and testing data. See the answer to the first question for more on this.
What does NLP stand for?
NLP stands for Natural Language Processing. It is a branch of artificial intelligence that gives machines the ability to read and understand human languages.
When would you use random forests Vs SVM and why?
There are a couple of reasons why a random forest is a better choice of model than a support vector machine:
- Random forests allow you to determine the feature importance. SVM’s can’t do this.
- Random forests are much quicker and simpler to build than an SVM.
- For multi-class classification problems, SVMs require a one-vs-rest method, which is less scalable and more memory intensive.
Why is dimension reduction important?
Dimensionality reduction is the process of reducing the number of features in a dataset. This is important mainly in the case when you want to reduce variance in your model (overfitting).
Wikipedia states four advantages of dimensionality reduction (see here):
- It reduces the time and storage space required
- Removal of multi-collinearity improves the interpretation of the parameters of the machine learning model
- It becomes easier to visualize the data when reduced to very low dimensions such as D or D
- It avoids the curse of dimensionality
What is principal component analysis? Explain the sort of problems you would use PCA for.
In its simplest sense, PCA involves project higher dimensional data (eg. dimensions) to a smaller space (eg. dimensions). This results in a lower dimension of data, ( dimensions instead of dimensions) while keeping all original variables in the model.
PCA is commonly used for compression purposes, to reduce required memory and to speed up the algorithm, as well as for visualization purposes, making it easier to summarize data.
Why is Naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?
One major drawback of Naive Bayes is that it holds a strong assumption in that the features are assumed to be uncorrelated with one another, which typically is never the case.
One way to improve such an algorithm that uses Naive Bayes is by decorrelating the features so that the assumption holds true.
What are the drawbacks of a linear model?
There are a couple of drawbacks of a linear model:
- A linear model holds some strong assumptions that may not be true in application. It assumes a linear relationship, multivariate normality, no or little multicollinearity, no auto-correlation, and homoscedasticity
- A linear model can’t be used for discrete or binary outcomes.
- You can’t vary the model flexibility of a linear model.
Do you think small decision trees are better than a large one? Why?
Another way of asking this question is “Is a random forest a better model than a decision tree?” And the answer is yes because a random forest is an ensemble method that takes many weak decision trees to make a strong learner. Random forests are more accurate, more robust, and less prone to overfitting.
Why is mean square error a bad measure of model performance? What would you suggest instead?
Mean Squared Error (MSE) gives a relatively high weight to large errors — therefore, MSE tends to put too much emphasis on large deviations. A more robust alternative is MAE (mean absolute deviation).
What are the assumptions required for linear regression? What if some of these assumptions are violated?
The assumptions are as follows:
- The sample data used to fit the model is representative of the population
- The relationship between X and the mean of Y is linear
- The variance of the residual is the same for any value of X (homoscedasticity)
- Observations are independent of each other
- For any value of X, Y is normally distributed.
Extreme violations of these assumptions will make the results redundant. Small violations of these assumptions will result in a greater bias or variance of the estimate.
What is collinearity and what to do with it? How to remove multicollinearity?
Multicollinearity exists when an independent variable is highly correlated with another independent variable in a multiple regression equation. This can be problematic because it undermines the statistical significance of an independent variable.
You could use the Variance Inflation Factors (VIF) to determine if there is any multicollinearity between independent variables — a standard benchmark is that if the VIF is greater than then multicollinearity exists.
How to check if the regression model fits the data well?
there are a couple of metrics that you can use:
R-squared/Adjusted R-squared: Relative measure of fit. This was explained in a previous answer
F Score: Evaluates the null hypothesis that all regression coefficients are equal to zero vs the alternative hypothesis that at least one doesn’t equal zero
RMSE: Absolute measure of fit.
What is a decision tree?
Decision trees are a popular model, used in operations research, strategic planning, and machine learning. Each square above is called a node, and the more nodes you have, the more accurate your decision tree will be (generally). The last nodes of the decision tree, where a decision is made, are called the leaves of the tree. Decision trees are intuitive and easy to build but fall short when it comes to accuracy.
What is a random forest? Why is it good?
Random forests are an ensemble learning technique that builds off of decision trees. Random forests involve creating multiple decision trees using bootstrapped datasets of the original data and randomly selecting a subset of variables at each step of the decision tree. The model then selects the mode of all of the predictions of each decision tree. By relying on a “majority wins” model, it reduces the risk of error from an individual tree.
For example, if we created one decision tree, the third one, it would predict . But if we relied on the mode of all decision trees, the predicted value would be . This is the power of random forests.
Random forests offer several other benefits including strong performance, can model non-linear boundaries, no cross-validation needed, and gives feature importance.
What is a kernel? Explain the kernel trick
A kernel is a way of computing the dot product of two vectors 𝐱x and 𝐲y in some (possibly very high dimensional) feature space, which is why kernel functions are sometimes called “generalized dot product” []
The kernel trick is a method of using a linear classifier to solve a non-linear problem by transforming linearly inseparable data to linearly separable ones in a higher dimension.
Is it beneficial to perform dimensionality reduction before fitting an SVM? Why or why not?
When the number of features is greater than the number of observations, then performing dimensionality reduction will generally improve the SVM.
What is overfitting?
Overfitting is an error where the model ‘fits’ the data too well, resulting in a model with high variance and low bias. As a consequence, an overfit model will inaccurately predict new data points even though it has a high accuracy on the training data.
What is boosting?
Boosting is an ensemble method to improve a model by reducing its bias and variance, ultimately converting weak learners to strong learners. The general idea is to train a weak learner and sequentially iterate and improve the model by learning from the previous learner.
The probability that item an item at location A is ., and . at location B. What is the probability that item would be found on Amazon website?
We need to make some assumptions about this question before we can answer it. Let’s assume that there are two possible places to purchase a particular item on Amazon and the probability of finding it at location A is . and B is .. The probability of finding the item on Amazon can be explained as so:
We can reword the above as P(A) = . and P(B) = .. Furthermore, let’s assume that these are independent events, meaning that the probability of one event is not impacted by the other. We can then use the formula…
P(A or B) = P(A) + P(B) — P(A and B)
P(A or B) = . + . — (.*.)
P(A or B) = .
You randomly draw a coin from coins — unfair coin (head-head), fair coins (head-tail) and roll it times. If the result is heads, what is the probability that the coin is unfair?
This can be answered using the Bayes Theorem. The extended equation for the Bayes Theorem is the following:
Assume that the probability of picking the unfair coin is denoted as P(A) and the probability of flipping heads in a row is denoted as P(B). Then P(B|A) is equal to , P(B∣¬A) is equal to .⁵¹⁰, and P(¬A) is equal to ..
If you fill in the equation, then P(A|B) = . or .%.Q: Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?
A convex function is one where a line drawn between any two points on the graph lies on or above the graph. It has one minimum.
A non-convex function is one where a line drawn between any two points on the graph may intersect other points on the graph. It characterized as “wavy”.
When a cost function is non-convex, it means that there’s a likelihood that the function may find local minima instead of the global minimum, which is typically undesired in machine learning models from an optimization perspective.
Walk through the probability fundamentals
Eight rules of probability
- Rule #: For any event A, 0 ≤ P(A) ≤ 1; in other words, the probability of an event can range from to
- Rule #: The sum of the probabilities of all possible outcomes always equals .
- Rule #: P(not A) = — P(A); This rule explains the relationship between the probability of an event and its complement event. A complement event is one that includes all possible outcomes that aren’t in A.
- Rule #: If A and B are disjoint events (mutually exclusive), then P(A or B) = P(A) + P(B); this is called the addition rule for disjoint events
- Rule #: P(A or B) = P(A) + P(B) — P(A and B); this is called the general addition rule.
- Rule #: If A and B are two independent events, then P(A and B) = P(A) * P(B); this is called the multiplication rule for independent events.
- Rule #: The conditional probability of event B given event A is P(B|A) = P(A and B) / P(A)
- Rule #: For any two events A and B, P(A and B) = P(A) * P(B|A); this is called the general multiplication rule
Counting Methods
Factorial Formula: n! = n x (n -) x (n — ) x … x x
Use when the number of items is equal to the number of places available.
Eg. Find the total number of ways people can sit in empty seats.
= x x x x =
Fundamental Counting Principle (multiplication)
This method should be used when repetitions are allowed and the number of ways to fill an open place is not affected by previous fills.
Eg. There are types of breakfasts, types of lunches, and types of desserts. The total number of combinations is = x x =
Permutations: P(n,r)= n! / (n−r)!
This method is used when replacements are not allowed and order of item ranking matters.
Eg. A code has digits in a particular order and the digits range from to . How many permutations are there if one digit can only be used once?
P(n,r) = !/(–)! = (xxxxxxxxx)/(xxxxx) =
Combinations Formula: C(n,r)=(n!)/[(n−r)!r!]
This is used when replacements are not allowed and the order in which items are ranked does not mater.
Eg. To win the lottery, you must select the correct numbers in any order from to . What is the number of possible combinations?
C(n,r) = ! / (–)!! = ,,
Describe Markov chains?
Brilliant provides a great definition of Markov chains (here):
“A Markov chain is a mathematical system that experiences transitions from one state to another according to certain probabilistic rules. The defining characteristic of a Markov chain is that no matter how the process arrived at its present state, the possible future states are fixed. In other words, the probability of transitioning to any particular state is dependent solely on the current state and time elapsed.”
The actual math behind Markov chains requires knowledge on linear algebra and matrices, so I’ll leave some links below in case you want to explore this topic further on your own.
A box has red cards and black cards. Another box has red cards and black cards. You want to draw two cards at random from one of the two boxes, one card at a time. Which box has a higher probability of getting cards of the same color and why?
The box with red cards and black cards has a higher probability of getting two cards of the same color. Let’s walk through each step.
Let’s say the first card you draw from each deck is a red Ace.
This means that in the deck with reds and blacks, there’s now reds and blacks. Therefore your odds of drawing another red are equal to /(+) or /.
In the deck with reds and blacks, there would then be reds and blacks. Therefore your odds of drawing another red are equal to /(+) or /.
Since / > /, the second deck with more cards has a higher probability of getting the same two cards.
You are at a Casino and have two dices to play with. You win $ every time you roll a . If you play till you win and then stop, what is the expected payout?
- Let’s assume that it costs $ every time you want to play.
- There are possible combinations with two dice.
- Of the combinations, there are combinations that result in rolling a five (see blue). This means that there is a / or / chance of rolling a .
- A / chance of winning means you’ll lose eight times and win once (theoretically).
- Therefore, your expected payout is equal to $. * — $. * = -$..
How can you tell if a given coin is biased?
This isn’t a trick question. The answer is simply to perform a hypothesis test:
- The null hypothesis is that the coin is not biased and the probability of flipping heads should equal % (p=.). The alternative hypothesis is that the coin is biased and p != ..
- Flip the coin times.
- Calculate Z-score (if the sample is less than , you would calculate the t-statistics).
- Compare against alpha (two-tailed test so ./ = .).
- If p-value > alpha, the null is not rejected and the coin is not biased.
If p-value < alpha, the null is rejected and the coin is biased.
Make an unfair coin fair
Since a coin flip is a binary outcome, you can make an unfair coin fair by flipping it twice. If you flip it twice, there are two outcomes that you can bet on: heads followed by tails or tails followed by heads.
P(heads) * P(tails) = P(tails) * P(heads)
This makes sense since each coin toss is an independent event. This means that if you get heads → heads or tails → tails, you would need to reflip the coin.
You are about to get on a plane to London, you want to know whether you have to bring an umbrella or not. You call three of your random friends and ask each one of them if it’s raining. The probability that your friend is telling the truth is / and the probability that they are playing a prank on you by lying is /. If all of them tell that it is raining, then what is the probability that it is actually raining in London.
You can tell that this question is related to Bayesian theory because of the last statement which essentially follows the structure, “What is the probability A is true given B is true?” Therefore we need to know the probability of it raining in London on a given day. Let’s assume it’s %.
P(A) = probability of it raining = %
P(B) = probability of all friends say that it’s raining
P(A|B) probability that it’s raining given they’re telling that it is raining
P(B|A) probability that all friends say that it’s raining given it’s raining = (/)³ = /
Step : Solve for P(B)
P(A|B) = P(B|A) * P(A) / P(B), can be rewritten as
P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)
P(B) = (/)³ * . + (/)³ * . = .*/ + .*/
Step : Solve for P(A|B)
P(A|B) = . * (/) / ( .*/ + .*/)
P(A|B) = / ( + ) = /
Therefore, if all three friends say that it’s raining, then there’s an / chance that it’s actually raining.
You are given cards with four different colors- Green cards, Red Cards, Blue cards, and Yellow cards. The cards of each color are numbered from one to ten. Two cards are picked at random. Find out the probability that the cards picked are not of the same number and same color.
Since these events are not independent, we can use the rule:
P(A and B) = P(A) * P(B|A) ,which is also equal to
P(not A and not B) = P(not A) * P(not B | not A)
For example:
P(not and not yellow) = P(not ) * P(not yellow | not )
P(not and not yellow) = (/) * (/)
P(not and not yellow) = .
Therefore, the probability that the cards picked are not the same number and the same color is .%.
How do you assess the statistical significance of an insight?
You would perform hypothesis testing to determine statistical significance. First, you would state the null hypothesis and alternative hypothesis. Second, you would calculate the p-value, the probability of obtaining the observed results of a test assuming that the null hypothesis is true. Last, you would set the level of the significance (alpha) and if the p-value is less than the alpha, you would reject the null — in other words, the result is statistically significant.
Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
Example of a long tail distribution
A long-tailed distribution is a type of heavy-tailed distribution that has a tail (or tails) that drop off gradually and asymptotically.
practical examples include the power law, the Pareto principle (more commonly known as the – rule), and product sales (i.e. best-selling products vs others).
It’s important to be mindful of long-tailed distributions in classification and regression problems because the least frequently occurring values make up the majority of the population. This can ultimately change the way that you deal with outliers, and it also conflicts with some machine learning techniques with the assumption that the data is normally distributed.
What is the Central Limit Theorem? Explain it. Why is it important?
Statistics How To provides the best definition of CLT, which is:
“The central limit theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size gets larger no matter what the shape of the population distribution.” []
The central limit theorem is important because it is used in hypothesis testing and also to calculate confidence intervals.
What is the statistical power?
‘Statistical power’ refers to the power of a binary hypothesis, which is the probability that the test rejects the null hypothesis given that the alternative hypothesis is true. []
Explain selection bias (with regard to a dataset, not variable selection). Why is it important? How can data management procedures such as missing data handling make it worse?
Selection bias is the phenomenon of selecting individuals, groups or data for analysis in such a way that proper randomization is not achieved, ultimately resulting in a sample that is not representative of the population.
Understanding and identifying selection bias is important because it can significantly skew results and provide false insights about a particular population group.
Types of selection bias include:
- sampling bias: a biased sample caused by non-random sampling
- time interval: selecting a specific time frame that supports the desired conclusion. e.g. conducting a sales analysis near Christmas.
- exposure: includes clinical susceptibility bias, protopathic bias, indication bias.
- data: includes cherry-picking, suppressing evidence, and the fallacy of incomplete evidence.
- attrition: attrition bias is similar to survivorship bias, where only those that ‘survived’ a long process are included in an analysis, or failure bias, where those that ‘failed’ are only included
- observer selection: related to the Anthropic principle, which is a philosophical consideration that any data we collect about the universe is filtered by the fact that, in order for it to be observable, it must be compatible with the conscious and sapient life that observes it. []
Handling missing data can make selection bias worse because different methods impact the data in different ways. For example, if you replace null values with the mean of the data, you adding bias in the sense that you’re assuming that the data is not as spread out as it might actually be.
Provide a simple example of how an experimental design can help answer a question about behavior. How does experimental data contrast with observational data?
Observational data comes from observational studies which are when you observe certain variables and try to determine if there is any correlation.
Experimental data comes from experimental studies which are when you control certain variables and hold them constant to determine if there is any causality.
An example of experimental design is the following: split a group up into two. The control group lives their lives normally. The test group is told to drink a glass of wine every night for days. Then research can be conducted to see how wine affects sleep.
Is mean imputation of missing data acceptable practice? Why or why not?
Mean imputation is the practice of replacing null values in a data set with the mean of the data.
Mean imputation is generally bad practice because it doesn’t take into account feature correlation. For example, imagine we have a table showing age and fitness score and imagine that an eighty-year-old has a missing fitness score. If we took the average fitness score from an age range of to , then the eighty-year-old will appear to have a much higher fitness score that he actually should.
Second, mean imputation reduces the variance of the data and increases bias in our data. This leads to a less accurate model and a narrower confidence interval due to a smaller variance.
What is an outlier? Explain how you might screen for outliers and what would you do if you found them in your dataset. Also, explain what an inlier is and how you might screen for them and what would you do if you found them in your dataset.
An outlier is a data point that differs significantly from other observations.
Depending on the cause of the outlier, they can be bad from a machine learning perspective because they can worsen the accuracy of a model. If the outlier is caused by a measurement error, it’s important to remove them from the dataset. There are a couple of ways to identify outliers:
Z-score/standard deviations: if we know that .% of data in a data set lie within three standard deviations, then we can calculate the size of one standard deviation, multiply it by , and identify the data points that are outside of this range. Likewise, we can calculate the z-score of a given point, and if it’s equal to +/- , then it’s an outlier.
Note: that there are a few contingencies that need to be considered when using this method; the data must be normally distributed, this is not applicable for small data sets, and the presence of too many outliers can throw off z-score.
Interquartile Range (IQR): IQR, the concept used to build boxplots, can also be used to identify outliers. The IQR is equal to the difference between the rd quartile and the st quartile. You can then identify if a point is an outlier if it is less than Q–.*IRQ or greater than Q + .*IQR. This comes to approximately . standard deviations.
Other methods include DBScan clustering, Isolation Forests, and Robust Random Cut Forests.
An inlier is a data observation that lies within the rest of the dataset and is unusual or an error. Since it lies in the dataset, it is typically harder to identify than an outlier and requires external data to identify them. Should you identify any inliers, you can simply remove them from the dataset to address them.
How do you handle missing data? What imputation techniques do you recommend?
There are several ways to handle missing data:
- Delete rows with missing data
- Mean/Median/Mode imputation
- Assigning a unique value
- Predicting the missing values
- Using an algorithm which supports missing values, like random forests
The best method is to delete rows with missing data as it ensures that no bias or variance is added or removed, and ultimately results in a robust and accurate model. However, this is only recommended if there’s a lot of data to start with and the percentage of missing values is low.
You have data on the duration of calls to a call center. Generate a plan for how you would code and analyze these data. Explain a plausible scenario for what the distribution of these durations might look like. How could you test, even graphically, whether your expectations are borne out?
First I would conduct EDA — Exploratory Data Analysis to clean, explore, and understand my data. As part of my EDA, I could compose a histogram of the duration of calls to see the underlying distribution.
My guess is that the duration of calls would follow a lognormal distribution (see below). The reason that I believe it’s positively skewed is because the lower end is limited to since a call can’t be negative seconds. However, on the upper end, it’s likely for there to be a small proportion of calls that are extremely long relatively.
Lognormal Distribution Example
Explain likely differences between administrative datasets and datasets gathered from experimental studies. What are likely problems encountered with administrative data? How do experimental methods help alleviate these problems? What problem do they bring?
Administrative datasets are typically datasets used by governments or other organizations for non-statistical reasons.
Administrative datasets are usually larger and more cost-efficient than experimental studies. They are also regularly updated assuming that the organization associated with the administrative dataset is active and functioning. At the same time, administrative datasets may not capture all of the data that one may want and may not be in the desired format either. It is also prone to quality issues and missing entries.
You are compiling a report for user content uploaded every month and notice a spike in uploads in October. In particular, a spike in picture uploads. What might you think is the cause of this, and how would you test it?
There are a number of potential reasons for a spike in photo uploads:
- A new feature may have been implemented in October which involves uploading photos and gained a lot of traction by users. For example, a feature that gives the ability to create photo albums.
- Similarly, it’s possible that the process of uploading photos before was not intuitive and was improved in the month of October.
- There may have been a viral social media movement that involved uploading photos that lasted for all of October. Eg. Movember but something more scalable.
- It’s possible that the spike is due to people posting pictures of themselves in costumes for Halloween.
The method of testing depends on the cause of the spike, but you would conduct hypothesis testing to determine if the inferred cause is the actual cause.
Give examples of data that does not have a Gaussian distribution, nor log-normal.
- Any type of categorical data won’t have a gaussian distribution or lognormal distribution.
- Exponential distributions — eg. the amount of time that a car battery lasts or the amount of time until an earthquake occurs.
What is root cause analysis? How to identify a cause vs. a correlation? Give examples
Root cause analysis: a method of problem-solving used for identifying the root cause(s) of a problem []
Correlation measures the relationship between two variables, range from – to . Causation is when a first event appears to have caused a second event. Causation essentially looks at direct relationships while correlation can look at both direct and indirect relationships.
Example: a higher crime rate is associated with higher sales in ice cream in Canada, aka they are positively correlated. However, this doesn’t mean that one causes another. Instead, it’s because both occur more when it’s warmer outside.
You can test for causation using hypothesis testing or A/B testing.
Give an example where the median is a better measure than the mean
When there are a number of outliers that positively or negatively skew the data.
Given two fair dices, what is the probability of getting scores that sum to ? to ?
There are combinations of rolling a (+, +, +):
P(rolling a ) = / = /
There are combinations of rolling an (+, +, +, +, +):
P(rolling an ) = /
What is the Law of Large Numbers?
The Law of Large Numbers is a theory that states that as the number of trials increases, the average of the result will become closer to the expected value.
Eg. flipping heads from fair coin , times should be closer to . than times.
How do you calculate the needed sample size?
Formula for margin of error
You can use the margin of error (ME) formula to determine the desired sample size.
- t/z = t/z score used to calculate the confidence interval
- ME = the desired margin of error
- S = sample standard deviation
When you sample, what bias are you inflicting?
Potential biases include the following:
- Sampling bias: a biased sample caused by non-random sampling
- Under coverage bias: sampling too few observations
- Survivorship bias: error of overlooking observations that did not make it past a form of selection process.
How do you control for biases?
There are many things that you can do to control and minimize bias. Two common things include randomization, where participants are assigned by chance, and random sampling, sampling in which each member has an equal probability of being chosen.
What are confounding variables?
A confounding variable, or a confounder, is a variable that influences both the dependent variable and the independent variable, causing a spurious association, a mathematical relationship in which two or more variables are associated but not causally related.
What is A/B testing?
A/B testing is a form of hypothesis testing and two-sample hypothesis testing to compare two versions, the control and variant, of a single variable. It is commonly used to improve and optimize user experience and marketing.
How do you prove that males are on average taller than females by knowing just gender height?
You can use hypothesis testing to prove that males are taller on average than females.
The null hypothesis would state that males and females are the same height on average, while the alternative hypothesis would state that the average height of males is greater than the average height of females.
Then you would collect a random sample of heights of males and females and use a t-test to determine if you reject the null or not.
Infection rates at a hospital above a infection per person-days at risk are considered high. A hospital had infections over the last person-days at risk. Give the p-value of the correct one-sided test of whether the hospital is below the standard.
Since we looking at the number of events (# of infections) occurring within a given timeframe, this is a Poisson distribution question.
The probability of observing k events in an interval
Null (H): infection per person-days
Alternative (H): > infection per person-days
k (actual) = infections
lambda (theoretical) = (/)*
p = . or .% calculated using .poisson() in excel or ppois in R
Since p-value < alpha (assuming % level of significance), we reject the null and conclude that the hospital is below the standard.
You roll a biased coin (p(head)=.) five times. What’s the probability of getting three or more heads?
Use the General Binomial Probability formula to answer this question:
General Binomial Probability Formula
p = .
n =
k = ,,
P( or more heads) = P( heads) + P( heads) + P( heads) = . or %
A random variable X is normal with mean and a standard deviation . Calculate P(X>)
Using Excel…
p =-norm.dist(, , , true)
p= .
Consider the number of people that show up at a bus station is Poisson with mean ./h. What is the probability that at most three people show up in a four hour period?
x =
mean = .* =
using Excel…
p = poisson.dist(,,true)
p = .
An HIV test has a sensitivity of .% and a specificity of .%. A subject from a population of prevalence .% receives a positive test result. What is the precision of the test (i.e the probability he is HIV positive)?
Equation for Precision (PV)
Precision = Positive Predictive Value = PV
PV = (.*.)/[(.*.)+((–.)*(–.))]
PV = . or .%
You are running for office and your pollster polled hundred people. Sixty of them claimed they will vote for you. Can you relax?
- Assume that there’s only you and one other opponent.
- Also, assume that we want a % confidence interval. This gives us a z-score of ..
Confidence interval formula
p-hat = / = .
z* = .
n =
This gives us a confidence interval of [.,.]. Therefore, given a confidence interval of %, if you are okay with the worst scenario of tying then you can relax. Otherwise, you cannot relax until you got out of to claim yes.
Geiger counter records radioactive decays in minutes. Find an approximate % interval for the number of decays per hour.
- Since this is a Poisson distribution question, mean = lambda = variance, which also means that standard deviation = square root of the mean
- a % confidence interval implies a z score of .
- one standard deviation =
Therefore the confidence interval = +/- . = [., .]
The homicide rate in Scotland fell last year to from the year before. Is this reported change really noteworthy?
- Since this is a Poisson distribution question, mean = lambda = variance, which also means that standard deviation = square root of the mean
- a % confidence interval implies a z score of .
- one standard deviation = sqrt() = .
Therefore the confidence interval = +/- . = [., .]. Since is within this confidence interval, we can assume that this change is not very noteworthy.
Consider influenza epidemics for two-parent heterosexual families. Suppose that the probability is % that at least one of the parents has contracted the disease. The probability that the father has contracted influenza is % while the probability that both the mother and father have contracted the disease is %. What is the probability that the mother has contracted influenza?
Using the General Addition Rule in probability:
P(mother or father) = P(mother) + P(father) — P(mother and father)
P(mother) = P(mother or father) + P(mother and father) — P(father)
P(mother) = . + .–.
P(mother) = .
Suppose that diastolic blood pressures (DBPs) for men aged – are normally distributed with a mean of (mm Hg) and a standard deviation of . About what is the probability that a random – year old has a DBP less than ?
Since is one standard deviation below the mean, take the area of the Gaussian distribution to the left of one standard deviation.
= . + . = .%
In a population of interest, a sample of men yielded a sample average brain volume of ,cc and a standard deviation of cc. What is a % Student’s T confidence interval for the mean brain volume in this new population?
Confidence interval for sample
Given a confidence level of % and degrees of freedom equal to , the t-score = .
Confidence interval = +/- .*(/)
Confidence interval = [., .]
A diet pill is given to subjects over six weeks. The average difference in weight (follow up — baseline) is – pounds. What would the standard deviation of the difference in weight have to be for the upper endpoint of the % T confidence interval to touch ?
Upper bound = mean + t-score*(standard deviation/sqrt(sample size))
= – + .*(s/)
= . * s /
s = .
Therefore the standard deviation would have to be at least approximately . for the upper bound of the % T confidence interval to touch .
In a study of emergency room waiting times, investigators consider a new and the standard triage systems. To test the systems, administrators selected nights and randomly assigned the new triage system to be used on nights and the standard system on the remaining nights. They calculated the nightly median waiting time (MWT) to see a physician. The average MWT for the new system was hours with a variance of . while the average MWT for the old system was hours with a variance of .. Consider the % confidence interval estimate for the differences of the mean MWT associated with the new system. Assume a constant variance. What is the interval? Subtract in this order (New System — Old System).
Confidence Interval = mean +/- t-score * standard error (see above)
mean = new mean — old mean = – = –
t-score = . given df= (–) and confidence interval of %
standard error = sqrt((.⁶²*+.⁶⁸²*)/(+–)) * sqrt(/+/)
standard error = .
confidence interval = [-., -.]
To further test the hospital triage system, administrators selected nights and randomly assigned a new triage system to be used on nights and a standard system on the remaining nights. They calculated the nightly median waiting time (MWT) to see a physician. The average MWT for the new system was hours with a standard deviation of . hours while the average MWT for the old system was hours with a standard deviation of hours. Consider the hypothesis of a decrease in the mean MWT associated with the new treatment. What does the % independent group confidence interval with unequal variances suggest vis a vis this hypothesis? (Because there’s so many observations per group, just use the Z quantile instead of the T.)
Assuming we subtract in this order (New System — Old System):
confidence interval formula for two independent samples
mean = new mean — old mean = – = –
z-score = . confidence interval of %
- error = sqrt((.⁵²*+²²*)/(+–)) * sqrt(/+/)
standard error = .
lower bound = -–.*. = -.
upper bound = -+.*. = -.
confidence interval = [-., -.]
Write a SQL query to get the second highest salary from the Employee table. For example, given the Employee table below, the query should return as the second highest salary. If there is no second highest salary, then the query should return null.
+—-+——–+
| Id | Salary |
+—-+——–+
| | |
| | |
| | |
+—-+——–+
SOLUTION A: Using IFNULL, OFFSET
- IFNULL(expression, alt) : ifnull() returns the specified value if null, otherwise returns the expected value. We’ll use this to return null if there’s no second-highest salary.
- OFFSET : offset is used with the ORDER BY clause to disregard the top n rows that you specify. This will be useful as you’ll want to get the second row (nd highest salary)
SELECT
IFNULL(
(SELECT DISTINCT Salary
FROM Employee
ORDER BY Salary DESC
LIMIT OFFSET
), null) as SecondHighestSalary
FROM Employee
LIMIT
This query says to choose the MAX salary that isn’t equal to the MAX salary, which is equivalent to saying to choose the second-highest salary!
SELECT MAX(salary) AS SecondHighestSalary
FROM Employee
WHERE salary != (SELECT MAX(salary) FROM Employee)
Here are three SQL concepts to review before your next interview!
Write a SQL query to find all duplicate emails in a table named Person.
+—-+———+
| Id | Email |
+—-+———+
| | a@b.com |
| | c@d.com |
| | a@b.com |
+—-+———+
SOLUTION A: COUNT() in a Subquery
First, a subquery is created to show the count of the frequency of each email. Then the subquery is filtered WHERE the count is greater than .
SELECT Email
FROM (
SELECT Email, count(Email) AS count
FROM Person
GROUP BY Email
) as email_count
WHERE count >
- HAVING is a clause that essentially allows you to use a WHERE statement in conjunction with aggregates (GROUP BY).
SELECT Email
FROM Person
GROUP BY Email
HAVING count(Email) >
Given a Weather table, write a SQL query to find all dates’ Ids with higher temperature compared to its previous (yesterday’s) dates.
+———+——————+——————+
| Id(INT) | RecordDate(DATE) | Temperature(INT) |
+———+——————+——————+
| | — | |
| | — | |
| | — | |
| | — | |
+———+——————+——————+
- DATEDIFF calculates the difference between two dates and is used to make sure we’re comparing today’s temperature to yesterday’s temperature.
In plain English, the query is saying, Select the Ids where the temperature on a given day is greater than the temperature yesterday.
SELECT DISTINCT a.Id
FROM Weather a, Weather b
WHERE a.Temperature > b.Temperature
AND DATEDIFF(a.Recorddate, b.Recorddate) =
The Employee table holds all employees. Every employee has an Id, a salary, and there is also a column for the department Id.
+—-+——-+——–+————–+
| Id | Name | Salary | DepartmentId |
+—-+——-+——–+————–+
| | Joe | | |
| | Jim | | |
| | Henry | | |
| | Sam | | |
| | Max | | |
+—-+——-+——–+————–+
The Department table holds all departments of the company.
+—-+———-+
| Id | Name |
+—-+———-+
| | IT |
| | Sales |
+—-+———-+
Write a SQL query to find employees who have the highest salary in each of the departments. For the above tables, your SQL query should return the following rows (order of rows does not matter).
+————+———-+——–+
| Department | Employee | Salary |
+————+———-+——–+
| IT | Max | |
| IT | Jim | |
| Sales | Henry | |
+————+———-+——–+
- The IN clause allows you to use multiple OR clauses in a WHERE statement. For example WHERE country = ‘Canada’ or country = ‘USA’ is the same as WHERE country IN (‘Canada’, ’USA’).
- In this case, we want to filter the Department table to only show the highest Salary per Department (i.e. DepartmentId). Then we can join the two tables WHERE the DepartmentId and Salary is in the filtered Department table.
SELECT
Department.name AS ‘Department’,
Employee.name AS ‘Employee’,
Salary
FROM Employee
INNER JOIN Department ON Employee.DepartmentId = Department.Id
WHERE (DepartmentId , Salary)
IN
( SELECT
DepartmentId, MAX(Salary)
FROM
Employee
GROUP BY DepartmentId
)
Mary is a teacher in a middle school and she has a table seat storing students’ names and their corresponding seat ids. The column id is a continuous increment. Mary wants to change seats for the adjacent students.
Can you write a SQL query to output the result for Mary?
+———+———+
| id | student |
+———+———+
| | Abbot |
| | Doris |
| | Emerson |
| | Green |
| | Jeames |
+———+———+
For the sample input, the output is:
+———+———+
| id | student |
+———+———+
| | Doris |
| | Abbot |
| | Green |
| | Emerson |
| | Jeames |
+———+———+
Note:
If the number of students is odd, there is no need to change the last one’s seat.
- Think of a CASE WHEN THEN statement like an IF statement in coding.
- The first WHEN statement checks to see if there’s an odd number of rows, and if there is, ensure that the id number does not change.
- The second WHEN statement adds to each id (eg. ,, becomes ,,)
- Similarly, the third WHEN statement subtracts to each id (,, becomes ,,)
SELECT
CASE
WHEN((SELECT MAX(id) FROM seat)% = ) AND id = (SELECT MAX(id) FROM seat) THEN id
WHEN id% = THEN id +
ELSE id –
END AS id, student
FROM seat
ORDER BY id
If there are marbles of equal weight and marble that weighs a little bit more (for a total of marbles), how many weighing are required to determine which marble is the heaviest?
Two weighing would be required (see part A and B above):
- You would split the nine marbles into three groups of three and weigh two of the groups. If the scale balances (alternative ), you know that the heavy marble is in the third group of marbles. Otherwise, you’ll take the group that is weighed more heavily (alternative ).
- Then you would exercise the same step, but you’d have three groups of one marble instead of three groups of three.
How would the change of prime membership fee affect the market?
I’m not 100% sure about the answer to this question but will give my best shot!
Let’s take the instance where there’s an increase in the prime membership fee — there are two parties involved, the buyers and the sellers.
For the buyers, the impact of an increase in a prime membership fee ultimately depends on the price elasticity of demand for the buyers. If the price elasticity is high, then a given increase in price will result in a large drop in demand and vice versa. Buyers that continue to purchase a membership fee are likely Amazon’s most loyal and active customers — they are also likely to place a higher emphasis on products with prime.
Sellers will take a hit, as there is now a higher cost of purchasing Amazon’s basket of products. That being said, some products will take a harder hit while others may not be impacted. It is likely that premium products that Amazon’s most loyal customers purchase would not be affected as much, like electronics.
If a PM says that they want to double the number of ads in Newsfeed, how would you figure out if this is a good idea or not?
You can perform an A/B test by splitting the users into two groups: a control group with the normal number of ads and a test group with double the number of ads. Then you would choose the metric to define what a “good idea” is. For example, we can say that the null hypothesis is that doubling the number of ads will reduce the time spent on Facebook and the alternative hypothesis is that doubling the number of ads won’t have any impact on the time spent on Facebook. However, you can choose a different metric like the number of active users or the churn rate. Then you would conduct the test and determine the statistical significance of the test to reject or not reject the null.
What is: lift, KPI, robustness, model fitting, design of experiments, / rule?
Lift: lift is a measure of the performance of a targeting model measured against a random choice targeting model; in other words, lift tells you how much better your model is at predicting things than if you had no model.
KPI: stands for Key Performance Indicator, which is a measurable metric used to determine how well a company is achieving its business objectives. Eg. error rate.
Robustness: generally, robustness refers to a system’s ability to handle variability and remain effective.
Model fitting: refers to how well a model fits a set of observations.
Design of experiments: also known as DOE, it is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variable. [] In essence, an experiment aims to predict an outcome based on a change in one or more inputs (independent variables).
/ rule: also known as the Pareto principle; states that % of the effects come from % of the causes. Eg. % of sales come from % of customers.
Define quality assurance, six sigma.
Quality assurance: an activity or set of activities focused on maintaining a desired level of quality by minimizing mistakes and defects.
Six sigma: a specific type of quality assurance methodology composed of a set of techniques and tools for process improvement. A six-sigma process is one in which .% of all outcomes are free of defects.
If % of Facebook users on iOS use Instagram, but only % of Facebook users on Android use Instagram, how would you investigate the discrepancy?
There are a number of possible variables that can cause such a discrepancy that I would check to see:
- The demographics of iOS and Android users might differ significantly. For example, according to Hootsuite, % of females use Instagram as opposed to % of men. If the proportion of female users for iOS is significantly larger than for Android then this can explain the discrepancy (or at least a part of it). This can also be said for age, race, ethnicity, location, etc…
- Behavioural factors can also have an impact on the discrepancy. If iOS users use their phones more heavily than Android users, it’s more likely that they’ll indulge in Instagram and other apps than someone who spent significantly less time on their phones.
- Another possible factor to consider is how Google Play and the App Store differ. For example, if Android users have significantly more apps (and social media apps) to choose from, that may cause greater dilution of users.
- Lastly, any differences in the user experience can deter Android users from using Instagram compared to iOS users. If the app is more buggy for Android users than iOS users, they’ll be less likely to be active on the app.
Likes/user and minutes spent on a platform are increasing but total number of users are decreasing. What could be the root cause of it?
Generally, you would want to probe the interviewer for more information but let’s assume that this is the only information that he/she is willing to give.
Focusing on likes per user, there are two reasons why this would have gone up. The first reason is that the engagement of users has generally increased on average over time — this makes sense because as time passes, active users are more likely to be loyal users as using the platform becomes a habitual practice. The other reason why likes per user would increase is that the denominator, the total number of users, is decreasing. Assuming that users that stop using the platform are inactive users, aka users with little engagement and fewer likes than average, this would increase the average number of likes per user.
The explanation above can also be applied to minutes spent on the platform. Active users are becoming more engaged over time, while users with little usage are becoming inactive. Overall, the increase in engagement outweighs the users with little engagement.
To take it a step further, it’s possible that the ‘users with little engagement’ are bots that Facebook has been able to detect. But over time, Facebook has been able to develop algorithms to spot and remove bots. If were a significant number of bots before, this can potentially be the root cause of this phenomenon.
Facebook sees that likes are up % year over year, why could this be?
The total number of likes in a given year is a function of the total number of users and the average number of likes per user (which I’ll refer to as engagement).
Some potential reasons for an increase in the total number of users are the following: users acquired due to international expansion and younger age groups signing up for Facebook as they get older.
Some potential reasons for an increase in engagement are an increase in usage of the app from users that are becoming more and more loyal, new features and functionality, and an improved user experience.
. If we were testing product X, what metrics would you look at to determine if it is a success?
The metrics that determine a product’s success are dependent on the business model and what the business is trying to achieve through the product. The book Lean analytics lays out a great framework that one can use to determine what metrics to use in a given scenario:
Framework from Lean Analytics
So, this brings us to the end of the Data Science Interview Questions blog.This Tecklearn ‘Top Data Science Interview Questions and Answers’ helps you with commonly asked questions if you are looking out for a job in Data Science Domain. If you wish to learn Data Science and build a career in Data Science domain, then check out our interactive Data Science Training using R Language, that comes with 24*7 support to guide you throughout your learning period.
https://www.tecklearn.com/course/data-science-training-using-r-language/
Data Science using R Language Training
About the Course
Tecklearn’s Data Science using R Language Training develops knowledge and skills to visualize, transform, and model data in R language. It helps you to master the Data Science with R concepts such as data visualization, data manipulation, machine learning algorithms, charts, hypothesis testing, etc. through industry use cases, and real-time examples. Data Science course certification training lets you master data analysis, R statistical computing, connecting R with Hadoop framework, Machine Learning algorithms, time-series analysis, K-Means Clustering, Naïve Bayes, business analytics and more. This course will help you gain hands-on experience in deploying Recommender using R, Evaluation, Data Transformation etc.
Why Should you take Data Science Using R Training?
- The Average salary of a Data Scientist in R is $123k per annum – Glassdoor.com
- A recent market study shows that the Data Analytics Market is expected to grow at a CAGR of 30.08% from 2020 to 2023, which would equate to $77.6 billion.
- IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle & other MNCs worldwide are using data science for their Data analysis.
What you will Learn in this Course?
Introduction to Data Science
- Need for Data Science
- What is Data Science
- Life Cycle of Data Science
- Applications of Data Science
- Introduction to Big Data
- Introduction to Machine Learning
- Introduction to Deep Learning
- Introduction to R&R-Studio
- Project Based Data Science
Introduction to R
- Introduction to R
- Data Exploration
- Operators in R
- Inbuilt Functions in R
- Flow Control Statements & User Defined Functions
- Data Structures in R
Data Manipulation
- Need for Data Manipulation
- Introduction to dplyr package
- Select (), filter(), mutate(), sample_n(), sample_frac() & count() functions
- Getting summarized results with the summarise() function,
- Combining different functions with the pipe operator
- Implementing sql like operations with sqldf()
Visualization of Data
- Loading different types of dataset in R
- Arranging the data
- Plotting the graphs
Introduction to Statistics
- Types of Data
- Probability
- Correlation and Co-variance
- Hypothesis Testing
- Standardization and Normalization
Introduction to Machine Learning
- What is Machine Learning?
- Machine Learning Use-Cases
- Machine Learning Process Flow
- Machine Learning Categories
- Supervised Learning algorithm: Linear Regression and Logistic Regression
Logistic Regression
- Intro to Logistic Regression
- Simple Logistic Regression in R
- Multiple Logistic Regression in R
- Confusion Matrix
- ROC Curve
Classification Techniques
- What are classification and its use cases?
- What is Decision Tree?
- Algorithm for Decision Tree Induction
- Creating a Perfect Decision Tree
- Confusion Matrix
- What is Random Forest?
- What is Naive Bayes?
- Support Vector Machine: Classification
Decision Tree
- Decision Tree in R
- Information Gain
- Gini Index
- Pruning
Recommender Engines
- What is Association Rules & its use cases?
- What is Recommendation Engine & it’s working?
- Types of Recommendations
- User-Based Recommendation
- Item-Based Recommendation
- Difference: User-Based and Item-Based Recommendation
- Recommendation use cases
Time Series Analysis
- What is Time Series data?
- Time Series variables
- Different components of Time Series data
- Visualize the data to identify Time Series Components
- Implement ARIMA model for forecasting
- Exponential smoothing models
- Identifying different time series scenario based on which different Exponential Smoothing model can be applied
Got a question for us? Please mention it in the comments section and we will get back to you.
0 responses on "Top Data Science Interview Questions and Answers"