Get in Touch

Course Outline

Day 01

Overview of Big Data Business Intelligence for Criminal Intelligence Analysis

  • Case Studies from Law Enforcement - Predictive Policing.
  • Big Data adoption rates in Law Enforcement Agencies and their alignment with Big Data Predictive Analytics for future operations.
  • Emerging technology solutions such as gunshot sensors, surveillance video, and social media.
  • Leveraging Big Data technology to mitigate information overload.
  • Integrating Big Data with Legacy data.
  • Foundational understanding of enabling technologies in predictive analytics.
  • Data Integration & Dashboard visualization.
  • Fraud management.
  • Business Rules and Fraud detection.
  • Threat detection and profiling.
  • Cost-benefit analysis for Big Data implementation.

Introduction to Big Data

  • Key characteristics of Big Data: Volume, Variety, Velocity, and Veracity.
  • MPP (Massively Parallel Processing) architecture.
  • Data Warehouses – static schema, slowly evolving datasets.
  • MPP Databases: Greenplum, Exadata, Teradata, Netezza, Vertica, etc.
  • Hadoop-Based Solutions – no structural constraints on datasets.
  • Typical pattern: HDFS, MapReduce (crunch), retrieval from HDFS.
  • Apache Spark for stream processing.
  • Batch processing – suited for analytical/non-interactive tasks.
  • Volume: CEP streaming data.
  • Typical choices – CEP products (e.g., Infostreams, Apama, MarkLogic, etc.).
  • Less production-ready options – Storm/S4.
  • NoSQL Databases – (columnar and key-value): Best suited as analytical adjuncts to data warehouses/databases.

NoSQL Solutions

  • KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB).
  • KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB.
  • KV Store (Hierarchical) - GT.m, Cache.
  • KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord.
  • KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracotta.
  • Tuple Store - Gigaspaces, Coord, Apache River.
  • Object Database - ZopeDB, DB40, Shoal.
  • Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris.
  • Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI.

Varieties of Data: Introduction to Data Cleaning Issues in Big Data

  • RDBMS – static structure/schema; does not promote an agile, exploratory environment.
  • NoSQL – semi-structured; provides enough structure to store data without an exact schema prior to storage.
  • Data cleaning issues.

Hadoop

  • When to select Hadoop?
  • STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not ideal for active exploration).
  • SEMI STRUCTURED data – difficult to handle using traditional solutions (DW/DB).
  • Warehousing data = HUGE effort and static even after implementation.
  • For variety & volume of data, crunched on commodity hardware – HADOOP.
  • Commodity H/W needed to create a Hadoop Cluster.

Introduction to Map Reduce /HDFS

  • MapReduce – distribute computing over multiple servers.
  • HDFS – make data available locally for the computing process (with redundancy).
  • Data – can be unstructured/schema-less (unlike RDBMS).
  • Developer responsibility to make sense of data.
  • Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS.

Day 02

Big Data Ecosystem -- Building Big Data ETL (Extract, Transform, Load) -- Which Big Data Tools to use and when?

  • Hadoop vs. Other NoSQL solutions.
  • For interactive, random access to data.
  • Hbase (column-oriented database) on top of Hadoop.
  • Random access to data but restrictions imposed (max 1 PB).
  • Not ideal for ad-hoc analytics; good for logging, counting, time-series.
  • Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access).
  • Flume – Stream data (e.g., log data) into HDFS.

Big Data Management System

  • Moving parts, compute nodes start/fail: ZooKeeper - For configuration/coordination/naming services.
  • Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain.
  • Deploy, configure, cluster management, upgrade etc (sys admin): Ambari.
  • In Cloud: Whirr.

Predictive Analytics -- Fundamental Techniques and Machine Learning based Business Intelligence

  • Introduction to Machine Learning.
  • Learning classification techniques.
  • Bayesian Prediction – preparing a training file.
  • Support Vector Machine.
  • KNN p-Tree Algebra & vertical mining.
  • Neural Networks.
  • Big Data large variable problem – Random Forest (RF).
  • Big Data Automation problem – Multi-model ensemble RF.
  • Automation through Soft10-M.
  • Text analytic tool – Treeminer.
  • Agile learning.
  • Agent-based learning.
  • Distributed learning.
  • Introduction to Open Source Tools for predictive analytics: R, Python, Rapidminer, Mahout.

Predictive Analytics Ecosystem and its application in Criminal Intelligence Analysis

  • Technology and the investigative process.
  • Insight analytic.
  • Visualization analytics.
  • Structured predictive analytics.
  • Unstructured predictive analytics.
  • Threat/fraudster/vendor profiling.
  • Recommendation Engine.
  • Pattern detection.
  • Rule/Scenario discovery – failure, fraud, optimization.
  • Root cause discovery.
  • Sentiment analysis.
  • CRM analytics.
  • Network analytics.
  • Text analytics for obtaining insights from transcripts, witness statements, internet chatter, etc.
  • Technology-assisted review.
  • Fraud analytics.
  • Real-Time Analytic.

Day 03

Real Time and Scalable Analytics Over Hadoop

  • Why common analytic algorithms fail in Hadoop/HDFS.
  • Apache Hama – for Bulk Synchronous distributed computing.
  • Apache SPARK – for cluster computing and real-time analytic.
  • CMU Graphics Lab2 – Graph-based asynchronous approach to distributed computing.
  • KNN p – Algebra-based approach from Treeminer for reduced hardware cost of operation.

Tools for eDiscovery and Forensics

  • eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance.
  • Predictive coding and Technology Assisted Review (TAR).
  • Live demo of vMiner for understanding how TAR enables faster discovery.
  • Faster indexing through HDFS – Velocity of data.
  • NLP (Natural Language processing) – open source products and techniques.
  • eDiscovery in foreign languages – technology for foreign language processing.

Big Data BI for Cyber Security – Getting a 360-degree view, speedy data collection and threat identification

  • Understanding the basics of security analytics – attack surface, security misconfiguration, host defenses.
  • Network infrastructure / Large datapipe / Response ETL for real-time analytic.
  • Prescriptive vs predictive – Fixed rule-based vs auto-discovery of threat rules from Meta data.

Gathering disparate data for Criminal Intelligence Analysis

  • Using IoT (Internet of Things) as sensors for capturing data.
  • Using Satellite Imagery for Domestic Surveillance.
  • Using surveillance and image data for criminal identification.
  • Other data gathering technologies – drones, body cameras, GPS tagging systems, and thermal imaging technology.
  • Combining automated data retrieval with data obtained from informants, interrogation, and research.
  • Forecasting criminal activity.

Day 04

Fraud Prevention BI from Big Data in Fraud Analytics

  • Basic classification of Fraud Analytics – rules-based vs predictive analytics.
  • Supervised vs unsupervised Machine learning for Fraud pattern detection.
  • Business-to-business fraud, medical claims fraud, insurance fraud, tax evasion, and money laundering.

Social Media Analytics – Intelligence gathering and analysis

  • How criminals use Social Media to organize, recruit, and plan.
  • Big Data ETL API for extracting social media data.
  • Text, image, meta data, and video.
  • Sentiment analysis from social media feeds.
  • Contextual and non-contextual filtering of social media feeds.
  • Social Media Dashboard to integrate diverse social media.
  • Automated profiling of social media profiles.
  • Live demo of each analytic will be given through the Treeminer Tool.

Big Data Analytics in image processing and video feeds

  • Image Storage techniques in Big Data – Storage solutions for data exceeding petabytes.
  • LTFS (Linear Tape File System) and LTO (Linear Tape Open).
  • GPFS-LTFS (General Parallel File System - Linear Tape File System) – layered storage solution for Big image data.
  • Fundamentals of image analytics.
  • Object recognition.
  • Image segmentation.
  • Motion tracking.
  • 3-D image reconstruction.

Biometrics, DNA and Next Generation Identification Programs

  • Beyond fingerprinting and facial recognition.
  • Speech recognition, keystroke (analyzing a user's typing pattern), and CODIS (combined DNA Index System).
  • Beyond DNA matching: using forensic DNA phenotyping to construct a face from DNA samples.

Big Data Dashboard for quick accessibility of diverse data and display :

  • Integration of existing application platforms with Big Data Dashboards.
  • Big Data management.
  • Case Study of Big Data Dashboards: Tableau and Pentaho.
  • Use Big Data apps to push location-based services in Government.
  • Tracking system and management.

Day 05

How to justify Big Data BI implementation within an organization:

  • Defining the ROI (Return on Investment) for implementing Big Data.
  • Case studies for saving Analyst Time in data collection and preparation – increasing productivity.
  • Revenue gain from lower database licensing costs.
  • Revenue gain from location-based services.
  • Cost savings from fraud prevention.
  • An integrated spreadsheet approach for calculating approximate expenses vs. Revenue gain/savings from Big Data implementation.

Step-by-step procedure for replacing a legacy data system with a Big Data System

  • Big Data Migration Roadmap.
  • What critical information is needed before architecting a Big Data system?
  • What are the different ways for calculating Volume, Velocity, Variety, and Veracity of data?
  • How to estimate data growth.
  • Case studies.

Review of Big Data Vendors and review of their products.

  • Accenture.
  • APTEAN (Formerly CDC Software).
  • Cisco Systems.
  • Cloudera.
  • Dell.
  • EMC.
  • GoodData Corporation.
  • Guavus.
  • Hitachi Data Systems.
  • Hortonworks.
  • HP.
  • IBM.
  • Informatica.
  • Intel.
  • Jaspersoft.
  • Microsoft.
  • MongoDB (Formerly 10Gen).
  • MU Sigma.
  • Netapp.
  • Opera Solutions.
  • Oracle.
  • Pentaho.
  • Platfora.
  • Qliktech.
  • Quantum.
  • Rackspace.
  • Revolution Analytics.
  • Salesforce.
  • SAP.
  • SAS Institute.
  • Sisense.
  • Software AG/Terracotta.
  • Soft10 Automation.
  • Splunk.
  • Sqrrl.
  • Supermicro.
  • Tableau Software.
  • Teradata.
  • Think Big Analytics.
  • Tidemark Systems.
  • Treeminer.
  • VMware (Part of EMC).

Q/A session.

Requirements

  • Knowledge of law enforcement procedures and data systems.
  • Basic understanding of SQL/Oracle or relational databases.
  • Basic understanding of statistics (at the spreadsheet level).

Target Audience

  • Law enforcement specialists with a technical background.
 35 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories