Coming from Computer Science background, experienced with 9 years in building Big data application with both batch and streaming capabilities. Data analytics in R and spark Mlib.
Experienced in consuming classical DW/ BI application along with unstructured sensors, logging, machine generated data capture to Big data architecture accommodating large scale and varied sources data ingestion, modelling, profiling and governance.
I have operated in Germany (Berlin, Nuremberg) , Great Britain (London) area working for Tier 1 business organisations in building enterprise DW and Big data applications.
Experienced in Big Data eco systems (Hadoop, Spark-Scala, Spark Streaming, Spark Mlib, Apache Drill, Kafka, Flume, Apache Samza, Impala, Hive, Java Map Reduce, HBase, Oozie, Zookeeper, Yarn, Storm, Solr Search, Elastic search etc). Enhancing Cloudera Manager Monitoring through Nagios plugins.
Implementation of Big data Lake architecture in a multi node clusters in Hadoop 2.0.
Database : NoSQL – HBase, Cassandra, MongoDB, CouchDB
Columnar Database – SAP HANA, CDH Impala, Exasol
RDBMS – PostgreSQL, Oracle, MySQL, Tera data
• BIG Data eco system :
o Data Mgmt : YARN, Impala Admission Control.
o Data Access: Spark-Scala, Spark Streaming, Java Map Reduce, Cloudera Impala, Hive, Solr,
o Realtime Integration : Sqoop, Flume, Kafka, Apache Drill, Apache Ambari,
Informatica TDM, power center, Tibco EI, Talend.
o Workflow Manager: Oozie
• Lambda Architecture
• Conceptual and Physical Data Modelling for Big Data
• Functional Programming using Scala
• Big Data Design Patterns.
• Deep learning and machine learning libraries
• Java – Spring, Hibernate, REST, Collections – List, Tree, Hash Map, Set, advanced Generics.
• Advanced Analytics, Data visualization – Dashboard, Customer Scorecard, Data Modelling, KPI