Bengaluru, Karnataka, India
Self-motivated and hardworking,diligent professional with 3 year of experience in Data Scientist, Currently working as a Data Scientist, I am eager to thrive in a challenging environment where I can demonstrate my skills, utilize my knowledge, and contribute to the organization's growth. I am particularly interested in Artificial Intelligence and seek opportunities in Data Science, Machine Learning, Data Analysis, and related fields where I can leverage my abilities to make significant contributions to the employer's success while also enhancing my own capabilities for personal growth. TECHNICAL SKILLS: Data Structure & Algorithms,Machine Learning,SQL Python, Statistical Modelling Classification, Clustering ,Machine learning, Deep learning, Natural language processing, Data Structures Data Visualization, Feature Engineering, Regression. Programming Languages: Python(Proficient),Java(basics) Framework & Libraries: TensorFlow, TensorFlow Lite, Keras, Numpy, Pandas , PyTorch, SK-Learn, OpenCV, Matplotlib, Seaborn, Plotly Dash, re, Pandas Profiling,Flask,Heroku,Streamlit. Softwares & Tools: Git, GitHub,PyCharm,VScode,Jupyter Notebook,Hugging Face,Roboflow Excellent communication skill,Ability to grasp the new skills quickly,Hard-working,Excellent knowledge of Core subjects,Participated in various sports events,Participated in Annual Sports Day at school,Participated in various cultural events in School and Colleges. 🔍 You can reach me at : ✅GitHub repository : https://github.com/MMuttalib1326 ✅Kaggle : https://www.kaggle.com/mohdmuttalib
Designed and developed scalable data pipelines on Amazon Web Services to process and transform large datasets Built and managed ETL workflows to load data into Amazon Redshift for analytics and reporting Wrote complex SQL queries for data extraction, transformation, and performance optimization in Redshift Optimized Redshift tables using distribution keys, sort keys, and query tuning to improve performance Automated data workflows and scheduling using tools like Airflow / cron jobs (if applicable) Ensured data quality and integrity through validation checks and monitoring pipelines Collaborated with data analysts and data scientists to provide clean, structured datasets for business insights Monitored and troubleshot data pipeline failures, reducing downtime and ensuring reliability Managed data storage solutions on AWS (e.g., S3) for efficient and scalable data handling Used Git for version control and collaborated with team members on code deployment Improved data pipeline efficiency by optimizing resource usage and reducing processing time for large-scale datasets Documented data architecture, pipeline workflows, and best practices to support team collaboration and future scalability
I am proud to be a Kaggle Expert, My expertise and skills in programming, statistical analysis, and machine learning algorithms have been honed through consistent high performance on Kaggle's challenging competitions. As a Kaggle Master, I am committed to contributing to the data science community by sharing my knowledge and expertise through tutorials, blogs, and discussions. This designation is a testament to my dedication and hard work in the field, and I am eager to continue my growth and impact as a leader in data science.
I participated in numerous competitions, using my skills in machine learning, data analysis, and visualization to deliver high-quality results.I also actively engaged with the Kaggle community by sharing your insights and techniques, helping others to learn and improve their skills.
I have been actively involved in open-source contributions on GitHub, exploring and contributing to various machine-learning projects. Through this experience, I have been able to gain a better understanding of the open-source development process, hone my coding and problem-solving skills, and make valuable connections with other developers. I love the idea of open source and I am looking forward to continuing to contribute to the open-source community in the future.
• Mastering acquired skills like Advanced Excel, Advanced SQL, PowerBI, Tableau, Looker Studio, and Python. • Working on Personal Development and Building a Professional portfolio. • Challenging myself every single day to push my bars of understanding. • Learning Personal Branding and developing business communication skills for smooth professional interaction. • Exploring the future leading libraries, tools, technologies and frameworks such as Big Data, Hadoop, Hive, Airflow, Apache, PySpark, ETL, Snowflake, OpenCV, SpaCy, Web Scrapping, Azure, AWS, Streamlit, Flask, Keras, TensorFlow, and Deep Learning.
• Developed an unsupervised ML model that can perform clustering on the comparable dataset by matching text-based attributes. • Utilized one-hot encoding to transform data and evaluated feature correlation. • Experimented with Elbow Method, Hierarchical Clustering and Silhouette analysis to figure out optimal number of cluster. • Performed Topic Modeling using LDA and LSA to figure out the latent topics of the contents for the calculated 3 clusters. • Performed EDA and Implemented clustering algorithms on over 7700 records of Netflix Movies and TV Shows and successfully identified well-separated clusters in high dimensional space.. • Determined the optimal number of clusters using the Elbow method and Silhouette Scores. The optimal number of clusters was 4 for K-Means, and 2 for Hierarchical clustering. • Processed the textual features using NLP techniques, including text cleaning, tokenization, text normalization, and text vectorization using TFIDF followed by PCA to handle the created sparse matrix containing over 46,000 attributes. Skills: k-means clustering · Hierarchical Clustering · PCA · Scikit-Learn · Data Analysis · NumPy · Seaborn
• This dataset consists of Cab Trip Record data, and the dataset is based on the 2016 NYC Cab trip record data, . • The dataset was originally published by the NYC Taxi and Limousine Commission. •The dataset consists the information of Cab Trip Data like pickup time, geo-coordinates, number of passengers etc. • Based on individual trip attributes, we will be predicting the duration of trip and The task was to build a model that predicts the total ride duration of taxi trips in New York City. • It will be interesting to explore what all other insights can be obtained from the same dataset. Our main goal in this project was to determine different factors affecting to Taxi trip duration and service. •Before visulization of the data, data analysis was done and checked for the missing values and treated. •From data visualization, found that Most of the trips durations took between 10-30 mins to complete. •To predict the trip duration for a particular taxi, we can conclude that XGBooster Regressor is the most suitable model as compared to the other models. •This type of prediction and research in the cab booking segment helps companies to gain more profit. • • Predicting bookings and peak hours are a very important factor for cab providers.
Collected, cleaned, and preprocessed large datasets using tools like Python (Pandas, NumPy) to ensure data quality and consistency Performed exploratory data analysis (EDA) to identify patterns, trends, and insights for business decision-making Built and implemented machine learning models (e.g., regression, classification) to solve real-world problems Evaluated model performance using metrics such as accuracy, precision, recall, and F1-score Visualized data and model results using tools like Matplotlib, Seaborn, or Tableau/Power BI Worked with SQL databases to extract and manipulate structured data Collaborated with cross-functional teams to understand business requirements and translate them into data solutions Developed data pipelines and automated repetitive data processing tasks Documented methodologies, experiments, and results for future reference and team use Assisted in deploying models or presenting findings to stakeholders