Malware-Detection Web App
Malware Analysis using Deep Learning & Machine Learning deployed on AWS cloud. ML & DL algorithms was written in Python, Server Part written in node.js
Abstract
In this Technological world, every day the number of malwares is increasing at high rates and its difficult to log all the upcoming malwares into the database and computationally heavy to do a signature-based search in the Malware Database. Manual Process of analysing malwares will lead to prediction of wrong family, since there were thousands of malware families floating throughout the internet. The objective of our App is to automate the process of Malware Analysis and use Artificial Intelligence to statically analyse the malware samples and predict the class. Through this Web-App everyone can check their Portable executive files whether it is benign, malware and also subclasses of malware affected PE files. Designing a perfect architecture with the data & algorithms available will make the prediction of real-world malwares classify more precisely. In this Project, we use light-gbm, Xgboost & CNN algorithms on various features extracted from the PE file. By deploying this Malware Detection AI on the cloud will help many victims across the world to scan their files and also help us get data from users around the world which tune our deployed model in order to make accurate predictions.
Introduction
Malware is a major threat to the security of computer users which can cause huge financial losses to firm. Malware has different names such as adware, rootkit, backdoor, ransomware, trojans, worms, spyware etc. Thus, detecting these malwares became as an evolving problem for researchers. We also live in an age where security threats and infections are discovered at a daily basis. Anti-viruses fail to detect zero-day attacks which are getting much more well engineered and obfuscated. Convolutional Neural Networks demonstrated better performance. Here feature engineering, feature learning and feature representation are automatically acquired. Boosted decision trees working on n-grams are found to produce better results than both the Naive Bayes classifier and Support Vector Machines. CNN works better on Image classification thus converting bytes files to the Images and classifying images based on it. Cloud is inevitable technology that scales users’ advantage and business improvement in a large scale. By deploying in the cloud, we can automate the process of generating revenue by paying as much as we consume the computation. This decreases the cost of deployment. Allows our app to scale whenever its needed.
Working
 At the creation of Ec2 instance, elastic beanstalk starts the node.js main thread, which automatically installs the required python packages and sets the environment for backend process. Our App is accessible by both the mobile and windows users, but still android executable apk malware scanning is not employed. Users can access out website 24x7 since it is deployed on cloud and it is highly scalable depend on the number of users accessing the cloud. Users need to upload the file they need to scan. Max upload size of 5mb is supported by the website. And they need to invoke the predict function by clicking the predict button of the upload form in the main page. This Request with the uploaded file is transmitted to the server securely through https protocol and placed inside the ec2 instances for the further work. Node.js takes cares of all the part that explained above. After successfully received the file, it spawns the python process through bash terminal of ec2 Linux. And the created python process takes care of all the machine learning-deep learning process happening inside the Ec2 instance.
 At the creation of Ec2 instance, elastic beanstalk starts the node.js main thread, which automatically installs the required python packages and sets the environment for backend process. Our App is accessible by both the mobile and windows users, but still android executable apk malware scanning is not employed. Users can access out website 24x7 since it is deployed on cloud and it is highly scalable depend on the number of users accessing the cloud. Users need to upload the file they need to scan. Max upload size of 5mb is supported by the website. And they need to invoke the predict function by clicking the predict button of the upload form in the main page. This Request with the uploaded file is transmitted to the server securely through https protocol and placed inside the ec2 instances for the further work. Node.js takes cares of all the part that explained above. After successfully received the file, it spawns the python process through bash terminal of ec2 Linux. And the created python process takes care of all the machine learning-deep learning process happening inside the Ec2 instance.
Python Process
Extracts the static features from the headers and converts the portable executable content into ‘.bytes’ and ‘.asm’ files which is then parsed to the machine learning model that is already trained in the local machine and deployed in the ec2 instances. Model.py contains the total architecture that these files will go through in order to classify it. This process is discussed