Tools
-
Keras
Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow, CNTK. It was developed with a focus on enabling fast experimentation.
-
Python
Python has become a competitor as go-to language for all things data-science.
-
Tensorflow
Developed by the Google Brain team, Tensorflow is now under Apache 2.0 open source license, and is one of the best applications for neural networks.It's written in Python, C++ and CUDA.
-
Scala
Scala is a modern multi-paradigm programming language stemming from Java. It is designed to express common programming patterns in a concise, elegant, and type-safe way. Spark loves Scala (not the choir though).
-
Torch
Part of the world of machine learning technologies and neural network, torch is an open source machine learning library, based Lua (using LuaJIT as scripting language).
-
Apache Kafka
Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
-
Hadoop
Hadoop and its filesystem HDFS is open-source software (part of Apache) for distributed processing of Big Data.
-
R
R is the go-to tool of our Data Scientists. It's great for exploratory analysis and allows easy access to statistical, mathematical and machine learning functions.
-
Hive
The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.
-
Spark
Spark can run on Hadoop 2's YARN and can read any existing Hadoop data. It is developed to run programs faster by making more use of in-memory data processing. Spark developers claim that it runs 100 times faster than Hadoop MapReduce in memory or 10 times faster on disk.