bigdata_analyse 大数据分析项目

Last update: Dec 30, 2022

Related tags

Data Analysis bigdata_analyse

Overview

bigdata_analyse

大数据分析项目

wish

采用不同的技术栈，通过对不同行业的数据集进行分析，期望达到以下目标：

了解不同领域的业务分析指标
深化数据处理、数据分析、数据可视化能力
增加大数据批处理、流处理的实践经验
增加数据挖掘的实践经验

tip

项目主要使用的编程语言是 python、sql、hql
.ipynb 可以用 jupyter notebook 打开，如何安装, 可以参考 jupyter notebook

jupyter notebook 是一种网页交互形式的 python 编辑器，直接通过 pip 安装，也支持 markdown，很适合用来做数据分析可视化以及写文章、写示例代码等。

list

主题	处理方式	技术栈	数据集下载
1 亿条淘宝用户行为数据分析	离线处理	清洗 hive + 分析 hive + 可视化 echarts	阿里云或者百度网盘提取码：5ipq
1000 万条淘宝用户行为数据实时分析	实时处理	数据源 kafka + 实时分析 flink + 可视化（es + kibana）	百度网盘提取码：m4mc
300 万条《野蛮时代》的玩家数据分析	离线处理	清洗 pandas + 分析 mysql + 可视化 pyecharts	百度网盘提取码：paq4
130 万条深圳通刷卡数据分析	离线处理	清洗 pandas + 分析 impala + 可视化 dbeaver	百度网盘提取码：t561
10 万条厦门招聘数据分析	离线处理	清洗 pandas + 分析 hive + 可视化 ( hue + pyecharts ) + 预测 sklearn	百度网盘提取码：9wx0
7000 条租房数据分析	离线处理	清洗 pandas + 分析 sqlite + 可视化 matplotlib	百度网盘提取码：9en3
6000 条倒闭企业数据分析	离线处理	清洗 pandas + 分析 pandas + 可视化 (jupyter notebook + pyecharts)	百度网盘提取码：xvgm

refer

https://tianchi.aliyun.com/dataset/

https://opendata.sz.gov.cn/data/api/toApiDetails/29200_00403601

https://www.kesci.com/home/dataset

Owner

Way

Way

GitHub Repository

Fit models to your data in Python with Sherpa.

Table of Contents Sherpa License How To Install Sherpa Using Anaconda Using pip Building from source History Release History Sherpa Sherpa is a modeli

134 Jan 07, 2023

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video. You can chose the cha

2 Jul 22, 2022

TextDescriptives - A Python library for calculating a large variety of statistics from text

A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistic

150 Dec 30, 2022

Python reader for Linked Data in HDF5 files

Linked Data are becoming more popular for user-created metadata in HDF5 files.

8 May 17, 2022

Titanic data analysis for python

Titanic-data-analysis This Repo is an analysis on Titanic_mod.csv This csv file contains some assumed data of the Titanic ship after sinking This full

1 Dec 26, 2021

Program that predicts the NBA mvp based on data from previous years.

NBA MVP Predictor A machine learning model using RandomForest Regression that predicts NBA MVP's using player data. Explore the docs » View Demo · Rep

1 Jan 21, 2022

Implementation in Python of the reliability measures such as Omega.

OmegaPy Summary Simple implementation in Python of the reliability measures: Omega Total, Omega Hierarchical and Omega Hierarchical Total. Name Link O

2 Apr 27, 2022

Python library for creating data pipelines with chain functional programming

PyFunctional Features PyFunctional makes creating data pipelines easy by using chained functional operators. Here are a few examples of what it can do

2.1k Jan 05, 2023

Elasticsearch tool for easily collecting and batch inserting Python data and pandas DataFrames

ElasticBatch Elasticsearch buffer for collecting and batch inserting Python data and pandas DataFrames Overview ElasticBatch makes it easy to efficien

21 Mar 16, 2022

PyIOmica (pyiomica) is a Python package for omics analyses.

PyIOmica (pyiomica) This repository contains PyIOmica, a Python package that provides bioinformatics utilities for analyzing (dynamic) omics datasets.

13 Jun 29, 2022

Python-based Space Physics Environment Data Analysis Software

pySPEDAS pySPEDAS is an implementation of the SPEDAS framework for Python. The Space Physics Environment Data Analysis Software (SPEDAS) framework is

98 Dec 22, 2022

Lale is a Python library for semi-automated data science.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-

293 Dec 29, 2022

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

102 Nov 10, 2022

Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

1 Apr 12, 2022

Single-Cell Analysis in Python. Scales to >1M cells.

Scanpy – Single-Cell Analysis in Python Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It inc

1.4k Jan 05, 2023

A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

xwrf A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset. The primary objective of

43 Nov 29, 2022

Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

5 Apr 26, 2022

Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

10 Oct 27, 2021

Single machine, multiple cards training; mix-precision training; DALI data loader.

Template Script Category Description Category script comparison script train.py, loader.py for single-machine-multiple-cards training train_DP.py, tra

2 Jun 27, 2022

Average time per match by division

HW_02 Unzip matches.rar to access .json files for matches. Get an API key to access their data at: https://developer.riotgames.com/ Average time per m

11 Jan 07, 2022