翻译数据挖掘教材中文译著的打算

KBits_PenPen · 2019 年11 月 5 日 14:58

我导师希望出版数据挖掘方面的 Julia 译著，目标是教学和推广。目前待选资料为链接中的 Books+Classes using Julia for teaching 部分。大家如果有推荐的请回复一下，谢谢！
为了能向出版学校或学院申请出版费，可能更希望没开源的已出版书籍。大家如果有想看的麻烦回复书名，或在已有书名下+1。我本人也会添加备选列表。一人可多票。我们偏向选票数最高的。
之后会去找原作者商讨版权事宜（不保证能做到。。）。如何对社区有贡献也请大家建议一下，谢谢！
待选资料

woclass · 2019 年11 月 5 日 15:26

记得确认下 julia 的版本，能兼容 LTS v1.0.5 + 就行。

johnnychen94 · 2019 年11 月 5 日 15:48

我觉得还是看内部需求来翻译吧。

比如说，假如你们有开线性代数课程的计划的化，就可以考虑翻译Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares，这样在备课和翻译的过程能够互相提供帮助。

当然仅仅是个人观点：单纯只是翻译现有内容的话容易流于形式

singularitti · 2019 年11 月 6 日 00:39

How about Algorithms for Optimization?

Download: http://dl.booktolearn.com/ebooks2/computer/algorithms/9780262039420_Algorithms_for_Optimization_7cc2.pdf

Book template: GitHub - sisl/tufte_algorithms_book: A template for textbooks in the same style as Algorithms for Optimization

Youtube video introduction: https://www.youtube.com/watch?v=ofWy5kaZU3g&list=PLlHZu1B49BRZ5n7mw8x17HTqJQ7ar7kpG&index=6&t=0s

woclass · 2019 年11 月 6 日 06:45

我的理解是你们想要偏 Statistics 和 Data Science 方面的。
楼上说的那个线性代数的教程，我感觉偏基础了。
你们再说的细一点，可能更方便大家推荐。比如要包括线性代数、数理统计的基础吗？要不要 julia 的快速入门教程？还是想偏向更高层一些的应用？

Books

[PDF] Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence
草稿、未出版预计19年内出版。
这本更偏向统计基础，带julia基本入门、包含基础的概率统计、概率分布、数据可视化、统计推断、置信区间、假设检验、线性回归、后面两章还讲了基本的机器学习和动态概率模型。
TOC

by Hayden Klok and Yoni Nazarathy. (DRAFT. PDF will be taken down when the book is published later in 2019).

Contents
- 1 Introducing Julia
  - 1.1 Language Overview
  - 1.2 Setup and Interface
  - 1.3 Crash Course by Example
  - 1.4 Plots, Images and Graphics
  - 1.5 Random Numbers and Monte Carlo
  - 1.6 Integration with Other Languages
- 2 Basic Probability
  - 2.1 Random Experiments
  - 2.2 Working With Sets
  - 2.3 Independence
  - 2.4 Conditional Probability
  - 2.5 Bayes’ Rule
- 3 Probability Distributions
  - 3.1 Random Variables
  - 3.2 Moment Based Descriptors
  - 3.3 Functions Describing Distributions
  - 3.4 The Distributions and Related Packages
  - 3.5 Families of Discrete Distributions
  - 3.6 Families of Continuous Distributions
  - 3.7 Joint Distributions and Covariance
- 4 Processing and Summarizing Data
  - 4.1 Data Frames and Cleaning Data
  - 4.2 Summarizing Data
  - 4.3 Plots for Single Samples and Time Series
  - 4.4 Plots for Multiple Samples
  - 4.5 Plots for Multivariate and High Dimensional Data
  - 4.6 Plots for the Board Room
  - 4.7 Working with Files and Remote Servers
- 5 Statistical Inference Concepts
  - 5.1 A Random Sample
  - 5.2 Sampling from a Normal Population
  - 5.3 The Central Limit Theorem
  - 5.4 Point Estimation
  - 5.5 Confidence Interval as a Concept
  - 5.6 Hypothesis Tests Concepts
  - 5.7 A Taste of Bayesian Statistics
- 6 Confidence Intervals
  - 6.1 Single Sample Confidence Intervals for the Mean
  - 6.2 Two Sample Confidence Intervals for the Difference in Means
  - 6.3 Bootstrap Confidence Intervals
  - 6.4 Confidence Interval for the Variance of Normal Population
  - 6.5 Prediction Intervals
  - 6.6 Credible Intervals
- 7 Hypothesis Testing
  - 7.1 Single Sample Hypothesis Tests for the Mean
  - 7.2 Two Sample Hypothesis Tests for Comparing Means
  - 7.3 Analysis of Variance (ANOVA)
  - 7.4 Independence and Goodness of Fit
  - 7.5 Power Curves
- 8 Linear Regression and Extensions
  - 8.1 Clouds of Points and Least Squares
  - 8.2 Linear Regression with One Variable
  - 8.3 Multiple Linear Regression
  - 8.4 Model Adaptations
  - 8.5 Model Selection
  - 8.6 Logistic Regression and the Generalized Linear Model
  - 8.7 Time Series and Forecasting
- 9 Machine Learning Basics
  - 9.1 Training, Validation and Testing
  - 9.2 Bias, Variance and Regularization
  - 9.3 Supervised Learning Methods
  - 9.4 Unsupervised Learning Methods
  - 9.5 Reinforcement Learning and MDP
  - 9.6 A Taste of Generational Adversarial Networks
- 10 Simulation of Dynamic Models
  - 10.1 Deterministic Dynamical Systems
  - 10.2 Markov Chains
  - 10.3 Discrete Event Simulation
  - 10.4 Models with Additive Noise
  - 10.5 Network Reliability
  - 10.6 Common Random Numbers and Multiple RNGs
- Appendix A How-to in Julia
  - A.1 Basics
  - A.2 Text and I/O
  - A.3 Data Structures
  - A.4 Data Frames
  - A.5 Mathematics
  - A.6 Randomness, Statistics and Machine Learning
  - A.7 Graphics
- Appendix B Additional Julia Features
- Appendix C Additional Packages
- Bibliography 413
- List of code listings 415
- Index 421
Data Science with Julia - CRC Press Book
January 11, 2019 - 220pages
这本书带julia入门，讲了数据的预处理、可视化、有监督&无监督学习、与 R 的互操作
TOC

Table of Contents
- Chapter 1 Introduction
  - DATA SCIENCE
  - BIG DATA
  - JULIA
  - JULIA PACKAGES
  - R PACKAGES
  - DATASETS
  - Overview
  - Beer Data
  - Coffee Data
  - Leptograpsus Crabs Data
  - Food Preferences Data
  - x Data
  - Iris Data
  - OUTLINE OF THE CONTENTS OF THIS MONOGRAPH
- Chapter 2 Core Julia
  - VARIABLE NAMES
  - TYPES
  - Numeric
  - Floats
  - Strings
  - Tuples
  - DATA STRUCTURES
  - Arrays
  - Dictionaries
  - CONTROL FLOW
  - Compound Expressions
  - Conditional Evaluation
  - Loops
  - Basics
  - Loop termination
  - Exception Handling
  - FUNCTIONS
- Chapter 3 Working With Data
  - DATAFRAMES
  - CATEGORICAL DATA
  - IO
  - USEFUL DATAFRAME FUNCTIONS
  - SPLIT-APPLY-COMBINE STRATEGY
  - QUERYJL
- Chapter 4 Visualizing Data
  - GADFLYJL
  - VISUALIZING UNIVARIATE DATA
  - DISTRIBUTIONS
  - VISUALIZING BIVARIATE DATA
  - ERROR BARS
  - FACETS
  - SAVING PLOTS
- Chapter 5 Supervised Learning
  - INTRODUCTION
  - Contents _ ix
  - CROSS-VALIDATION
  - Overview
  - K-Fold Cross-Validation
  - K-NEAREST NEIGHBOURS CLASSIFICATION
  - CLASSIFICATION AND REGRESSION TREES
  - Overview
  - Classification Trees
  - Regression Trees
  - Comments
  - BOOTSTRAP
  - RANDOM FORESTS
  - GRADIENT BOOSTING
  - Overview
  - Beer Data
  - Food Data
  - COMMENTS
- Chapter 6 Unsupervised Learning
  - INTRODUCTION
  - PRINCIPAL COMPONENTS ANALYSIS
  - PROBABILISTIC PRINCIPAL COMPONENTS
  - ANALYSIS
  - EM ALGORITHM FOR PPCA
  - Background: EM Algorithm
  - E-step
  - M-step
  - Woodbury Identity
  - Initialization
  - Stopping Rule
  - Implementing the EM Algorithm for PPCA
  - Comments
  - K-MEANS CLUSTERING
  - MIXTURE OF PPCAS
  - Model
  - Parameter Estimation
  - Illustrative Example: Coffee Data
- Chapter 7 R Interoperability
  - ACCESSING R DATASETS
  - INTERACTING WITH R
  - EXAMPLE: CLUSTERING AND DATA REDUCTION FOR THE COFFEE DATA
  - Coffee Data
  - PGMM Analysis
  - VSCC Analysis
  - EXAMPLE: FOOD DATA
  - Overview
  - Random Forests
[julia v0.4]Julia for Data Science
这本内容相关但是用的是 0.4 版本，比较老了。仅供参考

TOC

1 The Groundwork – Julia’s Environment
2 Data Munging
3 Data Exploration
4 Deep Dive into Inferential Statistics
5 Making Sense of Data Using Visualization
6 Supervised Machine Learning
7 Unsupervised Machine Learning
8 Creating Ensemble Models
9 Time Series
10 Collaborative Filtering and Recommendation System
11 Introduction to Deep Learning
vmls-julia-companion.pdf
这事那本线代教材附带的小册子，里面给了非常多的例子。我觉得你们如果有教学目的可以参考一下这种形式。

course

MIT 的那些课，看你们有没有需要的。
其他的课偏数值分析和优化的多，统计的太少了。
- STAT 590F, Topics in Statistical Computing: Julia Seminar (Prof. Heike Hofmann), Fall 2014
- Northeastern University, Fall 2016：MTH3300: Applied Probability & Statistics
- 223490-0286, Statistical Learning Methods (Bogumił Kamiński): Fall 2017, Spring 2018, Fall 2018

KBits_PenPen · 2019 年11 月 6 日 14:35

收到，感谢提醒！

KBits_PenPen · 2019 年11 月 6 日 14:38

明白，感谢建议！目前确实是教学需求，然后应该是数据挖掘入门方面。

翻译目的为了教学，这个我会和导师讨论一下会有较高的产出，感谢提醒！

KBits_PenPen · 2019 年11 月 6 日 14:45

Thanks! I think it’s an excellent book with Julia tutorials, but imnot familiar with the contents. I find that data mining mainly contains techniques about EDA (Exploratory Data Analysis). To ensure the translation quality，we would consider it later. Thanks!

KBits_PenPen · 2019 年11 月 6 日 14:49

感谢建议！目前应该只要数据挖掘方面的。您推荐的第一本书感觉非常适合。
偏基础方面的，以及数值优化方面的需求目前我们导师这里还没有提。
课程中整合 Julia 入门的资料应该是必要的。小册子中的例子应该是很好的补充练习，
非常感谢！

翻译 数据挖掘教材中文译著的打算

Books

Contents

course

翻译数据挖掘教材中文译著的打算