翻译 数据挖掘教材中文译著的打算


#1

我导师希望出版数据挖掘方面的 Julia 译著,目标是教学和推广。目前待选资料为链接中的 Books+Classes using Julia for teaching 部分。大家如果有推荐的请回复一下,谢谢!
为了能向出版学校或学院申请出版费,可能更希望没开源的已出版书籍。大家如果有想看的麻烦回复书名,或在已有书名下+1。我本人也会添加备选列表。一人可多票。我们偏向选票数最高的。
之后会去找原作者商讨版权事宜(不保证能做到。。)。如何对社区有贡献也请大家建议一下,谢谢!
待选资料


#2

记得确认下 julia 的版本,能兼容 LTS v1.0.5 + 就行。


#3

我觉得还是看内部需求来翻译吧。

比如说,假如你们有开线性代数课程的计划的化, 就可以考虑翻译Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares,这样在备课和翻译的过程能够互相提供帮助。

当然仅仅是个人观点:单纯只是翻译现有内容的话容易流于形式


#4

How about https://mitpress.mit.edu/books/algorithms-optimization?

Download: http://dl.booktolearn.com/ebooks2/computer/algorithms/9780262039420_Algorithms_for_Optimization_7cc2.pdf

Book template: https://github.com/sisl/tufte_algorithms_book

Youtube video introduction: https://www.youtube.com/watch?v=ofWy5kaZU3g&list=PLlHZu1B49BRZ5n7mw8x17HTqJQ7ar7kpG&index=6&t=0s


#5

我的理解是你们想要偏 Statistics 和 Data Science 方面的。
楼上说的那个线性代数的教程,我感觉偏基础了。
你们再说的细一点,可能更方便大家推荐。比如要包括线性代数、数理统计的基础吗?要不要 julia 的快速入门教程?还是想偏向更高层一些的应用?


Books

  • [PDF] Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence
    草稿、未出版预计19年内出版。
    这本更偏向统计基础,带julia基本入门、包含基础的概率统计、概率分布、数据可视化、统计推断、置信区间、假设检验、线性回归、后面两章还讲了基本的机器学习和动态概率模型。

    TOC

    by Hayden Klok and Yoni Nazarathy. (DRAFT. PDF will be taken down when the book is published later in 2019).

    Contents

    • 1 Introducing Julia
      • 1.1 Language Overview
      • 1.2 Setup and Interface
      • 1.3 Crash Course by Example
      • 1.4 Plots, Images and Graphics
      • 1.5 Random Numbers and Monte Carlo
      • 1.6 Integration with Other Languages
    • 2 Basic Probability
      • 2.1 Random Experiments
      • 2.2 Working With Sets
      • 2.3 Independence
      • 2.4 Conditional Probability
      • 2.5 Bayes’ Rule
    • 3 Probability Distributions
      • 3.1 Random Variables
      • 3.2 Moment Based Descriptors
      • 3.3 Functions Describing Distributions
      • 3.4 The Distributions and Related Packages
      • 3.5 Families of Discrete Distributions
      • 3.6 Families of Continuous Distributions
      • 3.7 Joint Distributions and Covariance
    • 4 Processing and Summarizing Data
      • 4.1 Data Frames and Cleaning Data
      • 4.2 Summarizing Data
      • 4.3 Plots for Single Samples and Time Series
      • 4.4 Plots for Multiple Samples
      • 4.5 Plots for Multivariate and High Dimensional Data
      • 4.6 Plots for the Board Room
      • 4.7 Working with Files and Remote Servers
    • 5 Statistical Inference Concepts
      • 5.1 A Random Sample
      • 5.2 Sampling from a Normal Population
      • 5.3 The Central Limit Theorem
      • 5.4 Point Estimation
      • 5.5 Confidence Interval as a Concept
      • 5.6 Hypothesis Tests Concepts
      • 5.7 A Taste of Bayesian Statistics
    • 6 Confidence Intervals
      • 6.1 Single Sample Confidence Intervals for the Mean
      • 6.2 Two Sample Confidence Intervals for the Difference in Means
      • 6.3 Bootstrap Confidence Intervals
      • 6.4 Confidence Interval for the Variance of Normal Population
      • 6.5 Prediction Intervals
      • 6.6 Credible Intervals
    • 7 Hypothesis Testing
      • 7.1 Single Sample Hypothesis Tests for the Mean
      • 7.2 Two Sample Hypothesis Tests for Comparing Means
      • 7.3 Analysis of Variance (ANOVA)
      • 7.4 Independence and Goodness of Fit
      • 7.5 Power Curves
    • 8 Linear Regression and Extensions
      • 8.1 Clouds of Points and Least Squares
      • 8.2 Linear Regression with One Variable
      • 8.3 Multiple Linear Regression
      • 8.4 Model Adaptations
      • 8.5 Model Selection
      • 8.6 Logistic Regression and the Generalized Linear Model
      • 8.7 Time Series and Forecasting
    • 9 Machine Learning Basics
      • 9.1 Training, Validation and Testing
      • 9.2 Bias, Variance and Regularization
      • 9.3 Supervised Learning Methods
      • 9.4 Unsupervised Learning Methods
      • 9.5 Reinforcement Learning and MDP
      • 9.6 A Taste of Generational Adversarial Networks
    • 10 Simulation of Dynamic Models
      • 10.1 Deterministic Dynamical Systems
      • 10.2 Markov Chains
      • 10.3 Discrete Event Simulation
      • 10.4 Models with Additive Noise
      • 10.5 Network Reliability
      • 10.6 Common Random Numbers and Multiple RNGs
    • Appendix A How-to in Julia
      • A.1 Basics
      • A.2 Text and I/O
      • A.3 Data Structures
      • A.4 Data Frames
      • A.5 Mathematics
      • A.6 Randomness, Statistics and Machine Learning
      • A.7 Graphics
    • Appendix B Additional Julia Features
    • Appendix C Additional Packages
    • Bibliography 413
    • List of code listings 415
    • Index 421
  • Data Science with Julia - CRC Press Book
    January 11, 2019 - 220pages
    这本书带julia入门,讲了数据的预处理、可视化、有监督&无监督学习、与 R 的互操作

    TOC

    Table of Contents

    • Chapter 1 Introduction
      • DATA SCIENCE
      • BIG DATA
      • JULIA
      • JULIA PACKAGES
      • R PACKAGES
      • DATASETS
      • Overview
      • Beer Data
      • Coffee Data
      • Leptograpsus Crabs Data
      • Food Preferences Data
      • x Data
      • Iris Data
      • OUTLINE OF THE CONTENTS OF THIS MONOGRAPH
    • Chapter 2 Core Julia
      • VARIABLE NAMES
      • TYPES
      • Numeric
      • Floats
      • Strings
      • Tuples
      • DATA STRUCTURES
      • Arrays
      • Dictionaries
      • CONTROL FLOW
      • Compound Expressions
      • Conditional Evaluation
      • Loops
      • Basics
      • Loop termination
      • Exception Handling
      • FUNCTIONS
    • Chapter 3 Working With Data
      • DATAFRAMES
      • CATEGORICAL DATA
      • IO
      • USEFUL DATAFRAME FUNCTIONS
      • SPLIT-APPLY-COMBINE STRATEGY
      • QUERYJL
    • Chapter 4 Visualizing Data
      • GADFLYJL
      • VISUALIZING UNIVARIATE DATA
      • DISTRIBUTIONS
      • VISUALIZING BIVARIATE DATA
      • ERROR BARS
      • FACETS
      • SAVING PLOTS
    • Chapter 5 Supervised Learning
      • INTRODUCTION
      • Contents _ ix
      • CROSS-VALIDATION
      • Overview
      • K-Fold Cross-Validation
      • K-NEAREST NEIGHBOURS CLASSIFICATION
      • CLASSIFICATION AND REGRESSION TREES
      • Overview
      • Classification Trees
      • Regression Trees
      • Comments
      • BOOTSTRAP
      • RANDOM FORESTS
      • GRADIENT BOOSTING
      • Overview
      • Beer Data
      • Food Data
      • COMMENTS
    • Chapter 6 Unsupervised Learning
      • INTRODUCTION
      • PRINCIPAL COMPONENTS ANALYSIS
      • PROBABILISTIC PRINCIPAL COMPONENTS
      • ANALYSIS
      • EM ALGORITHM FOR PPCA
      • Background: EM Algorithm
      • E-step
      • M-step
      • Woodbury Identity
      • Initialization
      • Stopping Rule
      • Implementing the EM Algorithm for PPCA
      • Comments
      • K-MEANS CLUSTERING
      • MIXTURE OF PPCAS
      • Model
      • Parameter Estimation
      • Illustrative Example: Coffee Data
    • Chapter 7 R Interoperability
      • ACCESSING R DATASETS
      • INTERACTING WITH R
      • EXAMPLE: CLUSTERING AND DATA REDUCTION FOR THE COFFEE DATA
      • Coffee Data
      • PGMM Analysis
      • VSCC Analysis
      • EXAMPLE: FOOD DATA
      • Overview
      • Random Forests
  • [julia v0.4]Julia for Data Science
    这本内容相关但是用的是 0.4 版本,比较老了。仅供参考

    TOC

    1 The Groundwork – Julia’s Environment
    2 Data Munging
    3 Data Exploration
    4 Deep Dive into Inferential Statistics
    5 Making Sense of Data Using Visualization
    6 Supervised Machine Learning
    7 Unsupervised Machine Learning
    8 Creating Ensemble Models
    9 Time Series
    10 Collaborative Filtering and Recommendation System
    11 Introduction to Deep Learning

  • vmls-julia-companion.pdf
    这事那本线代教材附带的小册子,里面给了非常多的例子。我觉得你们如果有教学目的可以参考一下这种形式。

course

  • MIT 的那些课,看你们有没有需要的。
  • 其他的课偏数值分析和优化的多,统计的太少了。
    • STAT 590F, Topics in Statistical Computing: Julia Seminar (Prof. Heike Hofmann), Fall 2014
    • Northeastern University, Fall 2016:MTH3300: Applied Probability & Statistics
    • 223490-0286, Statistical Learning Methods (Bogumił Kamiński): Fall 2017, Spring 2018, Fall 2018

#6

收到,感谢提醒!


#7

明白,感谢建议!目前确实是教学需求,然后应该是数据挖掘入门方面。

翻译目的为了教学,这个我会和导师讨论一下会有较高的产出,感谢提醒!


#8

Thanks! I think it’s an excellent book with Julia tutorials, but imnot familiar with the contents. I find that data mining mainly contains techniques about EDA (Exploratory Data Analysis). To ensure the translation quality,we would consider it later. Thanks!


#9

感谢建议!目前应该只要数据挖掘方面的。您推荐的第一本书感觉非常适合。
偏基础方面的,以及数值优化方面的需求目前我们导师这里还没有提。
课程中整合 Julia 入门的资料应该是必要的。小册子中的例子应该是很好的补充练习,
非常感谢!