215. Kth Largest Element in an Array

发布于 2020-03-24 | 分类于 LeetCode | 2分钟 | 302字数

Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.

Example 1:

Input: [3,2,1,5,6,4] and k = 2
Output: 5

Example 2:

Input: [3,2,3,1,2,4,5,5,6] and k = 4

阅读全文 »

CSLMap和CCSMap

发布于 2020-02-06 | 分类于 Java | 6分钟 | 1392字数

1. 跳表

1.1 跳表基本思路

对于有序数据的存储，常见的数据结构有：

数组：空间复杂度O(n)，定位插入/删除时间复杂度O(n)，查找时间复杂度O(logn)
链表：空间复杂度O(n)，定位插入/删除时间复杂度O(1)，查找时间复杂度O(n)
二叉树：空间复杂度O(n)，定位插入/删除时间复杂度O(logn)，查找时间复杂度Olog(n)

阅读全文 »

Spark Shuffle Manager解析

发布于 2020-01-24 | 分类于 Spark | 3分钟 | 563字数

Shuffle是下一个Stage向上一个Stage获取数据的过程，其中涉及到序列化、磁盘IO、网络IO等相对较慢的操作，所以减少和优化Shuffle过程有利于程序性能的提升。

MapReduce中的shuffle：

1. Spark Shuffle过程

在Spark中Shuffle和MR中的类似，也分为Shuffle Read和Shuffle Write，而且shuffle write也是将结果写入到磁盘中。

2. ShuffleManager

Spark中对于Shuffle的管理由ShuffleManager完成，ShuffleManager也从开始的HashShuffleManager演变为现在常用的SortShuffleManager。

阅读全文 »

SparkSQL join的实现

发布于 2020-01-02 | 分类于 Spark | 2分钟 | 381字数

SparkSQL对于join的实现有3种：

Hash Join
- Shuffle Hash Join
- Broadcast Hash Join（Map-Side Join）
阅读全文 »

560. Subarray Sum Equals K

发布于 2019-04-04 | 分类于 LeetCode | 3分钟 | 495字数

Given an array of integers and an integer k, you need to find the total number of continuous subarrays whose sum equals to k.

Example 1:
Input:nums = [1,1,1], k = 2
Output: 2
Note:

The length of the array is in range [1, 20,000].

阅读全文 »

Java中的equals和==有什么不同

发布于 2019-03-06 | 分类于 Java | 4分钟 | 749字数

今天看到两道题：

equals和==有什么区别？

Integer a = 1;

阅读全文 »

136. Single Number

发布于 2019-02-25 | 分类于 LeetCode | 1分钟 | 152字数

136. Single Number

Difficulty: Easy

Given a non-empty array of integers, every element appears twice except for one. Find that single one.

Note:

Your algorithm should have a linear runtime complexity. Could you implement it without using extra memory?

Example 1:

Input: [2,2,1]

阅读全文 »

Spark广播机制解析

发布于 2018-05-23 | 分类于 Spark | 4分钟 | 837字数

1. 使用场景

在spark开发过程中，广播变量会被分发到Executor，当Executor执行多个Task时会复用广播变量。所以广播变量最常见的使用场景有两个：

共享重资源：重资源的初始化十分消耗性能（例如：各类存储系统的客户端和连接池等），如果不使用广播可能造成频繁的初始化而影响性能，即使基于分区创建连接，在分区数较多的情况下也会占用大量资源。
共享只读数据：多次使用的只读数据可以使用广播发送到每个节点，避免重复传输，起到性能优化的作用。

2. 实现方式

阅读全文 »

Java如何写单元测试

发布于 2017-10-11 | 分类于 Java | 6分钟 | 987字数

1. JUnit 基本测试类

class StandardTests {

    @BeforeAll
    static void initAll() {
    }

阅读全文 »