单向散列函数

发表于 2019-03-29 | 分类于密码学

单向散列函数的说明

单向散列函数也称为消息摘要函数, 哈希函数或者杂凑函数
单向散列函数输出的散列值又称为消息摘要或者指纹

单向散列函数的特点

加密后密文的长度是定长的(即对任意长度的消息三列,得到的散列值是定长的)
散列计算速度快,非常高效
如果明文不一样，那么散列后的结果一定不一样
如果明文一样，那么加密后的密文一定一样（对相同数据加密，加密后的密文一样）
所有的加密算法是公开的
具备单向性,不可以逆推反算

单项散列函数的经典算法

MD4
MD5
SHA1
SHA256
SHA512

MD5和SHA256是非常常用的两种单向散列函数，虽然MD5在2005年已经被中国密码学家王小云攻破，但是也还是在被大规模使用，现在比较常用的是SHA256算法，安全性暂时未有被破解。在BITS系统中，所有的索引都是通过sha256进行加密，即便知道了索引，在不知道原文的情况下，对知道者来说，也只是是一串无意义的字符串。

单向散列函数的常见应用场景

数据库中保存用户密码
比如前段时间频频爆出一些公司的数据库被爆，用户的账户密码以明文存在，大量资料流出这样的做法是极其不负责和危险的。
最好的方法就是将用户的密码通过单向散列函数输出到数据库，每次登录时对比散列值即可。由于单向散列函数的不可逆性，就算数据库被盗取，也没有办法得到用户的信息。
（某些自称破解单向散列函数的网站使用的方法其实是低级的穷举法，保存大量常用明文的散列值，这样做是很愚蠢的。有很多方法可以应对，比如一种叫做“加盐”的常用方法，将用户的信息后面统一加上诸如$%*^&这样的字符，然后计算散列值存入数据库中，或者可以计算散列值的散列值，这样可以保证绝对的安全性。）
防止文件篡改
目前大部分提供下载服务的网站都有提供文件的SHA256值，这是因为单向散列函数具备防篡改的效果，若是下载的文件的SHA256和网站提供的值不符，则可能此文件已经遭到了修改，可能含有病毒或者是盗版等等。。。接下来的代码中我们也会来实现计算文件的SHA256值。
数字签名
在进行数字签名时也会使用单向散列函数。
数字签名是现实社会中的签名和盖章这样行为在数字世界中的实现。数字签名的处理过程非常耗时，因此一般不会对整个消息内容直接使用数字签名，而是先通过单向散列函数计算出消息的散列值，然后再对这个散列值使用私钥加密，得到的就是数字签名。
伪随机数生成器
使用单向散列函数可以构造伪随机数生成器。
密码技术中所使用的随机数需要具备“事实上不可能根据过去的随机数列预测未来的随机数列”这样的性质。为了保证不可预测性，可以利用单向散列函数的单向性。
秒传
很多诸如某云盘，某网盘这样的公司利用单向散列函数的特性实现秒传的效果。
单向散列函数就像文件的指纹一样，当用户上传文件时，首先计算一下此文件的单向散列值，将此值在数据库中进行查找，若存在相同值，证明此用户上传的文件已经存在相同的，所以无需上传，共享即可。如此可大幅降低服务器负载，大幅缩减存储空间，实现去重的效果。

参考资料

什么是散列函数：https://web.archive.org/web/20061206022506/http://www.rsasecurity.com/rsalabs/node.asp?id=2176
散列函数：https://zh.wikipedia.org/wiki/%E6%95%A3%E5%88%97%E5%87%BD%E6%95%B8
sha家族：https://zh.wikipedia.org/wiki/SHA%E5%AE%B6%E6%97%8F
彩虹表：https://zh.wikipedia.org/wiki/%E5%BD%A9%E8%99%B9%E8%A1%A8

hive on spark java.net.ConnectException:Connection refused

发表于 2018-12-14 | 分类于大数据， spark

hive on spark hive cli运行时

执行命令

set hive.execution.engine=spark;
select count(1) from ****;
```   
出现以下错误提示：

Failed to execute spark task, with exception ‘org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)’
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

1	使用yarn logs -applicationId application_1544681531973_0028 查看yarn日志：

18/12/14 15:16:28 INFO client.RMProxy: Connecting to ResourceManager at *.com/10.19.170.62:8032

Container: container_1544681531973_0028_01_000001 on *.com_8041

LogType:stderr
Log Upload Time:Thu Dec 13 16:54:50 +0800 2018
LogLength:4896
Log Contents:
18/12/13 16:54:46 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
18/12/13 16:54:47 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1544681531973_0028_000001
18/12/13 16:54:48 INFO spark.SecurityManager: Changing view acls to: yarn,root
18/12/13 16:54:48 INFO spark.SecurityManager: Changing modify acls to: yarn,root
18/12/13 16:54:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); users with modify permissions: Set(yarn, root)
18/12/13 16:54:48 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
18/12/13 16:54:48 INFO yarn.ApplicationMaster: Waiting for spark context initialization…
18/12/13 16:54:48 INFO client.RemoteDriver: Connecting to: localhost:39851
18/12/13 16:54:49 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException: java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:39851
java.util.concurrent.ExecutionException: java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:39851
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:552)
Caused by: java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:39851
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
18/12/13 16:54:49 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.util.concurrent.ExecutionException: java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:39851)
18/12/13 16:54:49 ERROR yarn.ApplicationMaster: Uncaught exception:
java.util.concurrent.ExecutionException: java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:39851
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:552)
Caused by: java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:39851
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
18/12/13 16:54:49 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.util.concurrent.ExecutionException: java.net.ConnectException: 拒绝连接: localhost/127.0.0.1:39851)
18/12/13 16:54:49 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1544681531973_0028
18/12/13 16:54:49 INFO util.ShutdownHookManager: Shutdown hook called

LogType:stdout
Log Upload Time:Thu Dec 13 16:54:50 +0800 2018
LogLength:0
Log Contents:

找了一天的原因，都没有找到，理解的是找不到某个服务的进程，链接，安装错误等等。  
试了重装spark，查看了不同的进程。重启了集群N次，最后在

## 分析错误原因
1. hive 不使用 spark 引擎，正常跑任务  
2. spark-shell 正常提交认为  
3. hive on spark 不能提交任务
4. yarn 能正常跑任务

搜索clouder上hive on spark 安装配置步骤

By default, if a Spark service is available, the Hive dependency on the Spark service is configured. To change this configuration, do the following:

In the Cloudera Manager Admin Console, go to the Hive service.
Click the Configuration tab.
Search for the Spark On YARN Service. To configure the Spark service, select the Spark service name. To remove the dependency, select none.
Click Save Changes.
Go to the Spark service.
Add a Spark gateway role to the host running HiveServer2.
Return to the Home page by clicking the Cloudera Manager logo.
Click the icon next to any stale services to invoke the cluster restart wizard.
Click Restart Stale Services.
Click Restart Now.
Click Finish.
In the Hive client, configure the Spark execution engine.

```

检查安装步骤后发现:
Add a Spark gateway role to the host running HiveServer2.
由于安装了 spark gateway 的机器并不在 hiveServer2上，遂通过CDH的WEB界面在相应的机器上安装相关的环境。最后进入该机器，运行hive，设置SET hive.execution.engine=spark; 运行相关脚本。
最后异常信息消失，正常运行提交任务。

centos6.5 安装python 2.7

发表于 2018-06-27 | 分类于 centos6.5 ， python

安装依赖环境

1 2	yum groupinstall "Development tools" yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel

下载安装Python2.7


wget https://www.python.org/ftp/python/2.7.11/Python-2.7.11.tgz
tar vxf Python-2.7.11.tgz
cd Python-2.7.11
./configure --prefix=/usr/local
make && make install

设置系统默认的环境为python2.7

1 2	mv /usr/bin/python /usr/bin/python2.6.6 ln -s /usr/local/bin/python2.7 /usr/bin/python

修改 /usr/bin/yum yum文件头部信息

1
2
3

#!/usr/bin/python
改成
#!/usr/bin/python2.6.6

安装setup

wget https://pypi.python.org/packages/ff/d4/209f4939c49e31f5524fa0027bf1c8ec3107abaf7c61fdaad704a648c281/setuptools-21.0.0.tar.gz#md5=81964fdb89534118707742e6d1a1ddb4
tar vxf setuptools-21.0.0.tar.gz 
cd setuptools-21.0.0
python setup.py  install

安装pip

wget https://pypi.python.org/packages/41/27/9a8d24e1b55bd8c85e4d022da2922cb206f183e2d18fee4e320c9547e751/pip-8.1.1.tar.gz#md5=6b86f11841e89c8241d689956ba99ed7
tar vxf pip-8.1.1.tar.gz 
cd pip-8.1.1
python setup.py install

Spark 运行Bug

发表于 2018-06-26 | 分类于大数据， spark

Spark 运行Bug

spark-shell执行命令出现以下错误提示：

java.io.IOException: Cannot run program “/etc/hadoop/conf.cloudera.yarn/topology.py” (in directory “/home/108857”): error=2。

这个算是Cloudera 的bug ，把datanode上的 /etc/hadoop/conf.cloudera.yarn/topology* 复制到执行spark-shell的机器上即可。

spark pyspark


from pyspark.sql import HiveContext
sqc = HiveContext(sc)
sqc.sql("use xcredit").show()
sqc.sql(select * from xxx.xxx).show()

spark spark-shell


import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)
import sqlContext.implicits._
sqlContext.sql(" SELECT * FROM xxx.xxx limit 10").collect().foreach(println)
sqlContext.sql("insert overwrite local directory '/data/spark_test' row format delimited fields terminated by ',' select ln.index_digest,ln.begin_date,count(distinct lt.p_id) as c  from tb_distinct_analyze_loan ln join tb_distinct_analyze_loan lt
on ln.index_digest = lt.index_digest where ln.begin_date>=date_sub(current_date, 1095) and lt.begin_date>=date_sub(current_date, 1095) and
((ln.begin_date < lt.end_date and ln.begin_date >= lt.begin_date) or
(ln.end_date >= lt.begin_date and ln.end_date <= lt.end_date)) group by  ln.index_digest,ln.begin_date ")

gitlab 维护命令

发表于 2018-06-13 | 分类于 gitlab

不用gitlab自带的nginx

1 2	vim /etc/gitlab/gitlab.rb nginx['enable'] = false

设置服务端口域名

1 2	external_url 'http://xxx.com' unicorn['port'] = 8800

更改配置文件后，重新设置生效

1	gitlab-ctl reconfigure

启动，重启，停止,查看状态，查看日志,查看某组件日志

gitlab-ctl start
gitlab-ctl restart
gitlab-ctl stop
gitlab-ctl stop
gitlab-ctl tail 
gitlab-ctl tail  unicorn

gitlab无法push或clone的错误:JWT::DecodeError (Nil JSON web token): lib/gitlab/workhorse.rb:120:in `verify_

问题出在反代的配置上:nginx或者apache的反代应该反代到 http://gitlab-workhorse; 而不应该反代到http://127.0.0.1:8080

修改

1
2
3

vim /etc/nginx/nginx.conf  
user git;

添加gitlab配置

vim /etc/nginx/conf.d/gitlab.conf
upstream gitlab-workhorse {
  server unix:/var/opt/gitlab/gitlab-workhorse/socket;
}
# HTTPS server
        #proxy_cache_path  /data/nginx/cache  levels=1:2    keys_zone=STATIC:10 inactive=24h  max_size=1g;
server {
        listen       80  default_server;
        #server_name  git.okycz.com;
        #ssl                  on;
        #ssl_certificate
        #ssl_certificate_key
        #ssl_session_cache    shared:SSL:1m;
        #ssl_session_timeout  5m;
        #ssl_ciphers  HIGH:!aNULL:!MD5;
        #ssl_prefer_server_ciphers  on;
        proxy_read_timeout 1800;
        proxy_send_timeout 1800;
        #proxy_connect_timeout 600;
        #send_timeout  600;
        #keepalive_timeout 600s;
        location / {
            proxy_pass http://gitlab-workhorse/;
            proxy_set_header Host $host:$server_port;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Via "nginx";
            #proxy_cache STATIC;
            #proxy_cache_valid  200 1d;
            #proxy_cache_use_stale  error timeout invalid_header updating http_500 http_502 http_503 http_504;
        }
        location ~ ^/(assets)/ {
                root /opt/gitlab/embedded/service/gitlab-rails/public;
                gzip_static on; # to serve pre-gzipped version
                expires max;
                add_header Cache-Control public;
         }
        #rewrite ^(.*)$  https://$host$1 permanent;
}

webide异常

1
2
3

I made it work behind a nginx reverse proxy by adding $request_uri to the proxy_pass:
proxy_pass http://my-gitlab:5555/$request_uri;