Omics – Hunter https://evvail.com Evvail | 一个经验分享的地方 Sun, 20 Jul 2025 15:32:17 +0000 zh-CN hourly 1 https://evvail.com/wp-content/uploads/2019/08/cropped-1-32x32.jpg Omics – Hunter https://evvail.com 32 32 解决LINUX系统下The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY XXXXXXXXXXX https://evvail.com/2025/07/20/2919.html https://evvail.com/2025/07/20/2919.html#respond Sun, 20 Jul 2025 15:32:16 +0000 https://evvail.com/?p=2919 好久没有更新博客,这两天刚好使用Ubuntu更新软件时发现报错: ...

The post 解决LINUX系统下The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY XXXXXXXXXXX first appeared on Omics - Hunter.

]]>
好久没有更新博客,这两天刚好使用Ubuntu更新软件时发现报错:

遇到这样的错误主要是因为我们使用的国内源的公钥与数据库的签名不一致导致

解决方法就是导入相应的公钥即可,如此处是 `ED65462EC8D5E4C5`

# 其他类似报错信息更换公钥即可
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys ED65462EC8D5E4C5

然后再次执行更新数据库即可解决。

sudo apt update

如果遇见apt-key不可用,说明你使用的系统版本比较新,新版本的系统已经改用gpg来管理了,可以使用如下命令(首先你需要先下载GPG 密钥文件或者本地导入):

sudo gpg --no-default-keyring --keyring /etc/apt/trusted.gpg.d/repo-name.gpg --keyserver keyserver.ubuntu.com --recv ED65462EC8D5E4C5

附语法格式

apt-key [选项] [密钥文件]

gpg [选项] [命令] [文件名]

The post 解决LINUX系统下The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY XXXXXXXXXXX first appeared on Omics - Hunter.

]]>
https://evvail.com/2025/07/20/2919.html/feed 0
GPT4ALL一个可以用CPU跑本地模型的框架 https://evvail.com/2024/04/03/2888.html https://evvail.com/2024/04/03/2888.html#respond Wed, 03 Apr 2024 01:07:03 +0000 https://evvail.com/?p=2888 近年,大模型是话题的中心。chatGPT以其强大语言处理能力频繁出...

The post GPT4ALL一个可以用CPU跑本地模型的框架 first appeared on Omics - Hunter.

]]>
近年,大模型是话题的中心。chatGPT以其强大语言处理能力频繁出现在大家的视野中,大模型可以做什么?大模型可以写代码、聊天、办公自动化、写小作文、语言转图片、图片扩展、生成视频等等。

前段时间由于个人原因,博客停更半年,今天我们继续以当下大模型聊聊。对于大部分时间我们只需要花0.000很多个01美刀即可体验像ChatGPT的语言模型功能,但是由于ChatGPT是商业软件并非开源,对于折腾党来说除了去逗GPT开心玩少了很多乐趣。

今天要较介绍的平台GPT4ALL(Open-source large language models that run locally on your CPU and nearly any GPU),是nomic-ai开源的一款可以在本地电脑上跑大模型的一个框架。同时GPT4ALL使用了gguf格式作为模型的标准格式,极大的降低了对电脑的要求和把玩成本。

什么是gguf呢?

gguf是开发者 Georgi Gerganov 基于 Llama 模型写的纯 C/C++ 版本,它最大的优势是可以在 CPU上快速地进行推理而不需要 GPU,支持量化模型在CPU中执行推断,从而实现低资源部署LLM。
所以我们可以hugging face仓库中下载很多优质的模型来进行测试学习。

提供了不同平台的安装方法:

Windows:https://gpt4all.io/installers/gpt4all-installer-win64.exe

Mac:https://gpt4all.io/installers/gpt4all-installer-darwin.dmg

Ubuntu:https://gpt4all.io/installers/gpt4all-installer-linux.run

1)它可以进行Q&A

2)也可以作为个人写作助理

3)可以进行代码的写作

4)可以理解和提取摘要

部分开源模型的性能评估如下:

ModelBoolQPIQAHellaSwagWinoGrandeARC-eARC-cOBQAAvg
GPT4All-J 6B v1.073.474.863.464.754.93640.258.2
GPT4All-J v1.1-breezy7475.163.263.655.434.938.457.8
GPT4All-J v1.2-jazzy74.874.963.663.856.635.34158.6
GPT4All-J v1.3-groovy73.674.363.863.557.73538.858.1
GPT4All-J Lora 6B68.675.866.263.556.435.740.258.1
GPT4All LLaMa Lora 7B73.177.672.167.851.140.440.260.3
GPT4All 13B snoozy83.379.27571.360.944.243.465.3
GPT4All Falcon77.679.874.970.167.943.442.665.2
Nous-Hermes79.578.98071.974.250.946.468.8
Nous-Hermes283.980.780.171.375.752.146.270.0
Nous-Puffin81.580.780.472.577.650.745.669.9
Dolly 6B68.877.367.663.962.938.741.260.1
Dolly 12B56.775.47162.264.638.540.458.4
Alpaca 7B73.977.273.966.159.843.343.462.5
Alpaca Lora 7B74.379.37468.856.643.942.662.8
GPT-J 6.7B65.476.266.264.162.236.638.258.4
LLama 7B73.177.47366.952.541.442.461.0
LLama 13B68.579.176.270.16044.642.263.0
Pythia 6.7B63.576.36461.161.335.237.256.9
Pythia 12B67.776.667.363.863.934.83858.9
Fastchat T581.564.646.361.849.333.339.453.7
Fastchat Vicuña 7B76.677.270.767.353.541.240.861.0
Fastchat Vicuña 13B81.576.873.366.757.442.743.663.1
StableVicuña RLHF82.378.674.170.96143.544.465.0
StableLM Tuned62.571.253.654.852.431.133.451.3
StableLM Base60.167.441.250.144.9273246.1
Koala 13B76.577.972.668.854.34142.862.0
Open Assistant Pythia 12B67.97868.16564.240.443.261.0
Mosaic MPT7B74.879.376.368.67042.242.664.8
Mosaic mpt-instruct74.380.477.267.872.244.64365.6
Mosaic mpt-chat77.178.274.567.569.443.344.264.9
Wizard 7B78.477.269.966.556.840.542.661.7
Wizard 7B Uncensored77.774.26865.253.538.741.659.8
Wizard 13B Uncensored78.475.572.169.557.540.44462.5
GPT4-x-Vicuna-13b81.37575.26558.743.943.663.2
Falcon 7b73.680.776.367.37143.344.465.2
Falcon 7b instruct70.978.669.866.767.942.741.262.5
text-davinci-00388.183.883.475.883.963.95175.7

官方也提供了多个模型的下载:

上面是开箱即用的安装方式,只要下载好模型和安装包进行安装就可以使用了。

我们也可以使用python版的自己来定制,首先通过pip安装:

pip install gpt4all

安装完成后,下面官方推荐的模型进行测试:

from gpt4all import GPT4All

model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")

output = model.generate("The capital of France is ", max_tokens=3)

print(output)

如果有好的显卡支持可以用下面方式调用:

from gpt4all import GPT4All

model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf", device='gpu') # device='amd', device='intel'

output = model.generate("The capital of France is ", max_tokens=3)

print(output)

测试后总体感觉不错,对于问答和名词解释相当准确,聊天方面相对chatGPT还是有一段距离,不过对于学习和DIY是足够了。

参考资料:

1.https://github.com/nomic-ai/gpt4all

2.https://gpt4all.io/index.html

The post GPT4ALL一个可以用CPU跑本地模型的框架 first appeared on Omics - Hunter.

]]>
https://evvail.com/2024/04/03/2888.html/feed 0
解决unable to access index of repository mran.microsoft.com https://evvail.com/2023/08/10/2879.html https://evvail.com/2023/08/10/2879.html#respond Thu, 10 Aug 2023 03:19:20 +0000 https://evvail.com/?p=2879 我们在使用微软版本的R会出现Warning: unable to ...

The post 解决unable to access index of repository mran.microsoft.com first appeared on Omics - Hunter.

]]>
我们在使用微软版本的R会出现Warning: unable to access index for repository https://mran.microsoft.com,出现这个问题的主要原因是微软已经停止对mran的支持了(July 1, 2023正式关闭)

对于我们安装了该版本的R来说,随着使用时间的推移,安装了很多R包,重新安装会浪费大量的时间,下面提供一种修改方法,避免R访问微软源。

打开R安装目录:

打开Rprofile.site文件

找到RevoUtils::getRevoRepos()将其替换成R源或者国内源

r["CRAN"] <- 'https://mirrors.tuna.tsinghua.edu.cn/CRAN'

即可解决无法访问源的问题。

参考资料:

1.https://techcommunity.microsoft.com/t5/azure-sql-blog/microsoft-r-application-network-retirement/ba-p/3707161

The post 解决unable to access index of repository mran.microsoft.com first appeared on Omics - Hunter.

]]>
https://evvail.com/2023/08/10/2879.html/feed 0
Perl-模式匹配中用含特殊字符的变量 https://evvail.com/2023/05/09/2871.html https://evvail.com/2023/05/09/2871.html#respond Tue, 09 May 2023 08:26:31 +0000 https://evvail.com/?p=2871 我们在用perl处理文本时匹配文本是很常用的操作,如: 但是当我们...

The post Perl-模式匹配中用含特殊字符的变量 first appeared on Omics - Hunter.

]]>
我们在用perl处理文本时匹配文本是很常用的操作,如:

$str = "string\n[]\\";

$str =~ m/string/;
# 会匹配到string

但是当我们用一个变量来做模式匹配,如果正则表达式中含特殊字符,那么perl解释器默认会解释成匹配模式,如:

$reg = "[";
# 如果直接使用则会报错, Unmatched [ in regex; marked by <-- HERE in m/ [ <-- HERE  / at .....
$str =~ m/$reg/g

上面perl在匹配时将[当作匹配模式,会找寻]作为结束。但是很多时候我们需要将其作为字符串来使用,而非一个特殊字符。

有两种方法来避免perl将其解释为特殊字符:

1)添加转义字符\

# 注意此处双引号需要添加两个反斜线
$reg = "\\[";
或者
$reg = '\[';

2)quotemeta 函数(此函数转义 EXPR 中的所有元字符)

$reg = "[";
$reg = quotemeta( $reg );

下面附上perl正则表达式规则(看微软在.Net区总结很好,故此借来和大家分享)

字符转义

正则表达式中的反斜杠字符 (\) 指示其后跟的字符是特殊字符(如下表所示),或应按原义解释该字符。 有关详细信息,请参阅字符转义

转义字符描述模式匹配
\a与报警 (bell) 符 \u0007 匹配。\a"Error!" + '\u0007' 中的 "\u0007"
\b在字符类中,与退格键 \u0008 匹配。[\b]{3,}"\b\b\b\b" 中的 "\b\b\b\b"
\t与制表符 \u0009 匹配。(\w+)\t"item1\titem2\t" 中的 "item1\t" 和 "item2\t"
\r与回车符 \u000D 匹配。 (\r 与换行符 \n不是等效的。)\r\n(\w+)"\r\nThese are\ntwo lines." 中的 "\r\nThese"
\v与垂直制表符 \u000B 匹配。[\v]{2,}"\v\v\v" 中的 "\v\v\v"
\f与换页符 \u000C 匹配。[\f]{2,}"\f\f\f" 中的 "\f\f\f"
\n与换行符 \u000A 匹配。\r\n(\w+)"\r\nThese are\ntwo lines." 中的 "\r\nThese"
\e与转义符 \u001B 匹配。\e"\x001B" 中的 "\x001B"
\ nnn使用八进制表示形式指定字符(nnn 由二位或三位数字组成)。\w\040\w"a bc d" 中的 "a b" 和 "c d"
\x nn使用十六进制表示形式指定字符(nn 恰好由两位数字组成)。\w\x20\w"a bc d" 中的 "a b" 和 "c d"
\c X

\cx
匹配 X 或 x指定的 ASCII 控制字符,其中 X 或 x 是控制字符的字母。\cC"\x0003" 中的 "\x0003" (Ctrl-C)
\u nnnn使用十六进制表示形式匹配 Unicode 字符(由 nnnn正确表示的四位数)。\w\u0020\w"a bc d" 中的 "a b" 和 "c d"
\在后面带有不识别为本主题的此表和其他表中的转义符的字符时,与该字符匹配。 例如, \* 与 \x2A相同,而 \. 与 \x2E相同。 这允许正则表达式引擎区分语言元素(如 * 或 ?)和字符(用 \* 或 \? 表示)。\d+[\+-x\*]\d+"(2+2) * 3*9" 中的 "2+2" 和 "3*9"

字符类

字符类与一组字符中的任何一个字符匹配。 字符类包括下表中列出的语言元素。 有关更多信息,请参见 字符类

字符类描述模式匹配
[ character_group]匹配 character_group 中的任何单个字符。 默认情况下,匹配区分大小写。[ae]"gray" 中的 "a"

"lane" 中的 "a" 和 "e"
[^ character_group]求反:与不在 character_group 中的任何单个字符匹配。 默认情况下, character_group 中的字符区分大小写。[^aei]"reign" 中的 "r""g" 和 "n"
[ first-last]字符范围:与从第一个至最后一个的范围内的任何单个字符匹配。[A-Z]"AB123" 中的 "A" 和 "B"
.通配符:与除 \n 之外的任何单个字符匹配。

若要匹配文本句点字符(. 或 \u002E),你必须在该字符前面加上转义符 (\.)。
a.e"nave" 中的 "ave"

"water" 中的 "ate"
\p{ name}\p{与 name 指定的 Unicode 通用类别或命名块中的任何单个字符匹配。\p{Lu}

\p{IsCyrillic}
"City Lights" 中的 "C" 和 "L"

"ДЖem" 中的 "Д" 和 "Ж"
\P{ name}\P{不在 name 指定的 Unicode 通用类别或命名块中的任何单个字符匹配。\P{Lu}

\P{IsCyrillic}
"City" 中的 "i""t" 和 "y"

"ДЖem" 中的 "e" 和 "m"
\w与任何单词字符匹配。\w"ID A1.3" 中的 "I""D""A""1" 和 "3"
\W与任何非单词字符匹配。\W"ID A1.3" 中的 " " 和 "."
\s与任何空白字符匹配。\w\s"ID A1.3" 中的 "D "
\S与任何非空白字符匹配。\s\S"int __ctr" 中的 " _"
\d与任何十进制数字匹配。\d"4 = IV" 中的 "4"
\D与任何不是十进制数的字符匹配。\D"4 = IV" 中的 " ""="" ""I" 和 "V"

定位点

定位点或原子零宽度断言会使匹配成功或失败,具体取决于字符串中的当前位置,但它们不会使引擎在字符串中前进或使用字符。 下表中列出的元字符是定位点。 有关详细信息,请参阅 定位点

断言说明模式匹配
^默认情况下,必须从字符串的开头开始匹配;在多行模式中,必须从该行的开头开始。^\d{3}"901-333-" 中的 "901"
$默认情况下,匹配必须出现在字符串的末尾,或在字符串末尾的 \n 之前;在多行模式中,必须出现在该行的末尾之前,或在该行末尾的 \n 之前。-\d{3}$"-901-333" 中的 "-333"
\A匹配必须出现在字符串的开头。\A\d{3}"901-333-" 中的 "901"
\Z匹配必须出现在字符串的末尾或出现在字符串末尾的 \n 之前。-\d{3}\Z"-901-333" 中的 "-333"
\z匹配必须出现在字符串的末尾。-\d{3}\z"-901-333" 中的 "-333"
\G匹配必须在上一个匹配结束的位置进行;如果以前没有匹配项,则从开始进行匹配的字符串中的位置开始。\G\(\d\)"(1)(3)(5)[7](9)" 中的 "(1)""(3)" 和 "(5)"
\b匹配必须出现在 \w (字母数字)和 \W (非字母数字)字符之间的边界上。\b\w+\s\w+\b"them theme them them" 中的 "them theme" 和 "them them"
\B匹配不得出现在 \b 边界上。\Bend\w*\b"end sends endure lender" 中的 "ends" 和 "ender"

分组构造

分组构造描述了正则表达式的子表达式,通常用于捕获输入字符串的子字符串。 分组构造包括下表中列出的语言元素。 有关详细信息,请参阅 分组构造

分组构造描述模式匹配
( subexpression)捕获匹配的子表达式并将其分配到一个从 1 开始的序号中。(\w)\1"deep" 中的 "ee"
(?< name>subexpression)

(?' name'subexpression)
将匹配的子表达式捕获到一个命名组中。(?<double>\w)\k<double>"deep" 中的 "ee"
(?< name1-name2>subexpression)

(?' name1-name2'subexpression)
定义平衡组定义。 有关详细信息,请参阅 分组构造中的”平衡组定义”部分。(((?'Open'\()[^\(\)]*)+((?'Close-Open'\))[^\(\)]*)+)*(?(Open)(?!))$"3+2^((1-3)*(3-1))" 中的 "((1-3)*(3-1))"
(?: subexpression)定义非捕获组。Write(?:Line)?"Console.WriteLine()" 中的 "WriteLine"

"Console.Write(value)" 中的 "Write"
(?imnsx-imnsx: subexpression)应用或禁用 子表达式中指定的选项。 有关详细信息,请参阅 正则表达式选项A\d{2}(?i:\w+)\b"A12xl A12XL a12xl" 中的 "A12xl" 和 "A12XL"
(?= subexpression)零宽度正预测先行断言。\b\w+\b(?=.+and.+)"cats""dogs"
in
"cats, dogs and some mice."
(?! subexpression)零宽度负预测先行断言。\b\w+\b(?!.+and.+)"and""some""mice"
in
"cats, dogs and some mice."
(?<= subexpression)零宽度正回顾后发断言。\b\w+\b(?<=.+and.+)

———————————

\b\w+\b(?<=.+and.*)
"some""mice"
in
"cats, dogs and some mice."
————————————
"and""some""mice"
in
"cats, dogs and some mice."
(?<! subexpression)零宽度负回顾后发断言。\b\w+\b(?<!.+and.+)

———————————

\b\w+\b(?<!.+and.*)
"cats""dogs""and"
in
"cats, dogs and some mice."
————————————
"cats""dogs"
in
"cats, dogs and some mice."
(?> subexpression)原子组。(?>a|ab)c"ac" 中的 "ac"

"abc"中无匹配

Lookaround 概览

当正则表达式引擎命中 Lookaround 表达式时,其需要一个子字符串从当前位置到达原始字符串的开始(后行)或结束(先行),然后使用 Lookaround 模式在该子字符串上运行 Regex.IsMatch。 然后,根据此子表达式的结果是正断言还是负断言,可判断其结果是否成功。

Lookaround名称函数
(?=check)正预测先行断言字符串中紧随当前位置之后的内容是 “check”
(?<=check)正预测后行断言字符串中紧随当前位置之前的内容是 “check”
(?!check)负预测先行断言字符串中紧随当前位置之后的内容并非 “check”
(?<!check)负预测后行断言字符串中紧随当前位置之前的内容并非 “check”

结果匹配后,不会再次重新计算原子组,即使该模式的其余部分由于匹配而失败。 当限定符出现在原子组内或模式的其余部分时,可显著提高性能。

数量词

限定符指定在输入字符串中必须存在上一个元素(可以是字符、组或字符类)的多少个实例才能出现匹配项。 限定符包括下表中列出的语言元素。 有关更多信息,请参见 数量词

限定符描述模式匹配
*匹配上一个元素零次或多次。a.*c"abcbc" 中的 "abcbc"
+匹配上一个元素一次或多次。"be+""been" 中的 "bee""bent" 中的 "be"
?匹配上一个元素零次或一次。"rai?""rain" 中的 "rai"
{n}匹配上一个元素恰好 n 次。",\d{3}""1,043.6" 中的 ",043""9,876,543,210" 中的 ",876"",543" 和 ",210"
{n,}匹配上一个元素至少 n 次。"\d{2,}""166""29""1930"
{ n,m}匹配上一个元素至少 n 次,但不多于 m 次。"\d{3,5}""166""17668"

"193024" 中的 "19302"
*?匹配上一个元素零次或多次,但次数尽可能少。a.*?c"abcbc" 中的 "abc"
+?匹配上一个元素一次或多次,但次数尽可能少。"be+?""been" 中的 "be""bent" 中的 "be"
??匹配上一个元素零次或一次,但次数尽可能少。"rai??""rain" 中的 "ra"
{n}?匹配前面的元素恰好 n 次。",\d{3}?""1,043.6" 中的 ",043""9,876,543,210" 中的 ",876"",543" 和 ",210"
{n,}?匹配上一个元素至少 n 次,但次数尽可能少。"\d{2,}?""166""29""1930"
{ n,m}?匹配上一个元素的次数介于 n 和 m 之间,但次数尽可能少。"\d{3,5}?""166""17668"

"193024" 中的 "193" 和 "024"

反向引用构造

反向引用允许在同一正则表达式中随后标识以前匹配的子表达式。 下表列出了 .NET 正则表达式支持的反向引用构造。 有关详细信息,请参阅 反向引用构造

反向引用构造描述模式匹配
\ number\后向引用。 匹配编号子表达式的值。(\w)\1"seek" 中的 "ee"
\k< name>\k<命名后向引用。 匹配命名表达式的值。(?<char>\w)\k<char>"seek" 中的 "ee"

替换构造

替换构造用于修改正则表达式以启用 either/or 匹配。 这些构造包括下表中列出的语言元素。 有关详细信息,请参阅 替换构造

替换构造描述模式匹配
|匹配以竖线 (|) 字符分隔的任何一个元素。th(e|is|at)"this is the day." 中的 "the" 和 "this"
(?( expression)yes|no)

(?( expression)yes)
如果由 expression指定的正则表达式模式匹配,则匹配 yes ;否则,匹配可的 no 部分。 expression 解释为零宽度的断言。

为了避免已命名或已编号的捕获组出现歧义,可选择使用显式断言,如下所示:
(?( (?= expression) )yes|no)
(?(A)A\d{2}\b|\b\d{3}\b)"A10 C103 910" 中的 "A10" 和 "910"
(?( name)yes|no)

(?( name)yes)
如果 name (已命名或已编号的捕获组)具有匹项,则匹配 yes;否则,匹配可的 no(?<quoted>")?(?(quoted).+?"|\S+\s)"Dogs.jpg \"Yiska playing.jpg\"" 中的 "Dogs.jpg " 和 "\"Yiska playing.jpg\""

替代

替换是替换模式中支持的正则表达式语言元素。 有关更多信息,请参见 替代。 下表中列出的元字符是原子零宽度断言。

字符说明模式替换模式输入字符串结果字符串
$ number$替换按组 number匹配的子字符串。\b(\w+)(\s)(\w+)\b$3$2$1"one two""two one"
${ name}${替换按命名组 name匹配的子字符串。\b(?<word1>\w+)(\s)(?<word2>\w+)\b${word2} ${word1}"one two""two one"
$$替换字符”$”。\b(\d+)\s?USD$$$1"103 USD""$103"
$&替换整个匹配项的一个副本。\$?\d*\.?\d+**$&**"$1.30""**$1.30**"
$`替换匹配前的输入字符串的所有文本。B+$`"AABBCC""AAAACC"
$'替换匹配后的输入字符串的所有文本。B+$'"AABBCC""AACCCC"
$+替换最后捕获的组。B+(C+)$+"AABBCCDD""AACCDD"
$_替换整个输入字符串。B+$_"AABBCC""AAAABBCCCC"

正则表达式选项

可以指定控制正则表达式引擎如何解释正则表达式模式的选项。 其中的许多选项可以指定为内联(在正则表达式模式中)或指定为一个或多个 RegexOptions 常量。 本快速参考仅列出内联选项。 有关内联和 RegexOptions 选项的详细信息,请参阅文章 正则表达式选项

可通过两种方式指定内联选项:

  • 通过使用其他构造(?imnsx-imnsx),可用选项或选项组前的减号 (-) 关闭这些选项。 例如, (?i-mn) 启用不区分大小写的匹配 (i),关闭多行模式 (m) 并关闭未命名的组捕获 (n)。 该选项自定义选项的点开始应用于此正则表达式,且持续有效直到模式结束或者到另一构造反转此选项的点。
  • 通过使用 分组构造(?imnsx-imnsx:子表达式)(只定义指定组的选项)。

.NET 正则表达式引擎支持以下内联选项:

选项说明模式匹配
i使用不区分大小写的匹配。\b(?i)a(?-i)a\w+\b"aardvark AAAuto aaaAuto Adam breakfast" 中的 "aardvark" 和 "aaaAuto"
m使用多行模式。 ^ 和 $ 匹配行的开头和结尾,但不匹配字符串的开头和结尾。有关示例,请参阅 正则表达式选项中的”多行模式”部分。
n不捕获未命名的组。有关示例,请参阅 正则表达式选项中的”仅显式捕获”部分。
s使用单行模式。有关示例,请参阅 正则表达式选项中的”单行模式”部分。
x忽略正则表达式模式中的非转义空白。\b(?x) \d+ \s \w+"1 aardvark 2 cats IV centurions" 中的 "1 aardvark" 和 "2 cats"

其他构造

其他构造可修改某个正则表达式模式或提供有关该模式的信息。 下表列出了 .NET 支持的其他构造。 有关详细信息,请参阅 其他构造

构造定义示例
(?imnsx-imnsx)在模式中间对诸如不区分大小写这样的选项进行设置或禁用。有关详细信息,请参阅正则表达式选项\bA(?i)b\w+\b 匹配 "ABA Able Act" 中的 "ABA" 和 "Able"
(?# comment)内联注释。 该注释在第一个右括号处终止。\bA(?#Matches words starting with A)\w+\b
# [至行尾]X 模式注释。 该注释以非转义的 # 开头,并继续到行的结尾。(?x)\bA\w+\b#Matches words starting with A

参考资料:

1.https://learn.microsoft.com/zh-cn/dotnet/standard/base-types/regular-expression-language-quick-reference

The post Perl-模式匹配中用含特殊字符的变量 first appeared on Omics - Hunter.

]]>
https://evvail.com/2023/05/09/2871.html/feed 0
ggplot2家族包汇总-120+ https://evvail.com/2023/03/26/2864.html https://evvail.com/2023/03/26/2864.html#respond Sun, 26 Mar 2023 15:30:00 +0000 https://evvail.com/?p=2864 ggplot2现在已经成为R绘图可视化的主要包,现在将目前大部分基...

The post ggplot2家族包汇总-120+ first appeared on Omics - Hunter.

]]>
ggplot2现在已经成为R绘图可视化的主要包,现在将目前大部分基于ggplot2开发的R包汇总如下:

R包名称 介绍 标签 下载地址 作者 作者主页
ggQQunif Make QQ plots for big data expected to be uniformly distributed, e.g. p-v alues. visualization,quantiles,p-values,statistics,big data https://github.com/rcorty/ggQQunif/ rcorty https://github.com/rcorty
ggupset Combination Matrix Axis for ‘ggplot2’ to Create ‘UpSet’ Plots visualization,upset,combination matrix https://github.com/const-ae/ggupset/ const-ae https://github.com/const-ae
xmrr Generate XMR Control Chart Data from Time-Series Data. XmR, Visualization, Control Charts, QC, XBar https://github.com/Zanidean/xmrr/ Alex Zanidean https://github.com/Alex Zanidean
gg3D 3D perspective plots for ggplot2 3D, Visualization https://github.com/AckerDWM/gg3D/ Daniel Acker https://github.com/Daniel Acker
ggQC Use ggQC to plot single, faceted and multi-layered quality control charts . QC, XmR, XbarR, SixSigma, Visualization https://github.com/kenithgrey/ggQC/ Kenith Grey https://github.com/Kenith Grey
ggdist ‘ggdist’ provides stats and geoms for visualizing distributions and uncertain ty. visualization,uncertainty,confidence,probability https://github.com/mjskay/ggdist/ mjskay https://github.com/mjskay
ggedit ggedit is aimed to interactively edit ggplot layers, scales and themes aesth etics visualization, interactive, shiny, general,themes https://github.com/metrumresearchgroup/ggedit/ yonicd https://github.com/yonicd
ggpage Creates Page Layout Visualizations. visualization,text https://github.com/emilhvitfeldt/ggpage/ emilhvitfeldt https://github.com/emilhvitfeldt
ggbreak Set Axis Break for ‘ggplot2’ visualization, geoms https://github.com/YuLab-SMU/ggbreak/ YuLab-SMU https://github.com/YuLab-SMU
ggimg Graphics Layers for Plotting Image Data with ggplot2. visualization, geoms https://github.com/statsmaths/ggimg/ statsmaths https://github.com/statsmaths
gganatogram gganatogram makes it possible to visualise tissues for different organisms or cell compartments. anatograms, tissue, visualization, anatomy, expr ession, pharmacology https://github.com/jespermaag/gganatogram/ jespermaag https://github.com/jespermaag
ggforce ggforce is aimed at providing missing functionality to ggplot2 through the extension system introduced with ggplot2 v2.0.0. visualization,general https://github.com/thomasp85/ggforce/ thomasp85 https://github.com/thomasp85
ggalt A compendium of ‘geoms’, ‘coords’ and ‘stats’ for ‘ggplot2’. visualization,general https://github.com/hrbrmstr/ggalt/ hrbrmstr https://github.com/hrbrmstr
ggiraph htmlwidget to make ‘ggplot’ graphics interactive. visualization,general https://github.com/davidgohel/ggiraph/ davidgohel https://github.com/davidgohel
ggmuller Creates Muller plots for visualizing evolutionary dynamics. visualization,evolution,dynamics https://github.com/robjohnnoble/ggmuller/ robjohnnoble https://github.com/robjohnnoble
ggstance ggstance implements horizontal versions of common ggplot2 geoms. visualization,general https://github.com/lionel-/ggstance/ lionel- https://github.com/lionel-
ggrepel Repel overlapping text labels away from each other. visualization,general https://github.com/slowkow/ggrepel/ slowkow https://github.com/slowkow
ggraph ggraph is tailored at plotting graph-like data structures (graphs, networks, trees, hierarchies…). visualization,general https://github.com/thomasp85/ggraph/ thomasp85 https://github.com/thomasp85
gginnards Find, delete, insert and move plot layers. Delete unused data from the data object stored within a ggplot object. Dump data to the R console. grammar extensions,layer manipulation,debug https://github.com/aphalo/gginnards/ aphalo https://github.com/aphalo
ggpp Add plots, tables and grobs as plot insets; nudge labels away from a focal point or line; filter observations by local density. grammar extensions,plot insets,position nudge, npc https://github.com/aphalo/ggpp/ aphalo https://github.com/aphalo
ggpmisc Annotate plots with fitted model equations, ANOVA tables, summary table s; find and label peaks and valleys; annotations support grouping and facets. visualization,general,model fit,anova,table https://github.com/aphalo/ggpmisc/ aphalo https://github.com/aphalo
geomnet geomnet implements network visualizations in ggplot2 via geom_net. visualization,general https://github.com/sctyner/geomnet/ sctyner https://github.com/sctyner
ggExtra ggExtra lets you add marginal density plots or histograms to ggplot2 scatt erplots. histogram,marginal,density https://github.com/daattali/ggExtra/ daattali https://github.com/daattali
ggfortify The unified interface to ggplot2 many popular statistical pakackage results. visualization,general https://github.com/sinhrks/ggfortify/ terrytangyuan https://github.com/terrytangyuan
autoplotly Automatic generation of interactive visualizations for popular statistical res ults. visualization,general https://github.com/terrytangyuan/autoplotly/ terrytangyuan https://github.com/terrytangyuan
gganimate A Grammar of Animated Graphics. visualization,general https://github.com/thomasp85/gganimate/ thomasp85 https://github.com/thomasp85
ggfx Pixel Filters for ‘ggplot2’ and ‘grid’ visualization,general https://github.com/thomasp85/ggfx/ thomasp85 https://github.com/thomasp85
plotROC plotROC provides functions to generate an interactive ROC curve plot   for web use, and print versions. visualization,general https://github.com/sachsmc/plotROC/ sachsmc https://github.com/sachsmc
ggbump Bump Chart and Sigmoid Curves. visualization,general,geoms https://github.com/davidsjoberg/ggbump/ davidsjoberg https://github.com/davidsjoberg
ggthemes Some extra geoms, scales, and themes for ggplot. visualization,general,themes https://github.com/jrnold/ggthemes/ jrnold https://github.com/jrnold
ggspectra ‘ggspectra’ extends ‘ggplot2’ with stats, geoms and annotations for plottin g light spectra. visualization,general https://github.com/jrnold/ggthemes/ aphalo https://github.com/aphalo
ggstatsplot ‘ggstatsplot’ provides a collection of functions to enhance ‘ggplot2’ plots with results from statistical tests. visualization,statistics https://github.com/IndrajeetPatil/ggstatsplot/ IndrajeetPatil https://github.com/IndrajeetPatil
ggnetwork The ggnetwork package provides a way to build network plots with ggplo t2. visualization,general https://github.com/briatte/ggnetwork/ briatte https://github.com/briatte
ggtech ggplot2 tech themes, scales, and geoms. visualization,general,themes https://github.com/ricardo-bion/ggtech/ ricardo-bion https://github.com/ricardo-bion
ggradar ggradar allows you to build radar charts with ggplot2. visualization,general https://github.com/ricardo-bion/ggradar/ ricardo-bion https://github.com/ricardo-bion
ggx A Natural Language Interface to ‘ggplot2’. visualization,nlp https://github.com/brandmaier/ggx/ brandmaier https://github.com/brandmaier
ggTimeSeries This R package offers novel time series visualisations. visualization,general https://github.com/Ather-Energy/ggTimeSeries/ Ather-Energy https://github.com/Ather-Energy
ggtree gtree is designed for visualizing phylogenetic tree and different types of as sociated annotation data. visualization,general https://github.com/GuangchuangYu/ggtree/ GuangchuangYu https://github.com/GuangchuangYu
ggseas Seasonal adjustment on the fly extension for ggplot2. visualization,general https://github.com/ellisp/ggseas/ ellisp https://github.com/ellisp
ggsci A collection of ‘ggplot2’ color palettes inspired by scientific journals   and sc ience fiction TV shows. visualization,general https://github.com/road2stat/ggsci/ road2stat https://github.com/road2stat
ggmosaic ggmosaic implements mosaic plots in ‘ggplot2’ via geom_mosaic. visualization,general https://github.com/haleyjeppson/ggmosaic/ haleyjeppson https://github.com/haleyjeppson
survminer Drawing Survival Curves using ‘ggplot2’ visualization,survival https://github.com/kassambara/survminer/ kassambara https://github.com/kassambara
ggeasy Easy Access to ‘ggplot2’ Commands visualization,teaching https://github.com/jonocarroll/ggeasy/ jonocarroll https://github.com/jonocarroll
ggside Side Grammar Graphics visualization,correlation https://github.com/jtlandis/ggside/ jtlandis https://github.com/jtlandis
ggcorrplot Visualization of a correlation matrix using ‘ggplot2’ visualization,correlation https://github.com/kassambara/ggcorrplot/ kassambara https://github.com/kassambara
ggpubr ‘ggplot2’ Based Publication Ready Plots visualization,statistics https://github.com/kassambara/ggpubr/ kassambara https://github.com/kassambara
ggthemr Themes for ggplot visualization,general,themes https://github.com/cttobin/ggthemr/ cttobin https://github.com/cttobin
GGally ggally extends ‘ggplot2’ by adding several functions to reduce the comple xity of combining geometric objects with transformed data. visualization,general https://github.com/ggobi/ggally/ ggobi https://github.com/ggobi
ggseqlogo Publication-ready sequence logos using ggplot2. visualization,general https://github.com/omarwagih/ggseqlogo/ omarwagih https://github.com/omarwagih
ggChernoff Visualise multivariate data using human faces visualization https://github.com/Selbosh/ggChernoff/ Selbosh https://github.com/Selbosh
ggridges Ridgeline plot geoms for ‘ggplot2’ visualization,general https://github.com/clauswilke/ggridges/ clauswilke https://github.com/clauswilke
lemons Repositioning legends and adding brackets to axes to ‘ggplot2’. visualization,brackets,axis https://github.com/stefanedwards/lemon/ stenfanedwards https://github.com/stenfanedwards
cowplot Streamlined plot theme and plot annotations for ‘ggplot2’ visualization,general,themes https://github.com/wilkelab/cowplot/ clauswilke https://github.com/clauswilke
qqplotr Quantile-quantile and probability-probability plot extensions for ‘ggplot2’ quantile-quantile,probability-probability https://github.com/aloy/qqplotr/ almeidaxan https://github.com/almeidaxan
ggalluvial A ‘ggplot2’ extension for alluvial diagrams. visualization,categorical,time series https://github.com/corybrunson/ggalluvial/ corybrunson https://github.com/corybrunson
patchwork Easy composition of ggplot plots using arithmetic operators visualization,composition https://github.com/thomasp85/patchwork/ thomasp85 https://github.com/thomasp85
ggquiver Quiver/velocity plots for ‘ggplot2’. visualization,quiver,velocity,vector https://github.com/mitchelloharawild/ggquiver/ mitchelloharawild https://github.com/mitchelloharawild
ggsignif Significance Brackets for ‘ggplot2’. visualization,multiple comparisons https://github.com/const-ae/ggsignif/ const-ae and IndrajeetPatil https://github.com/const-ae and Ind rajeetPatil
ggdag Causal directed acyclic graphs (DAGs) in <code highlighter-rouge”>ggplot2</code> visualization,dags,inference https://github.com/malcolmbarrett/ggdag/ malcolmbarrett https://github.com/malcolmbarrett
ggformula <code class=”language-plaintext highlighter-rouge”>ggplot2</code> via formulas and pipes visualization,general,interface https://github.com/ProjectMOSAIC/ggformula/ rpruim https://github.com/rpruim
ggbeeswarm Create beeswarm plots, which avoids overlapping datapoints. visualization, beeswarm, categorical https://github.com/eclarke/ggbeeswarm/ Erik Clarke and Scott Sherril l-Mix https://github.com/Erik Clarke and Sc ott Sherrill-Mix
ggperiodic Automagically augment periodic data in <code style=”display:none”> highlighter-rouge”>ggplot2</code> visualization,periodic https://github.com/eliocamp/ggperiodic/ eliocamp https://github.com/eliocamp
ggpol ggpol adds parliament diagrams and several other geoms to ggplot2. visualization,general https://github.com/erocoar/ggpol/ erocoar https://github.com/erocoar
ggpirate Pirate plots for <code highlighter-rouge”>ggpl ot2</code> visualization https://github.com/mikabr/ggpirate/ mikabr https://github.com/mikabr
esquisse Explore and Visualize Your Data Interactively with plaintext highlighter-rouge”>ggplot2</code> visualization,interface https://github.com/dreamrs/esquisse/ dreamrs https://github.com/dreamrs
ggdark Dark Mode for <code highlighter-rouge”>ggpl ot2</code> Themes visualization,general,themes https://github.com/nsgrantham/ggdark/ nsgrantham https://github.com/nsgrantham
sugrrants Supporting Graphs for Analysing Temporal Data with ge-plaintext highlighter-rouge”>ggplot2</code> visualization,calendar,time-series https://github.com/earowang/sugrrants/ earowang https://github.com/earowang
tvthemes <code class=”language-plaintext highlighter-rouge”>ggplot2</code> Th emes &amp; Palettes from popular TV shows! visualization,general,palettes,themes https://github.com/Ryo-N7/tvthemes/ Ryo-N7 https://github.com/Ryo-N7
ggfittext <code class=”language-plaintext highlighter-rouge”>ggplot2</code> ge oms to fit text in a box visualization,general,text https://github.com/wilkox/ggfittext/ wilkox https://github.com/wilkox
ggparty <code class=”language-plaintext highlighter-rouge”>ggplot2</code> visualizations for the <code class=”language-plaintext highlighter-rouge”>partykit</code> package visualization,tree,partykit https://github.com/martin-borkovec/ggparty/ martin-borkovec https://github.com/martin-borkovec
gggenes <code class=”language-plaintext highlighter-rouge”>ggplot2</code> ge oms to draw gene arrow maps visualization,general,genetics https://github.com/wilkox/gggenes/ wilkox https://github.com/wilkox
gggenomes a grammar of graphics for comparative genomics visualization,genetics,genomics https://github.com/thackl/gggenomes/ thackl https://github.com/thackl
treemapify Draw treemaps in <code highlighter-rouge”>g gplot2</code> visualization,general,treemap https://github.com/wilkox/treemapify/ wilkox https://github.com/wilkox
lindia Create diagnostics plots for linear regression visualization,general,diagnostics,regression https://github.com/yeukyul/lindia/ yeukyul https://github.com/yeukyul
gghalves gghalves adds half-geoms to <code highlighter -rouge”>ggplot2</code>. visualization,general https://github.com/erocoar/gghalves/ erocoar https://github.com/erocoar
ggrastr Rasterize only specific layers of your plot visualization,raster https://github.com/vpetukhov/ggrastr/ vpetukhov https://github.com/vpetukhov
ggpointdensity Introduces <code highlighter-rouge”>geom_p ointdensity()</code>: A cross between a scatter plot and a 2D density plot. visualization,general https://github.com/LKremer/ggpointdensity/ LKremer https://github.com/LKremer
ggsom The aim of this package is to offer more variability of graphics based on th e self-organizing maps. visualization,SOM,multi-dimensional,parallel-co ordinates https://github.com/oldlipe/ggsom/ oldlipe https://github.com/oldlipe
ggnewscale Use multiple fill and colour scales in ‘ggplot2’. visualization,general,scales https://github.com/eliocamp/ggnewscale/ eliocamp https://github.com/eliocamp
ggh4x Options for tailored facets, multiple colourscales and miscellaneous visualization,general,scales,facets https://github.com/teunbrand/ggh4x/ teunbrand https://github.com/teunbrand
ggcharts Shorten the distance from data visualization idea to actual plot visualization,general https://github.com/thomas-neitmann/ggcharts/ thomas-neitmann https://github.com/thomas-neitmann
humapr Visualise topographic human data with choropleths visualization,general,tabulation,choropleth https://github.com/benskov/humapr/ benskov https://github.com/benskov
ggshadow Draw a shadow below lines to make busy plots more aesthetically pleasing visualization,general https://github.com/marcmenem/ggshadow/ marcmenem https://github.com/marcmenem
ggseg Draw polygons of brain atlas segmentations visualization,brain imaging https://github.com/LCBC-UiO/ggseg/ Athanasiamo https://github.com/Athanasiamo
mdthemes ‘ggplot2’ themes that render text as markdown/HTML visualization,themes https://github.com/thomas-neitmann/mdthemes/ thomas-neitmann https://github.com/thomas-neitmann
ggwordcloud A word cloud text geom for ‘ggplot2’. visualization,text https://github.com/lepennec/ggwordcloud/ lepennec https://github.com/lepennec
ggasym Asymmetric matrix plotting with multiple scales. visualization,multi-dimensional,matrix,scales https://github.com/jhrcook/ggasym/ jhrcook https://github.com/jhrcook
gglorenz Plotting Lorenz curves with the blessing of ggplot2. visualization,general,statistics https://github.com/jjchern/gglorenz/ jjchern https://github.com/jjchern
hrbrthemes A compilation of extra {ggplot2} themes, scales and utilities, including a sp ell check function for plot label fields and an overall emphasis on typography. theme,typography https://github.com/hrbrmstr/hrbrthemes/ hrbrmstr https://github.com/hrbrmstr
ggpattern Pattern fills for ggplot2 geoms. visualization,pattern https://github.com/coolbutuseless/ggpattern/ coolbutuseless https://github.com/coolbutuseless
ggtext Improved text rendering support for <code hig hlighter-rouge”>ggplot2</code> general,theme,typography https://github.com/clauswilke/ggtext/ Claus Wilke https://github.com/Claus Wilke
calendR Ready to Print Monthly and Yearly Calendars visualization, calendar, time-series https://github.com/R-CoderDotCom/calendR/ R-CoderDotCom https://github.com/R-CoderDotCom
ggip Data visualization of IP addresses and networks visualization, cyber, space-filling curves https://github.com/davidchall/ggip/ davidchall https://github.com/davidchall
gglm Grammar of Graphics for linear model diagnostic plots. visualization,modeling,diagnostic https://github.com/graysonwhite/gglm/ graysonwhite https://github.com/graysonwhite
econocharts Microeconomics and Macroeconomics Charts economics, microeconomics, macroeconomics https://github.com/R-CoderDotCom/econocharts/ R-CoderDotCom https://github.com/R-CoderDotCom
ComplexUpset Visualize set intersections and add <code highli ghter-rouge”>ggplot2</code> annotations visualization,venn,set,intersections,venn-diagra m,upset https://github.com/krassowski/complex-upset/ krassowski https://github.com/krassowski
ggchromatic Colourspace Scales for ‘ggplot2’ visualization,scales https://github.com/teunbrand/ggchromatic/ teunbrand https://github.com/teunbrand
ggheatmap ggplot2 version of heatmap visualization, heatmap https://github.com/XiaoLuo-boy/ggheatmap/ XiaoLuo-boy https://github.com/XiaoLuo-boy
see Visualisation Toolbox for ‘easystats’ and Extra Geoms, Themes and Color P alettes for ‘ggplot2’ visualizations,statistics https://github.com/easystats/see/ easystats https://github.com/easystats
directlabels Framework for adding direct labels to lattice or ggplot2 plots. visualization, direct-labels, positioning, general, plot-labelling https://github.com/tdhock/directlabels/ tdhock https://github.com/tdhock
ggHoriPlot Horizon Plots for <code highlighter-rouge”>gg plot2</code> visualization,general,horizon-plot,time-series https://github.com/rivasiker/ggHoriPlot/ rivasiker https://github.com/rivasiker
ggtrace Outline groups of data points using ggplot2 visualization https://github.com/rnabioco/ggtrace/ sheridar https://github.com/sheridar
ggESDA Exploratory Symbolic Data Analysis with ‘ggplot2’. visualization,symbolic data,interval-valued data https://github.com/kiangkiangkiang/ggESDA/ kiangkiangkiang https://github.com/kiangkiangkiang
geomtextpath Create curved text and directly label lines in ext highlighter-rouge”>ggplot</code> typography,plot-labelling,visualization https://github.com/AllanCameron/geomtextpath/ AllanCameron https://github.com/AllanCameron
ggdensity Interpretable bivariate density visualization with highest density regions visualization,density-estimation https://github.com/jamesotto852/ggdensity/ jamesotto852 https://github.com/jamesotto852
ggtranscript Visualizing transcript structure and annotation using e-plaintext highlighter-rouge”>ggplot2</code> visualization,genetics,genomics,transcripts,annot ation https://github.com/dzhang32/ggtranscript/ dzhang32 https://github.com/dzhang32
piecepackr Board game graphics board games, geoms https://github.com/piecepackr/piecepackr/ trevorld https://github.com/trevorld
oblicubes 3D Rendering Using Obliquely Projected Cubes and Cuboids visualization, geoms https://github.com/trevorld/oblicubes/ trevorld https://github.com/trevorld
ggDoubleHeat A heatmap-like visualization tool visualization, geoms https://github.com/PursuitOfDataScience/ggDoubleHeat/ PursuitOfDataScience https://github.com/PursuitOfDataScie nce
nflplotR ‘nflplotR’ provides a set of functions to visualize National Football League a nalysis in ‘ggplot2’. general,scales,geoms,images,theme,elements https://github.com/nflverse/nflplotR/ mrcaseb https://github.com/mrcaseb
ggbraid Braid ribbons in <code highlighter-rouge”>ggp lot2</code>. visualization,general,geoms https://github.com/nsgrantham/ggbraid/ nsgrantham https://github.com/nsgrantham
ggblanket Simplify ggplot2 visualisation visualization https://github.com/davidhodge931/ggblanket/ davidhodge931 https://github.com/davidhodge931
ggpie Create pie and donut plot using <code highligh ter-rouge”>ggplot2</code>. visualization,general,pie,donut,rose pie https://github.com/showteeth/ggpie/ showteeth https://github.com/showteeth
ggstar Multiple Geometric Shape Point Layer for ‘ggplot2’ visualization, different shape points https://github.com/xiangpin/ggstar/ xiangpin https://github.com/xiangpin
ggarchery Flexible segment geoms with arrows for ‘ggplot2’ visualization, arrows https://github.com/mdhall272/ggarchery/ mdhall272 https://github.com/mdhall272
tidyterra ‘ggplot2’ geoms for   ‘terra’ rasters and vectors visualization, raster, spatial https://github.com/dieghernan/tidyterra/ dieghernan https://github.com/dieghernan
ggseqplot ‘ggseqplot’ renders sequence plots using ggplot2. visualization,sequence analysis https://github.com/maraab23/ggseqplot/ maraab23 https://github.com/maraab23
ggsurvfit Flexible Time-to-Event Figures visualization,survival,statistics https://github.com/ddsjoberg/ggsurvfit/ ddsjoberg https://github.com/ddsjoberg
ggsector Create sector plots using <code highlighter-rou ge”>ggplot2</code>. visualization, geoms, sector, fan https://github.com/yanpd01/ggsector/ yanpd01 https://github.com/yanpd01
ggterror Create T-errorbars like in THAT paper visualization, geoms https://github.com/mivalek/ggterror/ mivalek https://github.com/mivalek

参考资料:

1.https://exts.ggplot2.tidyverse.org/gallery/

The post ggplot2家族包汇总-120+ first appeared on Omics - Hunter.

]]>
https://evvail.com/2023/03/26/2864.html/feed 0
基因组:MAF文件分析及可视化-maftools https://evvail.com/2023/03/06/2856.html https://evvail.com/2023/03/06/2856.html#respond Mon, 06 Mar 2023 04:03:00 +0000 https://evvail.com/?p=2856 MAF文件格式被广泛用于检测到的体细胞变异。TCGA已经对30多种...

The post 基因组:MAF文件分析及可视化-maftools first appeared on Omics - Hunter.

]]>
MAF文件格式被广泛用于检测到的体细胞变异。TCGA已经对30多种不同的癌症进行了测序,每种癌症类型的样本量超过200个,由体细胞变异组成的结果数据以突变注释格式的形式保存。maftools试图以一种有效的方式从TCGA来源或其他基因组数据来总结,分析,注释和可视化MAF文件。

1. 安装maftools

#从Bioconductor安装
BiocManager::install("maftools")

#从github安装
BiocManager::install("PoisonAlien/maftools")

2. 准备MAF文件

MAF文件生成取决于我们用什么软件进行注释,不同的注释软件生成的VCF文件略有不同。

1)使用VEP注释,可以使用vcf2maf来生成MAF文件

2)使用gatk的Funcotator来注释,可以通过指定参数--output-file-format MAF来生成MAF文件

3)使用ANNOVAR进行注释,可以用过annovarToMaf来生成MAF文件

文件格式介绍如下:

File formatsData PortalsAnnotation tools
Mutation Annotation Format(MAF)TCGAvcf2maf – for converting your VCF files to MAF
Variant Call Format(VCF)ICGCEnsembl Variant Effect Predictor VEP
ICGC Simple Somatic Mutation FormatBroad FirehoseAnnovar
cBioPortalFuncotator
CIViC – Clinical interpretation of variants in cancer
DGIdb – Information on drug-gene interactions and the druggable genome

3.maftools输入文件准备

read.maf(
  maf,
  clinicalData = NULL,
  rmFlags = FALSE,
  removeDuplicatedVariants = TRUE,
  useAll = TRUE,
  gisticAllLesionsFile = NULL,
  gisticAmpGenesFile = NULL,
  gisticDelGenesFile = NULL,
  gisticScoresFile = NULL,
  cnLevel = "all",
  cnTable = NULL,
  isTCGA = FALSE,
  vc_nonSyn = NULL,
  verbose = TRUE
)

1)前面提到的MAF文件(可以是gz压缩后的MAF文件, 必须)

2)MAF中与每个Sample/Tumor_Sample_Barcode相关的临床数据(tsv文件格式,可选但推荐,后续可视化可能会用到相关标签)

3)拷贝数数据(如果可用)。可以是GISTIC输出,也可以是包含样本名称、基因名称和拷贝状态(Amp或Del)的数据表。

4.简单展示

此处简单展示maftools的可视化应用,详细内容可以查看官方文档(推荐)

library(maftools)

laml = read.maf(maf = system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools'), 
                clinicalData = system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools'))

#查看sample summry.
getSampleSummary(laml)
#查看基因summary.
getGeneSummary(laml)
#查看样本临床信息
getClinicalData(laml)
#查看所有可用的信息
getFields(laml)

#可视化
plotmafSummary(maf = laml, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)

绘制Oncoplots图,这个也是基因组常用的可视化热图

oncoplot(maf = laml, top = 10)

关于maftools的简单介绍就到这里了,下面是一些作者推荐的常用的工具包:

  • TRONCO – Repository of the TRanslational ONCOlogy library (R)
  • dndscv – dN/dS methods to quantify selection in cancer and somatic evolution (R)
  • cloneevol – Inferring and visualizing clonal evolution in multi-sample cancer sequencing (R)
  • sigminer – Primarily for signature analysis and visualization in R. Supports maftools output (R)
  • GenVisR – Primarily for visualization (R)
  • comut – Primarily for visualization (Python)
  • TCGAmutations – pre-compiled curated somatic mutations from TCGA cohorts (from Broad Firehose and TCGA MC3 Project) that can be loaded into maftools (R)
  • somaticfreq – rapid genotyping of known somatic hotspot variants from the tumor BAM files. Generates a browsable/sharable HTML report. (C)

参考资料:

1.https://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html

2.https://github.com/PoisonAlien/maftools

3.Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP. 2018. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Resarch. PMID: 30341162

The post 基因组:MAF文件分析及可视化-maftools first appeared on Omics - Hunter.

]]>
https://evvail.com/2023/03/06/2856.html/feed 0
多组学数据挖掘-R包MOVICS https://evvail.com/2022/11/29/2838.html https://evvail.com/2022/11/29/2838.html#respond Tue, 29 Nov 2022 09:39:00 +0000 https://evvail.com/?p=2838 MOVICS是一款整合了多组学数据分析及可视化的R包,整体流程涵盖...

The post 多组学数据挖掘-R包MOVICS first appeared on Omics - Hunter.

]]>
MOVICS是一款整合了多组学数据分析及可视化的R包,整体流程涵盖了多组学分析的诸多方面数据分析包含三大模块:

  • GET Module:通过聚类获取多组学数据中的亚型、分类
  • COMP Module:多个维度来比较分型结果
  • RUN Module:分型结果验证和Mark挖掘

具体见下图:

1)R包安装

首先确保你的R版本 > 4.0.1BiocManager v3.11然后在进行安装,版本不对应可能会导致安装失败,同时依赖的安装包也比较多:CIMLR,ClassDiscovery,ConsensusClusterPlus,IntNMF,PINSPlus,SNFtool,coca,dplyr,ggplot2,iClusterPlus,mogsa,vegan,circlize,survival,survminer,ggpp,tibble,limma,DESeq2,edgeR,aricode,ggalluvial,flexclust,reshape2,clusterProfiler,GSVA,grid,cowplot,jstable,impute,CMScaller,car,genefilter,ggpubr,preprocessCore,ridge,sva,grDevices,maftools,patchwork,ComplexHeatmap (>= 2.5.5),pamr,clusterRepro,officer。请确认依赖包都安装好后安装:

# 利用devtools安装GitHub中托管的R包

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
if (!require("devtools")) 
    install.packages("devtools")
# 安装包比较大,下载需要时间
devtools::install_github("xlucpu/MOVICS")

几个下载困难的包已经上传云盘,方便下载:云盘下载

下载后可以直接按照下面命令运行安装:

install.packages("D:/aricode_1.0.1.tar.gz", repos = NULL, type = "source")

2)示例演示

我们利用官方的文档数据进行演示说明

library(MOVICS)

# 作者提供的演示数据
load(system.file("extdata", "brca.tcga.RData", package = "MOVICS", mustWork = TRUE))
load(system.file("extdata", "brca.yau.RData",  package = "MOVICS", mustWork = TRUE))

# 提取数据用于下游分析
mo.data   <- brca.tcga[1:4]

count     <- brca.tcga$count

fpkm      <- brca.tcga$fpkm

maf       <- brca.tcga$maf

segment   <- brca.tcga$segment

surv.info <- brca.tcga$clin.info

获取最优分类数,getClustNum函数利用CPI5和Gaps-statistics方法来获取最优的聚类数:

optk.brca <- getClustNum(data        = mo.data,
                         is.binary   = c(F,F,F,T), # 第四组数据是somatic mutation数据,属于binary matrix
                         try.N.clust = 2:8,        # 从2个亚类尝试直到8个亚类,可以自定义
                         fig.name    = "CLUSTER NUMBER OF TCGA-BRCA")

从图上简单来看,当聚类数为5时为最优(也可以结合具体case的生物学意义进行选择)

MOVICS提供了多种聚类算法供我们选择,调用方法也比较灵活,如下:

# 假设我们需要使用iClusterBayes 聚类算法我们可以使用如下方式调用:
iClusterBayes.res <- getiClusterBayes(data        = mo.data,
                                      N.clust     = 5,
                                      type        = c("gaussian","gaussian","gaussian","binomial"),
                                      n.burnin    = 1800,
                                      n.draw      = 1200,
                                      prior.gamma = c(0.5, 0.5, 0.5, 0.5),
                                      sdev        = 0.05,
                                      thin        = 3)

# 也可以用使用getMOIC函数指定methodslist聚类方法,如下:
iClusterBayes.res <- getMOIC(data        = mo.data,
                             N.clust     = 5,
                             methodslist = "iClusterBayes", # specify only ONE algorithm here
                             type        = c("gaussian","gaussian","gaussian","binomial"), # data type corresponding to the list
                             n.burnin    = 1800,
                             n.draw      = 1200,
                             prior.gamma = c(0.5, 0.5, 0.5, 0.5),
                             sdev        = 0.05,
                             thin        = 3)

同时我们也可以一次调用多种聚类算法,不过这样需要花费很长时间:

moic.res.list <- getMOIC(data        = mo.data,
                         methodslist = list("SNF", "PINSPlus", "NEMO", "COCA", "LRAcluster", "ConsensusClustering", "IntNMF", "CIMLR", "MoCluster"),
                         N.clust     = 5,
                         type        = c("gaussian", "gaussian", "gaussian", "binomial"))

最后,使用getConsensusMOIC获取聚类结果热图

cmoic.brca <- getConsensusMOIC(moic.res.list = moic.res.list,
                               fig.name      = "CONSENSUS HEATMAP",
                               distance      = "euclidean",
                               linkage       = "average")

然后我们利用定量结果的相似性评价分类的效果

getSilhouette(sil      = cmoic.brca$sil, # a sil object returned by getConsensusMOIC()
              fig.path = getwd(),
              fig.name = "SILHOUETTE",
              height   = 5.5,
              width    = 5)

最后我们根据聚类结果绘制热图

# 将甲基化数据的 beta value转化为 M value
indata <- mo.data
indata$meth.beta <- log2(indata$meth.beta / (1 - indata$meth.beta))

# 数据标准化
plotdata <- getStdiz(data       = indata,
                     halfwidth  = c(2,2,2,NA), # no truncation for mutation
                     centerFlag = c(T,T,T,F), # no center for mutation
                     scaleFlag  = c(T,T,T,F)) # no scale for mutation
# 选取前10个feature
feat   <- iClusterBayes.res$feat.res
feat1  <- feat[which(feat$dataset == "mRNA.expr"),][1:10,"feature"] 
feat2  <- feat[which(feat$dataset == "lncRNA.expr"),][1:10,"feature"]
feat3  <- feat[which(feat$dataset == "meth.beta"),][1:10,"feature"]
feat4  <- feat[which(feat$dataset == "mut.status"),][1:10,"feature"]
annRow <- list(feat1, feat2, feat3, feat4)

# 定义不同组学颜色展示
mRNA.col   <- c("#00FF00", "#008000", "#000000", "#800000", "#FF0000")
lncRNA.col <- c("#6699CC", "white"  , "#FF3C38")
meth.col   <- c("#0074FE", "#96EBF9", "#FEE900", "#F00003")
mut.col    <- c("grey90" , "black")
col.list   <- list(mRNA.col, lncRNA.col, meth.col, mut.col)

# 绘图,较慢
getMoHeatmap(data          = plotdata,
             row.title     = c("mRNA","lncRNA","Methylation","Mutation"),
             is.binary     = c(F,F,F,T), # the 4th data is mutation which is binary
             legend.name   = c("mRNA.FPKM","lncRNA.FPKM","M value","Mutated"),
             clust.res     = iClusterBayes.res$clust.res, # cluster results
             clust.dend    = NULL, # no dendrogram
             show.rownames = c(F,F,F,F), # specify for each omics data
             show.colnames = FALSE, # show no sample names
             annRow        = annRow, # mark selected features
             color         = col.list,
             annCol        = NULL, # no annotation for samples
             annColors     = NULL, # no annotation color
             width         = 10, # width of each subheatmap
             height        = 5, # height of each subheatmap
             fig.name      = "COMPREHENSIVE HEATMAP OF ICLUSTERBAYES")

关于MOVICS的介绍就到这里了,该包还提供了很多可视化方法,详细见参考资料。

参考资料:

1.https://github.com/xlucpu/MOVICS

2.https://xlucpu.github.io/MOVICS/MOVICS-VIGNETTE.html

3.Lu, X., Meng, J., Zhou, Y., Jiang, L., and Yan, F. (2020). MOVICS: an R package for multi-omics integration and visualization in cancer subtyping. Bioinformatics, btaa1018.

The post 多组学数据挖掘-R包MOVICS first appeared on Omics - Hunter.

]]>
https://evvail.com/2022/11/29/2838.html/feed 0
融合基因分析-STAR-Fusion https://evvail.com/2022/10/05/2822.html https://evvail.com/2022/10/05/2822.html#respond Wed, 05 Oct 2022 05:27:00 +0000 https://evvail.com/?p=2822 我们知道有一些重要的基因的断裂与重组可能对疾病的发生和发展起着重要...

The post 融合基因分析-STAR-Fusion first appeared on Omics - Hunter.

]]>
我们知道有一些重要的基因的断裂与重组可能对疾病的发生和发展起着重要的作用,目前通过测序的手段来预测可能发生的融合事件,本文就当下广泛使用的软件STAR-Fusion(文章已经发表在Genome Biology)做简单的介绍和使用参考。

下图为STAR-Fusion的简单分析流程展示:

首先我们来安装STAR-Fusion,安装它的问题也比较多主要是集中在STAR的版本和STAR-Fusion的适配上面。下面就官方整理的适配列表(请严格按照给定的适配表进行安装):

STAR-Fusion版本对应的STAR版本CTAT Genome 库
STAR-Fusion v1.10.0STAR v2.7.8aCTAT genome lib StarFv1.10
STAR-Fusion v1.9.0STAR v2.7.2bCTAT genome lib StarFv1.9
STAR-Fusion v1.8.0STAR v2.7.2bCTAT genome lib StarFv1.8
STAR-Fusion v1.7.0STAR v2.7.2aCTAT genome lib StarFv1.7
STAR-Fusion v1.6.0STAR v2.7.0fhttps://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.6/
STAR-Fusion v1.5.0STAR v2.6.1ahttps://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.3/
STAR-Fusion v1.4.0STAR v2.6.0ahttps://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.3/
STAR-Fusion v1.3.2STAR v2.6.0ahttps://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.3/
STAR-Fusion v1.2.0STAR v2.5.3ahttps://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_pre-StarFv1.3/
STAR-Fusion v1.1.0STAR v2.5.3ahttps://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_pre-StarFv1.3/
STAR-Fusion v1.0.0STAR v2.5.2ahttps://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_pre-StarFv1.3/

建议直接通过conda或者docker来部署(conda安装的STAR-Fusion和STAR版本某些情况下会不一致,请参考上表更换),也可以通过自己手动部署最新版本:

# docker最方便吧
docker pull trinityctat/starfusion
# 进入docker环境
docker run --rm -it -v `pwd`:/data trinityctat/ctatfusion:latest bash


# 下载代码库,注意添加--recursive确保下载完整的代码库
git clone --recursive https://github.com/STAR-Fusion/STAR-Fusion.git

安装必备的软件:

由于STAR-Fusion由大量的perl脚本工具,需要安装如下必要的perl包

perl -MCPAN -e shell
   install DB_File
   install URI::Escape
   install Set::IntervalTree
   install Carp::Assert
   install JSON::XS
   install PerlIO::gzip

下载数据库(https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/):

其中’plug-n-play’是已经建好数据库的可以开箱直接使用,但是文件比较大下载需要时间;另一个是source版本的大约4G左右,下载好后需要自己建索引和必要的库文件,个人推荐用source版本。

# 下载好后需要建立索引数据库*(打包文件主要是STAR的索引很大)
tar xvf CTAT_resource_lib.tar.gz

cd CTAT_resource_lib/

$STAR_FUSION_HOME/ctat-genome-lib-builder/prep_genome_lib.pl \
                         --genome_fa ref_genome.fa \
                         --gtf gencode.*.annotation.gtf \
                         --fusion_annot_lib fusion_lib.*.dat.gz \
                         --annot_filter_rule AnnotFilterRule.pm \
                         --pfam_db current \
                         --dfam_db human \
                         --human_gencode_filter

接下来我们可以直接开始融合分析:

# 开始分析,--left_fq --right_fq 输入双端的cleandata;单端测序,只需要--left_fq参数即可
# 下面代码直接从fq文件开始,双端测序
STAR-Fusion \
       --left_fq rnaseq_1.fastq.gz \
       --right_fq rnaseq_2.fastq.gz \
       --genome_lib_dir ctat_genome_lib_build_dir \
       --output_dir STAR-Fusion_outdir
# 单端测序
STAR-Fusion \
       --left_fq rnaseq.fastq.gz \
       --genome_lib_dir ctat_genome_lib_build_dir \
       --output_dir STAR-Fusion_outdir

# 如果已经跑过STAR并且配置参数产生了 Chimeric.out.junction文件,直接配置-J参数可以执行下面命令更快
STAR-Fusion \
       -J /path/Chimeric.out.junction\
       --genome_lib_dir ctat_genome_lib_build_dir \
       --output_dir STAR-Fusion_outdir

默认会输出到STAR-Fusion_outdir文件夹(对下面两个文件的详细介绍可以到https://github.com/STAR-Fusion/STAR-Fusion/wiki#Outputs查看,此处不在赘述):

star-fusion.fusion_predictions.tsv
star-fusion.fusion_predictions.abridged.tsv

后面我们可以用FusionInspector对融合事件进行检查,验证(使用--FusionInspector参数),也可以对融合进行测试(--examine_coding_effect)或者是使用Trinity (--denovo_reconstruct)重构融合转录本。

# 一键化命令
# 注意,此步骤比较耗时~5小时左右
STAR-Fusion \
     --left_fq rnaseq_1.fastq.gz \
     --right_fq rnaseq_2.fastq.gz \
     --genome_lib_dir ctat_genome_lib_build_dir \
     --FusionInspector validate \
     --denovo_reconstruct \
     --examine_coding_effect

以上关于STAR-Fusion的简单介绍就到此了,更多信息可以查看下方链接。

参考资料:

1.https://github.com/STAR-Fusion/STAR-Fusion

2.https://github.com/STAR-Fusion/STAR-Fusion-Tutorial/wiki

The post 融合基因分析-STAR-Fusion first appeared on Omics - Hunter.

]]>
https://evvail.com/2022/10/05/2822.html/feed 0
rsync 数据同步、断点续传工具 https://evvail.com/2022/09/23/2816.html https://evvail.com/2022/09/23/2816.html#respond Fri, 23 Sep 2022 07:58:40 +0000 https://evvail.com/?p=2816 Rsync是一款开源的,多功能的,可实现全量及增量本地或远程数据同...

The post rsync 数据同步、断点续传工具 first appeared on Omics - Hunter.

]]>
Rsync是一款开源的,多功能的,可实现全量及增量本地或远程数据同步备份的工具,目前支持Linux、Windows、MAC等主流系统。其主要特点:

  • can update whole directory trees and filesystems
  • optionally preserves symbolic links, hard links, file ownership, permissions, devices and times
  • requires no special privileges to install
  • internal pipelining reduces latency for multiple files
  • can use rsh, ssh or direct sockets as the transport
  • supports anonymous rsync which is ideal for mirroring

1)rsync软件安装和其他linux软件安装方式一样,可以参考官方文档(https://download.samba.org/pub/rsync/INSTALL)进行编译,其中windows用户可以使用Cygwin来进行编译使用或者使用cwRsync编译好的windows可用版本

cwrsync_6.2.5_x64_free.zip3.95 MBSHA256 hashPGP signature
cwrsync_6.2.4_x64_free.zip3.96 MBSHA256 hashPGP signature
cwrsync_5.5.0_x86_free.zip3.32 MBSHA256 hashPGP signature

2)rsync软件的使用

命令参数解释:

-v, --verbose 详细模式输出 
-q, --quiet 静默模式 
-c, --checksum 打开校验开关,强制对文件传输进行校验 
-a, --archive 归档模式,表示以递归方式传输文件,并保持所有文件属性,等于-rlptgoD 
-r, --recursive 对子目录以递归模式处理 
-R, --relative 使用相对路径信息 
-b, --backup 创建备份,也就是对于目的已经存在有同样的文件名时,将老的文件重新命名为~filename。可以使用--suffix选项来指定不同的备份文件前缀。 
--backup-dir 将备份文件(如~filename)存放在在目录下。 
-suffix=SUFFIX 定义备份文件前缀 
-u, --update 仅仅进行更新,也就是跳过所有已经存在于DST,并且文件时间晚于要备份的文件。(不覆盖更新的文件) 
-l, --links 保留软链结 
-L, --copy-links 想对待常规文件一样处理软链结 
--copy-unsafe-links 仅仅拷贝指向SRC路径目录树以外的链结 
--safe-links 忽略指向SRC路径目录树以外的链结 
-H, --hard-links 保留硬链结 
-p, --perms 保持文件权限 
-o, --owner 保持文件属主信息 
-g, --group 保持文件属组信息 
-D, --devices 保持设备文件信息 
-t, --times 保持文件时间信息 
-S, --sparse 对稀疏文件进行特殊处理以节省DST的空间 
-n, --dry-run现实哪些文件将被传输 
-W, --whole-file 拷贝文件,不进行增量检测 
-x, --one-file-system 不要跨越文件系统边界 
-B, --block-size=SIZE 检验算法使用的块尺寸,默认是700字节 
-e, --rsh=COMMAND 指定使用rsh、ssh方式进行数据同步 
--rsync-path=PATH 指定远程服务器上的rsync命令所在路径信息 
-C, --cvs-exclude 使用和CVS一样的方法自动忽略文件,用来排除那些不希望传输的文件 
--existing 仅仅更新那些已经存在于DST的文件,而不备份那些新创建的文件 
--delete 删除那些DST中SRC没有的文件 
--delete-excluded 同样删除接收端那些被该选项指定排除的文件 
--delete-after 传输结束以后再删除 
--ignore-errors 及时出现IO错误也进行删除 
--max-delete=NUM 最多删除NUM个文件 
--partial 断点续传 
--force 强制删除目录,即使不为空 
--numeric-ids 不将数字的用户和组ID匹配为用户名和组名 
--timeout=TIME IP超时时间,单位为秒 
-I, --ignore-times 不跳过那些有同样的时间和长度的文件 
--size-only 当决定是否要备份文件时,仅仅察看文件大小而不考虑文件时间 
--modify-window=NUM 决定文件是否时间相同时使用的时间戳窗口,默认为0 
-T --temp-dir=DIR 在DIR中创建临时文件 
--compare-dest=DIR 同样比较DIR中的文件来决定是否需要备份 
-P 等同于 --partial 断点续传
--progress 显示备份过程 
-z, --compress 对备份的文件在传输时进行压缩处理 
--exclude=PATTERN 指定排除不需要传输的文件模式 
--include=PATTERN 指定不排除而需要传输的文件模式 
--exclude-from=FILE 排除FILE中指定模式的文件 
--include-from=FILE 不排除FILE指定模式匹配的文件 
--version 打印版本信息 
--address 绑定到特定的地址 
--config=FILE 指定其他的配置文件,不使用默认的rsyncd.conf文件 
--port=PORT 指定其他的rsync服务端口 
--blocking-io 对远程shell使用阻塞IO 
-stats 给出某些文件的传输状态 
--progress 在传输时现实传输过程 
--log-format=formAT 指定日志文件格式 
--password-file=FILE 从FILE中得到密码 
--bwlimit=KBPS 限制I/O带宽,KBytes per second 
-h, --help 显示帮助信息

一般我们传输数据习惯使用scp,但是scp不支持断点续传,给我们在做大数据处理的时候造成了很大的不便,下面就几种常用的场景命令展示rsync的用法:

# 场景一
# 推送文件到服务器
rsync -avzP /Bigdata/ username@ip:/path/to/
# 从服务器下载
rsync -avzP username@ip:/path/to/ /Bigdata/

# 场景二
# 同步排除指定后缀文件
rsync -avzP --exclude='*.txt' username@ip:/path/to/ /Bigdata/
# 多个排除模式
rsync -avzP --exclude='*.txt' --exclude='*.html' username@ip:/path/to/ /Bigdata/
# 只同步指定文件(即排除所有文件,只同步bam)
rsync -avzP --include="*.bam" --exclude='*' username@ip:/path/to/ /Bigdata/

更多信息可以参考官方文档。

参考资料:

1.https://rsync.samba.org/

2.https://github.com/WayneD/rsync

The post rsync 数据同步、断点续传工具 first appeared on Omics - Hunter.

]]>
https://evvail.com/2022/09/23/2816.html/feed 0
NCBI Datasets使用说明 https://evvail.com/2022/08/31/2805.html https://evvail.com/2022/08/31/2805.html#respond Wed, 31 Aug 2022 03:58:00 +0000 https://evvail.com/?p=2805 NCBI是我们经常下载基因组、查找基因信息得地方,NCBI的显示方...

The post NCBI Datasets使用说明 first appeared on Omics - Hunter.

]]>
NCBI是我们经常下载基因组、查找基因信息得地方,NCBI的显示方式也一直再改进中。NCBI推出的新工具Datasets更是方便了生信人员,主要内容呈现分以下几个方面:

1)Web访问方面更加的易用,提升搜索下载等体验,NCBI Datasets website

2)支持命令行工具访问,Command-line tools

3)支持API的访问接口,API 

下面我们就命令行方式进行简单的使用介绍:

1.下载安装

其中datasets可以访问NCBI各种资源数据,dataformat可实现结果在各种格式中无缝切换。

SystemArchitectureDownload
LinuxAMD64get_appdatasets
get_appdataformat
macOSUniversalget_appdatasets
get_appdataformat
Windows (64-bit)AMD64get_appdatasets
get_appdataformat
LinuxARM64get_appdatasets
get_appdataformat
LinuxARM (32-bit)get_appdatasets
get_appdataformat

也支持conda的方式:

# 创建ncbi_datasets环境并安装ncbi-datasets-cli
conda create -n ncbi_datasets -c conda-forge ncbi-datasets-cli

2.示例

# 下载基因信息并保存为example_gene_data_package.zip
datasets download gene gene-id 1,2,3,9,10,11,12,13,14,15,16,17 --filename example_gene_data_package.zip
# 解压数据
unzip -Z1 example_gene_data_package.zip
# 文件格式转换为tsv
dataformat tsv gene --fields gene-id,symbol,transcript-name --package example_gene_data_package.zip | head --lines=10

格式转换也支持excel格式等,更多高级用法大家可以下载尝试。

参考资料:

1.https://www.ncbi.nlm.nih.gov/datasets/docs/v1/getting_started/

The post NCBI Datasets使用说明 first appeared on Omics - Hunter.

]]>
https://evvail.com/2022/08/31/2805.html/feed 0